Strings
Welcome to tutorial no. 14 in Golang tutorial series.
Strings deserve a special mention in Go as they are different in implementation when compared to other languages.
What is a String?
A string is a slice of bytes in Go. Strings can be created by enclosing a set of characters inside double quotes " "
.
Let’s look at a simple example that creates a string
and prints it.
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 name := "Hello World"
9 fmt.Println(name)
10}
The above program will print Hello World
.
Strings in Go are Unicode compliant and are UTF-8 Encoded.
Accessing individual bytes of a string
Since a string is a slice of bytes, it’s possible to access each byte of a string.
1package main
2
3import (
4 "fmt"
5)
6
7func printBytes(s string) {
8 fmt.Printf("Bytes: ")
9 for i := 0; i < len(s); i++ {
10 fmt.Printf("%x ", s[i])
11 }
12}
13
14func main() {
15 name := "Hello World"
16 fmt.Printf("String: %s\n", name)
17 printBytes(name)
18}
%s is the format specifier to print a string. In line no. 16, the input string is printed. In line no. 9 of the program above, len(s) returns the number of bytes in the string and we use a for loop to print those bytes in hexadecimal notation. %x is the format specifier for hexadecimal. The above program outputs
String: Hello World
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
These are the Unicode UT8-encoded values of Hello World
. A basic understanding of Unicode and UTF-8 is needed to understand strings better. I recommend reading https://naveenr.net/unicode-character-set-and-utf-8-utf-16-utf-32-encoding/ to know more about Unicode and UTF-8.
Accessing individual characters of a string
Let’s modify the above program a little bit to print the characters of the string.
1package main
2
3import (
4 "fmt"
5)
6
7func printBytes(s string) {
8 fmt.Printf("Bytes: ")
9 for i := 0; i < len(s); i++ {
10 fmt.Printf("%x ", s[i])
11 }
12}
13
14func printChars(s string) {
15 fmt.Printf("Characters: ")
16 for i := 0; i < len(s); i++ {
17 fmt.Printf("%c ", s[i])
18 }
19}
20
21func main() {
22 name := "Hello World"
23 fmt.Printf("String: %s\n", name)
24 printChars(name)
25 fmt.Printf("\n")
26 printBytes(name)
27}
In line no.17 of the program above, %c format specifier is used to print the characters of the string in the printChars
method. The program prints
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
Although the above program looks like a legitimate way to access the individual characters of a string, this has a serious bug. Let’s find out what that bug is.
1package main
2
3import (
4 "fmt"
5)
6
7func printBytes(s string) {
8 fmt.Printf("Bytes: ")
9 for i := 0; i < len(s); i++ {
10 fmt.Printf("%x ", s[i])
11 }
12}
13
14func printChars(s string) {
15 fmt.Printf("Characters: ")
16 for i := 0; i < len(s); i++ {
17 fmt.Printf("%c ", s[i])
18 }
19}
20
21func main() {
22 name := "Hello World"
23 fmt.Printf("String: %s\n", name)
24 printChars(name)
25 fmt.Printf("\n")
26 printBytes(name)
27 fmt.Printf("\n\n")
28 name = "Señor"
29 fmt.Printf("String: %s\n", name)
30 printChars(name)
31 fmt.Printf("\n")
32 printBytes(name)
33}
The output of the above program is
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
String: Señor
Characters: S e à ± o r
Bytes: 53 65 c3 b1 6f 72
In line no. 30 of the program above, we are trying to print the characters of Señor and it outputs S e à ± o r which is wrong. Why does this program break for Señor
when it works perfectly fine for Hello World
. The reason is that the Unicode code point of ñ
is U+00F1
and its UTF-8 encoding occupies 2 bytes c3
and b1
. We are trying to print characters assuming that each code point will be one byte long which is wrong. In UTF-8 encoding a code point can occupy more than 1 byte. So how do we solve this? This is where rune saves us.
Rune
A rune is a builtin type in Go and it’s the alias of int32. Rune represents a Unicode code point in Go. It doesn’t matter how many bytes the code point occupies, it can be represented by a rune. Let’s modify the above program to print characters using a rune.
1package main
2
3import (
4 "fmt"
5)
6
7func printBytes(s string) {
8 fmt.Printf("Bytes: ")
9 for i := 0; i < len(s); i++ {
10 fmt.Printf("%x ", s[i])
11 }
12}
13
14func printChars(s string) {
15 fmt.Printf("Characters: ")
16 runes := []rune(s)
17 for i := 0; i < len(runes); i++ {
18 fmt.Printf("%c ", runes[i])
19 }
20}
21
22func main() {
23 name := "Hello World"
24 fmt.Printf("String: %s\n", name)
25 printChars(name)
26 fmt.Printf("\n")
27 printBytes(name)
28 fmt.Printf("\n\n")
29 name = "Señor"
30 fmt.Printf("String: %s\n", name)
31 printChars(name)
32 fmt.Printf("\n")
33 printBytes(name)
34}
In line no. 16 of the program above, the string is converted to a slice of runes. We then loop over it and display the characters. This program prints,
String: Hello World
Characters: H e l l o W o r l d
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64
String: Señor
Characters: S e ñ o r
Bytes: 53 65 c3 b1 6f 72
The above output is perfect. Just what we wanted 😀.
Accessing individual runes using for range loop
The above program is a perfect way to iterate over the individual runes of a string. But Go offers us a much easier way to do this using the for range loop.
1package main
2
3import (
4 "fmt"
5)
6
7func charsAndBytePosition(s string) {
8 for index, rune := range s {
9 fmt.Printf("%c starts at byte %d\n", rune, index)
10 }
11}
12
13func main() {
14 name := "Señor"
15 charsAndBytePosition(name)
16}
In line no.8 of the program above, the string is iterated using for range
loop. The loop returns the position of the byte where the rune starts along with the rune. This program outputs
S starts at byte 0
e starts at byte 1
ñ starts at byte 2
o starts at byte 4
r starts at byte 5
From the above output, it’s clear that ñ
occupies 2 bytes since the next character o
starts at byte 4 instead of byte 3 😀.
Creating a string from a slice of bytes
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 byteSlice := []byte{0x43, 0x61, 0x66, 0xC3, 0xA9}
9 str := string(byteSlice)
10 fmt.Println(str)
11}
byteSlice in line no. 8 of the program above contains the UTF-8 Encoded hex bytes of the string Café
. The program prints
Café
What if we have the decimal equivalent of hex values. Will the above program work? Let’s check it out.
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 byteSlice := []byte{67, 97, 102, 195, 169}//decimal equivalent of {'\x43', '\x61', '\x66', '\xC3', '\xA9'}
9 str := string(byteSlice)
10 fmt.Println(str)
11}
Decimal values also work and the above program will also print Café
.
Creating a string from a slice of runes
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 runeSlice := []rune{0x0053, 0x0065, 0x00f1, 0x006f, 0x0072}
9 str := string(runeSlice)
10 fmt.Println(str)
11}
In the above program runeSlice
contains the Unicode code points of the string Señor
in hexadecimal. The program outputs
Señor
String length
The RuneCountInString(s string) (n int)
function of the utf8 package can be used to find the length of the string. This method takes a string as an argument and returns the number of runes in it.
As we discussed earlier, len(s)
is used to find the number of bytes in the string and it doesn’t return the string length. As we already discussed, some Unicode characters have code points that occupy more than 1 byte. Using len
to find out the length of those strings will return the incorrect string length.
1package main
2
3import (
4 "fmt"
5 "unicode/utf8"
6)
7
8func main() {
9 word1 := "Señor"
10 fmt.Printf("String: %s\n", word1)
11 fmt.Printf("Length: %d\n", utf8.RuneCountInString(word1))
12 fmt.Printf("Number of bytes: %d\n", len(word1))
13
14 fmt.Printf("\n")
15 word2 := "Pets"
16 fmt.Printf("String: %s\n", word2)
17 fmt.Printf("Length: %d\n", utf8.RuneCountInString(word2))
18 fmt.Printf("Number of bytes: %d\n", len(word2))
19}
The output of the above program is
String: Señor
Length: 5
Number of bytes: 6
String: Pets
Length: 4
Number of bytes: 4
The above output confirms that len(s)
and RuneCountInString(s)
return different values 😀.
String comparison
The ==
operator is used to compare two strings for equality. If both the strings are equal, then the result is true
else it’s false
.
1package main
2
3import (
4 "fmt"
5)
6
7func compareStrings(str1 string, str2 string) {
8 if str1 == str2 {
9 fmt.Printf("%s and %s are equal\n", str1, str2)
10 return
11 }
12 fmt.Printf("%s and %s are not equal\n", str1, str2)
13}
14
15func main() {
16 string1 := "Go"
17 string2 := "Go"
18 compareStrings(string1, string2)
19
20 string3 := "hello"
21 string4 := "world"
22 compareStrings(string3, string4)
23
24}
In the compareStrings
function above, line no. 8 compares whether the two strings str1
and str2
are equal using the ==
operator. If they are equal, it prints a corresponding message and the function returns.
The above program prints,
Go and Go are equal
hello and world are not equal
String concatenation
There are multiple ways to perform string concatenation in Go. Let’s look at a couple of them.
The most simple way to perform string concatenation is using the +
operator.
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 string1 := "Go"
9 string2 := "is awesome"
10 result := string1 + " " + string2
11 fmt.Println(result)
12}
In the program above, in line no. 10, string1
is concatenated to string2
with a space in the middle. This program prints,
Go is awesome
The second way to concatenate strings is using the Sprintf function of the fmt package.
The Sprintf
function formats a string according to the input format specifier and returns the resulting string. Let’s rewrite the above program using Sprintf
function.
1package main
2
3import (
4 "fmt"
5)
6
7func main() {
8 string1 := "Go"
9 string2 := "is awesome"
10 result := fmt.Sprintf("%s %s", string1, string2)
11 fmt.Println(result)
12}
In line no. 10 of the program above, %s %s
is the format specifier input for Sprintf
. This format specifier takes two strings as input and has a space in between. This will concatenate the two strings with a space in the middle. The resulting string is stored in result
. This program also prints,
Go is awesome
Strings are immutable
Strings are immutable in Go. Once a string is created it’s not possible to change it.
1package main
2
3import (
4 "fmt"
5)
6
7func mutate(s string)string {
8 s[0] = 'a'//any valid unicode character within single quote is a rune
9 return s
10}
11func main() {
12 h := "hello"
13 fmt.Println(mutate(h))
14}
In line no. 8 of the above program, we try to change the first character of the string to 'a'
. Any valid Unicode character within a single quote is a rune. We try to assign the rune a
to the zeroth position of the slice. This is not allowed since the string is immutable and hence the program fails to compile with error ./prog.go:8:7: cannot assign to s[0]
To workaround this string immutability, strings are converted to a slice of runes. Then that slice is mutated with whatever changes are needed and converted back to a new string.
1package main
2
3import (
4 "fmt"
5)
6
7func mutate(s []rune) string {
8 s[0] = 'a'
9 return string(s)
10}
11func main() {
12 h := "hello"
13 fmt.Println(mutate([]rune(h)))
14}
In line no.7 of the above program, the mutate
function accepts a rune slice as an argument. It then changes the first element of the slice to 'a'
, converts the rune back to string and returns it. This method is called from line no. 13 of the program. h
is converted to a slice of runes and passed to mutate
in line no. 13. This program outputs aello
I have created a single program in GitHub which includes everything we discussed. You can download it here.
That’s it for strings. Have a great day.
Please share your valuable comments and feedback. Please consider sharing this tutorial on twitter and LinkedIn.
Next tutorial - Pointers