Strings

Welcome to tutorial no. 14 in Golang tutorial series.

Strings deserve a special mention in Go as they are different in implementation when compared to other languages.

What is a String?

A string is a slice of bytes in Go. Strings can be created by enclosing a set of characters inside double quotes " ".

Let’s look at a simple example that creates a string and prints it.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	name := "Hello World"
 9	fmt.Println(name)
10}

Run in playground

The above program will print Hello World.

Strings in Go are Unicode compliant and are UTF-8 Encoded.

Accessing individual bytes of a string

Since a string is a slice of bytes, it’s possible to access each byte of a string.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func printBytes(s string) {
 8	fmt.Printf("Bytes: ")
 9	for i := 0; i < len(s); i++ {
10		fmt.Printf("%x ", s[i])
11	}
12}
13
14func main() {
15	name := "Hello World"
16	fmt.Printf("String: %s\n", name)
17	printBytes(name)
18}

Run in playground

%s is the format specifier to print a string. In line no. 16, the input string is printed. In line no. 9 of the program above, len(s) returns the number of bytes in the string and we use a for loop to print those bytes in hexadecimal notation. %x is the format specifier for hexadecimal. The above program outputs

String: Hello World
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64  

These are the Unicode UT8-encoded values of Hello World. A basic understanding of Unicode and UTF-8 is needed to understand strings better. I recommend reading https://naveenr.net/unicode-character-set-and-utf-8-utf-16-utf-32-encoding/ to know more about Unicode and UTF-8.

Accessing individual characters of a string

Let’s modify the above program a little bit to print the characters of the string.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func printBytes(s string) {
 8	fmt.Printf("Bytes: ")
 9	for i := 0; i < len(s); i++ {
10		fmt.Printf("%x ", s[i])
11	}
12}
13
14func printChars(s string) {
15	fmt.Printf("Characters: ")
16	for i := 0; i < len(s); i++ {
17		fmt.Printf("%c ", s[i])
18	}
19}
20
21func main() {
22	name := "Hello World"
23	fmt.Printf("String: %s\n", name)
24	printChars(name)
25	fmt.Printf("\n")
26	printBytes(name)
27}

Run in playground

In line no.17 of the program above, %c format specifier is used to print the characters of the string in the printChars method. The program prints

String: Hello World
Characters: H e l l o   W o r l d 
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64 

Although the above program looks like a legitimate way to access the individual characters of a string, this has a serious bug. Let’s find out what that bug is.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func printBytes(s string) {
 8	fmt.Printf("Bytes: ")
 9	for i := 0; i < len(s); i++ {
10		fmt.Printf("%x ", s[i])
11	}
12}
13
14func printChars(s string) {
15	fmt.Printf("Characters: ")
16	for i := 0; i < len(s); i++ {
17		fmt.Printf("%c ", s[i])
18	}
19}
20
21func main() {
22	name := "Hello World"
23	fmt.Printf("String: %s\n", name)
24	printChars(name)
25	fmt.Printf("\n")
26	printBytes(name)
27	fmt.Printf("\n\n")
28	name = "Señor"
29	fmt.Printf("String: %s\n", name)
30	printChars(name)
31	fmt.Printf("\n")
32	printBytes(name)
33}

Run in playground

The output of the above program is

String: Hello World
Characters: H e l l o   W o r l d 
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64 

String: Señor
Characters: S e à ± o r 
Bytes: 53 65 c3 b1 6f 72 

In line no. 30 of the program above, we are trying to print the characters of Señor and it outputs S e à ± o r which is wrong. Why does this program break for Señor when it works perfectly fine for Hello World . The reason is that the Unicode code point of ñ is U+00F1 and its UTF-8 encoding occupies 2 bytes c3 and b1. We are trying to print characters assuming that each code point will be one byte long which is wrong. In UTF-8 encoding a code point can occupy more than 1 byte. So how do we solve this? This is where rune saves us.

Rune

A rune is a builtin type in Go and it’s the alias of int32. Rune represents a Unicode code point in Go. It doesn’t matter how many bytes the code point occupies, it can be represented by a rune. Let’s modify the above program to print characters using a rune.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func printBytes(s string) {
 8	fmt.Printf("Bytes: ")
 9	for i := 0; i < len(s); i++ {
10		fmt.Printf("%x ", s[i])
11	}
12}
13
14func printChars(s string) {
15	fmt.Printf("Characters: ")
16	runes := []rune(s)
17	for i := 0; i < len(runes); i++ {
18		fmt.Printf("%c ", runes[i])
19	}
20}
21
22func main() {
23	name := "Hello World"
24	fmt.Printf("String: %s\n", name)
25	printChars(name)
26	fmt.Printf("\n")
27	printBytes(name)
28	fmt.Printf("\n\n")
29	name = "Señor"
30	fmt.Printf("String: %s\n", name)
31	printChars(name)
32	fmt.Printf("\n")
33	printBytes(name)
34}

Run in playground

In line no. 16 of the program above, the string is converted to a slice of runes. We then loop over it and display the characters. This program prints,

String: Hello World
Characters: H e l l o   W o r l d 
Bytes: 48 65 6c 6c 6f 20 57 6f 72 6c 64 

String: Señor
Characters: S e ñ o r 
Bytes: 53 65 c3 b1 6f 72 

The above output is perfect. Just what we wanted 😀.

Accessing individual runes using for range loop

The above program is a perfect way to iterate over the individual runes of a string. But Go offers us a much easier way to do this using the for range loop.

 1package main
 2
 3import (  
 4    "fmt"
 5)
 6
 7func charsAndBytePosition(s string) {
 8	for index, rune := range s {
 9		fmt.Printf("%c starts at byte %d\n", rune, index)
10	}
11}
12
13func main() {  
14    name := "Señor"
15    charsAndBytePosition(name)
16}

Run in playground

In line no.8 of the program above, the string is iterated using for range loop. The loop returns the position of the byte where the rune starts along with the rune. This program outputs

S starts at byte 0
e starts at byte 1
ñ starts at byte 2
o starts at byte 4
r starts at byte 5

From the above output, it’s clear that ñ occupies 2 bytes since the next character o starts at byte 4 instead of byte 3 😀.

Creating a string from a slice of bytes

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	byteSlice := []byte{0x43, 0x61, 0x66, 0xC3, 0xA9}
 9	str := string(byteSlice)
10	fmt.Println(str)
11}

Run in playground

byteSlice in line no. 8 of the program above contains the UTF-8 Encoded hex bytes of the string Café. The program prints

Café

What if we have the decimal equivalent of hex values. Will the above program work? Let’s check it out.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	byteSlice := []byte{67, 97, 102, 195, 169}//decimal equivalent of {'\x43', '\x61', '\x66', '\xC3', '\xA9'}
 9	str := string(byteSlice)
10	fmt.Println(str)
11}

Run in playground

Decimal values also work and the above program will also print Café.

Creating a string from a slice of runes

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	runeSlice := []rune{0x0053, 0x0065, 0x00f1, 0x006f, 0x0072}
 9	str := string(runeSlice)
10	fmt.Println(str)
11}

Run in playground

In the above program runeSlice contains the Unicode code points of the string Señor in hexadecimal. The program outputs

Señor

String length

The RuneCountInString(s string) (n int) function of the utf8 package can be used to find the length of the string. This method takes a string as an argument and returns the number of runes in it.

As we discussed earlier, len(s) is used to find the number of bytes in the string and it doesn’t return the string length. As we already discussed, some Unicode characters have code points that occupy more than 1 byte. Using len to find out the length of those strings will return the incorrect string length.

 1package main
 2
 3import (
 4	"fmt"
 5	"unicode/utf8"
 6)
 7
 8func main() {
 9	word1 := "Señor"
10	fmt.Printf("String: %s\n", word1)
11	fmt.Printf("Length: %d\n", utf8.RuneCountInString(word1))
12	fmt.Printf("Number of bytes: %d\n", len(word1))
13
14	fmt.Printf("\n")
15	word2 := "Pets"
16	fmt.Printf("String: %s\n", word2)
17	fmt.Printf("Length: %d\n", utf8.RuneCountInString(word2))
18	fmt.Printf("Number of bytes: %d\n", len(word2))
19}

Run in playground

The output of the above program is

String: Señor
Length: 5
Number of bytes: 6

String: Pets
Length: 4
Number of bytes: 4

The above output confirms that len(s) and RuneCountInString(s) return different values 😀.

String comparison

The == operator is used to compare two strings for equality. If both the strings are equal, then the result is true else it’s false.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func compareStrings(str1 string, str2 string) {
 8	if str1 == str2 {
 9		fmt.Printf("%s and %s are equal\n", str1, str2)
10		return
11	}
12	fmt.Printf("%s and %s are not equal\n", str1, str2)
13}
14
15func main() {
16	string1 := "Go"
17	string2 := "Go"
18	compareStrings(string1, string2)
19	
20	string3 := "hello"
21	string4 := "world"
22	compareStrings(string3, string4)
23
24}

Run in playground

In the compareStrings function above, line no. 8 compares whether the two strings str1 and str2 are equal using the == operator. If they are equal, it prints a corresponding message and the function returns.

The above program prints,

Go and Go are equal
hello and world are not equal

String concatenation

There are multiple ways to perform string concatenation in Go. Let’s look at a couple of them.

The most simple way to perform string concatenation is using the + operator.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	string1 := "Go"
 9	string2 := "is awesome"
10	result := string1 + " " + string2
11	fmt.Println(result)
12}

Run in playground

In the program above, in line no. 10, string1 is concatenated to string2 with a space in the middle. This program prints,

Go is awesome

The second way to concatenate strings is using the Sprintf function of the fmt package.

The Sprintf function formats a string according to the input format specifier and returns the resulting string. Let’s rewrite the above program using Sprintf function.

 1package main
 2
 3import (
 4	"fmt"
 5)
 6
 7func main() {
 8	string1 := "Go"
 9	string2 := "is awesome"
10	result := fmt.Sprintf("%s %s", string1, string2)
11	fmt.Println(result)
12}

Run in playground

In line no. 10 of the program above, %s %s is the format specifier input for Sprintf. This format specifier takes two strings as input and has a space in between. This will concatenate the two strings with a space in the middle. The resulting string is stored in result. This program also prints,

Go is awesome

Strings are immutable

Strings are immutable in Go. Once a string is created it’s not possible to change it.

 1package main
 2
 3import (  
 4    "fmt"
 5)
 6
 7func mutate(s string)string {
 8	s[0] = 'a'//any valid unicode character within single quote is a rune 
 9	return s
10}
11func main() {  
12    h := "hello"
13    fmt.Println(mutate(h))
14}

Run in playground

In line no. 8 of the above program, we try to change the first character of the string to 'a'. Any valid Unicode character within a single quote is a rune. We try to assign the rune a to the zeroth position of the slice. This is not allowed since the string is immutable and hence the program fails to compile with error ./prog.go:8:7: cannot assign to s[0]

To workaround this string immutability, strings are converted to a slice of runes. Then that slice is mutated with whatever changes are needed and converted back to a new string.

 1package main
 2
 3import (  
 4    "fmt"
 5)
 6
 7func mutate(s []rune) string {
 8	s[0] = 'a' 
 9	return string(s)
10}
11func main() {  
12    h := "hello"
13    fmt.Println(mutate([]rune(h)))
14}

Run in playground

In line no.7 of the above program, the mutate function accepts a rune slice as an argument. It then changes the first element of the slice to 'a', converts the rune back to string and returns it. This method is called from line no. 13 of the program. h is converted to a slice of runes and passed to mutate in line no. 13. This program outputs aello

I have created a single program in GitHub which includes everything we discussed. You can download it here.

That’s it for strings. Have a great day.

Please share your valuable comments and feedback. Please consider sharing this tutorial on twitter and LinkedIn.

Next tutorial - Pointers