Part 14: Strings

Welcome to tutorial no. 14 in Golang tutorial series.

Strings deserve a special mention in Go as they are different in implementation when compared to other languages.

What is a String?

A string in Go is a slice of bytes. Strings can be created by enclosing their contents inside " ". Lets look at a simple example which creates a string and prints it.

package main

import (  
    "fmt"
)

func main() {  
    name := "Hello World"
    fmt.Println(name)
}

The above program will output Hello World.

Strings in Go are Unicode compliant and are UTF-8 Encoded.

Accessing individual bytes of a string

Since a string is a slice of bytes, its possible to access each byte of a string.

package main

import (  
    "fmt"
)

func printBytes(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func main() {  
    name := "Hello World"
    printBytes(name)
}

In line no. 8 of the above program, len(s) returns the number of bytes in the string and we use a for loop to print those bytes in hexadecimal notation. %x is the format specifier for hexadecimal. The above program outputs 48 65 6c 6c 6f 20 57 6f 72 6c 64. These are the Unicode UT8-encoded values of "Hello World". A basic understanding of Unicode and UTF-8 is needed to understand strings better. I recommend reading https://naveenr.net/unicode-character-set-and-utf-8-utf-16-utf-32-encoding/ to know what is Unicode and UTF-8.

Lets modify the above program a little bit to print the characters of the string.

package main

import (  
    "fmt"
)

func printBytes(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}


func printChars(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%c ",s[i])
    }
}

func main() {  
    name := "Hello World"
    printBytes(name)
    fmt.Printf("\n")
    printChars(name)
}

In line no.16 of the printChars method, %c format specifier is used to print the characters of the string. The program outputs

48 65 6c 6c 6f 20 57 6f 72 6c 64  
H e l l o   W o r l d  

Although the above program looks like a legitimate way to access the individual characters of a string, this has a serious bug. Lets break this code to see what we are doing wrong.

package main

import (  
    "fmt"
)

func printBytes(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func printChars(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%c ",s[i])
    }
}

func main() {  
    name := "Hello World"
    printBytes(name)
    fmt.Printf("\n")
    printChars(name)
    fmt.Printf("\n")
    name = "Señor"
    printBytes(name)
    fmt.Printf("\n")
    printChars(name)
}

The output of the above program is

48 65 6c 6c 6f 20 57 6f 72 6c 64  
H e l l o   W o r l d 

53 65 c3 b1 6f 72  
S e à ± o r  

In line no. 28 of the above program, we are trying to print the characters of Señor and it outputs S e à ± o r which is wrong. Why does this program break for Señor when its perfectly alright with Hello World. The reason is that the Unicode code point of ñ is U+00F1 and its UTF-8 encoding occupies 2 bytes c3 and b1. We are trying to print characters assuming that each code point will be one byte long which is wrong. In UTF-8 encoding a code point can occupy more than 1 byte. So how do we solve this. This is where rune saves us.



rune

A rune is a builtin type in Go and its the alias of int32. rune represents a Unicode code point in Go. It does not matter how many bytes the code point occupies, it can be represented by a rune. Lets modify the above program to print characters using a rune.

package main

import (  
    "fmt"
)

func printBytes(s string) {  
    for i:= 0; i < len(s); i++ {
        fmt.Printf("%x ", s[i])
    }
}

func printChars(s string) {  
    runes := []rune(s)
    for i:= 0; i < len(runes); i++ {
        fmt.Printf("%c ",runes[i])
    }
}

func main() {  
    name := "Hello World"
    printBytes(name)
    fmt.Printf("\n")
    printChars(name)
    fmt.Printf("\n\n")
    name = "Señor"
    printBytes(name)
    fmt.Printf("\n")
    printChars(name)
}

In line no.14 of the above program, the string is converted to a slice of runes. We then loop over it and display the character. This program outputs,

48 65 6c 6c 6f 20 57 6f 72 6c 64  
H e l l o   W o r l d 

53 65 c3 b1 6f 72  
S e ñ o r  

The above output is perfect. Just want we wanted :).

for range loop on a string

The above program is a perfect way to iterate over the individual runes of a string. But Go offers us a much easier way to do this using the for range loop.

package main

import (  
    "fmt"
)

func printCharsAndBytes(s string) {  
    for index, rune := range s {
        fmt.Printf("%c starts at byte %d\n", rune, index)
    }
}

func main() {  
    name := "Señor"
    printCharsAndBytes(name)
}

In line no.8 of the above program, the string is iterated using for range loop. The loop returns the position of the byte where the rune starts along with the rune. This program outputs

S starts at byte 0  
e starts at byte 1  
ñ starts at byte 2
o starts at byte 4  
r starts at byte 5  

From the above output its clear that ñ occupies 2 bytes :).

Constructing string from slice of bytes

package main

import (  
    "fmt"
)

func main() {  
    byteSlice := []byte{0x43, 0x61, 0x66, 0xC3, 0xA9}
    str := string(byteSlice)
    fmt.Println(str)
}

byteSlice in the program above contains the UTF-8 Encoded hex bytes of the string "Café". The program outputs Café.

What if we have the decimal equivalent of hex values. Will the above program work? Lets check it out.

package main

import (  
    "fmt"
)

func main() {  
    byteSlice := []byte{67, 97, 102, 195, 169}//decimal equivalent of {'\x43', '\x61', '\x66', '\xC3', '\xA9'}
    str := string(byteSlice)
    fmt.Println(str)
}

The above program will also output Café.

Constructing a string from slice of runes

package main

import (  
    "fmt"
)

func main() {  
    runeSlice := []rune{0x0053, 0x0065, 0x00f1, 0x006f, 0x0072}
    str := string(runeSlice)
    fmt.Println(str)
}

In the above program runeSlice contains the Unicode code points of the string Señor in hexadecimal. The program outputs Señor.

Length of the string

The func RuneCountInString(s string) (n int) function of the utf8 package is used to find the length of the string. This method takes a string as argument and returns the number of runes in it.

package main

import (  
    "fmt"
    "unicode/utf8"
)



func length(s string) {  
    fmt.Printf("length of %s is %d\n", s, utf8.RuneCountInString(s))
}
func main() {  

    word1 := "Señor" 
    length(word1)
    word2 := "Pets"
    length(word2)
}

The output of the above program is

length of Señor is 5  
length of Pets is 4  

Strings are immutable

Strings are immutable in Go. Once a string is created its not possible to change it.

package main

import (  
    "fmt"
)

func mutate(s string)string {  
    s[0] = 'a'//any valid unicode character within single quote is a rune 
    return s
}
func main() {  
    h := "hello"
    fmt.Println(mutate(h))
}

In line no. 8 of the above program we try to change the first character of the string to 'a'. This is not allowed since the string is immutable and hence the program throws error main.go:8: cannot assign to s[0]

To workaround this string immutability, strings are converted to a slice of runes. Then that slice is mutated with whatever changes needed and converted back to a new string.

package main

import (  
    "fmt"
)

func mutate(s []rune) string {  
    s[0] = 'a' 
    return string(s)
}
func main() {  
    h := "hello"
    fmt.Println(mutate([]rune(h)))
}

In line no.7 of the above program, the mutate function accepts a rune slice as argument. It then changes the first element of the slice to 'a', converts the rune back to string and returns it. This method is called from line no. 13 of the program. h is converted to a slice of runes and passed to mutate. This program outputs aello

I have created a single program in github which includes everything we discussed. You can download it here.

Thats it for strings. Have a great day.

Next tutorial - Pointers