TL;DR,
Memory allocation happens immediately when converting between string and slice. To avoid using unnecessary memory, below is the table when reading utf-encoded string:
string content | random read | sequential read | write |
---|---|---|---|
ascii | string | string | []byte |
utf-8 | []rune |
range |
[]rune |
Random read
string to rune slice
To random access character in UTF-8 encoded string, it’s common to convert str to rune
slices.
str := "世界"
letters := []rune(str)
When we do letters[1]
, does it decode UTF-8 character on the fly?
The answer is no. The UTF is variant length encoded, so there’s no way to know the index of this UTF-8 character in the string without knowing all characters ahead. Rune slice is not index-wise mapped to underlying bytes of string. That’s to say, during construction, all bytes from string will be read and decoded. The construction operation is quite expensive.
What’s more, COW (copy-on-write) for rune slice is unnecessary. If you construct a rune slice from string and never read/write it, the optimizer can ignore that construction.
From assembly code like this, golang calls runtime.stringtoslicerune
, which hides the detail of the memory allocation. Therefore, I turn to golang benchmark tool.
The benchmark setting is below:
import "testing"
var str string
func init() {
// 1024B
tmp := make([]byte, 1024)
str = string(tmp)
}
The template string is 1024 ASCII bytes, which is 1024B or 1MiB in total. As we will see soon, it’s easy to figure out how much memory is allocated for each benchmark test.
func BenchmarkStrToBytes(b *testing.B) {
ll := len(str)
for ii := 0; ii < b.N; ii++ {
newBytes := []byte(str)
newBytes[ll-1] = 10
}
}
func BenchmarkStrToRunes(b *testing.B) {
ll := len(str)
for ii := 0; ii < b.N; ii++ {
newRunes := []rune(str)
newRunes[ll-1] = 10
}
}
In the above 2 benchmark examples, I do extra assignments to make sure memory is allocated if there’s any COW. Let’s run a benchmark test with go test -benchmem -bench=.
:
BenchmarkStrToBytes-12 8774433 131 ns/op 1024 B/op 1 allocs/op
BenchmarkStrToRunes-12 624459 1658 ns/op 4096 B/op 1 allocs/op
As we know, rune
is an alias to int
, which is quad-word sized (4 bytes). As a result, the rune slice is 4096B, which is indeed 4x of the corresponding ASCII byte slice.
Rune slice is only meant for random read or write UTF strings. If you use it for a sequential read, it would be memory inefficient, as it’s shown above. To sequential read a UTF encoded string, check here.
string to byte slice
Converting string to byte slice for reading is unnecessary. Index operator on the original string is sufficient for random read and range
for a sequential read.
What I want to discuss here is whether there’s COW for byte slice. Unlike rune slice, elements from byte slice are index-wise mapped from string elements. COW is possible.
Let’s see if there’s any COW provided by golang by just reading the byte slice without any writes.
func BenchmarkStrToBytesCow(b *testing.B) {
tmp := func(str string) []byte {
newBytes := []byte(str)
return newBytes
}
for ii := 0; ii < b.N; ii++ {
tmp(str)
}
}
Benchmark test result shows allocation happens even there’s no write to the byte slice constructed from a string.
BenchmarkStrToBytesCow-12 9071371 131 ns/op 1024 B/op 1 allocs/op
That is to say, whenever you initialize byte slice from string, you mean to use it for write. Even if you don’t write, new memory is allocated anyways.
string from slice
Whenever you convert slice to string, new memory is allocated immediately. The reasoning is simple: string is immutable, and slice content can be mutated. Therefore, string needs to have a copy of the content in case the underlying memory pointed by the byte slice is modified afterward.
# simple benchmark
func BenchmarkBytesToStr(b *testing.B) {
tmp := func(bytes []byte) byte {
newStr := string(bytes)
return newStr[len(newStr)-1]
}
newBytes := []byte(str)
for ii := 0; ii < b.N; ii++ {
tmp(newBytes)
}
}
The output for this benchmark test:
BenchmarkBytesToStr-12 9468996 125 ns/op 1024 B/op 1 allocs/op
As it’s shown above, there’s one allocation for each construction from byte slice to string. No COW is observed since it’s troublesome to monitor the memory pointed by the byte slice.
Sequential read
You can also construct byte or rune slice to iterate the content of the string. That would be inefficient since extra memory allocation will drag down the performance.
The better ways:
-
use original string to read bytes by either
range
or index operator. -
use
range
for UTF string, which will decoderune
on the fly for rune slice by calling runtime.decoderune.
func BenchmarkStrToRuneRange(b *testing.B) {
tmp := func(str string) rune {
var tmp rune
for _, ch := range str {
tmp = ch
}
return tmp
}
for ii := 0; ii < b.N; ii++ {
tmp(str)
}
}
Benchmark output:
BenchmarkStrToRunes-12 624459 1658 ns/op 4096 B/op 1 allocs/op
BenchmarkStrToRuneRange-12 1660866 723 ns/op 0 B/op 0 allocs/op
No extra allocation is needed and it’s twice faster than the method from the previous section that converts string to rune slice.