brk or break

I read one great article for introducing JIT. There’s a section where it mentions heap memory can be requested by system call brk and sbrk. But what do they stands for?

From the man page,

brk() and sbrk() change the location of the program break, which defines the end of the process’s data segment (i.e., the program break is the first location after the end of the uninitialized data segment. Increasing the program break has the effect of allocating memory to the process; decreasing the break deallocates memory.

  • brk is short for program break address, whose initial and lowest value is first address after the end of BSS.

  • sbrk stands for space increments after program break address, which means to incrementally add new memory to the heap region, as it is shown below. This system call is actually used by malloc and free.

program memory layout

digression to BSS

BSS stands for Block Started by Symbol, which means in binary (object) file, there’s no actual data in this section. This section stores the symbols that are uninitialized in text code and meta data needed to initialize the data at runtime.

In C, statically-allocated objects without an explicit initializer are initialized to zero (for arithmetic types) or a null pointer (for pointer types). Implementations of C typically represent zero values and null pointer values using a bit pattern consisting solely of zero-valued bits (though this is not required by the C standard).

Unlike the data segment, actual value (data) is stored. BSS section only stores symbol and its size because its value will be zero initialized. For example, if you have a int a[1000], you can store its symbol and size (4 bytes) as meta data, instead of 4000 bytes into its binary (object) file. That’s why people sometimes call it Better Save Space. When the object file is loaded at runtime, and right before program starts, BSS will be initialized with 4000 bytes of memory, with all bytes set to zero.

the BSS segment typically includes all uninitialized objects (both variables and constants) declared at file scope (i.e., outside any function) as well as uninitialized static local variables (local variables declared with the static keyword)

Let’s examine it with global variable,

cat <<EOF > gui.cc
int global; /* Uninitialized variable stored in bss*/
int main(void) { return 0; }
EOF

compile it and check:

$ clang++ gui.cc
$ size
text  data   bss   dec   hex  filename
87    0      4     91    5b   a.out

The bss size is 4, which is size for int type in my machine. Let’s initialize the global variable with some value explicitly,

cat <<EOF > gui.cc
int global = 53; /* initialized data in data segment */
int main(void) { return 0; }
EOF

compile it and check the size required by each section,

$ clang++ gui.cc
$ size
text  data   bss   dec   hex  filename
87    4      0     91    5b   a.out

As you can see, bss size is 0 and data is 4 now.