tl;dr

For simplicity, part 1 only discusses how vtable works with single-inheritance. Part 2 will come soon to cover how vtable works with multi-inheritance. Also, I assume the inheritance is done through virtual functions not the static shadowing.

We all know virtual function call cannot be inlined and it has overhead of runtime table looking up. But how it’s implemented?

Reading high-level C/C++ code won’t give us any hints about the underlying implementation. In most cases, we can have a better understanding from low level assembly code since it’s very close to how machine understands the high level concept.

Through this post, I will use this high level Cpp code demo.cc for demonstration:

#include <cstdio>

struct Bar {
    virtual void method1() {
        std::puts("method1: this is bar");
    }

    virtual void method2() {
        std::puts("method2: this is bar");
    }
    virtual ~Bar() {};
};

struct Foo: public Bar {
    virtual void method1() {
        std::puts("method1: hi, I'm foo");
    }
    virtual ~Foo() {};
};

int main() {
    Bar bar;
    Foo foo;
    Bar* bar_arr[2];
    bar_arr[0] = &bar;
    bar_arr[1] = dynamic_cast<Bar*>(&foo);

    for (auto ptr: bar_arr) {
        ptr->method1();
        ptr->method2();
    }
    return 0;
}

C++ memory layout

First, let’s understand the C++ memory layout or how object is represented in C++.

$ clang -cc1 -std=c++11 -fdump-record-layouts demo.cc
... # ignore the error if you see any
*** Dumping AST Record Layout
         0 | struct Bar
         0 |   (Bar vtable pointer)
           | [sizeof=8, dsize=8, align=8,
           |  nvsize=8, nvalign=8]

*** Dumping AST Record Layout
         0 | struct Foo
         0 |   struct Bar (primary base)
         0 |     (Bar vtable pointer)
           | [sizeof=8, dsize=8, align=8,
           |  nvsize=8, nvalign=8]

0 on the left of the pipe means the offset from the object memory address. We can infer the size of the vtable pointer is 8 from sizeof because it starts at 0 and the total size of object is 8 bytes.

As the above output shows, the instance of struct Bar and Foo only contains a pointer inside. So the sizeof(bar) and sizeof(foo) should be 8 on 64 bits machine.

For single-instance, we only need to know sizeof, dsize and align.

nvsize (optional)

Feel free to skip this section.

Here, dsize means object size without trailing alignment. What about nvsize? nvsize simply means the size of non-pure-virtual part.

Let’s check a modified code example from pure virtual inheritance wiki:

// wiki.cc
struct A {
    virtual ~A() = default;
    int aa; // demo purpose, add non-vtable-pointer memory to A
    virtual void a_a() {}
};

struct B: virtual A {
    virtual void b_b() {}
};

struct C: virtual A {
    virtual void c_c() {}
};

struct D: B, C {};

int main() {
  return sizeof(D);
}

Below is a virtualization of the inheritance hierarchy,

vis-pure-virtual-inheritance

It’s possible to infer the same graphical information from the memory layout output by compiler frontend too.

$ clang -cc1 -std=c++11 -fdump-record-layouts wiki.cc
*** Dumping AST Record Layout
         0 | struct A
         0 |   (A vtable pointer)
         8 |   int aa
           | [sizeof=16, dsize=12, align=8,
           |  nvsize=12, nvalign=8]

*** Dumping AST Record Layout
         0 | struct B
         0 |   (B vtable pointer)
         8 |   struct A (virtual base)
         8 |     (A vtable pointer)
        16 |     int aa
           | [sizeof=24, dsize=20, align=8,
           |  nvsize=8, nvalign=8]

// omit C, which is same as B

*** Dumping AST Record Layout
         0 | struct D
         0 |   struct B (primary base)
         0 |     (B vtable pointer)
         8 |   struct C (base)
         8 |     (C vtable pointer)
        16 |   struct A (virtual base)
        16 |     (A vtable pointer)
        24 |     int aa
           | [sizeof=32, dsize=28, align=8,
           |  nvsize=16, nvalign=8]

As you can see the layout of struct D, it has 3 vtable pointers, namely, B’s vtable pointer at offset 0, C at 8, and A’s at 24. So the dsize is 3 * 8 + sizeof(int), 28. The final vtable generated by compiler for D is quite complex and we won’t cover that in part 1.

Back to the point, the nvsize of D is the sum of nvsize of B (8 bytes, a vtable pointer) and C (8 bytes, a vtable pointer), without virtual base A taken into account because D only has one A instance along the inheritance hierarchy. Thus, the dsize is nvsize of D plus one A’s nvsize.

In general, the virtual inheritance vtable implementation is quite complicated. You can try to remove int aa; line from the pure virtual base A. The result is quite surprising.


Assembly code for vtable implementation

Here’s an code example of how vtable is implemented in assembly code level. Below is using GNU assembler directives for vtable declaration:

vtable for Foo:
        .quad   0
        .quad   typeinfo for Foo    // RTTI for foo
        .quad   Foo::method1()      // where vtable starts
        .quad   Bar::method2()      // for method2 in vtable
        .quad   Foo::~Foo() [complete object destructor]
        .quad   Foo::~Foo() [deleting destructor]
vtable for Bar:
        .quad   0   // qword -> 8 bytes, used as mask for multi-inheritance
        .quad   typeinfo for Bar    // RTTI for bar
        .quad   Bar::method1()
        .quad   Bar::method2()
        .quad   Bar::~Bar() [complete object destructor]
        .quad   Bar::~Bar() [deleting destructor]

For each class instance, regardless of parent or children class, it carries a pointer to its RTTI) and vtable. In most cases, people refer this pointer as a vtable pointer, which is not precise.

let’s check this line,

    for (auto ptr: bar_arr) {
        ptr->method1();
        ptr->method2();
    }

First, it needs to store the vtable address into the stack along with the variable bar and foo, recall the memory layout:

        // bar memory layout is a simple vtable pointer, size is 8
        mov     eax, OFFSET FLAT:vtable for Bar+16
        mov     QWORD PTR [rbp-56], tax
        // same for foo
        mov     eax, OFFSET FLAT:vtable for Foo+16
        mov     QWORD PTR [rbp-64], rax

The compiler knows the offset of method1 and method2 in vtable for Foo abd Bar, where method2 should offset 8 from method1, see vtable layout.

        mov     rax, QWORD PTR [rbp-48] // temp var, store current arr iterator: ptr = bar_arr[0]
        mov     rax, QWORD PTR [rax]
        mov     rdx, QWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-48]
        mov     rdi, rax                // set up parameters for function call
        call    rdx                     // call method1 from vtable stored in the object, e.g., bar and foo
        mov     rax, QWORD PTR [rbp-48]
        mov     rax, QWORD PTR [rax]
        add     rax, 8                  // method1 + 8 offset -> method2
        mov     rdx, QWORD PTR [rax]
        mov     rax, QWORD PTR [rbp-48]
        mov     rdi, rax
        call    rdx

A visualization for stack layout of above assembly code,

# rbp - offset | c syntax | [// comment]
-24 | bar_arr (Bar**)     | // used to set up start and end
-32 | bar_arr (Bar**)     | // start of bar_arr[]
-40 | bar_arr + 2         | // end of bar_arr[]
-48 | ptr Bar*            | // like loop counter index i
-56 | bar (a vtable ptr)  |
-64 | foo (a vtable ptr)  |
-72 | &foo (bar_arr[1])   |
-80 | &bar (bar_arr[0])   |

As we can see, as long as the compiler knows the type information and method name at compile time, it knows where to load the real function call address from the object’s vtable.

Summary

Summarizing our example above.

vtable-layout

Each object carries a vtable pointer pointing to a vtable data structure that gathers addresses of all virtual functions calls associated with its type/class, including its destructor. In above example, it’s the address associated with assembly global label vtable for Foo and vtable for Bar.

At runtime, the compiler loads the vtable pointer and locate the function address for that object by adding offset according to the function name used in the source code. In above example, method1’s offset is 16 and method2 is 24 from the starting address of the vtable data structure of each type, i.e., Foo and Bar, in the rodata section.

Therefore, loading vtable pointer and performing runtime function look up will pose extra performance overhead.

Further readings