Learn Rust fat pointer and type erasure from a Cpp programmer's perspective

The concept for Sized and Unsized is related to the type layout in Rust

Types where all values have the same size and alignment known at compile time implement the Sized trait and can be checked with the size_of and align_of functions. Types that are not Sized are known as dynamically sized types. Since all values of a Sized type share the same size and alignment, we refer to those shared values as the size of the type and the alignment of the type respectively.

What are types that don’t have known size in compile time? In Rust, there are two Unsized types: Slice and Trait. These Unsized types use fat pointers to reference the underlying object. The C-ABI compatible for fat pointer layout is like below:

// poitner1, pointer2
struct fat_pointer {
    void* first;
    void* second;
};

For TraitObject definition, #[repr(C)] denotes that the layout is C-ABI (size and alignment) compatible:

#[repr(C)]
#[derive(Copy, Clone)]
#[allow(missing_debug_implementations)]
pub struct TraitObject {
    pub data: *mut (),
    pub vtable: *mut (),
}

Note that this is the fat pointer physical stack layout for slice object but not always true for trait object. For trait object, second is a constant and it may be optimized to use register not stack memory. Check laster part trait object type erasure for more details.

inner vs external vpointer

As my previous post discussed, in C++, object carries a vtable pointer, even though you don’t use those virtual functions, or use any polymorphism, you still have to carry a vtable pointer and pay for the increased object size. Some people call this vtable pointer pattern as inner vpointer since it resides inside of each object.

For Rust, it’s different. If you don’t touch any polymorphism by using trait object, you don’t pay for extra size to carry this vtable pointer. Every function call is static, even for those virtual functions. When you use trait object, you use this extra vtable pointer to perform virtual functions dispatch and pay for the overhead.

In general, C++ implementations obey the zero-overhead principle: What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better. – Bjarne Stroustrup

Well, in this case, Rust wins over C++.

slice layout

In Rust vec deref implementation:

impl<T> ops::Deref for Vec<T> {
    type Target = [T];

    fn deref(&self) -> &[T] {
        unsafe {
            let p = self.buf.ptr();
            assume(!p.is_null());
            slice::from_raw_parts(p, self.len)
        }
    }
}

The field first is a pointer to the underlying buffer of the vector or the start of the slice, and rust will cast its type from void* to T* at compile time since [T] defines type for each element in slice. The second is a pure usize to define the length of the slice.

As convention, let’s code an example:

pub fn demo() {
    let x = "string".to_owned();
    let slice = &x[..];
    println!("{}", slice);
}

From assembly code:

    lea     rdi, [rsp + 40]
    call    <alloc::string::String as core::ops::index::Index<core::ops::range::RangeFull>>::index
    // move function call fat pointer result, 2 qword, to stack (temp vars)
    mov     qword ptr [rsp + 32], rdx
    mov     qword ptr [rsp + 24], rax
    // ...
    // slice storage: 2 qword
    mov     rax, qword ptr [rsp + 24]
    mov     qword ptr [rsp + 64], rax
    mov     rcx, qword ptr [rsp + 32]
    mov     qword ptr [rsp + 72], rcx

For slice type, the physical layout aligns with the C-ABI compatible layout. As the comparison with trait object type erasure, the physical layout always matches because unlike the second (length of slice), is not a constant and can be different on instance-by-instance, even though they share the same slice type.

trait object layout

Let’s first dig into an example in playground:

trait Bar {
    fn bar_method(&self) {
        println!("this is bar");
    }
}

trait Foo: Bar {
    fn foo_method(&self) {
        println!("this is foo");
    }
}

impl Bar for u8 {}
impl Foo for u8 {}

fn main() {
    let x: u8 = 35;
    let foo: &dyn Foo = &x;
    x.bar_method();
    foo.bar_method();
}

dyn is used to distinguish the Unsized type from Sized. In previous Rust versions, you can do let foo: &Foo = &x for both, which is not helpful for readability since you can’t tell Foo is a trait or a normal type at the first glance.

As we can see, object implements Foo should also implement foo_method and bar_method. Keep in mind, in Rust, there’s no inheritance.Foo is not a Bar. The syntax trait Foo: Bar simply means if type wants to implement Foo, it must also implement Bar.

# C++ equivalent for `impl Foo for u8`
class Baz : public Foo, public Bar { ... }

The Baz should carry a vpointer that has all information about the virtual functions, which are trait methods defined in Foo and Bar declaration. Since there’s no multiple inheritance in Rust, its vtable can be as simple as what we’ve seen in vtable for C++.

different vtable layout for different Trait type

In other words, Foo and Bar are siblings and there’s no hierarchy between them, i.e., a Foo is not a child of Bar. At compile time, all other dependent traits are known for the base trait and the compiler can define vtable layout for each different traits because different traits can not be dynamically cast to others, i.e., no hierarchical inheritance in Rust. I will explain this later.

From this example code, we know

trait Bar: A + B { /* ... */ }

Since there’s no multiple inheritance, the vtable for Bar can just simply stacks methods from A, B and Bar itself.

.L__unnamed_1:
        .quad   core::ptr::drop_in_place // destructor
        .quad   1 // size
        .quad   1 // align
        .quad   example::Bar::bar_method
        .quad   example::B::b_method
        .quad   example::A::a_method

Thus, the relative order of the methods is deterministic order. When you want to call b_method from a trait object &dyn Bar, the compiler knows to offset vtable by 32.

Also, vtable will only be generated when Trait Object is requested. That is to say, if you don’t use it, you don’t pay the storage for the vtable. Feel free to follow the comment in above example and check corresponding assembly code.

different trait object for different Trait type

For different trait types, the different trait objects are synthesized by rust compiler. For example, Foo trait would have a corresponding trait object TraitObjectFoo when &dyn Foo is requested, whereas Bar has TraitObjectBar. That is to say, even though Foo extends Bar, it’s prohibited to dynamically cast &dyn Foo to &dyn Bar, which is analogous to convert TraitObjectFoo to TraitObjectBar.

Let’s go through an example to get better understanding of above observation.

// below is all pseudo code
pub struct TraitObjectFoo {
    data: *mut (),
    vtable_ptr: &VTableFoo,
}

pub struct VTableFoo {
    layout: Layout,
    // destructor
    drop_in_place: unsafe fn(*mut ()),
    // methods shown in deterministic order
    foo_method: fn(*mut ()),
    bar_method: fn(*mut ()),
}

// fields contains Foo and Bar method addresses for u8 implementation
static VTABLE_FOO_FOR_U8: VTableFoo = VTableFoo { ... };

// let foo: &dyn Foo = &x;
let foo = TraitObjectFoo {&x, &VTABLE_FOO_FOR_U8}

At runtime, foo has a vtable VtableFoo and compiler knows the offset of bar_method in that table.

// foo.bar_method();
foo.vtable.offset(4)(foo.data);

vpointer is not always stacked in stack

We can also verify above in assembly code:

// let x: u8 = 35;
mov     byte ptr [rsp + 7], 35
// let foo: &dyn Foo = &x;
lea     rax, [rsp + 7]
// foo.bar_method();
mov     rdi, rax                           // move self, &x
call    qword ptr [rip + .L__unnamed_1+32] // bar_method(self)
// Foo vtable for u8
.L__unnamed_1:
        .quad   core::ptr::drop_in_place
        .quad   1
        .quad   1
        .quad   example::Foo::foo_method
        .quad   example::Bar::bar_method

As we can see in above assembly example, let foo:&dyn Foo = &x only takes the address of x (1 qword) instead of constructing this C-ABI layout on stack for variable foo, which requires 2 qwords. Instead, vpointer is stored in register in a disposable manner. We can also verify this by running std::raw::transmute example and checking the assembly code:

// load vtable pointer address to rax; hit and run
// rip contains address of current instruction being executed in CPU
    lea     rax, [rip + .L__unnamed_2]
// ...
// let raw_object: raw::TraitObject = unsafe { mem::transmute(object) };
    mov     qword ptr [rsp + 232], rcx // data*
    mov     qword ptr [rsp + 240], rax // vtable*

The reason is that the compiler knows variable x is a Foo type and the vtable address for Foo, i.e., a const relative offset .L__unnamed_2. Thus, there’s no need to pay extra storage in stack for Foo instance x and register can be used for faster access the vpointer for Foo. This is an optimization benefited from the fat pointer or external vtable layout. It’s like:

// c++ pseudo code
constexpr vpointer_for_foo = 0x1234;
Foo foo;
foo.data = &x;
foo.vpointer = vpointer_for_foo;
foo.vpointer.offset(4)(foo.data);

The vpointer can be inlined with optimization:

// c++ pseudo code
constexpr vpointer_for_foo = 0x1234;
int* data = &x;
(vpointer_for_foo+4)(data);

Therefore, register is sufficient to handle inlined value and no need to bother stack because it’s much more slower.

vpointer is stacked during type erasure

In this type erasure example, we can erase the x’s type info by

trait Base {
    fn base_method(&self) {
        println!("this is Base");
    }
}

// x type info is erased when returned
fn test_dyn_ret(x: &dyn Base) -> &dyn Base {
    x.base_method();
    x
}

In outer scope of test_dyn_ret function, x is a Foo type that compiler assures. However, type information is erased in inner scope of test_dyn_ret, and the compiler doesn’t have any knowledge of the type of x anymore. Compiler only knows a &dyn Base is returned not assuring the underlying instance’s type info inside that returned trait object. Therefore, it needs to store this vpointer to stack not only in callee frame but also in caller frame, i.e., the outer scope. The reason of this different behavior with previous inlined vpointer is that it doesn’t know how to infer and load the vpointer value inline, otherwise it would use register.

    // let foo = test_dyn_ret(&y);
    call    example::test_dyn_ret
    mov     qword ptr [rsp + 16], rax // vpointer in caller stack
    mov     qword ptr [rsp + 8], rdx  // self in caller stack

Above layout is the C-ABI layout we’ve seen. That’s type erasure for rust and the most common usage for Trait.

Learn Rust fat pointer and type erasure from a Cpp programmer's perspective

Understand TraitObject and Slice implementation

Learn Rust fat pointer and type erasure from a Cpp programmer's perspective

Understand TraitObject and Slice implementation

inner vs external vpointer

slice layout

trait object layout

different vtable layout for different Trait type

different trait object for different Trait type

vpointer is not always stacked in stack

vpointer is stacked during type erasure

references