Previously, we covered the basics behind virtual methods. If you aren’t wholly comfortable with the subject yet, I’d recommend you go check that post out first. But if you are comfortable, we’re going to delve into the wacky fun world of multiple inheritance, and see how that affects virtual function calls.
I touched briefly on how vtables work for multiple inheritance in the last post, so let’s recap that to frame our discussion.
class A { private: int i; public: A() : i( 42 ) {} virtual void Foo() { } }; class B { private: double d; public: B() : d( 1.0 ) {} virtual void Bar() { } }; class C : public A, public B { private: char *s; public: C() : s( "Hello World" ), A(), B() {} void Foo() { } };
In this example, C inherits from both A and B. Therefore, C has an is-a relationship with both A and C. This means that C needs to contain information about the virtual methods of A and B in order to behave properly when invoked polymorphically. Let’s look at some example usages:
A *a = new C(); B *b = new C(); C *c = new C();
This is legal because of the is-a relationship C has with A and B. But what would the class layouts look like?
Offset | Field | Size |
0 | vtable ptr | sizeof(void*) |
4 | i | sizeof(int) |
Offset | Field | Size |
0 | vtable ptr | sizeof(void*) |
4 | d | sizeof(double) |
Offset | Field | Size |
0 | vtable ptr | sizeof(void*) |
4 | i | sizeof(int) |
8 | vtable ptr | sizeof(void*) |
12 | d | sizeof(double) |
20 | s | sizeof(char*) |
The layouts for A and B should be of no real surprise as they’re the same concept discussed in the previous post. But the layout for C is rather interesting. Because C is an instance of both A and B, C needs to have the ability to call into either vtable. Based on our current class definition, C overrides A::Foo, so polymorphically attempting to call a->Foo() should wind up in C::Foo. Taking a more concrete look at the layouts in memory:
Address | Field | Size | Value |
0x00045032 | vtable ptr | sizeof(void*) | 0x00345080 |
0x00045036 | i | sizeof(int) | 42 |
0x0004503A | vtable ptr | sizeof(void*) | 0x00169068 |
0x0004503E | d | sizeof(double) | 1.0 |
0x00045046 | s | sizeof(char*) | 0x0016907C |
Address | Value |
0x00345080 | index 0: 0x00e04580 |
0x00169068 | index 0: 0x00e04584 |
0x0016907C | Hello World |
0x00e04580 | C::Foo |
0x00e04584 | B::Bar |
Now that you’ve got a more firm memory model in mind, let’s look at the first interesting question — how does calling one of these virtual methods work? Let’s take a look at two cases:
c->Foo(); // #1 c->Bar(); // #2
(I think now would be a good time to mention that calling virtual functions is up to the compiler in much the same way as memory layouts are: compilers can do different things from what I am showing here. I am showing one possible example, on x86.)
When you call a C++ method on x86, the general convention everyone follows stipulates the following:
- The “this” pointer is stored in the ECX register
- The returned value, if any, will live in EAX
- Parameters are passed on the stack from right to left
Keeping this in mind (which shouldn’t be hard since there are no parameters and no return values in our example calls), let’s cover the steps taken to call a virtual function. First, the compiler needs to determine what function to call. It does this by looking at the proper vtable ptr for c, at the proper index. The compiler knows both of these pieces of information at compile time, since it is what determined the layout in the first place. It just doesn’t know where the resulting call will take you (hence the requirement for vtables). The Foo function lives in the first vtable ptr, at the first index. So the compiler loads that address into a register. Then the compiler loads the “this” pointer, which is the variable c, into ECX. Then it calls into the address previously loaded to transfer control to the function.
mov eax, [c] ; Load the function pointer mov ecx, c ; Load the this pointer call eax ; Call the function
The first instruction may be the most difficult to understand. The variable c is a pointer in memory. The first field at that location is the vtable pointer for class C. We need to load the first index of the vtable pointer. This means that we merely need to dereference c to get to that index.
To bring this assembly back into reality, it means that C::Foo will be called, and the this pointer will point to c. The reason C::Foo is called is because that’s the value placed in the vtable ptr for class C.
Now let’s take a look at #2 to see how this call works.
mov edx, c ; Get the instance pointer add edx, 8 ; Advance by 8 bytes mov eax, [edx] ; Load the function pointer mov ecx, edx ; Load the this pointer call eax ; Call the function
This one is a little bit different. First we get the the instance pointer c, then we add 8 to it. From there on out, the rest of the steps are the same. That offset we have to do is called a thunk, and it serves a very important purpose. Because we are calling B::Bar, the this pointer needs to look like a B* when calling the function. Otherwise, the offset to the member variable d (if used) would be wrong within the context of the function call. We didn’t have to worry about this for calls to A or C’s methods because the compiler can calculate those offsets at compile time. The offsets to A’s data are inherently safe because A’s vtable pointer is the first member of the structure. The offsets to C’s data can be calculated because the compiler knows the location of C’s data.
These same concepts apply regardless of how deep the multiple inheritance goes. The most-derived class will share the vtable pointer of one of the base classes, and virtual function calls within some of the base classes may require thunks to ensure the offsets are calculated properly. The take-home points to remember from this are:
- A class definition with virtual methods will have one vtable associated with it, and the fields in this table are calculated at compile time (ignoring relocations handled by the loader).
- A class instance with virtual methods will have one or more vtable pointers used for polymorphic dispatch.
- The compiler may need to insert thunks to ensure that offsets to data members of class instances are resolved properly.
- Nothing comes for free! Using virtual methods has overhead, and using multiple inheritance has more overhead. If performance is a concern, don’t use either concept!
Hopefully this sheds some light on a somewhat dim concept in C++. As usual, if you have questions or think I missed something, please bring it up!
Thank you for your greate article! It’s very clear ! Howerver, I have a question about c->Bar(), if we overide Bar() in C, in which we may do something on s, however, the “this” pointer is changed to point to the B subobject in order to visit d, however, how can we find the right s, because the offset of s is 20, but not 12.
@zhangqx — If you override Bar() in C, then when the “this” pointer is adjusted prior to calling Bar(), it is adjusted to point to the C object, not the B object (otherwise there would be no way to access C’s member variables). Similarly, if C::Bar() were to call into B::Bar(), the “this” pointer would be adjusted accordingly prior to jumping into B::Bar().
Overriding inherited virtual functions is easy — as long as you’re not trying to override a virtual function that has the same signature in two base classes. This can happen even when the base classes don’t come from different vendors!
How private variables I and d are in c?
why go assembler
how do I make the right function beeing called ?