Describing the MSVC ABI for Structure Return Types

An ABI is an “application binary interface”, which is basically a contract between pieces of executable code on how to behave. The ABI dictates things like how parameters are passed, where return values go, how to create and destroy stack frames, etc. As a programmer, you oftentimes don’t have to worry about this sort of thing because the compiler takes care of it for you. However, if you want code from one compiler to talk to code from another compiler, the ABI is extremely important because if the two compilers don’t agree, the two pieces of code won’t be able to work together.

I don’t want to go into the entire MSVC ABI (that could likely fill a book!), but instead would like to focus on the under-documented portion having to do with the way structures are returned from functions. There is some documentation on the subject on MSDN, the latest of which can be found here.

If you read the above link, you will see the documentation pertaining to return values:

Return values are also widened to 32 bits and returned in the EAX register, except for 8-byte structures, which are returned in the EDX:EAX register pair. Larger structures are returned in the EAX register as pointers to hidden return structures.

This seems quite definitive, however, it’s also quite inaccurate in practice. I went through all of the calling conventions except __clrcall and tried various interesting structures coupled with different structure packings, and want to share what I found.

Before I get to my findings, I should describe what I tested and why. All of my tests were performed with MSVC 10. I tested with functions utilizing four different calling conventions: (__cdecl, __stdcall, __fastcall and __thiscall). Each function had six variants, returning a structure of different sizes: 3, 4, 7, 8, 15 and 16 byte structures. I tested using all six different packing modes: 1, 2, 4, 8, 16 and natural. I only tested on x86, so there’s room for further research on x64 and ARM. All told, there is a lot of raw data involved (about 145 distinct datapoints)!

__cdecl

With the cdecl calling convention (which is the default for C/C++ programs in MSVC), the stack is cleaned up by the caller instead of by the callee. This allows for it to use variable argument lists, at the expense of larger executables.

For packing sizes 2, 4, 8, 16 and natural the cdecl calling convention behaves as documented. 3 and 4 byte structures were returned in EAX, 7 and 8 byte structures were returned in EAX/EDX, 15 and 16 byte structures were returned via a caller-allocated pointer stored in EAX, and the caller was responsible for cleaning that pointer up.

However, for packing size 1, the calling convention does not behave as documented in all cases. Structure size 4, 8, 15 and 16 all behave the same as the other packing modes. But structure size 3 and 7 use the same hidden parameter mechanism as used by 15 and 16 byte structures, instead of using EAX or EAX/EDX.

__stdcall

The stdcall calling convention (which is the default for Win32 APIs), the stack is cleaned up by the callee instead of the caller. So the executable code is typically smaller, but unable to use variable argument lists.

For packing sizes 2, 4, 8, 16 and natural the stdcall calling convention behaves as documented. 3 and 4 byte structures were returned in EAX, 7 and 8 byte structures were returned in EAX/EDX, 15 and 16 byte structures were returned via a callee-allocated pointer stored in EAX, and the callee was responsible for cleaning that pointer up. Basically, the only difference between stdcall and cdecl was exactly what you would expect: callee cleaned instead of caller cleaned.

However, for packing size 1, the calling convention behaved the same as it did for cdecl with packing size 1. Structure sizes 4, 8, 15 and 16 all behaved as the other stdcall packing modes. But structure size 3 and 7 use the same hidden parameter mechanism as used by 15 and 16 byte structures, instead of using EAX and EAX/EDX.

__fastcall

The fastcall calling convention is similar to stdcall in that the callee is responsible for stack maintenance. It differs in that the first two DWORD or smaller parameters are always passed in the ECX and EDX registers. This isn’t a common calling convention on Windows for x86, but it’s awfully close to the calling convention used by default on x64. However, you can use the /Gr compile option to cause all functions to be compiled with __fastcall by default.

The behavior of fastcall with returning structures is identical to the behavior seen with stdcall.

For packing sizes 2, 4, 8, 16 and natural the fastcall calling convention behaves as documented. 3 and 4 byte structures were returned in EAX, 7 and 8 byte structures were returned in EAX/EDX, 15 and 16 byte structures were returned via a callee-allocated pointer stored in EAX, and the callee was responsible for cleaning that pointer up.

However, for packing size 1, the calling convention behaved the same as it did for cdecl and stdcall with packing size 1. Structure sizes 4, 8, 15 and 16 all behaved as the other fastcall packing modes. But structure size 3 and 7 use the same hidden parameter mechanism as used by 15 and 16 byte structures, instead of using EAX and EAX/EDX.

__thiscall

The thiscall calling convention is almost like stdcall, and almost like fastcall, but not quite the same as either. All parameters are passed on the stack with the exception of the “this” pointer, which is passed via ECX. It is the default calling convention for C++ class member functions. It was also the odd-man-out in terms of behavior. Regardless of structure size or packing, structures were returned via a callee-allocated pointer stored in EAX, and the callee was responsible for cleaning that pointer up.

Raw Data

Here is the raw data that I collected for this information. If you run your own experiment and have findings different from mine, please contact me so we can research the issue further. For a link to the Excel spreadsheet with this data, click here.

Packing Calling Convention Structure Size Behavior
1 cdecl 4 return value in eax
1 cdecl 8 return value in eax/edx
1 cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
1 cdecl 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
1 cdecl 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
1 cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
1 stdcall 4 return value in eax
1 stdcall 8 return value in eax/edx
1 stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 stdcall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 stdcall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 fastcall 4 return value in eax
1 fastcall 8 return value in eax/edx
1 fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 fastcall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 fastcall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
1 thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 cdecl 4 return value in eax
2 cdecl 8 return value in eax/edx
2 cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
2 cdecl 3 return value in eax
2 cdecl 7 return value in eax/edx
2 cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
2 stdcall 4 return value in eax
2 stdcall 8 return value in eax/edx
2 stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 stdcall 3 return value in eax
2 stdcall 7 return value in eax/edx
2 stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 fastcall 4 return value in eax
2 fastcall 8 return value in eax/edx
2 fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 fastcall 3 return value in eax
2 fastcall 7 return value in eax/edx
2 fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
2 thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 cdecl 4 return value in eax
4 cdecl 8 return value in eax/edx
4 cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
4 cdecl 3 return value in eax
4 cdecl 7 return value in eax/edx
4 cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
4 stdcall 4 return value in eax
4 stdcall 8 return value in eax/edx
4 stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 stdcall 3 return value in eax
4 stdcall 7 return value in eax/edx
4 stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 fastcall 4 return value in eax
4 fastcall 8 return value in eax/edx
4 fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 fastcall 3 return value in eax
4 fastcall 7 return value in eax/edx
4 fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
4 thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 cdecl 4 return value in eax
8 cdecl 8 return value in eax/edx
8 cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
8 cdecl 3 return value in eax
8 cdecl 7 return value in eax/edx
8 cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
8 stdcall 4 return value in eax
8 stdcall 8 return value in eax/edx
8 stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 stdcall 3 return value in eax
8 stdcall 7 return value in eax/edx
8 stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 fastcall 4 return value in eax
8 fastcall 8 return value in eax/edx
8 fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 fastcall 3 return value in eax
8 fastcall 7 return value in eax/edx
8 fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
8 thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 cdecl 4 return value in eax
16 cdecl 8 return value in eax/edx
16 cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
16 cdecl 3 return value in eax
16 cdecl 7 return value in eax/edx
16 cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
16 stdcall 4 return value in eax
16 stdcall 8 return value in eax/edx
16 stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 stdcall 3 return value in eax
16 stdcall 7 return value in eax/edx
16 stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 fastcall 4 return value in eax
16 fastcall 8 return value in eax/edx
16 fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 fastcall 3 return value in eax
16 fastcall 7 return value in eax/edx
16 fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
16 thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural cdecl 4 return value in eax
Natural cdecl 8 return value in eax/edx
Natural cdecl 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
Natural cdecl 3 return value in eax
Natural cdecl 7 return value in eax/edx
Natural cdecl 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, caller cleans up
Natural stdcall 4 return value in eax
Natural stdcall 8 return value in eax/edx
Natural stdcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural stdcall 3 return value in eax
Natural stdcall 7 return value in eax/edx
Natural stdcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural fastcall 4 return value in eax
Natural fastcall 8 return value in eax/edx
Natural fastcall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural fastcall 3 return value in eax
Natural fastcall 7 return value in eax/edx
Natural fastcall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 4 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 8 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 16 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 3 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 7 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
Natural thiscall 15 hidden parameter caller allocated, pushed onto stack. Return value in eax, callee cleans up
This entry was posted in C/C++, Win32 and tagged , , , , . Bookmark the permalink.

6 Responses to Describing the MSVC ABI for Structure Return Types

  1. GregM says:

    __thiscall

    Regardless of structure size or packing, structures were returned via a callee-allocated pointer stored in EAX, and the callee was responsible for cleaning that pointer up.

    This should be “caller”, right?

  2. Aaron Ballman says:

    GregM: Yes, caller is correct, not callee. Sorry about that!

  3. Sam Hughes says:

    I am getting different results on the cdecl calling convention. My results are that the calling convention solely depends on the size of the structure. I think your structure sizes are not what you think they are. In particular, if I define


    struct b3 { char x[3]; };
    struct b3 func3(void) { struct b3 ret; ret.x[0] = 7; return ret; }

    The return value of func3 uses a hidden parameter, it has natural alignment, and it’s not returned in EAX! This contradicts your table.

    The same is true for these types:


    __pragma(pack(push, 1))
    struct bs { char x; short y; };
    struct sb { short x; char y; };
    __pragma(pack(pop))

    They also have size 3, and use a hidden parameter.

    However, if you define the packing alignment to be 2, or just define no packing alignment and define this struct:


    struct b_s { char x; short y; };

    Note that sizeof(struct b_s) is 4, not 3 — there’s a padding byte between x and y. Thus the structure gets returned in EAX, with no hidden return pointer parameter.

    So I think your results are wrong and so far I have seen nothing inconsistent with the theory that if the return type is a structure, the size of your structure is all that determines the behavior. I’ve observed that structures of size 1, 2, 4, and 8 are returned in AL, AX, EAX, or EAX:EDX, but others are returned using a hidden parameter.

    I did not check the behavior of the other calling conventions, so you might want to double-check those.

  4. Artem Kerpatenko says:

    Hi Sam,

    as for me the return value depends on structure size only.

    It does’t matter what is pack size, padding between fields and so on…
    Type of standard calling convention for 32-bit architecture with push/pop
    instruction set does not matter also.

    All of these parameters varies on platform and compilers.

    You may try my code below to find if there is hidden parameter on machine stack.
    I think this is the easiest verification system test using sliding stack procedure.

    Every unit test for pointed size ABI_EXEC_UNITTEST change the following variables:
    — structsz -> size of testing structure
    — regbyteSz -> machine register size x86:32[1-al, 2-ax, 4-eax, 8-eax:edx]
    — regbitSz -> size of return register in bits: 32/64 …
    — bAddressptr -> indicates hidden stack value
    — bRegisterval -> indicates that structure value in register(s)

    I don’t see any sense to public results for all values & calling conventions.
    Try it…
    {
    ABI_SystemTest();
    }

    regards, Artem

    ———————————————————-

    #pragma pack(1)

    //-

    #define ABI_MAX_RETSTRUCTSZ 16
    #define ABI_TESTING_CC __cdecl

    //-

    bool g_bAddr=false;
    char g_bData[ABI_MAX_RETSTRUCTSZ];

    //–

    #define ABI_NAME_CALL_PROC(xCDecl, xCnt) MVC_call_ ## xCDecl ## xCnt
    #define ABI_NAME_R_TYPE(xCDecl, xCnt) MVC_s_ret ## xCDecl ## xCnt
    #define ABI_NAME_STACKSHIFT_CALL_PROC(xCDecl, xCnt) fn_MVC_regret_ ## xCDecl ## xCnt

    #define ABI_DECLARE_UNITTEST(xCDecl, xCnt)\
    typedef struct {char a[xCnt];} ABI_NAME_R_TYPE(xCDecl, xCnt);\
    NEAR ABI_NAME_R_TYPE(xCDecl, xCnt) xCDecl \
    ABI_NAME_CALL_PROC(xCDecl, xCnt)(void* ip1)\
    {\
    ABI_NAME_R_TYPE(xCDecl, xCnt) buf={0};\
    g_bAddr=(ip1!=g_bData);\
    return buf;\
    }; typedef void*(xCDecl *ABI_NAME_STACKSHIFT_CALL_PROC(xCDecl, xCnt))(void*);

    //–

    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 1);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 2);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 3);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 4);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 5);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 6);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 7);
    ABI_DECLARE_UNITTEST(ABI_TESTING_CC, 8);

    //–

    #define ABI_EXEC_UNITTEST(xCDecl, xCnt) \
    ((ABI_NAME_STACKSHIFT_CALL_PROC(xCDecl, xCnt)) \
    ABI_NAME_CALL_PROC(xCDecl, xCnt)) ((void*)g_bData); \
    structsz=xCnt; bAddressptr=g_bAddr; bRegisterval=!bAddressptr; \
    regbyteSz=bAddressptr?sizeof(void*):xCnt%ABI_MAX_RETSTRUCTSZ; \
    regbitSz=regbyteSz< reg
    void *addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 1);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 2);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 3);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 4);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 5);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 6);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 7);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 8);
    }

  5. Artem Kerpatenko says:

    …sorry for post. Site does not show C/C+ + left shift < reg
    void *addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 1);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 2);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 3);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 4);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 5);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 6);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 7);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 8);
    }

  6. Artem Kerpatenko says:

    correct one:

    //–
    #define ABI_EXEC_UNITTEST(xCDecl, xCnt) \
    ((ABI_NAME_STACKSHIFT_CALL_PROC(xCDecl, xCnt)) \
    ABI_NAME_CALL_PROC(xCDecl, xCnt)) ((void*)g_bData); \
    structsz=xCnt; bAddressptr=g_bAddr; bRegisterval=!bAddressptr; \
    regbyteSz=bAddressptr?sizeof(void*):xCnt%ABI_MAX_RETSTRUCTSZ; \
    regbitSz=regbyteSz*8;
    //–
    void ABI_SystemTest(void)
    {
    int regbyteSz = 0; // return machine register size x86:32[al, ax, eax, eax:edx]
    int regbitSz = 0; // return bits: 32/64 …
    int structsz=0; // struct {xSize} fn(…
    bool bAddressptr=false; // struct PTR
    bool bRegisterval=false; // struct => reg
    void *addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 1);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 2);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 3);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 4);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 5);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 6);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 7);
    addr = ABI_EXEC_UNITTEST(ABI_TESTING_CC, 8);
    }

Leave a Reply

Your email address will not be published. Required fields are marked *