Enumerations for Framework Design

As someone who develops cross-platform and cross-language frameworks, a frequent problem I run up against are enumerations. They’re a very handy construct for a framework designer to use because they allow you to logically group related constants together with some degree of cohesion. However, they have some less-than-desirable behaviors.

For starters, you cannot rely on the size of the resulting datatype without a bit of extra legwork. The size of the datatype is an issue when doing cross-language frameworks. If you are writing a C header file, and you expect people to be able to use the library from C#, Visual Basic, etc, then you have to worry about the size of the datatype. If you don’t, it becomes difficult to marshal that type across call sites when it is used in method calls.

In C, the specification designates that the size of the type need only be large enough to represent the largest integer constant in the enumeration. That’s not particularly useful for cross-language frameworks. Taking an example:

enum foo {
  kBar,
  kBaz,
  kQuux
};

What should sizeof( enum foo ) return? Well, it can return almost anything! Since the largest value in the enumeration is 3, the smallest type required is a char. However, the compiler could decide to optimize the size of the enumeration based on the targeted CPU architecture, too. So a 32-bit integer wouldn’t be entirely unexpected either.

Why is this a problem? Consider a function prototype like this:

void DoSomething( enum foo f );

If you are trying to call this function from another language, both compilers must agree on how to call the DoSomething method. In this case, the size of the f parameter is important — if the framework expects it as a char, then only one byte of information should be passed from the target language. If the framework expects it as an int, then the target language needs to pass four bytes. Let’s say the target language only passes one byte though, because it assumes f is a char. In that case, the framework receives three bytes of garbage data and you’ve got a marshaling bug.

When solving this problem, one approach open to the framework designer is to try to trick the compiler into sizing the enumeration to a specific size by picking a sentinel value.

enum foo {
  kBar,
  kBaz,
  kQuux,
  kPleaseIgnoreThis = 0xFFFFFFFF
};

By including the kPleaseIgnoreThis constant whose size requires 32-bits to represent, then you’d think you’re safe. Except you’re not — it only need to be “large enough”, so a 64-bit target CPU could very well pick 64 bits to represent the enumeration. You’re back to not actually knowing the size!

The only portable, reliable way that I’ve found to ensure the size is stable is to take the size determination away from the enumeration. Instead, use a typedef to pick a stable size and use convention to enforce it.

// Don't do this
typedef enum Foo {
  kBar,
  kBaz,
  kQuux
} Foo;

void Blah( Foo f );

// Do this instead
enum {
  kBar,
  kBaz,
  kQuux
}
typedef uint32_t Foo;	// Requires stdint.h

void Blah( Foo f );

This ensures that the size of the parameter being passed to Blah is a consistent 32 bits regardless of what compiler or CPU architecture is being targeted. This certainly meets our goal for cross-language support, but it comes with an extra-added benefit. Since a cross-language framework is almost certainly required to be a shared library of some variety, this allows you to safely use the library with other C/C++ applications too. That may seem a bit counter-intuitive at first, given that you’re writing the framework with C linkage — you likely expect that compatibility to be a given. However, since this is a cross-language framework, the declarations for your methods cannot be name-mangled, so there’s no parameter size information included (and if there was name-mangled declarations, there’s a high chance your framework won’t be cross-compiler compatible). Just because the compiler used to generate the library picks a specific size for an enumeration does not mean the compiler used to consume the library will pick the same size.

tl;dr: if you are using enumerations when making a C library, you should use a typedef to ensure the size of the enumeration is stable.

This entry was posted in Framework Design and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *