The Joys of Bit Fields

In C and C++, bit fields are one of the more odd declaration types that you run into rarely. The basic idea behind them is to provide the programmer with a way to define declarations at the bit-level. For instance, let’s say you have 8 boolean flags you want the programmer to be able to specify. You could use 8 bools to do this, however that could mean those flags take up anywhere from 8 to 32 bytes of memory! (Remember that in C++, the size of a bool datatype is implementation defined.) But you really only need one bit of information to track each individual flag. In this case, you could use a single byte to encode all 8 flags by using a bit field.

When you declare a bit field, you first specify an integral type to be used as the underlying storage. For instance, in our case above we only require 8 bits, so we could declare our bit field’s type as char or unsigned char. If we needed to encode 9 flags instead of 8, we’d want to declare the bit field’s type as short or unsigned short.

Following the size information is the name you wish to give the bit field. So far, this looks like any other declaration you’re used to. type then identifier.

Finally, after the identifier for your bit field is a : (colon) and then the length of the field. A bit field can be an arbitrary number of bits wide, though for our example, we’re looking at widths of one bit because that’s all that’s required to store a boolean.

So what does our example case look like?

struct S {
	char flag1 : 1;
	char flag2 : 1;
	char flag3 : 1;
	char flag4 : 1;
	char flag5 : 1;
	char flag6 : 1;
	char flag7 : 1;
	char flag8 : 1;
};

In this case, we’ve declared eight members (flag1 through flag8), each taking up only a single bit. The compiler packs all of the one-bit flags together, so if you take the sizeof( struct S ), you will see it report back 1.

This is awesome! Right? Am I right? No. I’m wrong. Very, very wrong.

Bit fields are one of the most implementation-defined concepts in the language. Almost every detail about them is left up to the compiler writer. Specifically: the allocation for the bit fields is left up to the compiler. So our above example happened to be packed into a single byte for me, but it didn’t have to be. Also, the alignment for bit fields is left up to the compiler. Also, the location of the bits within the allocated storage is also left up to the compiler. So, for instance, flag1 could be setting the most significant bit of the byte, or it could be setting the least significant bit. You have no control over these details, which makes bit fields entirely non-portable.

Let’s talk about some of the gotchas with regards to bit field.

For starters, what happens if we add a flag9 to our declaration, and make it a char? How does it affect the size of our structure? It turns out, it adds another byte so the sizeof operator returns 2. This is a very sensible action for the compiler to take. It appears to be trying to pack things as tightly as possible. But what if the declaration is changed to look like this:

struct S {
	char flag1 : 1;
	char flag2 : 1;
	char flag3 : 1;
	char flag4 : 1;
	char flag5 : 1;
	char flag6 : 1;
	char flag7 : 1;
	char flag8 : 1;
	long flag9 : 1;   // Notice that this is a long
};

You might expect that the sizeof( struct S ) would be sizeof( char ) + sizeof( long), but you’d be wrong. For me, in Visual Studio 2010, the sizeof( struct S ) comes back as 8. The compiler decided to use two longs instead of a char and a long. I suspect the reason it chose to do so was because of structure field alignments. When I packed the structure onto one byte boundaries, it improved the situation. So here’s your first thing to watch out for — because bit fields live within a structure or a class, they are still subject to the compiler’s alignment requirements.

What happens if we say we want a bit field that is larger than its declared type?

struct S {
	char flag1 : 10;
};

Different compilers believe different things about this declaration. According to the specification, this is actually legal! flag1 will consist of eight bits of information, followed by two bits of padding. However, Visual Studio 2010 treats this case as an error. However, gcc treats this as a warning and claims the sizeof( struct S ) is two.

But this brings up a good question of: what if you do want some padding between your fields? You may have figured out your compilers bit field semantics and are exploiting them for use with embedded programming. So the requirement for having more control over bit positions may be necessary. In this case, you can use an unnamed bit field — a bit field with no identifier.

struct S {
	char flag1 : 8;
	char : 2;
	char flag2 : 4;
};

This creates a structure with one named flag using 8 bits of data, then 2 bits of padding, and then 4 bits for the second named flag.

So you’ve learned that you can pad the fields — but what if you want to control the alignment of them? For instance, say you want to use two bits out of one char, and then four bits out of a second char? You could use padding to accomplish this by having an unnamed bit field of six bits. But instead, you can use an unnamed bit field of size zero to force the compiler to use another allocation unit for the subsequent declaration.

struct S {
	char flag1 : 2;
	char : 0;
	char flag2 : 4;
};

Without the : 0 declaration, flag1 and flag2 would be packed into the same char. But with the : 0, they are in different allocation units, and so the sizeof( struct S ) is actually two.

Another thing to note is that bit fields themselves are not directly addressable. So while they occupy some chunk of memory somewhere, you cannot directly obtain a pointer to them. You can still obtain a pointer to the structure that contains the bit fields, but since the implementation of the bit fields is implementation-specific, mucking about with the memory is ill-advised at best. But while you cannot get a pointer to a bit field, references are a whole different can of worms. Getting a non-const reference to a bit field is impossible, for the same reason that getting a pointer to a bit field is impossible. But const references to bit fields are allowed, in a truly bizarre fashion — the value of the bit field is placed into a temporary local variable, and the const reference then refers to that local. So you still never get a reference to the actual bit field, just to the temporary local!

struct S {
	char flag1 : 2;
	char :0;
	char flag2 : 4;

	char normal;
};

const char& explode( const struct S& ohDear )
{
	return ohDear.flag2;
}

const char& just_fine( const struct S& ok )
{
	return ok.normal;
}

The just_fine function is an acceptable implementation as the const reference being returned has guaranteed storage backed by the const struct S& being passed in. However, the explode function has a subtle and scary bug in it! Attempting to get a const reference to the bit field uses the automatically created temporary variable, which is then being returned! So your compiler should warn you about returning the address of a temporary or local variable, if you’re lucky!

As you can see, bit fields are one of those strange concepts that looks pretty nice on the outside, but is better off left alone for production code. As soon as you start to use bit fields, you are signing yourself up for some pretty strange problems due to portability and code clarity. The only benefit is that you are able to save some space for your declarations. I’m not convinced the tradeoff is worth it!

tl;dr: bit fields let you pack multiple declarations into a smaller memory footprint at the expense of code portability and developer sanity.

This entry was posted in C/C++ and tagged , , . Bookmark the permalink.

2 Responses to The Joys of Bit Fields

  1. Dan says:

    Aaron,

    I always shun the use of bit-fields. One of the big reasons is that they are inherently non-portable (as you said), and a lot of the firmware I work on inevitably is ported to other products & processors. Dependencies such as signedness, word size, endianness, compiler implementation, etc… come into play. That’s already a decent reason not to use bitfields, especially when code might be built & tested on different platforms (e.g. PC for unit / component testing).

    With normal RAM variables, portability is the primary concern. Things will usually work as expected on platform “A”, but then moving the code to platform “B” might be problematic. Stuff that was originally written for a little MSP430 with 8K of ROM and 1K of RAM is now being used on a 32-bit ARM9 running at 200MHz. Different CPU, different development toolset, different C dialect (C89 vs. C99), etc.

    Another thing — I often see embedded developers use bit fields when modeling hardware registers. Even the demo code from CPU vendors (e.g. Freescale) and toolset vendors (e.g. IAR) uses them.

    On a microcontroller, control registers are often split into groups of bits for reading status, controlling hardware, etc. It might seem obvious or intuitive to use a language feature (bitfields!) to access the individual “regions” of the register – hey, I’m all for abstraction too – but usually this is a very fragile way to do it at best, and downright wrong in the worst case.

    One of the pitfalls is write-only registers. (Most non-embedded developers, or even embedded application-level developers, have never seen these curious beasts.) In these cases a shadow copy of the register must be kept, otherwise read-modify-write operations may fail (since the read of the write-only register will either be indeterminate, or return a fixed 0 or 1 each time). An example might help. Keep in mind the way bitfields are usually implemented “behind the curtains”: a value is read, bits are twiddled, and the value is written back.

    Hypothetical example: suppose we have a write-only register with 32 bits. Each of the 32 bits controls a missile. Writing a ‘1’ to a bit launches a missle (writing “0” does nothing; writing 0xFFFF.FFFF launches 32 missiles simultaneously). With bitfields, you might do something like MissileControlReg.Missile5 = 1, thinking you’re setting bit 5, and launching missile 5.

    The problem is that behind the scenes, the assembly code will probably read the 32-bit write-only control register, bitwise-OR it with 0x20 (1UL << 5), and write it back. But reading the 32-bit register is undefined (it is a write-only register). The chip designer can have it return 0xFFFF.FFFF when read. So if this were the case, the bitwise OR is redundant (bit 5 is set, like all others), and we launch all 32 missiles, instead of just #5.

    The code should be something like MissileControlReg = (1UL << 5), writing a full 32-bit value with only 1 bit set (0x0000.0020). This can still be encapsulated with macros (blech) or inline functions (yay! C99 & C++ for the win) or even a simple C function if you can afford it.

    Sorry for the long comment / example. But it's these dusty corners that people get burned by. At least these dusty corners keep me busy with work ;-)

  2. Aaron Ballman says:

    @Dan — yeah, I can’t stress enough why not to use bit fields. I’m really surprised that the Win32 APIs actually use them (though very sparingly). For instance, the DCB structure for serial programming uses them. Certainly makes life more interesting (especially when you try to call it from a language like VB)!

    I had never heard of write-only registers (though it makes sense now that I imagine it), but that’s a situation where you can get into trouble even without bit fields. I would have to think that most modern optimizers wreak havoc with hardware like that!

    No worries on the long comment — it was fascinating!

Leave a Reply

Your email address will not be published. Required fields are marked *