Discriminated Unions

In computer science, a discriminated union is one of the many names given to the concept of a “catch-all” datatype. (You’ll also hear it referred to as a variant.) It’s meant to hold data of any type at any given point in time. It does so by “tagging” the type information within the union. Generally speaking, it’s also an efficient datatype because the underlying storage can be shared amongst all tags. Since you’re only allowed to use one tag at a time, this sharing of memory can greatly reduce the overhead for some applications.

In C and C++, you have something that’s close to discriminated unions with the union keyword. However, you can only store very simple datatypes within the union. For instance:

union u1 {
	int the_int;
	char *the_string;
	double the_double;
};

This declares a union named u1, and it is allowed to contain an int, a char * or a double at any given point in time. The programmer picks which datatype they want to use by using the “tags” (the_int, the_string or the_double). Since the double is the largest datatype used within the union, the entire union requires eight bytes to store the data. However, if you were to require storage for all three values, you’d need at least 16 bytes on a 32-bit platform.

In older versions of C++, the only datatypes which are allowed in a union are the built-in datatypes: char, int, short, long, long long, float, double, long double, char *, wchar_t * and user-defined POD structs. Recall, a POD struct is a structure that contains only data (no methods, no constructors, etc). This makes some degree of sense — none of these blessed datatypes require any special work on the part of the compiler. All of them are just a bucket of bytes with no worries about constructors or destructors. Unfortunately, it also severely limits the datatypes you can place into a union. Very few custom C++ datatypes don’t use constructor, destructor or some form of instance methods!

However, the new C++0x specification relaxes that rule slightly so that it has something closer to truly discriminated unions. Now you are allowed to place any class or struct, so long as it contains no virtual methods. That means you can now do:

struct Position {
	int x, y;
	
	Position( int x_, int y_ ) : x( x_ ), y( y_ ) {}
	Position() : x( 0 ), y( 0 ) {}
};

union u1 {
	int the_int;
	char *the_string;
	double the_double;
	struct Position the_position;
};

However, this brings up an interesting question. When you declare a variable of type union u1, what happens to the_position? Does its constructor fire? Or, when u1 goes out of scope, does the_position’s destructor fire? After all, these were the dangerous things that C++ was protecting against previously.

Unions are not for the faint of heart, and discriminated unions are no different! The only entity that knows whether the constructor or destructor should fire is the programmer. The compiler cannot reliably figure it out, and so it’s left up to you to fire the constructor and destructor manually using the little-known feature of manually calling them!

Taking our example above, let’s say that you wanted to use the_position within the union, what would that look like?

union u1 the_union;
the_union.the_position.Position( 10, 20 );

extern void SomeFunction( struct Position* pos );
SomeFunction( &the_union.the_position );

the_union.the_position.~Position();
the_union.the_int = 12;

As you can see, the Position::Position constructor is called explicitly at the point when we want to use the_position, and then the Position::~Position (automatically-generated) destructor is called when we’re done using the tagged value. Then we’re free to make use of one of the other tagged values within the union.

While I certainly agree with the implementation, and the rationale behind it, I am hard-pressed to think of times when I’d want to use the feature in production code. I can see a lot of use within the embedded markets where space concerns are high. But given the dangers of forgetting to call the constructor manually, or the destructor (if needed) is quite a high bar to set for most projects.

However, it is good to see that C++ has relaxed the rules. One of the benefits of working in C++ is that you’re allowed to shoot yourself in the foot (or in 30 copies of your foot, if you prefer). This allows you to implement powerful, efficient solutions, at the expense of the hand-holding provided by some other languages. I just hope I don’t catch any of my coworkers using this particular one! ;-)

This entry was posted in C/C++ and tagged , . Bookmark the permalink.

6 Responses to Discriminated Unions

  1. Dan says:

    Aaron,

    Always enjoy reading your posts.

    I know the purpose of the post was to talk about discriminated unions, and specifically the changes made possible with C++0x – and you did a great job. And I agree that while it’s an interesting and perhaps useful addition, I’m not sure I’ll ever actually use it.

    Having said that — I just wanted to point out that I often use for these situations, at least when I’m able to use Boost (embedded guy here – things like Boost, exceptions, and RTTI are not always available / enabled anyway).

  2. Aaron Ballman says:

    @Dan — yeah, this is one of those language features that I understand as a language guy exactly why it exists, even if I can’t justify its existence from a practical perspective. I mean, I wouldn’t WANT many people to use this! But it definitely deserves to be allowed. If that make sense. ;-)

    As strange as it may sound, I’ve never used Boost for anything practical, though I’ve certainly read plenty about it. But the concept of a variant has always left me feeling slightly sick coming from my REALbasic background. It’s just too easy to abuse! But in the sense of embedded programming, I can see a lot of utility to it!

    What sort of embedded stuff do you do?

  3. Dan says:

    Hi Aaron,

    Actually I don’t use Boost too much either. It’s rare that I see it used in embedded systems, although I’ve often seen shops with template code that is similar (things like smart pointers, dimensional analysis, etc.) Have you heard about the Highscore Boost book? One of my goals is to work through that material when I get a chance…

    I’m mostly an “embedded generalist” – I have lots of experience developing firmware for lots of different kinds of systems — telecom/datacom (from large racks of distributed boards connected via fiber, to deeply embedded handsets, cable modems and routers) to industrial automation/motion control/motor control, to medical devices, to defense/military communications & weapons systems. Background is BSEE, but I’ve been doing firmware most of my life. Started (many years ago) doing board diagnostics & drivers, then RTOS/multitasking stuff, then full-blown systems (digital logic design & all firmware, from bring-up code to the application).

    Now I’m a consultant, so the variety of projects I work on is pretty broad. But most of the time the client has the “domain knowledge”, but they don’t always know how to write good software / firmware. No processes, no architecture / design, poor (non-existent) use of tools (version control, static analysis, code review, etc.) So we couple my engineering background with their domain expertise, and we usually end up with a dog that can hunt ;-)

  4. Aaron Ballman says:

    @Dan — sounds like a lot of interesting work! I’m sure you’ve run into many of the hairy back alleys of C and C++ in your projects. ;-) But it also sounds like a lot of fun challenges too.

  5. Pingback: The Placement New Operator | Ruminations

  6. Pingback: Destructors | Ruminations

Leave a Reply

Your email address will not be published. Required fields are marked *