The comma

As C and C++ programmers, we’ve probably seen and used the comma countless times in our applications, without thinking too much about it. However, there are some very interesting points to this piece of punctuation that are worth discussing. The comma is used as part of the syntax for a list of like objects (such as when declaring variables), or as part of the syntax for a list of unlike objects (such as a parameter list). It is even an operator that can be used as part of an expression!

The first place you’ve likely encountered the comma is with a parameter list. In this case, the comma creates a logical separation for the parameters. This is the case in the declaration, the definition and the call site. We’ve all seen it before:

// Declaration -- the comma separates i and j logically
void SomeFunction( int i, int j );

// Definition -- the comma still separates i and j logically
void SomeFunction( int i, int j )
{
	if (i + j < 10) {
		::printf( "Huttah\n" );
	} else {
		::printf( "Boo\n" );
	}
}

But let’s discuss the comma at the call site, because there’s something interesting happening there. Given the following code (and SomeFunction from above), what do you think the outputs should be?

int global_value = -10;
int Foo( void )
{
	global_value *= -1;
	::printf( "In Foo, %d\n", global_value );
	return global_value;
}

int Bar( void )
{
	global_value += 5;
	::printf( "In Bar, %d\n", global_value );
	return global_value;
}

int main( void )
{
	SomeFunction( Foo(), Bar() );
	return 0;
}

The truth of the matter is: you can’t know what the outputs will be definitively, because the list of parameters is evaluated in arbitrary order. That means you could get either of these outcomes:

In Foo, 10
In Bar, 15
Boo

In Bar, -5
In Foo, 5
Huttah

This is one of those gotchas that you probably never think about, but can cause for some very difficult porting projects. In the case of parameter evaluations, the comma still logically separates the parameters, but the order of evaluation is undefined (as per Section 5.2.2 Clause 8 of the C++0x specification). All you can be sure of is that all of the expressions within the parameters will be completely evaluated, without interleaving them (the call to Foo and Bar won’t be executed at the same time), before the function is entered.

Another place we commonly see the comma pop up is in declarators. For instance:

int foo = 12, bar = 15;
for (std::string::iterator iter = someStr.begin(), end = someStr.end(); iter != end; ++iter) {}

In this case, the comma is being used to separate items in a list of like types. foo and bar are both ints, even though bar doesn’t have the type specifier in front of it. iter and end are both std::string::iterator objects, and so forth.

So knowing what you may have learned above — what should the output be for the following code:

int global_value = -10;
int Foo( void )
{
	global_value *= -1;
	::printf( "In Foo, %d\n", global_value );
	return global_value;
}

int Bar( void )
{
	global_value += 5;
	::printf( "In Bar, %d\n", global_value );
	return global_value;
}

int main( void )
{
	int foo = Foo(), bar = Bar();
	::printf( "%d\n", foo + bar );
	return 0;
}

The answer is a bit murky, actually, because the specification makes no assertions as to the order of non-static declaration initalizers. According to Section 8 Clause 3, “Each init-declarator in a declaration is analyzed separately as if it was in a declaration by itself.” and the footnote goes on to say that our declarations should be treated like int foo = Foo(); int bar = Bar(); So, while the specification doesn’t explicitly call out left-to-right ordering, it’s a pretty good bet that your output will be:

In Foo, 10
In Bar, 15
25

Another place you see the comma that’s of interest is in a class constructor’s initializer list. In this case, the comma is used to separate a list of class members to be initialized, like so:

class Foo {
private:
	int mBar, mBaz;
	const char *mBing;

public:
	Foo() : mBaz( 0 ), mBar( 0 ), mBing( "Aaron" )
	{
	}
};

What you might not realize about the comma in this instance is that order does not matter! According to Section 12.6.2 Clause 10, the order that the member variables will be initialized in depends on their declared order within the class, and not the order within the initializer list. So while our initializer list looks like it will initialize mBaz, then mBar, then mBing, it will actually initialize mBar, then mBaz, then mBing. This generally doesn’t cause problems, but it is possible to initialize one member based on another — so you should try to match the declaration order and the initializer order just to be on the safe side.

There’s still one more usage of the comma within C and C++, and it’s one you may have been unaware even existed. The comma is also an operator that can be used in an expression. You may have seen it before in a for loop, without even realizing it:

for (int i = 0, j = 10; i < 10 && j > 0; i++, j--) {}

There’s two uses of the comma in that statement. The first use is as part of a declaration list, which we’ve already described above. But the second is the comma between the i++ and the j–. Here’s an unfair question to ask at this point; what do you think the output of this code should be:

int global_value = -10;
int Foo( void )
{
	global_value *= -1;
	::printf( "In Foo, %d\n", global_value );
	return global_value;
}

int Bar( void )
{
	global_value += 5;
	::printf( "In Bar, %d\n", global_value );
	return global_value;
}

int main( void )
{
	int foo = Foo(), Bar();
	::printf( "%d\n", foo );
	return 0;
}

This does have well-defined behavior, but you might be surprised to learn that the output should be:

In Foo, 10
In Bar, 15
15

As per Section 5.18 Clause 1 of the C++0x specification, the comma operator is evaluated from left-to-right, with the resulting value of the expression being the right-most sub-expression value. So this means Foo is called first, then Bar is called, and Bar’s results are assigned to the local variable.

It may seem esoteric (and it probably is), but the strange behavior of comma operator actually has some interesting implications. Let’s say you wanted to write an assert statement that returned the result of the assert itself (so if the assert fails, this function throws the assertion, then returns false). You can’t do this with the stock assert function because it returns void. What’s more, you want your assert to give you file and line number information which requires use of the preprocessor macros __FILE__ and __LINE__. Assuming that your platform’s assert statement is not a no-op in release builds, one possible implementation could look like this:

// DO NOT USE THIS, IT IS A BAD IDEA!
#define testAssert( expr )	(assert( expr ), expr)

Since the comma operator only cares about the right-most expression’s value, it doesn’t matter that assert returns void. The result of the testAssert macro is the result of the expression passed in, so you could do something like this:

if (testAssert( i >= 10 )) {}

However, I do want to point out that this macro is a terrible idea because the expression passed in to testAssert will be evaluated multiple times. That means code like this will not behave as you expect:

if (testAssert( ++i >= 10 )) {}

i will actually end up getting incremented twice, which is a nasty problem to track down (always be wary of macros with expressions!). However, the testAssert macro does show you one possible interesting use of the comma operator.

So what happens if you want to use the comma operator when making a function call? In this case, which comma wins out — the parameter list comma, or the operator comma? The answer is: the parameter list comma always wins out when part of a function call argument list. To use the comma operator within the function argument list, you have to use parenthesis to specify what you mean. For instance:

SomeFunction( (i++, 2), 3 );

This would pass the values 2 and 3 to the SomeFunction call, but i would still be incremented before the call was made.

The comma operator is allowed to be overloaded (it’s not explicitly called out in Section 13.5 Clause 3), but that does not make it a good idea to do so. The biggest danger of this is that the ordering properties have a tendency to be different for the overloaded version than the default. So please don’t do this unless you’ve got a very good reason to do so.

It’s amazing how versatile a simple piece of punctuation can be! Thankfully, most of the comma’s uses in C++ are intuitive enough that we never need to think about them. But you should now have a better understanding of all the uses of comma!

tl;dr: the comma operator is an expression where the result is the right-most value in the expression. Be careful of function call parameters with side-effects, as the order of evaluation for the parameters is implementation defined. Initializer lists do their initialization based on the order of declaration, not on the order of the initializer list.

Update (8/10/11): Clarified the comma usage in a function call argument list, and fixed a mistake about whether the comma could legally be overloaded or not. Thanks to Alberto Barbati for pointing this out!

4 Responses to The comma

Shakil says:

2011-08-11 at 10:05 am

The output in the first paragraph would not be

In Foo, 10
In Bar, 15
Huttah //This should be Boo

In Bar, -5
In Foo, 5
Boo //This should be Huttah
Aaron Ballman says:

2011-08-11 at 10:30 am

Good catch, I’ll fix the output! The values are right, just my comparison got mixed up.
Brandon says:

2011-08-18 at 12:19 am

Why? I mean you may say that with different compilers, it may get different result since it really depends on how the code is translated into binary. But for a same compiler, if you compile it 100 times, will the result be different? I really doubt it.
Aaron Ballman says:

2011-08-18 at 6:45 am

@Brandon — in theory, the compiler can do whatever it wants. If it wants to randomize the order on each pass, that’s fine. But in practice, that’s highly unlikely to happen (though not out of the realm of possibility; I’ve done some stuff like that in one of the compilers I’ve worked on).

But what can happen is having the behavior change between releases of the same compiler. The optimizer might find a more efficient way to order things, which is perfectly legal since the order is undefined, and so upgrading your compiler may change the behavior of your application. These optimizations changes are quite frequent.

4 Responses to The comma

Leave a Reply

Who

Search

Categories

Archives

Feeds

The comma

4 Responses to The comma

Leave a Reply

Who

Search

Categories

Keywords

Archives

Feeds