Lambdas

What is likely to be considered the biggest, sexiest feature of the new C++0x specification goes by many names. Some folks call them “function objects”, others call them “closures”, and still others call them “lambda functions.” Regardless of what you call this feature, you’ll hopefully love it!

“Lambda function” is a fancy term for a fairly simple concept: an object containing stateful information, which acts as a function. Breaking that sentence down may make it seem less scary. “An object” means that lambdas are something you can store in a variable. “Containing stateful information” means that this object acts like a class or structure — it has state information that it can access. “Which acts as a function” simply means you can call this object like a function, as though it had implemented operator().

You may have already used this concept in the past, actually. Functors in C++ are exactly the same thing. As you’ll see, lambdas are literally just syntactic sugar that the compiler turns into functors for you. To demonstrate this, each of my lambda function code snippets will show you that you can do exactly the same thing pre-C++0x by manually constructing a functor yourself.

Before I delve into examples of lambdas, let’s take a look at the syntax and what the clauses mean.

[capture list](parameters) mutable throw() -> return type { function body }

The start of every lambda expression is the lambda introducer (open and close square brackets) and optionally, a capture list. Don’t worry about the capture list for now, as I’ll get into the specifics of it in a bit.

Following the lambda introducer is the optional parameter list. This is a list of formal parameters that need to be passed when the lambda is called as a function. If the parameter list is empty, then the open and close parenthesis can be elided.

Next comes the optional mutable specifier. You will only need to specify this keyword if you are capturing variables by value and wish to modify them within the function body. I will explain this a bit more when I discuss capture lists.

After the mutable specifier comes the optional exception specification. If you wish to be explicit as to what exceptions are thrown by the lambda (if any), you use the exception specification to denote them. For instance, if your lambda function can throw an integer, you can specify throw(int) as the exception specification. If your lambda function doesn’t throw any exceptions, you can specify throw() (with the empty type list). By using the exception specification, you can help the compiler to generate warnings if code is not properly exception-safe. If no exception specification is provided, no assumptions are made as to whether the function will or won’t throw exceptions.

Next is the optional return type. If you wish to be explicit as to what the lambda function returns, you can include the return type with “-> type” However, you can also choose to leave off the return type specifier entirely if the lambda function only has a single return statement, or the function returns nothing. If there’s a single return statement, then the lambda expression deduces the return type automatically. If there are no return statements, the lambda expression returns void.

Finally, comes the function body itself (enclosed in curly braces). This follows all the usual rules about functions that you’re already used to. For instance, you can access parameters, global variables, local variables, etc. However, the function body can also access captured variables that come from the capture list.

Which brings us to the capture list, which is the last remaining piece of information to cover. The capture list tells the compiler what pieces of information you would like the lambda to retain as part of its “stateful information.” For instance, you can tell the lambda “keep this local variable around”, because you may want to use that local variable within the lambda function body itself. Any variables you place into the capture list will be “captured” by the lambda expression for use within the function body. You can specify the values in the capture list be captured either by value (the default) or by reference. The semantics for this should be familiar to you already because it’s the same semantics that function calls use. When you capture something by value, its value is copied into the lambda function. When you capture something by reference, using the & operator, its reference is copied into the lambda function.

There are three special tokens you can use within the capture list that tell the compiler “capture everything.” You can specify [=] to capture all variables by value, [&] to capture all variables by reference, and [this] to capture all class member variables.

Now we’re ready to look at some examples of lambda expressions! We’ll start simple by writing a lambda function to help sort user-defined datatypes in a vector.

// Our user-defined datatype
typedef struct MyType {
  int i, j;
} MyType;

void sortObjects( std::vector< MyType > &objs )
{
	std::sort( objs.begin(), objs.end(), 
		[]( const MyType &left, const MyType &right ) { 
			return left.j < right.j; 
		}
	);
}

Breaking the lambda function down: we have an empty capture list, so the lambda expression has no access to variables outside its scope. However, it does take two parameters, both being const MyType&’s. There is no explicit return value, because it can be implicitly derived by the lone return type in the function body. All in all, this satisfies the requirement of a sorting predicate for the std::sort algorithm, without requiring us to modify the MyType structure, or use a functor!

Let’s try a slightly more complex example that uses the capture list. In this example, we will capture a local variable by value and use it while performing a calculation.

int i = ::rand();
::printf( "Our random value is %d\n", i );
[i]() {
	::printf( "Our random value, plus 100 is: %d\n", i + 100 );
}();

This will print a random value (for me, it printed 41 the first time I ran it), and then the lambda expression captures the local variable “i”, so that it can be used within the expression. The printf from the lambda expression then outputs i + 100.

Let’s take a look behind the curtain for a moment to see how lambda expressions are implemented under the hood. If you recall, I mentioned that lambdas are nothing more than syntactic sugar around functors. The compiler takes the short-hand syntax and expands it out into anonymously named functor structures for you. So the previous lambda expression is merely short-hand for this functor:

struct anon_functor {
private:
	const int i;

public:
	anon_functor( const int captureValue1 ) : i( captureValue1 ) {}

	void operator ()() const { ::printf( "Our random value, plus 100 is: %d\n", i + 100 ); }
};

struct anon_functor f( i );
f();

You’ll notice that our functor version has the capture value i as being a const int. You’re not imagining things — that is accurate. Let’s try to modify our lambda slightly to demonstrate that the capture by values variables truly are constant.

int i = 10;
[i]() {
	i += 100;
	::printf( "i + 100 is: %d\n", i );
}();

If you attempt to compile that, you will get a compile error. So even though i is captured by value, it cannot be modified. This is why the mutable clause exists. If you add that to the lambda expression, you will be able to modify i within the lambda expression. But the modifications will not carry out of the expression.

int i = 10;
[i]() mutable {
	i += 100;
	::printf( "%d\n", i );
}();

::printf( "%d\n", i );

If you run that example, you’ll see 110 printed out, followed by 10. In terms of how the compiler implements the mutable keyword, it simply removes the const nature of the captured value. So in our functor would look like this instead:

struct anon_functor {
private:
	mutable int i;

public:
	anon_functor( int captureValue1 ) : i( captureValue1 ) {}

	void operator ()() const { i += 100; ::printf( "%d\n", i ); }
};

If you want the mutation to carry out of the lambda expression, you can capture i by reference instead, like this:

int i = 10;
[&i]() {
	i += 100;
	::printf( "%d\n", i );
}();

::printf( "%d\n", i );

If you run this example, you’ll see 110 printed twice because the reference to i was modified within the lambda expression when it was called. In terms of how this looks as a functor:

struct anon_functor {
private:
	int& i;

public:
	anon_functor( int& captureValue1 ) : i( captureValue1 ) {}

	void operator ()() { i += 100; ::printf( "%d\n", i ); }
};

“Catch all” capture lists allow you to skip specifying each of the variables you wish to capture, and essentially mean “capture everything the enclosing function has access to.” They are a handy way to be lazy, which makes them dangerous to use. Remember, when you capture a variable, you can possibly affect its lifetime. For instance, if you capture an auto_ptr, then the captured value’s lifetime now extends to the lifetime of the lambda expression too. Also, captured values require extra storage space, and time to copy them, which can have an effect on performance. These things may be acceptable, but they may also be unintended or harder to notice. So use caution when capturing everything. Let’s take a look at how it works:

void lambda5( void )
{
	int i = 1, j = 2, k = 3;
	double d = 1.0;

	printf( "%g\n", [=]() { return d / (double)(i + j * k); }() );
}

struct anon_functor {
private:
	const int i, j, k;
	const double d;

public:
	anon_functor( const int param1, const int param2, const int param3, const double param4 ) : 
	  i( param1 ), j( param2 ), k( param3 ), d( param4 ) {}

	  double operator()() const { return d / (double)(i + j * k); }
};

Since lambda expressions are really nothing more than automatically-created functor structures, it stands to reason that you can assign them into a local variable. However, the type information for the lambda expression isn’t something you can express via a standard type. So how do you assign one into a variable, or pass it as a parameter into a function? You have two choices: use the auto keyword to let the compiler infer the type information automatically, or used the std::function type to represent the lambda.

auto f = []( int i ) { return i + 10; };
::printf( "%d\n", f( 100 ) );

std::function< int(int) > f2 = []( int i ) { return i + 10; };
::printf( "%d\n", f2( 100 ) );

If you need to assign the lambda expression into a local variable, then using “auto” is definitely the way to go. But if you need to pass the lambda expression as a parameter to a function, then using std::function is likely the safest route.

That’s a whole lot of information about what lambdas are and how they work under the hood! I’d like to finish this article up with some ideas as to how lambdas may be useful to you.

  • STL functionality which previously required functors are now easier to write. For instance, std::sort, or std::for_each.
  • Callback functions can now be implemented inline — no more need for an entirely separate method. This can be useful for reducing networking or threading code.

tl;dr: C++0x introduces lambda syntax, which is short-hand compiler syntax for functors. It’s a very powerful way to implement inline functions that can be stored as objects.

This entry was posted in C/C++ and tagged , , , . Bookmark the permalink.

2 Responses to Lambdas

  1. Hi Aaron, thanks for your excellent blog! Your post about lambda expressions was very helpful for me. I discovered something that you think might be worth mentioning: returning from a function in a nested lambda expression, does not terminate the function, but only the lambda expression!

    I have put an example (and a downloadable Qt Creator project) online at http://www.richelbilderbeek.nl/CppLambdaExpression.htm (note the reference to your blog!).

    Perhaps this is a nice addition to your, IMHO, excellent blog!

    Richel Bilderbeek

  2. Aaron Ballman says:

    Yes, that is very true (and thank you for the link back!). But think about the way it would be implemented with a functor, and it may become more clear that this is what should be happening. Say you have the trivial lamba:

    auto i = [] { return 12; };
    

    That will turn into a functor like:

    struct anon_functor {
    public:
        anon_functor() {}
     
        void operator ()() const { return 12; }
    };
    
    anon_functor anon;
    auto i = anon();
    

    Since the functor is nothing more than an operator() overload, it makes sense that the return statement does not leave the encapsulating function — it’s leaving the operator() instead! Regardless of the levels of nesting for the lambdas, they’re all simply operator()’s at the end of the day, and so return statements always leave the closure and not the caller. Which makes sense — otherwise how would you be able to do this?

    std::sort( myVec.begin(), myVec.end(), []( const blah& left, const blah& right ) { return left.val < right.val; } );
    

    However, this can certainly be confusing to run into in someone’s code — we’ve trained our eyes to see “return” and think “leaving the current function.” So it’s a great thing to call out in this blog post! Thanks for the feedback!

Leave a Reply

Your email address will not be published. Required fields are marked *