C++20 introduced the likelihood attributes [[likely]]
and [[unlikely]]
as a way for a programmer to give an optimization hint to their implementation that a given code path is more or less likely to be taken. On its face, this seems like a great set of attributes because you can give hints to the optimizer in a way that is hopefully understood by all implementations and will result in faster performance. What’s not to love?
The attribute is specified to appertain to arbitrary statements or labels with the recommended practice “to optimize for the case where paths of execution including it are arbitrarily more likely|unlikely than any alternative path of execution that does not include such an attribute on a statement or label.” Pop quiz, what does this code do?
if (something) {
[[likely]];
[[unlikely]];
foo(something);
}
Sorry, but the answer key for this quiz is currently unavailable. However, one rule you should follow about how to use these attributes is: never allow both attributes to appear in the same path of execution. Lest you think, “but who would write such bad code?”, consider this reasonable-looking-but-probably-very-unfortunate code:
#define MY_COOL_ASSERT(x) [[unlikely]] assert(x)
if (something) {
[[likely]];
MY_COOL_ASSERT(something > 0);
foo(something);
}
Despite the name, these attributes do not mark whether the statement itself is likely, only whether the path leading to the statement is likely. This brings us to our second rule: only mark the dominating statement or label of the flow control path you want to optimize for. This will often mean you only mark the compound statement after a flow control statement, as in:
if (foo) [[likely]] {
do_something(foo);
}
while (bar) [[unlikely]] {
;
}
switch (baz) {
[[likely]] case 0: whatever(); break;
[[unlikely]] case 1: something_else(); break;
default: break;
}
Speaking of code that looks reasonable when you apply the attribute to the dominating statement of flow control, what does this code do?
if (foo) [[likely]] { // A
baz();
} else if (bar) [[likely]] { // B
quux();
} else [[unlikely]] { // C
bing();
}
It marks the true branch at A as being likely and says nothing about the false (else
) branch (making it unlikely by default). It then marks the true (if
) branch at B as being likely and has a redundant (but harmless) [[unlikely]]
attribute at C. What it does not do is mark that A and B are equally likely and C is unlikely; it will optimize the path for A over B rather than treat them as equals. The issue is that the attribute is not written on the dominating statement of flow control, and the code should be written as:
if (foo) [[likely]] { // A
baz();
} else [[likely]] if (bar) [[likely]] { // B
quux();
} else [[unlikely]] { // C
bing();
}
However, because of the duplicate likely branches at A and B (on the else
), it’s not clear what the implementation will do with the construct from reading the code (not to mention that it’s super ugly and unintuitive code). Because of this, the initial rule should be augmented to be: Never allow multiple likelihood attributes to appear in the same path of execution. This sort of confusion comes up in other places as well. Pop quiz, which cases are likely and which cases are unlikely in the following?
switch (foo) {
case 0:
[[likely]] case 1:
[[unlikely]] case 2: bar(); break;
[[likely]] case 3:
default: baz(); break;
Sorry, I still can’t find the answer key. Given that case 1
looks to be likely, but it falls through to case 2
which looks to be unlikely, it’s hard to say what should happen here. Further, it’s hard to say whether the default
case is likely given that case 3
is likely. The only unambiguous cases are that case 0
say nothing about whether it is or isn’t likely and case 3
is likely. Unfortunately, the wording from the standard leaves a bit to be desired when considering switch
statements because it says “A path of execution includes a label if and only if it contains a jump to that label.” A switch
statement contains a path of execution which can jump to any of its labels, so when you couple this recommended practice with the earlier one about applying to arbitrary statements, you have to work to answer whether this code path is likely, unlikely, or something else:
if (foo) { // is this branch likely or unlikely?
switch (*foo) {
[[likely]] case 0: bar(); break;
[[unlikely]] case 1: baz(); break;
[[likely]] default: quux(); break;
}
} else {
...
}
Now, a sensible person would look at this and say “aha, those attributes shouldn’t impact the if statement because they’re within a different control flow statement with its own substatements.” Well, unfortunately the standard doesn’t say anything about how recursive these attributes should be treated. For instance, one would certainly hope that an implementation allowing attributes on arbitrary statements would do something reasonable with this:
if (foo) { // Is this path likely?
{
[[likely]];
SomeRAIIObject Obj;
Obj.whatever(foo);
}
} else {
...
}
The standard also doesn’t say what happens when you follow my rule to only mark the dominating statement or label and that leads to a conflict like this (with thanks to Arthur O’Dwyer for the example):
if (ch == ' ') [[likely]] {
goto whitespace; // A
} else if (ch == '\n' || ch == '\t') [[unlikely]] {
goto whitespace; // B
} else {
foo();
}
[[likely]] whitespace: bar(); // C
The [[likely]]
attribute at C says the path at both A and B are likely, despite the path of B being marked as unlikely. Which attributes, if any, are ignored? Who knows — in all of these circumstances, the standard says nothing and so implementations will likely come up with different answers to different situations. This brings me to the next rule: assume no two implementations will behave the same way for optimizing using these attributes.
So given all of these odd issues with the attributes, why would you want to use them? In my mind, there are only two use cases for the likelihood attributes. Either you have an implementation which does not support profile-guided optimizations (which will generally do a far better job of predicting branch weights for optimization than a programmer ever could) or you need to optimize a code path in a strange way where you cannot use PGO. The first question is the easier one to address: can you point to a C++20 implementation that doesn’t support profile-guided optimizations? I can’t find one. Maybe these implementations really do exist, but the major vendors all support the concept, so this isn’t a very compelling argument for adding the attributes to your own code unless you’re in that situation. That is why my rule is: prefer profile-guided optimization over likelihood attributes. It is more suited to the purpose of optimizing flow control and is likely to result in better performing code.
The second situation is more interesting to talk about because it seems off-the-wall until you understand it. Sometimes you want to optimize the failure path that almost never gets hit rather than the common paths that do. Consider writing some safety-critical piece of code to control an elevator where you need the failure path to meet some real-time obligations in order to stop the elevator from dashing its occupants to death. In that situation, your optimization needs can’t be met by PGO and the likelihood attributes could be very useful. Consider this use case which came up during the standards discussions about the feature:
try {
foo();
} catch (...) [[likely]] {
dont_kill_people();
}
This is an attempt to convince the optimizer to optimize the catch statement control flow path, but it has three problems that may not be obvious from looking at the code. The first problem is a small one, the attribute is misnamed for its use in this case, which makes the code far harder to read than it needs to be. The second problem is that the C++ grammar doesn’t allow you to write the attribute at that position! You’d have to put the [[likely]]
attribute inside of the catch block’s compound statement. The final problem is: implementations typically have no idea how to optimize the failure path for C++ exceptions. So these attributes failed to address the intended need in this circumstance, which is another rule: not all flow control paths can be optimized. Exception handling, setjmp
/longjmp
, the branches in a ?:
operator are all examples of flow control where the likelihood attributes either cannot be written or may look like they’ll do something useful, but likely won’t (pun totally intended).
Let’s review the rules we’ve got so far:
0) Never allow multiple likelihood attributes to appear in the same path of execution.
1) Only mark the dominating statement or label of the flow control path you want to optimize for.
2) Assume no two implementations will behave the same way for optimization behaviors with these attributes.
3) Prefer profile-guided optimization over likelihood attributes.
4) Not all flow control paths can be optimized.
These attributes are starting to look a bit more like some other code constructs we’ve seen in the past: the register
keyword as an optimization hint to put things in registers and the inline
keyword as an optimization hint to inline function bodies into the call site. Using register
or inline
for these purposes is often strongly discouraged because experience has shown that optimizer implementations eventually improved to the point where they were consistently better at optimizing than the user trying to give their own hints. However, at least the register
and inline
keywords have other semantic impact (like not being able to take the address of a register variable in C). The likelihood attributes have no semantic impact beyond their optimization hints. Given how hard it is to use these attributes properly (especially if the code is being compiled by multiple implementations), how good profile-guided optimization is by comparison, and that there is no semantic impact from the attribute, my recommendation is to never use the likelihood attributes. They’re just not worth it.
Rules 0) and 1) sound like good candidates for future compiler warnings. Thanks for insight.
This is just a huge failing of the standard syntax. The GCC __builtin_expect() function works better and makes it much more obvious which parts of the code are actually being annotated.
I’ll never understand why standards committees continually fail to understand that their inventions will always end up worse than de facto standards that have been refined over decades of use. Yes, changing the syntax still counts as “invention”.
The example with `MY_COOL_ASSERT` makes zero sense. Why would anyone put an `[[unlikely]]` attribute into the common control path of an assert-like macro? This is still an example of bad/nonsensical code, nothing else.
In all the samples “if else” case is the only one I’d consider as a problem.
How do you know it is a problem through? Standard do not mention anything like that.
It looks rather like implementation depended behavior. Which is true for any “hint” attribute anyway.
All other samples are just bad code. If bad code is using some particular syntax, it’s not always a problem of that syntax.
The problem with the profile guided optimization suggestion is that there are tons of environments (e.g. embedded) where this is simply not feasible or even possible. Whenever it comes to optimization there is a discrepancy between the real world and some standard behaviour as the standard behaviour does not talk about anything related to optimization, yet people in the real world have real problems that need to be solved
@AndreyT: You put the [[unlikely]] attribute there because it’s unlikely that the assert will be triggered! It’s not correct, but it looks _probable_ and unfortunately that’s enough to get it in somewhere in a large codebase. If the example were correct, I think it’d still apply. :)
Fun fact: There is no way to influence the branch prediction unit in any modern CPU. So this entire thing can just change the order in which things are evaluated, usually adding or subtracting a constant to the heuristic value. Usually the result is worse than not bothering.
The entire thing is so useless that the C committee rejected these two attributes entirely. I guess we will be the only community stuck with this useless crap.
The mentioned idea of using the C++ language in a hard real-time system, and using these attributes to maybe meet the requirements thereof, makes my skin crawl. In that case, a language truly suited to such tasks should be used, like Ada.
Karl Napf writes “There is no way to influence the branch prediction unit in any modern CPU. So this entire thing can just change the order in which things are evaluated, usually adding or subtracting a constant to the heuristic value.”
It is true that these statements don’t influence the branch prediction unit, but that isn’t the point. When the compiler generates code for the if/else it has to decide which path falls through after the test statement, and which path requires a jump. The start of the fall through case (where no jump is taken) is pretty much guaranteed to be in the icache (cache lines are 64 bytes on most modern CPUs, so at least the first few instructions in the fall through case are likely to be in the same cache line as the test instruction itself). The jump case may also be in the icache, but that is slightly less likely.
In general this is a very small optimization, and usually not worth polluting code with. That doesn’t mean it is *never* worth doing though.