Some commentary about the accepted answer is at the bottom of this question post.
Problem statement
According to the C standard (C17 draft, 6.10.3.2 ¶2):
The order of evaluation of [the]
#and##operators is unspecified.
I am looking for an example where this evaluation order matters and where there are no other instances of undefined behavior and no errors.
After spending some time on this matter, I suspect that the following might work:
#define PRECEDENCETEST(a, b, c) # a ## b PRECEDENCETEST(c, , d) (Note that the preprocessor can be run as follows: cpp or gcc -E (GCC), cl /E (MSVC); see further below for a compilable dummy example. Note also that empty macro arguments are only legal since C99.)
My question: Does this actually work as an example where either relative evaluation order of # and ## produces legal output, according to the C standard? As I explain at the bottom of this post, the answer might, if I understand correctly, rely on whether the standard allows for the token after # to end up being different from the one originally specified.
If the answer is "yes (because ...)", then we've found an example! If the answer is "no, your example doesn't work (because ...)", then I'll later think of a way to solicit better examples.
(Note that the standard imposes no requirement that a compiler have an absolute relative evaluation order for the # and ## operators. The order could be: left-to-right, right-to-left, following some other logic, or entirely random.)
Documentation
Older GCC documentation (up to version 6.5 it seems) states:
The standard does not specify the order of evaluation of a chain of ‘
##’ operators, nor whether ‘#’ is evaluated before, after, or at the same time as ‘##’. You should therefore not write any code which depends on any specific ordering. It is possible to guarantee an ordering, if you need one, by suitable use of nested macros.An example of where this might matter is pasting the arguments ‘
1’, ‘e’ and ‘-2’. This would be fine for left-to-right pasting, but right-to-left pasting would produce an invalid token ‘e-2’.GCC 3.0 evaluates ‘
#’ and ‘##’ at the same time and strictly left to right. Older versions evaluated all ‘#’ operators first, then all ‘##’ operators, in an unreliable order.
(As for the ##-only example in the middle paragraph (ie: 1##e##-2): 1e is not a valid floating-constant (C17 draft, 6.4.4.2) but it's a valid pp-number ("preprocessing number"; C17 draft, 6.4.8) because a sole e is a valid identifier-nondigit. (Preprocessing numbers exist "to isolate the preprocessor from the full complexity of numeric constants"; see the GNU documentation for its C preprocessor.) That said, a better example would have been 2##.##e3 (valid for left-to-right but not right-to-left token concatenation), adapted from this MISRA discussion.)
For what it's worth, Wikipedia claims the following in its article on the C preprocessor:
[F]unction-like macro expansion occurs in the following stages:
- Stringification operations are replaced with the textual representation of their argument's replacement list (without performing expansion).
- Parameters are replaced with their replacement list (without performing expansion).
- Concatenation operations are replaced with the concatenated result of the two operands (without expanding the resulting token).
- Tokens originating from parameters are expanded.
- The resulting tokens are expanded as normal.
However, I can't find support for this specific order of evaluation in either the C standard or GNU's documentation for CPP (the C preprocessor, part of GCC), whose latest documentation as of the time of asking this question (GCC 13.2) is here.
Most importantly, none of the above-mentioned sources (incl the C17 standard) provide examples of a function-like macro which would evaluate to something different depending on the relative precedence of # and ## in the replacement-list of the macro.
I'm looking for examples that don't lead to otherwise undefined behavior or an error, because macros that are seemingly valid are a potential source of hard-to-find bugs. Important in this regard are the following two constraints:
- "If the replacement that results [from the
#operator] is not a valid character string literal, the behavior is undefined." (C17 draft, 6.10.3.2 ¶2) - "If the result [of token concatenation with
##] is not a valid preprocessing token, the behavior is undefined." (C17 draft, 6.10.3.3 ¶3)
Finding an example
The search for a suitable example turns out to be surprisingly tricky.
For one thing, string literals (C17 draft, 6.4.5) – which we are considering because they are the result of applying # – can barely be concatenated with anything else using ##:
##cannot be used to concatenate two string literals, because something like"abc""def"wouldn't be a valid preprocessing-token (C17 draft, 6.4 ¶1). Important here is to note that##-based token concatenation is not like the concatenation of string literals from translation phase 6 (C17 draft, 5.1.1.2 ¶1), which would merge"abc"and"def"into"abcdef".- String literals can optionally start with an encoding-prefix (
u8,u,U,L), but writing a replacement-list like[...] ## # bthat leads to valid preprocessing tokens requires a delicate balance of#s (which, aside from starting a preprocessing directive or from being within a string or character literal, can only exist as part of the preprocessing tokens#and##themselves), which I wasn't able to achieve. For example,
produces#define TEST(a, b) a ## # b TEST(, c)"c"under either evaluation order (assuming that#as the stringify operator can legally result from the application of##), and I am not sure whether this example can be morphed into one producing two different valid results depending on the evaluation order.
Also, something like a ## b # c doesn't work, because in this expression, the "a ## b" and "# c" parts are independent.
However, it seems like the following might work:
#include <stdio.h> #define PRECEDENCETEST(a, b, c) # a ## b int main(void) { printf("%s\n", PRECEDENCETEST(c, , d)); return 0; } Case A: With both GCC and MSVC, I get the output c, corresponding to a #-before-## evaluation order:
PRECEDENCETEST(c, , d)
# a ## b
"c" ## b
"c" ## <placemarker>
"c"
(A placemarker preprocessing token signifies an empty macro argument adjacent to ##. (C17 draft, 6.10.3.3 ¶2))
Case B: A ##-before-# evaluation order would give us the following:
PRECEDENCETEST(c, , d)
# a ## b
# c ## <placemarker>
# c
"d"
That is, the program's output would have to be d. Or would it? The last step here assumes that # can operate not only on parameters from the original replacement-list but also on those resulting from the application of ##. Note importantly that the following constraint (C17 draft, 6.10.3.2 ¶1)
Each # preprocessing token in the replacement list for a function-like macro shall be followed by a parameter as the next preprocessing token in the replacement list. [This doesn't apply to object-like macros.]
is not violated – it's just that in this example the actual parameter of # ends up being a different parameter (c) from the one specified in the replacement-list (a).
Commentary about the accepted answer:
I believe that the accepted answer represents the most sensible interpretation of the standard. In fact, the standard should have been written in a way to force any reader to the same conclusions.
However, I do believe that the standard's authors didn't think it through. The reason is this: The combination of
- the accepted answer and
- my musings (in the bullet points at the beginning of the section "Finding an example" of my question post) about
##-concatenation of two string literals
is relatively close to a proof that
there are no cases where the same input, parsed in two different ways which differ only in the order in which
#and##are applied, leads to two different output possibilities which neither invoke errors (such as violations of preprocessor constraints) nor undefined behavior.For, if there are indeed no such cases, the writers of the C standard could have simply prescribed "
#before##", as adding such a prescription wouldn't be able to affect existing valid/non-UB programs. (See my discussion with the answerer for additional details/points.)Similarly, if the C standard was as clear as the accepted answer suggests, why did the GCC maintainers and documentation authors (who evidently gave the matter some thought) not provide relevant commentary with a similar conclusion (or otherwise a contrasting example)?
#can be the result of other preprocessor actions such as##, but then that's kinda moot since that would rely on a specific evaluation order which the standard explicitly leaves unspecified. (Though it's not UB, assuming the compiler has chosen an evaluation order, I guess.)##-before-#evaluation order) should never result in"d", as that would involve two substitution passes. No sensible implementation should be doing that.