C Programming/Alternative tokens
C was designed in English and assumes the common ASCII character set made for English, which includes such characters as {, }, [, ], and so on. Some other character sets (like EBCDIC on mainframes), however, do not have these or other characters which are required by C. Since even the simplest C program uses curly braces to define main(), developing in C would be impossible on these systems.
To solve this problem, trigraph sequences can be substituted for the symbols.[1] They will work in any situation; the first translation phase of compilation replaces the trigraph sequences with their corresponding single-character equivalents.[2]
Trigraphs were later supplemented with digraphs,[3] but neither were in widespread use except in mainframe systems. Trigraphs have been removed from C in version C23.[4] However, they are described here as they may appear in some legacy systems' codebases.
Trigraphs
[edit | edit source]The following trigraph sequences exist, and no other. Each question mark ? that does not begin one of the trigraph sequences listed is not changed.
| Sequence | Replacement |
|---|---|
??= | # |
??( | [ |
??/ | \ |
??) | ] |
??' | ^ |
??< | { |
??! | | |
??> | } |
??- | ~ |
The effect of this is that source code such as
??=include <stdio.h> int main(int argc, char **argv) ??< char message??(20??) = "??-??-Hello, world.??-??-"; printf("%s??/n", message); ??> will, after trigraphs are replaced, be the equivalent of
#include <stdio.h> int main(int argc, char **argv) { char message[20] = "~~Hello, world.~~"; printf("%s\n", message); } Should the programmer want a trigraph not to be replaced, within strings and character constants (which is the only place they would need replacing and it would change things), the programmer can simply escape the second question mark; e.g.
printf ("Two question marks in a row: ?\?!\n"); MSVC and GCC do not process trigraphs by default, needing the /Zc:trigraphs or -trigraphs (respectively) flag to be passed on the command line to the compiler.[5][6] Clang processes them by default unless the C standard is set to a GNU mode.[7]
Digraphs
[edit | edit source]Digraphs are equivalent to the following tokens except for their spelling:
| Digraph | Equivalent |
|---|---|
<: | [ |
:> | ] |
<% | { |
%> | } |
%: | # |
%:%: | ## |
In other words, they behave differently when stringized as part of a macro replacement, but are otherwise equivalent.
iso646.h
[edit | edit source]Header iso646.h focuses only on solving the symbol problem for operators, but takes a different approach of defining keyword-like macros to stand in for them. It is named after the ISO/IEC 646 character encoding standard, which has some variants that do not have these symbols; some variants also must rely on trigraphs or digraphs.
Macros
[edit | edit source]The iso646.h header defines the following 11 macros as stated below:
| Macro | Defined as |
|---|---|
| and | && |
| and_eq | &= |
| bitand | & |
| bitor | | |
| compl | ~ |
| not | ! |
| not_eq | != |
| or | || |
| or_eq | |= |
| xor | ^ |
| xor_eq | ^= |
References
[edit | edit source]- ↑ 1989 C standard (section 5.2.1.1)
- ↑ 1989 C standard (section 5.1.1.2)
- ↑ 1995 C standard (section 6.4.6)
- ↑ https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2940.pdf
- ↑ "/Zc:trigraphs (Trigraphs Substitution)". Microsoft Learn. Retrieved 2025-11-03.
- ↑ "Initial processing (The C Preprocessor)". gcc.gnu.org. Retrieved 2025-11-03.
- ↑ "Clang Compiler User's Manual — Clang 22.0.0git documentation". clang.llvm.org. Retrieved 2025-11-03.