I know there is a standard behind all C compiler implementations, so there should be no hidden features. Despite that, I am sure all C developers have hidden/secret tricks they use all the time.
- It'd be great if you/someone were to edit the “question” to indicate the pick of the best hidden features, such as in the C# and Perl versions of this question.Donal Fellows– Donal Fellows2010-05-26 13:19:05 +00:00Commented May 26, 2010 at 13:19
56 Answers
More of a trick of the GCC compiler, but you can give branch indication hints to the compiler (common in the Linux kernel)
#define likely(x) __builtin_expect((x),1) #define unlikely(x) __builtin_expect((x),0) see: http://kerneltrap.org/node/4705
What I like about this is that it also adds some expressiveness to some functions.
void foo(int arg) { if (unlikely(arg == 0)) { do_this(); return; } do_that(); ... } 1 Comment
int8_t int16_t int32_t uint8_t uint16_t uint32_t These are an optional item in the standard, but it must be a hidden feature, because people are constantly redefining them. One code base I've worked on (and still do, for now) has multiple redefinitions, all with different identifiers. Most of the time it's with preprocessor macros:
#define INT16 short #define INT32 long And so on. It makes me want to pull my hair out. Just use the freaking standard integer typedefs!
11 Comments
The comma operator isn't widely used. It can certainly be abused, but it can also be very useful. This use is the most common one:
for (int i=0; i<10; i++, doSomethingElse()) { /* whatever */ } But you can use this operator anywhere. Observe:
int j = (printf("Assigning variable j\n"), getValueFromSomewhere()); Each statement is evaluated, but the value of the expression will be that of the last statement evaluated.
4 Comments
initializing structure to zero
struct mystruct a = {0}; this will zero all stucture elements.
14 Comments
memset/calloc do "all bytes zero" (i.e. physical zeroes), which is indeed not defined for all types. { 0 } is guaranteed to intilaize everything with proper logical zero values. Pointers, for example, are guranteed to get their proper null values, even if the null-value on the given platform is 0xBAADFOOD.memset does (with 0 as second argument). You get logical zero when you initialize/assign 0 ( or { 0 }) to the object in the source code. These two kinds of zeros do not necessarily produce the same result. As in the example with pointer. When you do memset on a pointer, you get a 0x0000 pointer. But when you assign 0 to a pointer, you get null pointer value, which at the physical level might be 0xBAADF00D or anything else.double. Usually it is implemented in accordance with IEEE-754 standard, in which the logical zero and physical zero are the same. But IEEE-754 is not required by the language. So it might happen that when you do double d = 0; (logical zero), physically some bits in memory occupied by d will not be zero.Function pointers. You can use a table of function pointers to implement, e.g., fast indirect-threaded code interpreters (FORTH) or byte-code dispatchers, or to simulate OO-like virtual methods.
Then there are hidden gems in the standard library, such as qsort(),bsearch(), strpbrk(), strcspn() [the latter two being useful for implementing a strtok() replacement].
A misfeature of C is that signed arithmetic overflow is undefined behavior (UB). So whenever you see an expression such as x+y, both being signed ints, it might potentially overflow and cause UB.
9 Comments
Multi-character constants:
int x = 'ABCD'; This sets x to 0x41424344 (or 0x44434241, depending on architecture).
EDIT: This technique is not portable, especially if you serialize the int. However, it can be extremely useful to create self-documenting enums. e.g.
enum state { stopped = 'STOP', running = 'RUN!', waiting = 'WAIT', }; This makes it much simpler if you're looking at a raw memory dump and need to determine the value of an enum without having to look it up.
14 Comments
I never used bit fields but they sound cool for ultra-low-level stuff.
struct cat { unsigned int legs:3; // 3 bits for legs (0-4 fit in 3 bits) unsigned int lives:4; // 4 bits for lives (0-9 fit in 4 bits) // ... }; cat make_cat() { cat kitty; kitty.legs = 4; kitty.lives = 9; return kitty; } This means that sizeof(cat) can be as small as sizeof(char).
8 Comments
C has a standard but not all C compilers are fully compliant (I've not seen any fully compliant C99 compiler yet!).
That said, the tricks I prefer are those that are non-obvious and portable across platforms as they rely on the C semantic. They usually are about macros or bit arithmetic.
For example: swapping two unsigned integer without using a temporary variable:
... a ^= b ; b ^= a; a ^=b; ... or "extending C" to represent finite state machines like:
FSM { STATE(x) { ... NEXTSTATE(y); } STATE(y) { ... if (x == 0) NEXTSTATE(y); else NEXTSTATE(x); } } that can be achieved with the following macros:
#define FSM #define STATE(x) s_##x : #define NEXTSTATE(x) goto s_##x In general, though, I don't like the tricks that are clever but make the code unnecessarily complicated to read (as the swap example) and I love the ones that make the code clearer and directly conveying the intention (like the FSM example).
10 Comments
Interlacing structures like Duff's Device:
strncpy(to, from, count) char *to, *from; int count; { int n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } } 10 Comments
I'm very fond of designated initializers, added in C99 (and supported in gcc for a long time):
#define FOO 16 #define BAR 3 myStructType_t myStuff[] = { [FOO] = { foo1, foo2, foo3 }, [BAR] = { bar1, bar2, bar3 }, ... The array initialization is no longer position dependent. If you change the values of FOO or BAR, the array initialization will automatically correspond to their new value.
1 Comment
anonymous structures and arrays is my favourite one. (cf. http://www.run.montefiore.ulg.ac.be/~martin/resources/kung-f00.html)
setsockopt(yourSocket, SOL_SOCKET, SO_REUSEADDR, (int[]){1}, sizeof(int)); or
void myFunction(type* values) { while(*values) x=*values++; } myFunction((type[]){val1,val2,val3,val4,0}); it can even be used to instanciate linked lists...
3 Comments
gcc has a number of extensions to the C language that I enjoy, which can be found here. Some of my favorites are function attributes. One extremely useful example is the format attribute. This can be used if you define a custom function that takes a printf format string. If you enable this function attribute, gcc will do checks on your arguments to ensure that your format string and arguments match up and will generate warnings or errors as appropriate.
int my_printf (void *my_object, const char *my_format, ...) __attribute__ ((format (printf, 2, 3))); Comments
the (hidden) feature that "shocked" me when I first saw is about printf. this feature allows you to use variables for formatting format specifiers themselves. look for the code, you will see better:
#include <stdio.h> int main() { int a = 3; float b = 6.412355; printf("%.*f\n",a,b); return 0; } the * character achieves this effect.
Comments
Well... I think that one of the strong points of C language is its portability and standardness, so whenever I find some "hidden trick" in the implementation I am currently using, I try not to use it because I try to keep my C code as standard and portable as possible.
3 Comments
Compile-time assertions, as already discussed here.
//--- size of static_assertion array is negative if condition is not met #define STATIC_ASSERT(condition) \ typedef struct { \ char static_assertion[condition ? 1 : -1]; \ } static_assertion_t //--- ensure structure fits in STATIC_ASSERT(sizeof(mystruct_t) <= 4096); Comments
Constant string concatenation
I was quite surprised not seeing it allready in the answers, as all compilers I know of support it, but many programmers seems to ignore it. Sometimes it's really handy and not only when writing macros.
Use case I have in my current code: I have a #define PATH "/some/path/" in a configuration file (really it is setted by the makefile). Now I want to build the full path including filenames to open ressources. It just goes to:
fd = open(PATH "/file", flags); Instead of the horrible, but very common:
char buffer[256]; snprintf(buffer, 256, "%s/file", PATH); fd = open(buffer, flags); Notice that the common horrible solution is:
- three times as long
- much less easy to read
- much slower
- less powerfull at it set to an arbitrary buffer size limit (but you would have to use even longer code to avoid that without constant strings contatenation).
- use more stack space
1 Comment
Well, I've never used it, and I'm not sure whether I'd ever recommend it to anyone, but I feel this question would be incomplete without a mention of Simon Tatham's co-routine trick.
Comments
When initializing arrays or enums, you can put a comma after the last item in the initializer list. e.g:
int x[] = { 1, 2, 3, }; enum foo { bar, baz, boom, }; This was done so that if you're generating code automatically you don't need to worry about eliminating the last comma.
3 Comments
Struct assignment is cool. Many people don't seem to realize that structs are values too, and can be assigned around, there is no need to use memcpy(), when a simple assignment does the trick.
For example, consider some imaginary 2D graphics library, it might define a type to represent an (integer) screen coordinate:
typedef struct { int x; int y; } Point; Now, you do things that might look "wrong", like write a function that creates a point initialized from function arguments, and returns it, like so:
Point point_new(int x, int y) { Point p; p.x = x; p.y = y; return p; } This is safe, as long (of course) as the return value is copied by value using struct assignment:
Point origin; origin = point_new(0, 0); In this way you can write quite clean and object-oriented-ish code, all in plain standard C.
7 Comments
Strange vector indexing:
int v[100]; int index = 10; /* v[index] it's the same thing as index[v] */ C compilers implement one of several standards. However, having a standard does not mean that all aspects of the language are defined. Duff's device, for example, is a favorite 'hidden' feature that has become so popular that modern compilers have special purpose recognition code to ensure that optimization techniques do not clobber the desired effect of this often used pattern.
In general hidden features or language tricks are discouraged as you are running on the razor edge of whichever C standard(s) your compiler uses. Many such tricks do not work from one compiler to another, and often these kinds of features will fail from one version of a compiler suite by a given manufacturer to another version.
Various tricks that have broken C code include:
- Relying on how the compiler lays out structs in memory.
- Assumptions on endianness of integers/floats.
- Assumptions on function ABIs.
- Assumptions on the direction that stack frames grow.
- Assumptions about order of execution within statements.
- Assumptions about order of execution of statements in function arguments.
- Assumptions on the bit size or precision of short, int, long, float and double types.
Other problems and issues that arise whenever programmers make assumptions about execution models that are all specified in most C standards as 'compiler dependent' behavior.
4 Comments
When using sscanf you can use %n to find out where you should continue to read:
sscanf ( string, "%d%n", &number, &length ); string += length; Apparently, you can't add another answer, so I'll include a second one here, you can use "&&" and "||" as conditionals:
#include <stdio.h> #include <stdlib.h> int main() { 1 || puts("Hello\n"); 0 || puts("Hi\n"); 1 && puts("ROFL\n"); 0 && puts("LOL\n"); exit( 0 ); } This code will output:
Hi ROFL
Comments
using INT(3) to set break point at the code is my all time favorite
4 Comments
My favorite "hidden" feature of C, is the usage of %n in printf to write back to the stack. Normally printf pops the parameter values from the stack based on the format string, but %n can write them back.
Check out section 3.4.2 here. Can lead to a lot of nasty vulnerabilities.
2 Comments
Compile-time assumption-checking using enums: Stupid example, but can be really useful for libraries with compile-time configurable constants.
#define D 1 #define DD 2 enum CompileTimeCheck { MAKE_SURE_DD_IS_TWICE_D = 1/(2*(D) == (DD)), MAKE_SURE_DD_IS_POW2 = 1/((((DD) - 1) & (DD)) == 0) }; 2 Comments
#define CompilerAssert(exp) extern char _CompilerAssert[(exp)?1:-1])C99-style variable argument macros, aka
#define ERR(name, fmt, ...) fprintf(stderr, "ERROR " #name ": " fmt "\n", \ __VAR_ARGS__) which would be used like
ERR(errCantOpen, "File %s cannot be opened", filename); Here I also use the stringize operator and string constant concatentation, other features I really like.
1 Comment
Variable size automatic variables are also useful in some cases. These were added i nC99 and have been supported in gcc for a long time.
void foo(uint32_t extraPadding) { uint8_t commBuffer[sizeof(myProtocol_t) + extraPadding]; You end up with a buffer on the stack with room for the fixed-size protocol header plus variable size data. You can get the same effect with alloca(), but this syntax is more compact.
You have to make sure extraPadding is a reasonable value before calling this routine, or you end up blowing the stack. You'd have to sanity check the arguments before calling malloc or any other memory allocation technique, so this isn't really unusual.