7

I need to get a string checksum or hash (or something equivalent) using just the C preprocessor, if possible.

The use case is as follows: I'm doing error logging on an embedded device with very limited memory and cpu. I would like to define a LogError() macro which inserts hash(__FILE__) and __LINE__ in a circular buffer (as 16bit numbers). But hash(__FILE__) needs to be compiled to a constant; if the actual filenames are stored as strings in the program that would use too much memory. The hash can be calculated using any method.

It is possible to #define FILE_ID with some unique number at the top of every file, and use that when logging, but that is not the preferred solution, it has a bit of a maintenance cost. Is there a better method?

10
  • 2
    In C, expressions like "foo.c"[0]+9*"foo.c"[1] are not constant expressions, but when used in code, may still actually compile to constants. Commented Feb 22, 2015 at 5:08
  • What if two files have the same hash? Commented Feb 22, 2015 at 5:12
  • @immibis: I guess I'd have to take my chances; there aren't anywhere near 62k files. Less worried about that than about someone accidentally defining FILE_ID to the same number in several files, if using the manual method... I would probably write a python script to update the IDs in that case. Commented Feb 22, 2015 at 5:15
  • 3
    This is not possible. Your compiler may or may not fold your hash function invocation into a constant; I would rather bet on it not being able to do so. The solution with FILE_ID is actually very low maintenance if you use something like -DFILE_ID=$$(myhashpgm $<) in your makefile. Commented Feb 22, 2015 at 5:39
  • 2
    This question is quite awkward since using the pre-processor to hash a string is useful in some cases - but almost certainly NOT a good solution for this use-case. I added own answer, but wonder if your detailed use-case should be a separate question on calculating logging ID's for primitive architectures. Commented Oct 4, 2017 at 10:41

3 Answers 3

7

The question "How to calculate the hash of a string literal using only the C preprocessor?" is valid, however I think you're adding a red-herring by including details about __FILE__ and logging ID's.

This means anyone answering needs to solve the problem you describe, or answer the question on hashing a string with the pre-processor (which may not be a good solution in your particular case!).

As it happens, __FILE__ expands to variable, not a literal string (GCC at least), so you will need to define the filename as a constant. You could use the build system to pass along a define for each for example.

As others have pointed out you can calculate the hash and pass this in via the build-system, although this avoids the question about hashing a sting literal.


Whatever the case, this question comes up when I searched for using the pre-processor for hashing, and none of the answers cover this, so heres is an answer that covers the string hashing part.

This is possible, albeit quite verbose

/** * Implement compile-time string hashing on string literals. * * This macro implements the widely used "djb" hash apparently posted * by Daniel Bernstein to comp.lang.c some time ago. The 32 bit * unsigned hash value starts at 5381 and for each byte 'c' in the * string, is updated: ``hash = hash * 33 + c``. This * function uses the signed value of each byte. * * note: this is the same hash method that glib 2.34.0 uses. */ #define SEED 5381 #if 0 // correct but causes insane expansion # define _SH(e, c) (((e) << 5) + (e) + (unsigned char)(c)) #elif defined(__GNUC__) // Use statement-expression extension # define _SH(e, c) ({ unsigned int _e = (unsigned int)(e); (_e << 5) + _e + (unsigned char)c; }) #else // use an inline function, the compiler will be able to optimize this out. static inline unsigned int _SH(unsigned int e, unsigned char c) { unsigned int _e = (unsigned int)e; return (_e << 5) + _e + (unsigned char)c; } #endif #define _SH_1(a) _SH(SEED, (a)[0]) #define _SH_2(a) _SH(_SH_1(a), (a)[1]) #define _SH_3(a) _SH(_SH_2(a), (a)[2]) #define _SH_4(a) _SH(_SH_3(a), (a)[3]) #define _SH_5(a) _SH(_SH_4(a), (a)[4]) #define _SH_6(a) _SH(_SH_5(a), (a)[5]) #define _SH_7(a) _SH(_SH_6(a), (a)[6]) #define _SH_8(a) _SH(_SH_7(a), (a)[7]) #define _SH_9(a) _SH(_SH_8(a), (a)[8]) #define _SH_10(a) _SH(_SH_9(a), (a)[9]) #define _SH_11(a) _SH(_SH_10(a), (a)[10]) #define _SH_12(a) _SH(_SH_11(a), (a)[11]) #define _SH_13(a) _SH(_SH_12(a), (a)[12]) #define _SH_14(a) _SH(_SH_13(a), (a)[13]) #define _SH_15(a) _SH(_SH_14(a), (a)[14]) #define _SH_16(a) _SH(_SH_15(a), (a)[15]) #define _SH_17(a) _SH(_SH_16(a), (a)[16]) #define _SH_18(a) _SH(_SH_17(a), (a)[17]) #define _SH_19(a) _SH(_SH_18(a), (a)[18]) #define _SH_20(a) _SH(_SH_19(a), (a)[19]) #define _SH_21(a) _SH(_SH_20(a), (a)[20]) #define _SH_22(a) _SH(_SH_21(a), (a)[21]) #define _SH_23(a) _SH(_SH_22(a), (a)[22]) #define _SH_24(a) _SH(_SH_23(a), (a)[23]) #define _SH_25(a) _SH(_SH_24(a), (a)[24]) #define _SH_26(a) _SH(_SH_25(a), (a)[25]) #define _SH_27(a) _SH(_SH_26(a), (a)[26]) #define _SH_28(a) _SH(_SH_27(a), (a)[27]) #define _SH_29(a) _SH(_SH_28(a), (a)[28]) #define _SH_30(a) _SH(_SH_29(a), (a)[29]) #define _SH_31(a) _SH(_SH_30(a), (a)[30]) #define _SH_32(a) _SH(_SH_31(a), (a)[31]) // initial check prevents too-large strings from compiling #define STRHASH(a) ( \ (void)(sizeof(int[(sizeof(a) > 33 ? -1 : 1)])), \ (sizeof(a) == 1) ? SEED : \ (sizeof(a) == 2) ? _SH_1(a) : \ (sizeof(a) == 3) ? _SH_2(a) : \ (sizeof(a) == 4) ? _SH_3(a) : \ (sizeof(a) == 4) ? _SH_3(a) : \ (sizeof(a) == 5) ? _SH_4(a) : \ (sizeof(a) == 6) ? _SH_5(a) : \ (sizeof(a) == 7) ? _SH_6(a) : \ (sizeof(a) == 8) ? _SH_7(a) : \ (sizeof(a) == 9) ? _SH_8(a) : \ (sizeof(a) == 10) ? _SH_9(a) : \ (sizeof(a) == 11) ? _SH_10(a) : \ (sizeof(a) == 12) ? _SH_11(a) : \ (sizeof(a) == 13) ? _SH_12(a) : \ (sizeof(a) == 14) ? _SH_13(a) : \ (sizeof(a) == 15) ? _SH_14(a) : \ (sizeof(a) == 16) ? _SH_15(a) : \ (sizeof(a) == 17) ? _SH_16(a) : \ (sizeof(a) == 18) ? _SH_17(a) : \ (sizeof(a) == 19) ? _SH_18(a) : \ (sizeof(a) == 20) ? _SH_19(a) : \ (sizeof(a) == 21) ? _SH_20(a) : \ (sizeof(a) == 22) ? _SH_21(a) : \ (sizeof(a) == 23) ? _SH_22(a) : \ (sizeof(a) == 24) ? _SH_23(a) : \ (sizeof(a) == 25) ? _SH_24(a) : \ (sizeof(a) == 26) ? _SH_25(a) : \ (sizeof(a) == 27) ? _SH_26(a) : \ (sizeof(a) == 28) ? _SH_27(a) : \ (sizeof(a) == 29) ? _SH_28(a) : \ (sizeof(a) == 30) ? _SH_29(a) : \ (sizeof(a) == 31) ? _SH_30(a) : \ (sizeof(a) == 32) ? _SH_31(a) : \ (sizeof(a) == 33) ? _SH_32(a) : \ 0) // last zero is unreachable // only for comparison unsigned int strhash_func(const void *str) { const signed char *p; unsigned int h = 5381; for (p = str; *p != '\0'; p++) { h = (h << 5) + h + (unsigned int)*p; } return h; } /* -------------------------------------------------------------------- */ #include <stdio.h> #define TEST_STR1 "Hello World" #define TEST_STR2 "Testing 123" int main(void) { unsigned int A = STRHASH(TEST_STR1); unsigned int B = STRHASH(TEST_STR2); printf("String hash: const %u <- '%s'\n", STRHASH(TEST_STR1), TEST_STR1); printf("String hash: const %u <- '%s'\n", STRHASH(TEST_STR2), TEST_STR2); printf("String hash: dyn %u <- '%s'\n", strhash_func(TEST_STR1), TEST_STR1); printf("String hash: dyn %u <- '%s'\n", strhash_func(TEST_STR2), TEST_STR2); #if defined(__GNUC__) printf("Is this known at compile time?, answer is: %d\n", __builtin_constant_p(A)); #endif } 

Note, for some reason Clang 5.0 prints answer is: 0, however on closer inspection is does in fact know the value at compile time, __builtin_constant_p just doesn't seem to work as GCC does.

Sign up to request clarification or add additional context in comments.

Comments

0

You're asking too much of the preprocessor.

Better just build each file with a 16bit checksum of its name defined as preprocessor macro, as @n.m suggests.

One trifling difficulty for this solution is laying your hands on a 16bit checksum utility. The GNU cksum tool is just 32-bit.

But FreeBSD has a better one that lets you choose 16 or 32 bits. If your development system is a Debian derivative then you can get it by:

sudo apt-get install freebsd-buildutils 

Then run:

dpkg-query -L freebsd-buildutils 

to see where the FreeBSD cksum has been installed. In my case, it is:

/usr/bin/freebsd-cksum 

but you might find differently.

You direct FreeBSD cksum to produce a 16bit checksum by passing the option -o 1. You can check its man page.

Take care when generating the preprocessor macro that defines a filename checksum that you define the checksum as a 16 bit unsigned int, as that's what you want. Should it come out as a plain decimal numeral, for instance, it would default to signed int, which could cause you surprises.

Here's a sketch of the solution with GNU make:

main.c

#include <stdio.h> #include <stdint.h> int main(int argc, char **argv) { printf("%hu\n",FILE_CHKSUM); return 0; } 

Makefile

.phony: all clean SRCS = main.c OBJS = $(SRCS:.c=.o) # Take the 1st word of the output of `echo <filename> | freebsd-cksum -o 1` FILE_CHKSUM = $(word 1,$(shell echo $(1) | freebsd-cksum -o 1)) all: prog %.o:%.c $(CC) $(CPPLAGS) $(CFLAGS) -DFILE_CHKSUM='((uint16_t)$(call FILE_CHKSUM,$<))' -c -o $@ $< prog: $(OBJS) $(CC) $(CFLAGS) $(LDFLAGS) -o $@ $(OBJS) $(LDLIBS) clean: rm -f $(OBJS) prog 

Running make:

cc -DFILE_CHKSUM='((uint16_t)3168)' -c -o main.o main.c cc -o prog main.o 

Run prog:

./prog 3168 

Comments

0

How to do this in CMake ? I can not figure out how to run script for every source file, and the result place as a value for -DFILE_CHEKSUM

Method with STRHASH, has a big limitation - it takes maximum of 32 characters as input, and when I have longer names, it just does not compile. I think that computing it before build and passing it as -D is beter approach, but I just can not manage to do it in cmake :(

1 Comment

never mind, I used ``` static inline unsigned int strhash() { const char *p = FILE; // ignore clang-diagnostic-static-local-in-inline - expected behaviour in this case static unsigned int h = 5381; // NOLINT if(h!=5381) { return h; } for (;*p != '\0'; p++) { h = (h << 5) + h + (unsigned int)*p; } return h; } ``` so after calculating it once in a file, I will just return fast

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.