I'm writing a language interpreter in C, and my string type contains a length attribute, like so:
struct String { char* characters; size_t length; }; Because of this, I have to spend a lot of time in my interpreter handling this kind of string manually since C doesn't include built-in support for it. I've considered switching to simple null-terminated strings just to comply with the underlying C, but there seem to be a lot of reasons not to:
Bounds-checking is built-in if you use "length" instead of looking for a null.
You have to traverse the entire string to find its length.
You have to do extra stuff to handle a null character in the middle of a null-terminated string.
Null-terminated strings deal poorly with Unicode.
Non-null-terminated strings can intern more, i.e. the characters for "Hello, world" and "Hello" can be stored in the same place, just with different lengths. This can't be done with null-terminated strings.
String slice (note: strings are immutable in my language). Obviously the second is slower (and more error-prone: think about adding error-checking of begin and end to both functions).
struct String slice(struct String in, size_t begin, size_t end) { struct String out; out.characters = in.characters + begin; out.length = end - begin; return out; } char* slice(char* in, size_t begin, size_t end) { char* out = malloc(end - begin + 1); for(int i = 0; i < end - begin; i++) out[i] = in[i + begin]; out[end - begin] = '\0'; return out; } After all this, my thinking is no longer about whether I should use null-terminated strings: I'm thinking about why C uses them!
So my question is: are there any benefits to null-termination that I'm missing?