13

What actually belongs to the "character type" in C11 — besides char of course?

To be more precise, the special exceptions for the character type (for example that any object can be accessed by an lvalue expression of character type — see §6.5/7 in C11 standard), to which concrete types do they apply? They seem to apply to uint8_t and int8_t from stdint.h, but is this guaranteed? On the other hand gcc doesn't regard char16_t from uchar.h as a "character type".

5
  • Also signed char and unsigned char. Commented Aug 8, 2016 at 8:38
  • Note that int8_t and uint8_t are just aliases for existing types. Commented Aug 8, 2016 at 8:38
  • There have been serious proposals to base int8_t and uint8_t on extended integer types, functionally identical to signed char and unsigned char respectively except that they would not count as "character types" for §6.5/7. As far as I know, no implementation has carried through this idea, but I'm not aware of any reason it's forbidden, either. (The advantage of this would be, for instance, that you could now have string pointers that didn't alias all the other pointers in the program.) Commented Aug 8, 2016 at 13:16
  • @zwol Do you mean by using std::basic_string<uint8_t>, etc.? Commented Aug 8, 2016 at 13:38
  • @underscore_d Essentially yes. It would be awkward on account of all the library functions that expect plain char* and/or std::string, but it could be done. I suspect careful use of restrict gets you at least 90% of the benefit, though. Commented Aug 8, 2016 at 14:30

2 Answers 2

8

Only char, signed char and unsigned char1.

The types uint8_t, int8_t, char16_t, or any type in the form intN_t or charN_t, may or may not be synonyms for a character type.


1(Quoted from: ISO/IEC 9899:201x 6.2.5 Types 15)
The three types char, signed char, and unsigned char are collectively called the character types.

Sign up to request clarification or add additional context in comments.

4 Comments

If sizeof(char) == sizeof(int8_t) then int8_t is a character type?
@wolf-revo-cats No, that is not guaranteed. It could be theoretically typedefed as an extended integer type.
@2501 The intN_t types are optional and cannot exist on a machine that does not have an exactly N-bit type. And since char types must be the smallest addressible unit on a machine, and must be at least 8 bits... if a machine supports the int8_t types, then they must be aliases to [[un]signed] char. Or have I missed a logical loophole somewhere? I guess some exotic machine could offer types with the same width but different signages or bit representations... making them more suited to one or the other of char or intN_t.
@underscore_d The point 2501 was making is that a system could provide types with the same size and representation but that are "technically" different types (in that signatures for functions taking one type will not match the other, etc)
7

char, signed char, and unsigned char are the character types in C11. This is the same since C89.

Treating int8_t (or uint8_t) as a character type has many problems.

  1. They are optional.
  2. They may not exist if CHAR_BIT > 8.
  3. Defined to work if the implementation uses 2's complement representation (which is the most common). But they are other representations, namely 1's complement and sign-magnitude defined/allowed by the C standard.

Since they are, if they exist, typedef'ed to an existing type, you can probably get away with using int8_t or uint8_t as a character type in practice. But the standard doesn't guarantee anything and there's no reason to treat them as such anyway when you have the real character types.

3 Comments

I previously used the cstdint typedefs but stopped and went back to good old chars for precisely the reasons you gave. It just makes sense, states intent properly as per the special allowances given to chars, and protects against potential accidents later (on some exotic implementation).
I would like to ask, if there actually exists an implementation, where char is larger than one byte? If you look for example at GNU libc's memset at the lines cccc = (unsigned char) c; cccc |= cccc << 8; cccc |= cccc << 16; this would break if char is larger than one byte. So with GNU is seems implicitly to be guaranteed that char is one byte large.
See: What platforms have something other than 8-bit char? for some examples. Yes, that'd break glibc. But glibc typically uses arch-specific assembly code for memset, memcpy etc. So, that code might not be the one actually used. Besides, CHAR_BIT!=8 allowance is mainly for DSPs. But you can reasonably assume CHAR_BIT=8 on most desktop systems and POSIX requires CHAR_BIT to be exactly 8 bits. But, the standard covers a lot of other systems as well.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.