0

It is discussed in another thread that reading a file with the following code can result in an infinite loop because EOF is an integer outside the range of char and the while condition therefore cannot become true:

FILE* f; char c; std::string s; f = fopen("/some/file", "r") while ((c = getc(f)) != EOF) s += c; 

Actually, I have observed that this results in an infinite loop under Linux on Raspberyy Pi, but works fine under Linux on a different hardware.

Why does it work at all?

The only explanation that comes to my mind is that the return value of the assingment statement (c = getc(f)) is not well-defined and returns the left value (a char in this case) on some platforms and the right value (an int in this case) on other platforms.

Can someone shed light on this behaviour?

8
  • 3
    Assuming CHAR_BIT is 8 ... YOU ALWAYS LOSE A CHARACTER ... there are 256 different characters "on the keyboard" (more probably in a file); getc() returns a int value from the set {<some negative value>, 0, 1, 2, ..., 255} (257 elements). There is absolutely no way to put 257 distinct values inside a 8-bit char. Using a char for the intermediate value, you are always, no matter the implementation (with 8-bit signed or unsigned char), either losing EOF or some other character (possibly '\xFF' or signed equivalent). Commented Dec 19, 2024 at 9:36
  • 1
    I'm using a function incorrectly. Why does it work sometimes - why does it fail sometimes? Makes no sense. Just use the function correctly and you are safe. Why can I give printf for %s a null pointer on some systems and not on others? Answer: Just don't. No reason to speculate on what happens when you do things wrong. Just do it correct from the start Commented Dec 19, 2024 at 9:42
  • 1
    @phuclv, No, it's not a duplicate of that. The OP is already aware that EOF is not a char, and didn't ask why that's the case. Commented Dec 19, 2024 at 13:49
  • How do you read the DEL char? Commented Dec 19, 2024 at 14:56
  • 1
    @4386427 This is an interesting philosophical point: Is only getting it right important or is there some value in understanding why it (not) works? To parody a famous engineering quote: "It is not only important that a bridge stands, but also to understand why it breaks.". Commented Dec 19, 2024 at 20:01

2 Answers 2

8

It depends on how the type char behaves: whether as type signed char or as type unsigned char. If it behaves as unsigned char then a value in such an object never can be equal to EOF. If it behaves as signed char then due to the integer promotions it can be equal to EOF that usually is defined as -1.

Try the following demonstrarion program.

#include <stdio.h> int main(void) { unsigned char c1 = EOF; signed char c2 = EOF; printf( "c1 == EOF is %s\n", c1 == EOF ? "true" : "false" ); printf( "c2 == EOF is %s\n", c2 == EOF ? "true" : "false" ); return 0; } 

Its output is

c1 == EOF is false c2 == EOF is true 

Whether the type char behaves as the type signed char or unsigned char depends on compiler options.

From the C Standard (6.2.5 Types)

15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

For example as far as I know in IBM mainframes the type char behaves as the type unsigned char by default.

Sign up to request clarification or add additional context in comments.

Comments

5

@vlad-from-moscow beat me to it, explaining that signedness of char is implementation defined. That being said: Many compilers will inform you in some way or another, if char is unsigned. For example GCC will

#define __CHAR_UNSIGNED__ 1 

if it's the case for the target¹. However, a well written C program will hardly ever need to take such things into account. In your case if you rewrote the loop as

int c; while( (c = getc(f)) != EOF ){ s += (char)c; } 

it would not exhibit such implementation dependent behavior. On targets where char is signed it will also lead to identical machine code.


1: With GCC and Clang you can run the command

gcc -dM -E - < /dev/null 

or

clang -dM -E - < /dev/null 

to get the full set of all compiler pre-defined macro values. Of course this also applies to cross compilers.

2 Comments

"Many compilers will inform you in some way or another" --> What works for all: CHAR_MIN < 0 is true when char is signed.
CHAR_MAX < UCHAR_MAX should also work everywhere as long as you've included <limits.h>.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.