7

An obvious example of undefined behavior (UB), when reading a value, is:

int a; printf("%d\n", a); 

What about the following examples?

int i = i; // `i` is not initialized when we are reading it by assigning it to itself. int x; x = x; // Is this the same as above? int y; int z = y; 

Are all three examples above also UB, or are there exceptions to it?

20
  • 4
    All three have undefined behaviour. Commented May 23, 2021 at 18:20
  • 1
    @Dan the links from the duplicate you linked in comment, say that it is syntactically legal, but the behaviour is undefined. Commented May 23, 2021 at 18:26
  • 2
    Reading from an uninitialised variable is undefined behaviour. Right there and then. Commented May 23, 2021 at 18:30
  • 1
    int i = i; is semantically equivalent to int i; i = i; Both are UB. But just because you have UB in your code doesn't mean the compiler have to do something about it, it's part of the whole undefined bit. A decent compiler will be able to detect it and warn you about it though, but from the compilers point of view it's not an error. Commented May 23, 2021 at 18:39
  • 2
    It is not because of what the compiler, or the processor might or might not do, but because the C standard says so although I can't find the relevant clause. Commented May 23, 2021 at 19:01

3 Answers 3

10

Each of the three lines triggers undefined behavior. The key part of the C standard, that explains this, is section 6.3.2.1p2 regarding Conversions:

Except when it is the operand of the sizeof operator, the _Alignof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; additionally, if the lvalue has atomic type, the value has the non-atomic version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

In each of the three cases, an uninitialized variable is used as the right-hand side of an assignment or initialization (which for this purpose is equivalent to an assignment) and undergoes lvalue to rvalue conversion. The part in bold applies here as the objects in question have not been initialized.

This also applies to the int i = i; case as the lvalue on the right side has not (yet) been initialized.

There was debate in a related question that the right side of int i = i; is UB because the lifetime of i has not yet begun. However, that is not the case. From section 6.2.4 p5 and p6:

5 An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration, as do some compound literals. The result of attempting to indirectly access an object with automatic storage duration from a thread other than the one with which the object is associated is implementation-defined.

6 For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end,execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached

So in this case the lifetime of i begins before the declaration in encountered. So int i = i; is still undefined behavior, but not for this reason.

The bolded part of 6.3.2.1p2 does however open the door for use of an uninitialized variable not being undefined behavior, and that is if the variable in question had it's address taken. For example:

int a; printf("%p\n", (void *)&a); printf("%d\n", a); 

In this case it is not undefined behavior if:

  • The implementation does not have trap representations for the given type, OR
  • The value chosen for a happens to not be a trap representation.

In which case the value of a is unspecified. In particular, this will be the case with GCC and Microsoft Visual C++ (MSVC) in this example as these implementations do not have trap representations for integer types.

Sign up to request clarification or add additional context in comments.

2 Comments

Look at this other way int a; ... might cause UB, but int a; a = 5 is not UB. thus int a = a it depends on next line. So int a = a; a = 5 is this UB, I think not?
@KPCT The second line doesn't matter. int a = a; by itself is undefined behavior as per 6.3.2.1p1. If at some point later &a was used then it might not be UB depending on the implementation.
1

Use of the not initialized automatic storage duration objects invokes UB.

Use of the not initialized static storage duration objects is defined as they are initialized to 0s

int a; int foo(void) { static int b; int c; int d = d; //UB static int e = e; //OK printf("%d\n", a); //OK printf("%d\n", b); //OK printf("%d\n", c); //UB } 

4 Comments

is UB or not when.. was this an answer or a question?
Assume compiler optimization is disabled, then is it UB on its own?
That doesn't mean anything. It could still provide assembly but be UB in general. maybe it woks in one compiler but not in another.
@Dan yes clang generates the code. It is UB
1

In cases where an action on an object of some type might have unpredictable consequences on platforms where the type has trap representations, but have at-least-somewhat predictable behavior for types that don't, the Standard will seek to avoid distinguishing platforms that do or don't define the behavior by throwing everything into the catch-all category of "Undefined Behavior".

With regard to the behavior of uninitialized or partially-initialized objects, I don't think there's ever been a consensus over exactly which corner cases must be treated as though objects were initialized with Unspecified bit patterns, and which cases need not be treated in such fashion.

For example, given something like:

struct ztstr15 { char dat[16]; } x,y; void test(void) { struct zstr15 hey; strcpy(hey.dat, "Hey"); x=hey; y=hey; } 

Depending upon how x and y will be used, there are at least four ways it might be useful to have an implementation process the above code:

  1. Squawk if an attempt is made to copy any automatic-duration object that isn't fully initialized. This could be very useful in cases where one must avoid leakage of confidential information.

  2. Zero-fill all unused portions of hey. This would prevent leakage of confidential information on the stack, but wouldn't flag code that might cause such leakage if the data weren't zero-filled.

  3. Ensure that all parts of x and y are identical, without regard for whether the corresponding members of hey were written.

  4. Write the first four bytes of x and y to match those of hey, but leave some or all of the remaining portions holding whatever they held before test() was called.

I don't think the Standard was intended to pass judgment as to whether some of those approaches would be better or worse than others, but it would have been awkward to write the Standard in a manner that would define behavior of test() while allowing for option #3. The optimizations facilitated by #3 would only be useful if programmers could safely write code like the above in cases where client code wouldn't care about the contents of x.dat[4..15] and y.dat[4..15]. If the only way to guarantee anything about the behavior of that function would be to write all portions of hey were written, including those whose values would be irrelevant to program behavior, that would nullify any optimization advantage approach #3 could have offered.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.