44

I have a function that I would like to be able to return special values for failure and uninitialized (it returns a pointer on success).

Currently it returns NULL for failure, and -1 for uninitialized, and this seems to work... but I could be cheating the system. IIRC, addresses are always positive, are they not? (although since the compiler is allowing me to set an address to -1, this seems strange).

[update]

Another idea I had (in the event that -1 was risky) is to malloc a char @ the global scope, and use that address as a sentinel.

4
  • What are you trying to do with this? What's interface being used for. Combining so many different kinds of outputs on the return value seems like the wrong way to do this. Commented Jul 22, 2010 at 0:10
  • @kirk.burleson: int* foo(){ return -1;} gives a warning on G++ warning: return makes pointer from integer without a cast. I'm not sure whether that proves or disproves your point, but I'm still irked when you say "C compilers don't care what you feed 'em and they'll try to compile anything." (In g++, that's an error error: invalid conversion from ‘int’ to ‘int*’, BTW.) Commented Jul 22, 2010 at 3:08
  • @kirk.burleson: in my last comment, the warning is on GCC, the error is on G++. Commented Jul 23, 2010 at 15:54
  • Why not return a simple two value struct with one value being the pointer and the second value being a status code? It will not be that much more trouble and provide much greater flexibility and would be thread safe as well. Commented Sep 13, 2016 at 11:23

13 Answers 13

83

No, addresses aren't always positive - on x86_64, pointers are sign-extended and the address space is clustered symmetrically around 0 (though it is usual for the "negative" addresses to be kernel addresses).

However the point is mostly moot, since C only defines the meaning of < and > pointer comparisons between pointers that are to part of the same object, or one past the end of an array. Pointers to completely different objects cannot be meaningfully compared other than for exact equality, at least in standard C - if (p < NULL) has no well defined semantics.

You should create a dummy object with static storage duration and use its address as your unintialised value:

extern char uninit_sentinel; #define UNINITIALISED ((void *)&uninit_sentinel) 

It's guaranteed to have a single, unique address across your program.

Sign up to request clarification or add additional context in comments.

11 Comments

+1 Just to mention a modification of this idea. The dedicated sentinel has the disadvantage that you have to instantiate it in one of your objects. If you just want to have a macro you could use the address of a known system variable of which you judge to be not a valid result of your function. There are not too many such variables defined, but on a POSIX system e.g environ would do the trick.
@caf: could you point to a resource to verify that pointers in amd64 are sign-extended? Never have I read anything that implies or states this. Perhaps you are referring to the requirement that canonical addresses must have bits 48 through 63 of any virtual address to be copies of bit 47? If this is what you mean, it does not imply "negative" pointers. Neither does RIP-relative addressing.
@mfukar: Personally I find the architecture ABI to be a more authoritative document than Wikipedia. In the end, as I'm sure you know, it is ultimately a matter of interpretation or how you conceptualise the address space anyway.
@mfukar: Perhaps you could enlighten us as to what the "negative half of the address space" refers to then, if not the obvious. (And at least I haven't been retconning my comments!)
The 'negative half of the address space' refers to the x64 variant of the System V OS specification's interpretation of the bit pattern as a signed value, and as such applies only to environments which choose to make that interpretation. It could easily have been written to say 'addresses with the msb set'. Windows takes an unsigned interpretation, and calls such addresses the high address space. There is nothing in the x64 instruction set which implies that the value should be interpreted one way or the other, though most OS I've seen use unsigned - +1 for finding a signed pointer environment
|
23

The valid values for a pointer are entirely implementation-dependent, so, yes, a pointer address could be negative.

More importantly, however, consider (as an example of a possible implementation choice) the case where you are on a 32-bit platform with a 32-bit pointer size. Any value that can be represented by that 32-bit value might be a valid pointer. Other than the null pointer, any pointer value might be a valid pointer to an object.

For your specific use case, you should consider returning a status code and perhaps taking the pointer as a parameter to the function.

1 Comment

Careful though, if your pointer is too negative it might end up adressing the machine next to your current one.
18

It's generally a bad design to try to multiplex special values onto a return value... you're trying to do too much with a single value. It would be cleaner to return your "success pointer" via argument, rather than the return value. That leaves lots of non-conflicting space in the return value for all of the conditions you want to describe:

int SomeFunction(SomeType **p) { *p = NULL; if (/* check for uninitialized ... */) return UNINITIALIZED; if (/* check for failure ... */) return FAILURE; *p = yourValue; return SUCCESS; } 

You should also do typical argument checking (ensure that 'p' isn't NULL).

7 Comments

This is absolutely the right way to design this function. Anything else will be a maintenance disaster and a bug magnet for anyone else using the code, and should be strongly dis-recommended.
Possibly. The guy who "invented" null pointers said it was a mistake, IIRC. Another special-case value may be a problem. Even so, sometimes using two separate values where one will do leads to overcomplex code. A common approach for simplifying some common algorithms is to assign special-case past-the-end objects, for instance, rather than use nulls - it avoids special-case null checks. Having a "valid" flag still needs those at-the-end checks, just in a different form. A valid pointer to a special object is a special-case pointer, and often saves a lot of complexity.
"The guy" is C.A.R. Hoare. On the other hand, he more than made up for the "billion dollar mistake" with the invention of Quicksort :-)
@James - all those guys, they're just guys, you know? I probably should remember Hoare, but the who is just history. The ideas are more important. Also, I find it helps to be vague - hard for people to contradict me when they don't know who I'm quoting ;-)
@Steve314: Yes, in a very specific context (where you can control SomeType in my example) having some common "special case" objects/pointers can work and be a little more streamlined... but in the general case separating the status and the returned object is more maintainable.
|
6

The C language does not define the notion of "negativity" for pointers. The property of "being negative" is a chiefly arithmetical one, not in any way applicable to values of pointer type.

If you have a pointer-returning function, then you cannot meaningfully return the value of -1 from that function. In C language integral values (other than zero) are not implicitly convertible to pointer types. An attempt to return -1 from a pointer-returning function is an immediate constraint violation that will result in diagnostic message. In short, it is an error. If your compiler allows it, it simply means that it doesn't enforce that constraint too strictly (most of the time they do it for compatibility with pre-standard code).

If you force the value of -1 to pointer type by an explicit cast, the result of the cast will be implementation-defined. The language itself makes no guarantees about it. It might easily prove to be the same as some other, valid pointer value.

If you want to create a reserved pointer value, there no need to malloc anything. You can simple declare a global variable of the desired type and use its address as the reserved value. It is guaranteed to be unique.

Comments

4

Pointers can be negative like an unsigned integer can be negative. That is, sure, in a two's-complement interpretation, you could interpret the numerical value to be negative because the most-significant-bit is on.

2 Comments

Are you saying that they can be negative when cast to a signed type e.g. int? You already know this I'm sure, but other readers might not, but an unsigned number will not store negative values. The sign bit James refers to is present in signed types only, and is precisely what makes a datatype a signed type.
WonderWorker, the bit you mention is present in signed and unsigned types, the only difference is how you interpret that bit. In signed types, that is the sign. In unsigned types, that is the second half of the range of that type
1

What's the difference between failure and unitialized. If unitialized is not another kind of failure, then you probably want to redesign the interface to separate these two conditions.

Probably the best way to do this is to return the result through a parameter, so the return value only indicates an error. For example where you would write:

void* func(); void* result=func(); if (result==0) /* handle error */ else if (result==-1) /* unitialized */ else /* initialized */ 

Change this to

// sets the *a to the returned object // *a will be null if the object has not been initialized // returns true on success, false otherwise int func(void** a); void* result; if (func(&result)){ /* handle error */ return; } /*do real stuff now*/ if (!result){ /* initialize */ } /* continue using the result now that it's been initialized */ 

1 Comment

I'm not specifically returning uninitialized. I'm working with a linked list, which is passed in as an argument, but which may or may not be initialized. Previously I had it set to NULL initially, but this conflicted with my returning "NULL" for failure. Thanks for your suggestions.
0

@James is correct, of course, but I'd like to add that pointers don't always represent absolute memory addresses, which theoretically would always be positive. Pointers also represent relative addresses to some point in memory, often a stack or frame pointer, and those can be both positive and negative.

So your best bet is to have your function accept a pointer to a pointer as a parameter and fill that pointer with a valid pointer value on success while returning a result code from the actual function.

1 Comment

Sure? The relative offset is usually an int in my experience.
0

James answer is probably correct, but of course describes an implementation choice, not a choice that you can make.

Personally, I think addresses are "intuitively" unsigned. Finding a pointer that compares as less-than a null pointer would seem wrong. But ~0 and -1, for the same integer type, give the same value. If it's intuitively unsigned, ~0 may make a more intuitive special-case value - I use it for error-case unsigned ints quite a lot. It's not really different (zero is an int by default, so ~0 is -1 until you cast it) but it looks different.

Pointers on 32-bit systems can use all 32 bits BTW, though -1 or ~0 is an extremely unlikely pointer to occur for a genuine allocation in practice. There are also platform-specific rules - for example on 32-bit Windows, a process can only have a 2GB address space, and there's a lot of code around that encodes some kind of flag into the top bit of a pointer (e.g. for balancing flags in balanced binary trees).

Comments

0

Actually, (at least on x86), the NULL-pointer exception is generated not only by dereferencing the NULL pointer, but by a larger range of addresses (eg, first 65kb). This helps catching such errors as

int* x = NULL; x[10] = 1; 

So, there are more addresses that are garanteed to generate the NULL pointer exception when dereferenced. Now consider this code (made compilable for AndreyT):

#include <stdlib.h> #include <stdio.h> #include <string.h> #define ERR_NOT_ENOUGH_MEM (int)NULL #define ERR_NEGATIVE (int)NULL + 1 #define ERR_NOT_DIGIT (int)NULL + 2 char* fn(int i){ if (i < 0) return (char*)ERR_NEGATIVE; if (i >= 10) return (char*)ERR_NOT_DIGIT; char* rez = (char*)malloc(strlen("Hello World ")+sizeof(char)*2); if (rez) sprintf(rez, "Hello World %d", i); return rez; }; int main(){ char* rez = fn(3); switch((int)rez){ case ERR_NOT_ENOUGH_MEM: printf("Not enough memory!\n"); break; case ERR_NEGATIVE: printf("The parameter was negative\n"); break; case ERR_NOT_DIGIT: printf("The parameter is not a digit\n"); break; default: printf("we received %s\n", rez); }; return 0; }; 

this could be useful in some cases. It won't work on some Harvard architectures, but will work on von Neumann ones.

2 Comments

I'm not sure that's true "on x86" so much as on modern operating systems. The chip provides the ability to map a process address space to a physical address space etc, but it's the OS that usually decides which parts of the process address space are valid.
This will not even compile. Some C compilers with rather loose error checking will let you assign an integer value to a pointer (even though it is illegal in C), but none I know of will let you use a pointer as a controlling value for switch statement.
0

Do not use malloc for this purpose. It might keep unnecessary memory tied up (if a lot of memory is already in use when malloc gets called and the sentinel gets allocated at a high address, for example) and it confuses memory debuggers/leak detectors. Instead simply return a pointer to a local static const char object. This pointer will never compare equal to any pointer the program could obtain in any other way, and it only wastes one byte of bss.

2 Comments

Are you sure that the const char objects can never be coalesced like string literals can? (If you're sure they can't, perhaps you'd like to answer my question on the subject which (in my mind) had no real consensus.)
Yes, quite sure. See 6.5.9 paragraph 6.
0

You don't need to care about the signness of a pointer, because it's implementation defined. The real question here is "how to return special values from a function returning pointer?" which I've explained in detail in my answer to the question Pointer address span on various platforms

In summary, the all-one bit pattern (-1) is (almost) always safe, because it's already at the end of the spectrum and data cannot be stored wrapped around to the first address, and the malloc family never returns -1. In fact this value is even returned by many Linux system calls and Win32 APIs to indicate another state for the pointer. So if you need just failure and uninitialized then it's a good choice

But you can return far more error states by utilizing the fact that variables must be aligned properly (unless you specified some other options). For example in a pointer to int32_t the low 2 bits are always zero which means only ¹⁄₄ of the possible values are valid addresses, leaving all of the remaining bit patterns for you to use. So a simple solution would be just checking the lowest bit

int* result = func(); if (!result) error_happened(); else if ((uintptr_t)result & 1) uninitialized(); 

In this case you can return both a valid pointer and some additional data at the same time

You can also use the high bits for storing data in 64-bit systems. On ARM there's a flag that tells the CPU to ignore the high bits in the addresses. On x86 there isn't a similar thing but you can still use those bits as long as you make it canonical before dereferencing. See Using the extra 16 bits in 64-bit pointers

See also

Comments

-1

NULL is the only valid error return in this case, this is true anytime an unsigned value such as a pointer is returned. It may be true that in some cases pointers will not be large enough to use the sign bit as a data bit, however since pointers are controlled by the OS not the program I would not rely on this behavior.

Remember that a pointer is basically a 32-bit value; whether or not this is a possible negative or always positive number is just a matter of interpretation (i.e.) whether the 32nd bit is interpreted as the sign bit or as a data bit. So if you interpreted 0xFFFFFFF as a signed number it would be -1, if you interpreted it as an unsigned number it would be 4294967295. Technically, it is unlikely that a pointer would ever be this large, but this case should be considered anyway.

As far as an alternative you could use an additional out parameter (returning NULL for all failures), however this would require clients to create and pass a value even if they don't need to distinguish between specific errors.

Another alternative would be to use the GetLastError/SetLastError mechanism to provide additional error information (This would be specific to Windows, don't know if that is an issue or not), or to throw an exception on error instead.

2 Comments

On a 64 bit systems, pointers are 64 bit values stackoverflow.com/questions/399003/…
a pointer is basically a 32-bit value that's far from correct. 64-bit systems have been available for decades. And not every systems are 32 or 64-bit. Have you ever used DOS or things like a microcontroller with 14-bit address bus?
-1

Positive or negative is not a meaningful facet of pointer type. They pertain to signed integer including signed char, short, int etc.

People talk about negative pointer mostly in a situation that treats pointer's machine representation as an integer type. e.g. reinterpret_cast<intptr_t>(ptr). In this case, they are actually talking about the cast integer, not the pointer itself.

In some scenario I think pointer is inherently unsigned, we talk about address in terms below or above. 0xFFFF.FFFF is above 0x0AAAA.0000, which is intuitively for human beings. Although 0xFFFF.FFFF is actually a "negative" while 0x0AAA.0000 is positive.

But in other scenarios such as pointer subtraction (ptr1 - ptr2) that results in a signed value whose type is ptrdiff_t, it's inconsistent when you compare with integer's subtraction, signed_int_a - signed_int_b results in a signed int type, unsigned_int_a - unsigned_int_b produces an unsigned type. But for pointer subtraction, it produces a signed type, because the semantic is the distance between two pointers, the unit is number of elements.

In summary I suggest treating pointer type as standalone type, every type has it's set of operation on it. For pointers (excluding function pointer, member function pointer, and void *):

  1. List item
  2. +, +=

    ptr + any_integer_type

  3. -, -=

    ptr - any_integer_type

    ptr1 - ptr2

  4. ++ both prefix and postfix

  5. -- both prefix and postfix

Note there are no / * % operations for pointer. That's also supported that pointer should be treated as a standalone type, instead of "A type similar to int" or "A type whose underlying type is int so it should looks like int".

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.