30

I'm quite new at working with C++ and haven't grasped all the intricacies and subtleties of the language.

What is the most portable, correct and safe way to add an arbitrary byte offset to a pointer of any type in C++11?

SomeType* ptr; int offset = 12345 /* bytes */; ptr = ptr + offset; // <-- 

I found many answers on Stack Overflow and Google, but they all propose different things. Some variants I have encountered:

  1. Cast to char *:

    ptr = (SomeType*)(((char*)ptr) + offset); 
  2. Cast to unsigned int:

    ptr = (SomeType*)((unsigned int)ptr) + offset); 
  3. Cast to size_t:

    ptr = (SomeType*)((size_t)ptr) + offset); 
  4. "The size of size_t and ptrdiff_t always coincide with the pointer's size. Because of this, it is these types that should be used as indexes for large arrays, for storage of pointers and pointer arithmetic." - About size_t and ptrdiff_t on CodeProject

    ptr = (SomeType*)((size_t)ptr + (ptrdiff_t)offset); 
  5. Or like the previous, but with intptr_t instead of size_t, which is signed instead of unsigned:

    ptr = (SomeType*)((intptr_t)ptr + (ptrdiff_t)offset); 
  6. Only cast to intptr_t, since offset is already a signed integer and intptr_t is not size_t:

    ptr = (SomeType*)((intptr_t)ptr) + offset); 

And in all these cases, is it safe to use old C-style casts, or is it safer or more portable to use static_cast or reinterpret_cast for this?

Should I assume the pointer value itself is unsigned or signed?

18
  • 7
    There isn't any. It's undefined behaviour to add an arbitrary byte offset to a pointer. You can only do arithmetic on pointers that point to the same array (and one past the end of it). Commented Apr 10, 2013 at 18:57
  • 5
    @jrok It's perfectly well defined to add an arbitrary offset to a pointer. What's undefined is dereferencing a pointer that doesn't point to valid memory. Commented Apr 10, 2013 at 18:59
  • 1
    @sfstewman It won't cause errors on the implementations I know, but IIRC there's a clause prohibiting going more than one object past the end of an array (i.e. int a[5]; a + 5; is good, int a[5]; a + 6 is bad). Edit: found a source: stackoverflow.com/a/988220/395760 Commented Apr 10, 2013 at 19:07
  • 6
    @sfstewman: C++ draft n3092 5.7 5: “If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.” Commented Apr 10, 2013 at 19:12
  • 2
    @sfstewman You're wrong. The standard explicitly makes it UB (see the comment above). In practice, yeah, it just works, at least until you smash your own stack or something like that. Commented Apr 10, 2013 at 19:14

5 Answers 5

17

I would use something like:

unsigned char* bytePtr = reinterpret_cast<unsigned char*>(ptr); bytePtr += offset; 
Sign up to request clarification or add additional context in comments.

26 Comments

I would use the more condensed (reinterpret_cast<unsigned char *>(ptr) + offset) perhaps wrapped in an inline (possibly template) function, depending on how often I needed it and what the returned type ought to be.
@Virtlink: unsigned char is preferred for working with bytes because the language standard requires it be a simple binary representation of the value and that all bit patterns correspond to a value. In contrast, char and signed char might use two’s complement, one’s complement, or signed magnitude and might have bit patterns that do not correspond to a value.
@Virtlink: For the purposes of C and C++, an unsigned char is a byte. The allowances in the standard for the number of bits in a char to vary are for old or esoteric platforms where the memory is organized in something like 9-bit units, not so that a C or C++ implementation can give you 16-bit char objects while addressing uses 8-bit units.
@Virtlink Yes, a char doesn't need to be 8 bits, but you know what, you don't care. char is gauranteed to be the unit in which C++ measures sizes and thus the granularity of your systems addressing. And mixing code written for an 8-bit platform (and working at that low a level) with code for a 9-bit platform is hopefully something you're not planning to do.
@Virtlink Because the standard doesn't make any guarantees about casting to int, disturbing the int and casting back. The only thing you can do with a pointer cast to int is cast it back. Of course it will most probably work an any practical platform (in the same way any practical platform will have 8-bit chars), but it's really UB to use this pointer afterwards (and if ou don't want to use it, then why adding a offset anyway?). And in the end I don't even think anybody guarantees the pointer to convert into a byte address (again, on most practical platforms it will indeed do).
|
14

Using reinterpret_cast (or C-style cast) means circumventing the type system and is not portable and not safe. Whether it is correct, depends on your architecture. If you (must) do it, you insinuate that you know what you do and you are basically on your own from then on. So much for the warning.

If you add a number n to a pointer or type T, you move this pointer by n elements of type T. What you are looking for is a type where 1 element means 1 byte.

From the sizeof section 5.3.3.1.:

The sizeof operator yields the number of bytes in the object representation of its operand. [...] sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is implementation-defined.

Note, that there is no statement about sizeof(int), etc.

Definition of byte (section 1.7.1.):

The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. [...] The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.

So, if sizeof returns the number of bytes and sizeof(char) is 1, than char has the size of one byte to C++. Therefore, char is logically a byte to C++ but not necessarily the de facto standard 8-bit byte. Adding n to a char* will return a pointer that is n bytes (in terms of the C++ memory model) away. Thus, if you want to play the dangerous game of manipulating an object's pointer bytewise, you should cast it to one of the char variants. If your type also has qualifiers like const, you should transfer them to your "byte type" too.

 template <typename Dst, typename Src> struct adopt_const { using type = typename std::conditional< std::is_const<Src>::value, typename std::add_const<Dst>::type, Dst>::type; }; template <typename Dst, typename Src> struct adopt_volatile { using type = typename std::conditional< std::is_volatile<Src>::value, typename std::add_volatile<Dst>::type, Dst>::type; }; template <typename Dst, typename Src> struct adopt_cv { using type = typename adopt_const< typename adopt_volatile<Dst, Src>::type, Src>::type; }; template <typename T> T* add_offset(T* p, std::ptrdiff_t delta) noexcept { using byte_type = typename adopt_cv<unsigned char, T>::type; return reinterpret_cast<T*>(reinterpret_cast<byte_type*>(p) + delta); } 

Example

Comments

2

Please note that, NULL is special. Adding an offset on it is dangerous.
reinterpret_cast can't remove const or volatile qualifiers. More portable way is C-style cast.
reinterpret_cast with traits like @user2218982's answer, seems more safer.

template <typename T> inline void addOffset( std::ptrdiff_t offset, T *&ptr ) { if ( !ptr ) return; ptr = (T*)( (unsigned char*)ptr + offset ); } 

Comments

0

Mine isn't as elegant, but I hope is more readable. char helper_ptr; helper_ptr= (char) ptr;

Then you can traverse byte-by-byte using helper_ptr.

ptr = (SomeType*)(((char*)ptr) + 1) will advance the ptr by sizeof(SomeType) instead of 1 byte.

Comments

-2

if you have:

myType *ptr; 

and you do:

ptr+=3; 

The compiler will most certainly increment your variable by:

3*sizeof(myType) 

And it's the standard way to do it as far as I know.

If you want to iterate over let's say an array of elements of type myType that's the way to do it.

Ok, if you wanna cast do that using

myNewType *newPtr=reinterpret_cast < myNewType * > ( ptr ) 

Or stick to plain old C and do:

myNewType *newPtr=(myNewType *) ptr; 

And then increment

3 Comments

I know how it works when you don't cast. I want to add any byte offset (say, 0xABC bytes) to a pointer of any type MyType* regardless of its type's size. If MyType* ptr = (MyType*)0x1000 then I want to end up with ptr == (MyType*)0x1ABC.
You should avoid using C-style casts in C++ for anything except perhaps numerical casts. The compiler will apply the first C++ cast that works except for dynamic_cast. One issue (among others) is that C-style casts can remove the constness of an object with no indication it's happening either in source or via a compiler diagnostic.
How does this answer the question?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.