3

I am a bit confused with some of the basic string implementation. I have been going through the source to understand the inner working and learn new things. I can't entirely grasp how the memory is managed.

Just some tidbits from the basic string implementation

  • The raw allocator is for char type

    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc; 
  • ...then when allocating Rep is placed within the allocated buffer __size is calculated to also fit the characters

    size_type __size = (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep); void* __place = _Raw_bytes_alloc(__alloc).allocate(__size); _Rep *__p = new (__place) _Rep; 
  • This is how the character data is fetched from the _Rep buffer

    _CharT* _M_refdata() throw() { return reinterpret_cast<_CharT*>(this + 1); } 
  • Setting up the character - for one type of way

    _M_assign(__p->_M_refdata(), __n, __c); 

What is bothering me is that the raw allocator is type char, but the allocated memory may hold a _Rep object, plus the character data (which does not have to be type char)

Also, why (or rather how) does the call to _M_refdata know where the start (or end) of the character data is within the buffer (ie this+1)

Edit: does this+1 just push the internal pointer to the next position after the _Rep object?

I have a basic understanding of memory alignment and casting, but this seems to go beyond anything I have read up on.

Can anybody help, or point me to more informative reading material?

2 Answers 2

5

You're missing the placement new. The line

_Rep *__p = new (__place) _Rep; 

initializes a new _Rep-object at __place. The space for this has already been allocated before (meaning a placement-new doesn't allocate by itself, it's actually only a constructor call).

Pointer arithmetics in C and C++ tells you, that this + 1 is a pointer that points sizeof(*this) bytes right of this. Since there have been allocated (__capacity + 1) * sizeof(_CharT) + sizeof(_Rep) bytes before, the space after the _Rep object is used for the character data. The layout is thus like this:

| _Rep | (__capacity + 1) * _CharT | 
Sign up to request clarification or add additional context in comments.

Comments

0

Allocators, like C's malloc, return pointers to bytes, not objects. So, the return type is either char * or void *.

Somewhere in the C and C++ standards, there is a clause that explicitly allows reinterpret casting between char and any other object type. This is because C often needs to treat objects as byte arrays (as when writing to disk or a network socket) and it needs to treat byte arrays as objects (like when allocating a range of memory or reading from disk).

To protect against aliasing and optimization problems, you're not allowed to cast the same char * to different types of objects and once you've casted a char * to an object type, you are not allowed to modify the object's value by writing to the bytes.

6 Comments

This has nothing to do with char. You can reinterpret_cast all simple pointer types into each other.
@filmor: not true. That violates the strict aliasing rule.
@filmor: The language will let you do it, but it does not promise to compile it to do what you think it should do. For example, after casting a char* to double it might load the double into a register. Then you write something to the char *. It will not change the value of the double.
Thanks for enlightening me :). The rule of thumb "it compiles" just doesn't work here ;)
ok thanks (filmor and zan). the default string allocator (eg std::allocator<wchar_t> , std::allocator<char> ,...) is the argument to the _Raw_bytes_alloc which is std::allocator<char>. is this correct? then why is this?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.