Revisions to How are C++ style references implemented behind the scenes? Could they be implemented without pointers?

added 293 characters in body

edited Jun 7, 2023 at 8:17

5.7k
1
14
35

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. It would give an incorrect answer if pd were a null pointer; in that case, this function should also return a null pointer.

This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

Having said all that, implementing a reference as a value rather than a pointer would be an optimisation. The main barrier would be escape analysis; the compiler would have to be convinced that the location referred to isn't visible from outside the current function (e.g. another thread).

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. It would give an incorrect answer if pd were a null pointer; in that case, this function should also return a null pointer.

This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. It would give an incorrect answer if pd were a null pointer; in that case, this function should also return a null pointer.

This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

Having said all that, implementing a reference as a value rather than a pointer would be an optimisation. The main barrier would be escape analysis; the compiler would have to be convinced that the location referred to isn't visible from outside the current function (e.g. another thread).

added 22 characters in body

Source Link

edited Jun 7, 2023 at 6:54

Pseudonym

5.7k
1
14
35

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. ThisIt would begive an incorrect answer if pd were a null pointer; in that case, this function should also return a null pointer.

This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. This would be incorrect if pd were a null pointer; in that case, this should also return a null pointer. This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. It would give an incorrect answer if pd were a null pointer; in that case, this function should also return a null pointer.

This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

Source Link

answered Jun 7, 2023 at 6:45

Pseudonym

5.7k
1
14
35

From an implementation perspective, there is one very important difference between a pointer and a reference in C++: references cannot be null.

This is undefined behaviour:

int* px = nullptr; int &rx = *px;

This doesn't make a difference except in one specific scenario: casting in the presence of multiple inheritance.

Consider this:

struct base1 { uint64_t member1; }; struct base2 { uint64_t member2; }; struct derived : public base1, base2 { };

What code should this function compile to?

base2& cast_from_derived_to_base2_ref(derived& pd) { return (base2&)pd; }

If you said "basically add 8", you would be correct. Clang 16.0.0 on x64 at -O2 generates this code:

cast_from_derived_to_base2_ref(derived&): lea rax, [rdi + 8] ret

Now, what code should this function compile to?

base2* cast_from_derived_to_base2_ptr(derived* pd) { return (base2*)pd; }

If you said "the same code", you would be wrong. This would be incorrect if pd were a null pointer; in that case, this should also return a null pointer. This is the code that Clang 16.0.0 generates:

cast_from_derived_to_base2_ptr(derived*): lea rax, [rdi + 8] test rdi, rdi cmove rax, rdi ret

Of course, there are also high-level semantic differences (references are immutable, for example), but this is the key difference in implementation.

Stack Exchange Network

Return to Answer