0

I have some legacy code to understand and I stumbled upon the fact that inside the code the same struct is being accessed very very often. Would it make any difference if I save the content of the struct beforehand and then access the local copy instead of accessing through the pointer?

I already compared some testcode via a online assembler to see if it would optimize the code. Done that with https://godbolt.org/ ARM64 gcc8.2

Variant A

typedef struct STRUCT_D{ int myInt1IND; int myInt2IND; int myInt3IND; int myInt4IND; int myInt5IND; int myInt6IND; int myInt7IND; int myInt8IND; int myInt9IND; } STRUCT_D; typedef struct STRUCT_C{ STRUCT_D myStructInDIntINC; } STRUCT_C; typedef struct STRUCT_B{ STRUCT_C * myPointerB; } STRUCT_B; typedef struct STRUCT_A{ STRUCT_B * myPointerA; } STRUCT_A; int square(void) { struct STRUCT_C myStructC; struct STRUCT_B myStructB; struct STRUCT_A myStructA; struct STRUCT_A* startPointer; myStructC.myStructInDIntINC.myInt1IND = 55; myStructB.myPointerB = &myStructC; myStructA.myPointerA = &myStructB; startPointer = &myStructA; int myresult = startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt1IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt2IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt3IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt4IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt5IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt6IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt7IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt8IND + startPointer->myPointerA->myPointerB->myStructInDIntINC.myInt9IND; return myresult; } 

Variant B

typedef struct STRUCT_D{ int myInt1IND; int myInt2IND; int myInt3IND; int myInt4IND; int myInt5IND; int myInt6IND; int myInt7IND; int myInt8IND; int myInt9IND; } STRUCT_D; typedef struct STRUCT_C{ STRUCT_D myStructInDIntINC; } STRUCT_C; typedef struct STRUCT_B{ STRUCT_C * myPointerB; } STRUCT_B; typedef struct STRUCT_A{ STRUCT_B * myPointerA; } STRUCT_A; int square(void) { struct STRUCT_C myStructC; struct STRUCT_B myStructB; struct STRUCT_A myStructA; struct STRUCT_A* startPointer; myStructC.myStructInDIntINC.myInt1IND = 55; myStructB.myPointerB = &myStructC; myStructA.myPointerA = &myStructB; startPointer = &myStructA; struct STRUCT_D myResultStruct = startPointer->myPointerA->myPointerB->myStructInDIntINC; int myresult = myResultStruct.myInt1IND + myResultStruct.myInt2IND + myResultStruct.myInt3IND + myResultStruct.myInt4IND + myResultStruct.myInt5IND + myResultStruct.myInt6IND + myResultStruct.myInt7IND + myResultStruct.myInt8IND + myResultStruct.myInt9IND; return myresult; } 

I know that STRUCT_D is not fully initialized, but is for this example not relevant. My question would be if variant B is "better". Of course it is better readable, but does it make sense to save the context of a pointer. As I said in my file the same pointer is being dereferenced approximately 150 times in the same function. I know I know.. This function should definitely be refactored. :D

6
  • 1
    A decent compiler should optimize away the repeated dereferences in Variant A. I propose a Variant C which is like your Variant B, but has struct STRUCT_D *myResultStruct = &startPointer->myPointerA->myPointerB->myStructInDIntINC; and int myresult = myResultStruct->myInt1IND + myResultStruct->myInt2IND + .... Just to give the compiler a helping hand (if it fails to optimize the code) and save typing. Commented Jun 24, 2019 at 16:27
  • I understood that but if I am using the online assembler compiler then the assembler code is a lot bigger for Variant A then for Variant B. How can that be if the compiler is optimizing it? Commented Jun 24, 2019 at 20:53
  • Compilers vary in their ability to optimize. For gcc 8.3 on x86_64, your Variant B produces smaller code when optimization is disabled (and smaller than my Variant C by a couple of bytes). But with even the minimum optimization level -O1, gcc produced identical code for your Variant A and Variant B (and my Variant C). Commented Jun 25, 2019 at 9:27
  • 1
    Godbolt has a text box for entering compiler options. It doesn't seem to turn on any optimization options by default. Commented Jun 25, 2019 at 9:36
  • What if the final structs are not accessed so often directly one after another but with a lot of different code in between. Would it be then wiser to save the final pointer rather than access it every time? Or WHEN would it make a difference? Commented Jun 25, 2019 at 10:55

2 Answers 2

2

There would be no real difference, as any optimizing compiler (gcc, clang) would optimize this into a stack variable and/or a register.

Sign up to request clarification or add additional context in comments.

Comments

0

Copying data to a local can be useful to let compilers prove that no other accesses through other pointers read or write it.

So basically for the same reason you'd use int *restrict p. If you use void func(struct foo *restrict ptr) then you're promising the compiler that any access to ptr->member is not going to change the value you read via any other pointer or from a global-scope variable.

Type-based alias analysis can already help significantly; accesses through a float* can't affect any int objects, for example. (Unless your program contains strict-aliasing UB; some compilers let you define that behaviour, e.g. gcc -fno-strict-aliasing).

If you aren't doing assignments or reads through other pointers (which the compiler has to assume might be pointing to a member of a struct), it won't make a difference: alias analysis will succeed and let the compiler keep a struct member in a register across other accesses to memory, just like it could for a local.

(alias analysis is typically easy for locals, especially if they've never even had their address taken then nothing can be pointing to them.)


BTW, the reason the compiler is allowed to optimize away non-volatile / non-_Atomic memory accesses is that it's undefined behaviour to write a non-atomic object at the same time another thread is reading or writing it.

That makes it safe to assume that variables don't change unless you write them yourself, and that you don't need the value in memory to be "in sync" with the C abstract machine except when you make non-inline function calls. (For any object that some unknown function might have pointers to. This is typically not the case for local vars like loop counters, so they can be kept in call-preserved registers instead of being spilled/reloaded.)


But there is a potential downside to declaring locals to hold copies of globals or pointed-to data: if the compiler doesn't end up keeping that local in a register for the whole function, it might end up having to actually copy the data into stack memory so it can reread from there. (If it can't prove that the original object is unchanged.)


Normally just favor readability over this level of micro-optimization, but have a look at the optimized asm for some platform you care about if you're curious. If there's a lot of unnecessary store/reload happening, then try using locals.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.