8

I want to implement (what represents abstractly) a two dimensional 4x4 matrix. All the code I write for matrix multiplication et cetera will be entirely "unrolled" as it were -- that is to say, I will not be using loops to access and write data entries in the matrix.

My question is: In C, would it be faster to use a struct as such:

typedef struct { double e0, e1, e2, e3, e4, ..., e15 } My4x4Matrix; 

Or would this be faster:

typedef double My4x4Matrix[16]; 

Given that I will be accessing each matrix element individually as such:

My4x4Matrix a,b,c; // (Some initialization of a and b.) ... c.e0=a.e0+b.e0; c.e1=a.e1+b.e1; ... 

Or

My4x4Matrix a,b,c; // (Some initialization of a and b.) ... c[0]=a[0]+b[0]; c[1]=a[1]+b[1]; ... 

Or are they exactly the same speed?

4 Answers 4

18

Any decent compiler will generate the exact same code, byte-for-byte. However, using arrays allows you a lot more flexibility; when accessing the matrix elements, you can choose whether you want to access fixed locations or address positions with variables.

I also highly question your choice to "unwind" (unroll?) all the operations by hand. Any good compiler can fully unroll loops with a constant number of iterations for you, and can perhaps even generate SIMD code and/or optimally schedule the order of instructions. You'll have a hard time doing better by hand, and you'll end up with code that's hideous to read. The fact that you asked this question suggests to me that you're probably not sufficiently experienced to do better than even a naive optimizing compiler.

Sign up to request clarification or add additional context in comments.

4 Comments

Probably right about my experience. Still, I think I will try rolled and unrolled to see for myself what works better. I could use the experience.
@Collin You should read gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Optimize-Options.html . While there are other compilers, many of the technologies are the same. Note that loop unrolling can make run time slower.
This is probably more useful (as in more current), but thanks for your link. gcc.gnu.org/onlinedocs/gcc-4.4.5/gcc/Optimize-Options.html
Leaving out the version number in the link is even better; then you'll always get the latest gcc docs: gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
13

Struct elements (fields) can only be accessed by their names explicitly specified in the program's source, which means that every time you access a field the actual field must be selected and hardcoded at compile time. If you wanted to implement the same thing with arrays, that would mean that you would use explicit constant compile-time array indices (as in your example). In this case the performance of the two will be exactly the same and the code generated will be exactly the same (excluding from consideration "malicious" compilers).

However, note that arrays provide you with an extra degree of freedom: if necessary, you can select array elements by a run-time index. This is something that's not possible with structs. Only you know whether it matters to you.

On the other hand, note also that arrays in C are not copyable, which means that you'll be forced to use memcpy to copy your array-based My4x4Matrix. With struct-based version normal language-level copying will work. With arrays this issue can be worked around by wrapping the actual array in a struct.

2 Comments

You can put the array inside a struct and then it's copyable directly, but also has all the advantages of an array.
the second paragraph is very important difference! good point!
2

I guess both are the same speed. The difference between a struct and an array is just its meaning (in human terms.) Both will be compiled as memory addresses.

Comments

2

I would say the best way is to create a test to try it yourself. Results may vary based on system environments and compilers.

1 Comment

this is actually the sanest answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.