I have written an C implementation of AES and have tried to make it as fast as possible (Im just starting out in Programming and have training in IT). I have achieved an Speed increase of around 600% so far but its still awfully slow. To Compare my AES-Implementation with something i have used the "openssl speed" command in the Linux-Terminal. In 3 seconds this implementation encrypts around 36 977 043 blocks (16byte). I am ~25 times slower (at 72 seconds for the 36... bytes) than that which kinda sucks. Im curious about 2 things.
- What would be a good goal to achieve, how fast is a realistic goal to aim at.
- Why is my Code so slow, and how can i change that.
To my code: I have tried to leave out on some of my functions so see how much faster the code gets without them. The full code took 72 seconds.
- Without Mixcolumns 14 seconds #here is a big problem
- Without Shiftrows 67 seconds
- Without Subbytes 61 seconds
My encryption function:
uint32_t * encrypt(uint32_t * expkey,uint32_t state[4]){ uint32_t temp[4]; state[0] = state[0] ^ expkey[0]; state[1] = state[1] ^ expkey[1]; state[2] = state[2] ^ expkey[2]; state[3] = state[3] ^ expkey[3]; for(int round = 1; round < Nr; round++){ // Subbytes for (int c = 0; c < 4;c++){ temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]); } // Shiftrows state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF); state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF); state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF); state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF); // Mixcolums for (int c = 0; c < 4;c++){ state[c] = ((xtime((state[c] >> 24) & 0xFF) ^ xtime3((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 24) + ((((state[c] >> 24) & 0xFF) ^ xtime((state[c] >> 16) & 0xFF) ^ xtime3((state[c] >> 8) & 0xFF) ^ (state[c] & 0xFF)) << 16) + ((((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ xtime((state[c] >> 8) & 0xFF) ^ xtime3(state[c] & 0xFF)) << 8 ) + (xtime3((state[c] >> 24) & 0xFF) ^ ((state[c] >> 16) & 0xFF) ^ ((state[c] >> 8) & 0xFF) ^ xtime(state[c] & 0xFF)); } // Add Key state[0] = state[0] ^ expkey[round * 4]; state[1] = state[1] ^ expkey[round * 4 + 1]; state[2] = state[2] ^ expkey[round * 4 + 2]; state[3] = state[3] ^ expkey[round * 4 + 3]; } // Last Subbytes for (int c = 0; c < 4;c++){ temp[c] = ((sbox[state[c] >> 24 & 0xFF]) << 24 ) + ((sbox[state[c] >> 16 & 0xFF]) << 16 ) + ((sbox[state[c] >> 8 & 0xFF]) << 8 ) + (sbox[state[c] & 0xFF]); } */ // Last Shiftrow state[0] = (((temp[0] >> 24) & 0xFF) << 24) + (((temp[1] >> 16) & 0xFF) << 16) + (((temp[2] >> 8) & 0xFF) << 8) + (temp[3] & 0xFF); state[1] = (((temp[1] >> 24) & 0xFF) << 24) + (((temp[2] >> 16) & 0xFF) << 16) + (((temp[3] >> 8) & 0xFF) << 8) + (temp[0] & 0xFF); state[2] = (((temp[2] >> 24) & 0xFF) << 24) + (((temp[3] >> 16) & 0xFF) << 16) + (((temp[0] >> 8) & 0xFF) << 8) + (temp[1] & 0xFF); state[3] = (((temp[3] >> 24) & 0xFF) << 24) + (((temp[0] >> 16) & 0xFF) << 16) + (((temp[1] >> 8) & 0xFF) << 8) + (temp[2] & 0xFF); // Last Add Key state[0] = state[0] ^ expkey[Nr * 4]; state[1] = state[1] ^ expkey[Nr * 4 + 1]; state[2] = state[2] ^ expkey[Nr * 4 + 2]; state[3] = state[3] ^ expkey[Nr * 4 + 3]; return state; } And the xtime function:
uint8_t xtime(uint8_t x){ return (x << 1) ^ (0x11b & -(x >> 7)); } I am looking forward to all tips tricks and improvements.