C - Why is a fixed size array performing much slower than a dynamically allocated array?

Question

I have a csv with over 4M lines that I'm loading into an array.

csv: EURUSD,20010102,230100,0.9507,0.9507,0.9507,0.9507,4

This operation takes about 3.5 minutes.

... typedef struct Rates_t { char open[7]; char high[7]; char low[7]; char close[7]; } Rates_t; void Substr(char *src, char **dst, int start, int length) { char *ptr1 = *dst; char *ptr2 = src+start; int i; for (i = 0; i < length; i++) { *(ptr1 + i) = *(ptr2 + i); } (*dst)[length] = '\0'; } void FillRates(char *tmp, char *price) { Substr(tmp, &price, 0, 6); } bool BacktestServer() { ... Rates_t r = { {0}, {0}, {0}, {0} }; Rates_t *rates = &r; rates = (Rates_t *) malloc(sizeof(Rates_t)); FILE *f; if (!(f = fopen("EURUSD.txt", "r"))) { fprintf(stderr, "Unable to open 'EURUSD.txt' for reading.\n"); exit(1); } ... while (fgets(line, 72, f)) { tmp = line; for (skip = 0; skip < 3; skip++) { tmp = strchr(tmp, ','); tmp++; } sz += sizeof(Rates_t); rates = (Rates_t *) realloc(rates, sz); FillRates(tmp, rates[i].open); tmp = strchr(tmp, ','); tmp++; FillRates(tmp, rates[i].high); tmp = strchr(tmp, ','); tmp++; FillRates(tmp, rates[i].low); tmp = strchr(tmp, ','); tmp++; FillRates(tmp, rates[i].close); i++; free(line); line = NULL; line = (char *) malloc(72 * sizeof(char)); } ... }

This takes about 1 minute.

... typedef struct Rates_t { char *open; char *high; char *low; char *close; } Rates_t; void Substr(char *src, char **dst, int start, int length) { char *ptr1 = *dst; char *ptr2 = src+start; int i; for (i = 0; i < length; i++) { *(ptr1 + i) = *(ptr2 + i); } (*dst)[length] = '\0'; } void FillRates(char *tmp, char *price) { Substr(tmp, &price, 0, 6); } bool BacktestServer() { ... Rates_t r = { NULL, NULL, NULL, NULL }; Rates_t *rates = &r; rates = (Rates_t *) malloc(sizeof(Rates_t)); FILE *f; if (!(f = fopen("EURUSD.txt", "r"))) { fprintf(stderr, "Unable to open 'EURUSD.txt' for reading.\n"); exit(1); } ... while (fgets(line, 72, f)) { tmp = line; for (skip = 0; skip < 3; skip++) { tmp = strchr(tmp, ','); tmp++; } sz += sizeof(Rates_t); rates = (Rates_t *) realloc(rates, sz); rates[i].open = (char *) malloc(7 * sizeof(char)); FillRates(tmp, rates[i].open); tmp = strchr(tmp, ','); tmp++; rates[i].high = (char *) malloc(7 * sizeof(char)); FillRates(tmp, rates[i].high); tmp = strchr(tmp, ','); tmp++; rates[i].low = (char *) malloc(7 * sizeof(char)); FillRates(tmp, rates[i].low); tmp = strchr(tmp, ','); tmp++; rates[i].close = (char *) malloc(7 * sizeof(char)); FillRates(tmp, rates[i].close); i++; free(line); line = NULL; line = (char *) malloc(72 * sizeof(char)); } ... }

Using either memcpy or snprintf, the program will be a few seconds longer.

void Substr(char *src, char **dst, int start, int length) { memcpy(*dst, src+start, length); (*dst)[length] = '\0'; } void Substr(char *src, char **dst, int start, int length) { snprintf(*dst, length + 1, "%s", src+start); (*dst)[length] = '\0'; }

From the consensus online, the static array should be faster than the dynamic array. If anyone needs more information I'll edit the post to that effect.

UPDATE:

I increased the allocation to not 2 as suggested but 4096 and I'm still getting the same results for the dynamic array version, about a minute or less. The static array version has decreased to about 2.75 minutes.

The initial allocation:

int sz = 256 * sizeof(Rates_t); rates = (Rates_t *) malloc(sz);

The reallocation:

if (realloc_count == 256) { sz += 256 * sizeof(Rates_t); rates = (Rates_t *) realloc(rates, sz); realloc_count = 0; } realloc_count++;

I am on a 64-bit Windows machine but I compile 32-bit programs via cygwin gcc. On the other hand, on 64-bit Linux in a VM, the speeds are obviously significantly less, but the speeds are reversed. The dynamically allocated version takes longer than the static version. On Linux, dynamic memory = ~20-30 seconds, static = ~15 seconds. On Linux @1, 2, 256, 4096 or 524,288 there was little to no change in speed. When I increased the allocation to 524,288 on cygwin, I get ~6 seconds for static allocation and ~8 seconds for dynamic allocation.

It may simply be that your simple loops may be faster than calling yet another function and have the loops in those. Especially since your arrays are very small. — Some programmer dude
– Some programmer dude, Commented May 3, 2015 at 0:46
When you say, "yet another function," are you referring to memcpy and snprintf? If so, memcpy and snprintf are not the cause of the slowdown. I just added them so that readers can see that I've already tried those functions. — user2270773
– user2270773, Commented May 3, 2015 at 0:56
Because you are reallocing the array every loop iteration. If you either made one large allocation or doubled your array size on each realloc, your program would run much faster. — JS1
– JS1, Commented May 3, 2015 at 0:58
Do you create r just so you can initialize rates to &r? That's unnecessary; you could skip that and do Rates_t *rates = malloc(sizeof(Rates_t));. — user2357112
– user2357112, Commented May 3, 2015 at 0:58
In the first example, it appears you're calling malloc() outside the scope of any function. Have you left out part of the code that shows what function that malloc() is being called from? — rcgldr
– rcgldr, Commented May 3, 2015 at 1:05

Michael Burr · Accepted Answer · 2015-05-03 01:18:15Z

1

I'm surprised it would make this much of a difference, but since you realloc() the rates array for each line of data read, it's likely that the cost of copying that array so often is the culprit. If the target is a 32-bit machine, the Rates_t structure that contains the full arrays is probably about twice the size of a Rates_t structure that contains only pointers.

As JS1 mentioned in a comment, sizing the array appropriately up front (or reallocating in larger chunks if you don't know how big it needs to be) should make the run time difference disappear.

answered May 3, 2015 at 1:18

Michael Burr

342k52 gold badges553 silver badges774 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user2270773 Over a year ago

This makes a lot of sense. I'll look into this.

user2270773 Over a year ago

While perusing online, it didn't make much sense that passing an array that decays to a pointer was taking longer to process than a pointer. To elaborate on my previous comment, "it makes a lot of sense" that structures are different sizes and will give different results when allocating the structure itself. I'm accepting your answer as yours was first and lead me to look at the sizes of the struct more closely. The differences were clearer, 28 vs 16 bytes as mentioned by rcgldr. Though it didn't make much difference in the Linux VM, it made a huge difference in cygwin.

Michael Burr Over a year ago

@user2270773: I think that the key take away should be the less-than-optimal reallocation strategy, not so much the size difference of the struct. The continual reallocation problem is magnified by the size difference. If you give some small optimization to the reallocation strategy, the time the program takes should drop to be reasonably close to the time it takes to read the file for either structure type.

rcgldr · Accepted Answer · 2015-05-03 02:13:23Z

One difference is the amount of data copied during a realloc. In the first case, the code realloc's an array of structs of size 28, not on a ram / cache friendly boundary. In the second case, the code realloc's an array of structs of size 16 which is on a ram / cache friendly boundary, but then it does 4 mallocs (these don't get realloc'ed).

I'm wondering if you changed the character array sizes in the first case from 7 to 8 would help.

I'm also wondering if this was done in 64 bit mode (64 bit pointers), if the difference would be similar.

Collectives™ on Stack Overflow

C - Why is a fixed size array performing much slower than a dynamically allocated array?

2 Answers 2

3 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Related