Most efficient way to concatenate strings in c

Question

Consider this simple program that concatenates all specified parameters and prints them in standard output. I used 2 for loops to append the strings, one to calculate the length of that string and one to concatenate the strings. Is there a way doing it with only one loop? It wouldn't be more efficient reallocating memory for each string to concatenate, would it? How would Java's StringBuilder be implemented in C? Would it loop twice as I did?

#include <stdio.h> #include <string.h> #include <stdlib.h> int main(int argc, char** argv) { size_t len = 0; // start for loop at i = 1 to skip the program name specified in argv for(int i = 1; i < argc; i++) len += strlen(argv[i]) + 1; // +1 for the space char* toAppend = (char*)malloc(len * sizeof(char) + 1); toAppend[0] = '\0'; // first string is empty and null terminated for(int i = 1; i < argc; i++) { strcat(toAppend, argv[i]); strcat(toAppend, " "); } printf(toAppend); free(toAppend); }

The problem with your code is strcat, every iteration it goes through the entire string to find the last character, use a moving pointer instead. — imreal
– imreal, Commented Sep 20, 2018 at 0:22
Side tip: a more readable way to write (char*)malloc(len * sizeof(char) + 1) is malloc(len + 1). — Ry-
– Ry- ♦, Commented Sep 20, 2018 at 0:28
@Winter: Yep, and because void* doesn’t need to be cast. — Ry-
– Ry- ♦, Commented Sep 20, 2018 at 0:31
@ElliottFrisch: You can implement something like java AbstractStringBuilder in C, but it's highly inefficient compared to the right way to do it, allocating the right total size to begin with. Doing that kind of thing largely defeats the purpose of writing C; you get the difficulty of writing C and the slowness of Java. — R.. GitHub STOP HELPING ICE
– R.. GitHub STOP HELPING ICE, Commented Sep 20, 2018 at 0:35
Depending on your usage patterns, you may want to consider sized strings, rather than NUL-terminated ones. — Acorn
– Acorn, Commented Sep 20, 2018 at 0:45

R.. GitHub STOP HELPING ICE · Accepted Answer · 2018-09-20 00:37:55Z

Your method of allocation is efficient, measuring the total length and allocating just once. But the concatenation loop repeatedly measures the length of the output buffer from the start to concatenate to it, resulting in quadratic runtime.

To fix it track your position as you go:

size_t pos = 0; for(int i = 1; i < argc; i++) { size_t len = strlen(argv[i]); memcpy(toAppend+pos, argv[i], len); pos += len; toAppend[pos] = ' '; pos++; } toAppend[pos] = 0;

This is the most efficient way to actually concatenate in memory, but the most efficient of all is not to concatenate. Instead:

for(int i = 1; i < argc; i++) printf("%s ", argv[i]);

The whole reason stdio is buffered is so you don't have to build arbitrary-length in-memory buffers to do efficient output; instead it buffers up to a fixed size automatically and flushes when the buffer is full.

Note that your usage of printf is wrong and dangerous in the event that your input contains a % character anywhere; it should be printf("%s", toAppend);.

If you're writing to POSIX (or POSIX-ish) systems rather than just plain C, another option would be fmemopen, which would allow you to write the loop just as:

for(int i = 1; i < argc; i++) fprintf(my_memfile, "%s ", argv[i]);

The OP's method is NOT the most efficient, and clearly the OP is interested in learning about concatenation, part of the question mentions StringBuilder in Java.
Thank you for the answer and the warning. However, it goes that the point of this program was to create a minified, verifiable example. What I actually wanted to do requires using the string in memory.
Even better than printf("%s", toAppend) would be fputs(toAppend, stdout). (I think GCC will actually make this optimization for you.)
@R.. fmemopen appears to be a great idea but I still have to specify the length of the buffer, so it doesn't remove the first loop. Would it be faster in that case? It's indeed simpler at least.

chux · Accepted Answer · 2018-09-20 01:13:28Z

efficient way to concatenate strings in c

An efficient way is to calculate the string lengths - and remember them.

size_t sum = 1; // for \0 if (argc > 2) sum += argc - 2. // spaces size_t length[argc]; // This is a VLA, available C99 and optionally in C11 for(int i = 1; i < argc; i++) length[i] = strlen(argv[i]); sum += length[i]; }

Then allocate, and then check for errors.

char *dest = malloc(sum); if (dest == NULL) Handle_OutOfMemory();

Copy each string in turn

char *p = dest; for(int i = 1; i < argc; i++) // Use either memcpy() or strcpy(). // memcpy() tends to be faster for long strings than strcpy(). memcpy(p, argv[i], length[i]); p += length[i]; // advance insertion point if (i > 1) { *p++ = ' '; // space separators } } *p = '\0';

Now use dest[].

printf("<%s>\n", dest);

Free resources when done.

free(dest);

It wouldn't be more efficient reallocating memory for each string to concatenate, would it?

Usually repetitive re-allocations is best avoided, yet for small short strings it really makes scant difference. Focus on big O. My answer is O(n). Relocating in a loop tends to be O(n*n).

If performance was critical, try various approaches and profile for the intended system. The point being what is fast on one machine may differ on another. Usually it is best to first code a reasonable clear approach.

You are missing the spaces but it doesn't change much. Nice idea to keep the lengths

DYZ · Accepted Answer · 2018-09-20 00:43:13Z

0

The most efficient way is probably to not use any str functions and copy the characters "by hand":

char* toAppend = malloc(len + 1); size_t j = 0; for(size_t i = 1; i < argc; i++) { for(size_t k = 0; argv[i][k]; k++) toAppend[j++] = argv[i][k]; toAppend[j++] = ' '; } toAppend[j - 1] = '\0'; // Remove the last space and NULL-terminate the string

edited Sep 20, 2018 at 0:43

answered Sep 20, 2018 at 0:37

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

14 Comments

R.. GitHub STOP HELPING ICE Over a year ago

j and k have the wrong type. They need to be size_t.

Winter Over a year ago

There's no point having i as size_t though, as argc is an int

chux Over a year ago

i as size_t may emit some warnings with i < argc due to comparison of mis-matched integer sign-ness. e.g. "warning: comparison between signed and unsigned integer expressions [-Wsign-compare]". There is little practical down-side either way here, yet it does encourage non-use of the finicky [-Wsign-compare]. That is a love/hate compiler option. Since argc is signed, I'd go with int i here.

Serge Over a year ago

@DYZ glibc functions are usually optimized using hardware features and/or built-in compiler features for the specific operation and it is very difficult to match. Thougth it depends on the library, compiler and os version. Arrays require position calculation(at least addition with single dimension), in particular 2-dimensional arrays (multiplication and addition).

DYZ Over a year ago

@Serge Not everything that looks like a 2d array, is a 2d array. This one is not a 2d array.

|

Collectives™ on Stack Overflow

Most efficient way to concatenate strings in c

3 Answers 3

6 Comments

2 Comments

14 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

6 Comments

2 Comments

14 Comments

Linked

Related