2

As part of a high performance computing course I'm trying to speed up a naive string search in C as much as possible, and I'd like to know if there's anything obvious that could be done that I've missed in my current iteration or if there's a better direction to go in.

Some restrictions are that the pattern must be searched left to right, each character must be checked separately, and it needs to be based on a for loop.

So far I've reduced the time taken for a pattern of 999 As followed by a B in a text of 9,999,999 As followed by a B from ~18 seconds to ~9 seconds, but I'm not sure if it could be faster given the restrictions above.

The current code is:

int match(char** text, int n, char** pattern, int m){ int i, j, last = n-m; for(i = 0; i <= last; i++){ for(j = 0; j < m; j++){ if((*text)[i+j] != (*pattern)[j]){ break; } } if(j == m){ return i; } } return -1; } 

Where text is the text to be searched, pattern is the pattern to be searched, n is the length of the text, and m is the length of the pattern.

Edit: This is an implementation of the naive string search algorithm. https://en.wikipedia.org/wiki/String_searching_algorithm#Na.C3.AFve_string_search

Edit 2: Code after @RadLexus idea to change from char** to char* (25% faster):

int match(char* text, int n, char* pattern, int m){ int i, j, last = n-m; for(i = 0; i <= last; i++){ for(j = 0; j < m; j++){ if(text[i+j] != pattern[j]){ break; } } if(j == m){ return i; } } return -1; } 
15
  • I'm not sure that I understood your task, but you can use if((*text)[i+j] != (*pattern)[j]){ i = i+j; break; } to avoid unnecessary iterations Commented May 8, 2016 at 0:24
  • 1
    @RadLexus Probably to make sure that we stick to a naive string search instead of using Boyer-Moore or KMP. Commented May 8, 2016 at 0:29
  • 1
    Hm. Was the prototype given to you? I wonder if the double indirection could slow things down, esp. inside the tight loops. For simple strings, a single char * for both is enough. Commented May 8, 2016 at 0:31
  • 1
    Declare last as const. const int last = n-m; Could be a hint to the compiler to keep last in a register on each iteration through the loop. Commented May 8, 2016 at 2:42
  • 1
    is the question about the patern AAAA...B or about arbitrary patterns ? Commented May 8, 2016 at 15:37

2 Answers 2

4

I got a 10x speedup simply by doing this:

for (int i = 0; i <= last; i++) { if (memcmp(&text[i], pattern, m) == 0) { return i; } } 

Now you may complain that this no longer uses "for" loops and therefore isn't a solution. But memcmp uses loops and you can lift its implementation into your code if you like. As for why it's so much faster, we have a good answer here: Why is memcmp so much faster than a for loop check?

Sign up to request clarification or add additional context in comments.

3 Comments

This most likely violates the requirement "each character must be checked separately" (there is a significant gain in not doing that). It probably violates the intention of the question as well. Why not use strstr?
@RadLexus: I addressed that directly in my answer. Yes, memcmp() may be outlawed. But you can simply copy its implementation tricks into your code if desired. There's no reason for us to all reinvent this wheel one opcode at a time.
@RadLexus: it is probable that memcmp is vectorized, hence ruled out (even rewritten).
-1

To use only loops, removing the array indexing in favour of pointers will probably provide another speed-up:

int match(char* text, int n, char* pattern, int m){ char *ptext= text, *pend= text+n, *ptext2; char *ppat, *ppatend; if (n<m) return -1; for (; ptext<pend; ptext++) { ptext2= ptext; ppat= pattern; ppatend= ppat+m; for (; ppat<ppatend; ppat++, ptext2++) { if (*ptext2 != *ppat) { break; } } if (ppat==ppatend) { return (ptext-text); } } return -1; } 

4 Comments

A good optimizer will choose the fastest implementation, whether accesses are done by indexing or pointer dereference. I bet there will be no measurable difference.
@Yves-daoust, 1) you have to inspect the assembler to learn whether you have a "good" optimizer and I doubt it will turn array-indexing into pointers; 2) his execercise is to optimize his search so he should learn/know about replacing indexing with pointers; he shoud not learn to rely on an optimizer that may not always be available or be "good".
I agree with @YvesDaoust: unless you actually know what asm you want, this is just a bad idea. Your version has an inner loop that's 5 uops instead of 6, on Intel Haswell, with gcc 5.3, since indexed addressing modes can't micro-fuse on SnB-family microarchitectures. What would be more interesting is to count up towards zero with a negative array index, to potentially remove the cmp instruction from the loop overhead. (So inc/jcc could micro-fuse.) It might be possible to get the inner loop down to 4 fused-domain uops, speeding it up to one per clock.
A more standard way to do that is to unroll a bit. Even gcc -funroll-loops probably helps a bit, but source-level unrolling might help gcc do a better job.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.