3

Say I have a large byte[] and I'm not only looking to see if, but also where, a smaller byte[] is in the larger array. For example:

byte[] large = new byte[100]; for (byte i = 0; i < 100; i++) { large[i] = i; } byte[] small = new byte[] { 23, 24, 25 }; int loc = large.IndexOf(small); // this is what I want to write 

I guess I'm asking about looking for a sequence of any type (primitive or otherwise) within a larger sequence.

I faintly remember reading about a specific approach to this in strings, but I don't remember the name of the algorithm. I could easily write some way to do this, but I know there's a good solution and it's on the tip of my tongue. If there's some .Net method that does this, I'll take that too (although I'd still appreciate the name of the searching algorithm for education's sake).

2
  • 6
    I think you are thinking of this: en.wikipedia.org/wiki/… Commented Apr 11, 2013 at 2:59
  • I think that's exactly what I was reading about before. The name doesn't ring a bell, but the process does. You should put this down as an answer. Commented Apr 11, 2013 at 3:12

2 Answers 2

4

You can do it with LINQ, like this:

var res = Enumerable.Range(0, large.Length-1) .Cast<int?>() .FirstOrDefault(n => large.Skip(n.Value).Take(small.Length).SequenceEqual(small)); if (res != null) { Console.Println("Found at {0}", res.Value); } else { Console.Println("Not found"); } 

The approach is self-explanatory except for the Cast<int?> part: you need it to decide between finding the result at the initial location of the large array, when zero is returned, and not finding the result at all, when the return is null.

Here is a demo on ideone.

The complexity of the above is O(M*N), where M and N are lengths of the large and small arrays. If the large array is very long, and contains a significant number of "almost right" sub-sequences that match long prefixes of small, you may be better off implementing an advanced algorithm for searching sequences, such as the Knuth–Morris–Pratt (KMP) algorithm. The KMP algorithm speeds up the search by making an observation that when a mismatch occurs, the small sequence contains enough information on how far ahead you can move in the large sequence based on where in the small sequence is the first mismatch. A look-up table is prepared for the small sequence, and then that table is used throughout the search to decide how to advance the search point. The complexity of KMP is O(N+M). See the Wikipedia article linked above for pseudocode of the KMP algorithm.

Sign up to request clarification or add additional context in comments.

6 Comments

I like the LINQ approach but this is brute forcing and has room for optimization.
@CoreyOgburn Of course it does. However, your run-of-the-mill IndexOf is not giving us the Knuth–Morris–Pratt performance either :)
THAT! THAT is the exact algorithm I was reading about before! KMP String Searching! LINQ works, but KMP works faster. Imagine if I was looking for a stretch in a large binary file?
@CoreyOgburn Coding a KMP from pseudocode should not be a big deal: it's not one's typical "cinch of an algorithm", but it's no rocket science either. I took a stab at it (link to my C++ implementation), and managed to do it relatively quickly.
I can't really mark your comment about KMP as the answer and the LINQ approach isn't quite what I'm looking for as it's a much slower approach. If you submit another answer about KMP, I'll mark it as the answer.
|
0

Are you thinking of Lambda expressions? That is what came to my mind when you said a more specific approach with strings.

http://www.dotnetperls.com/array-find

1 Comment

No, they're very useful but not here. These looks for individual entries in an array that meet certain criteria (at least that's how Find works) but what I'm looking for is a sequence of multiple elements in the array.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.