1

I am trying to figure out an equivalent to C# string.IndexOf(string) that can handle surrogate pairs in Unicode characters.

I am able to get the index when only comparing single characters, like in the code below:

 public static int UnicodeIndexOf(this string input, string find) { return input.ToTextElements().ToList().IndexOf(find); } public static IEnumerable<string> ToTextElements(this string input) { var e = StringInfo.GetTextElementEnumerator(input); while (e.MoveNext()) { yield return e.GetTextElement(); } } 

But if I try to actually use a string as the find variable then it won't work because each text element only contains a single character to compare against.

Are there any suggestions as to how to go about writing this?

Thanks for any and all help.

EDIT:

Below is an example of why this is necessary:

CODE

 Console.WriteLine("HolyCow𪘁BUBBYY𪘁YY𪘁Y".IndexOf("BUBB")); Console.WriteLine("HolyCow@BUBBYY@YY@Y".IndexOf("BUBB")); 

OUTPUT

9 8 

Notice where I replace the 𪘁 character with @ the values change.

4
  • use the same encoding for both string and you are good Commented May 4, 2018 at 20:28
  • @Steve I added some information to my question. Are those strings the same encoding or is there a difference? Commented May 4, 2018 at 20:39
  • @Ibrennan208, from your initial implementation it looks like you are trying to find a single grapheme, because you are using an IndexOf on an array of strings that are in effect TextElements, but from your sample data it looks like you actually want to find an index of a substring with length > 1 grapheme. Can you specify which solution you are seeking? (Just run your code on your test data - it won't work - indexOf will return -1) Commented May 4, 2018 at 20:48
  • 1
    @ironstone13 I want to find an index of a substring with length > 1. In the question I explained that I can get it to work if I am only comparing a string with a single character, but I want to extend it to allow for the user to input a multicharacter string to find the index of. Commented May 4, 2018 at 20:58

1 Answer 1

3

You basically want to find index of one string array in another string array. We can adapt code from this question for that:

public static class Extensions { public static int UnicodeIndexOf(this string input, string find, StringComparison comparison = StringComparison.CurrentCulture) { return IndexOf( // split input by code points input.ToTextElements().ToArray(), // split searched value by code points find.ToTextElements().ToArray(), comparison); } // code from another answer private static int IndexOf(string[] haystack, string[] needle, StringComparison comparision) { var len = needle.Length; var limit = haystack.Length - len; for (var i = 0; i <= limit; i++) { var k = 0; for (; k < len; k++) { if (!String.Equals(needle[k], haystack[i + k], comparision)) break; } if (k == len) return i; } return -1; } public static IEnumerable<string> ToTextElements(this string input) { var e = StringInfo.GetTextElementEnumerator(input); while (e.MoveNext()) { yield return e.GetTextElement(); } } } 
Sign up to request clarification or add additional context in comments.

2 Comments

Nice, I was writing the same, but you were faster!
Thank you, this is exactly what I was looking for!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.