I am trying to figure out an equivalent to C# string.IndexOf(string) that can handle surrogate pairs in Unicode characters.
I am able to get the index when only comparing single characters, like in the code below:
public static int UnicodeIndexOf(this string input, string find) { return input.ToTextElements().ToList().IndexOf(find); } public static IEnumerable<string> ToTextElements(this string input) { var e = StringInfo.GetTextElementEnumerator(input); while (e.MoveNext()) { yield return e.GetTextElement(); } } But if I try to actually use a string as the find variable then it won't work because each text element only contains a single character to compare against.
Are there any suggestions as to how to go about writing this?
Thanks for any and all help.
EDIT:
Below is an example of why this is necessary:
CODE
Console.WriteLine("HolyCow𪘁BUBBYY𪘁YY𪘁Y".IndexOf("BUBB")); Console.WriteLine("HolyCow@BUBBYY@YY@Y".IndexOf("BUBB")); OUTPUT
9 8 Notice where I replace the 𪘁 character with @ the values change.
IndexOfon an array of strings that are in effectTextElements, but from your sample data it looks like you actually want to find an index of a substring with length > 1 grapheme. Can you specify which solution you are seeking? (Just run your code on your test data - it won't work - indexOf will return -1)