9

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be.
Consider the following scenario:

namespace test { class Program { static void Main(string[] args) { string str = "SUBTOTAL 34.37 TAX TOTAL 37.43"; int indx = str.IndexOf("TOTAL"); string amount = str.Substring(indx + "TOTAL".Length, 10); string strAmount = Regex.Replace(amount, "[^.0-9]", ""); Console.WriteLine(strAmount); Console.WriteLine("Press any key to continue..."); Console.ReadKey(); } } } 

The output of the above code is:

// 34.37 // Press any key to continue... 

The problem is, I don't want SUBTOTAL, but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.

Any advice would be appreciated.

1
  • str.IndexOf(" TOTAL "); But it's ugly. Commented Jun 26, 2014 at 18:05

6 Answers 6

12

You can use Regex

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43"; var indx = Regex.Match(str, @"\WTOTAL\W").Index; // will be 18 
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks! That's much cleaner! Who knew there was a ".Index" hanging off of RegEx.Match? :) :) :)
A bit ago, there was a post on this answer using a RegEx pattern that returned the number following the exact match for "TOTAL". Did anyone else see it? Anyone care to weigh in on such a pattern?
@DJ Are you looking for something like var val = Regex.Match(str, @"\WTOTAL\W\s*([0-9\.]+)").Groups[1].Value;
WOW! I have got to learn more about RegEx. It seems very powerful, if not very intuitive. Thanks LB!
6

My method is faster than the accepted answer because it does not use Regex.

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43"; var indx = str.IndexOfWholeWord("TOTAL"); public static int IndexOfWholeWord(this string str, string word) { for (int j = 0; j < str.Length && (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++) if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length))) return j; return -1; } 

1 Comment

This is also more flexible as it returns -1 if TOTAL is NOT in the line. The Regex above returns 0.
3

You can use word boundaries, \b, and the Match.Index property:

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43"; var idx = Regex.Match(text, @"\bTOTAL\b").Index; // => 19 

See the C# demo.

The \bTOTAL\b matches TOTAL when it is not enclosed with any other letters, digits or underscores.

If you need to count a word as a whole word if it is enclosed with underscores, use

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index; 

where (?<![^\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.

If the boundaries are whitespaces or start/end of string use

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index; 

where (?<!\S) requires start of string or a whitespace immediately on the left, and (?!\S) requires the end of string or a whitespace on the right.

NOTE: \b, (?<!...) and (?!...) are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

Comments

2

To make the accepted answer a little bit safer (since IndexOf returns -1 for unmatched):

string pattern = String.Format(@"\b{0}\b", findTxt); Match mtc = Regex.Match(queryTxt, pattern); if (mtc.Success) { return mtc.Index; } else return -1; 

Comments

0

While this may be a hack that just works for only your example, try

string amount = str.Substring(indx + " TOTAL".Length, 10); 

giving an extra space before total. As this will not occur with SUBTOTAL, it should skip over the word you don't want and just look for an isolated TOTAL.

1 Comment

LOL!!! Why didn't I see that! It is a bit "hacky" but for my example only, it should work. I would really like to see if there is a way to force the whole word match in a more clean approach, but will mark this as the answer if I don't see a more refined answer in a day or so. THANKS MUCH!!! :)
0

I'd recommend the Regex solution from L.B. too, but if you can't use Regex, then you could use String.LastIndexOf("TOTAL"). Assuming the TOTAL always comes after SUBTOTAL?

http://msdn.microsoft.com/en-us/library/system.string.lastindexof(v=vs.110).aspx

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.