9

I tried looking for an answer to this question but just couldn't finding anything and I hope that there's an easy solution for this. I have and using the following code in C#,

String pattern = ("(hello|hello world)"); Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); var matches = regex.Matches("hello world"); 

Question is, is there a way for the matches method to return the longest pattern first? In this case, I want to get "hello world" as my match as opposed to just "hello". This is just an example but my pattern list consist of decent amount of words in it.

1
  • If there are many words which could match, why do you propose a Regex rather than, say, a Dictionary? Commented Jun 17, 2014 at 21:08

3 Answers 3

9

If you already know the lengths of the words beforehand, then put the longest first. For example:

String pattern = ("(hello world|hello)"); 

The longest will be matched first. If you don't know the lengths beforehand, this isn't possible.

An alternative approach would be to store all the matches in an array/hash/list and pick the longest one manually, using the language's built-in functions.

Sign up to request clarification or add additional context in comments.

1 Comment

that works! ordering the pattern by the length of the words did the trick. thanks!
2

Regular expressions (will try) to match patterns from left to right. If you want to make sure you get the longest possible match first, you'll need to change the order of your patterns. The leftmost pattern is tried first. If a match is found against that pattern, the regular expression engine will attempt to match the rest of the pattern against the rest of the string; the next pattern will be tried only if no match can be found.

String pattern = ("(hello world|hello wor|hello)"); 

Comments

0

Make two different regex matches. The first will match your longer option, and if that does not work, the second will match your shorter option.

string input = "hello world"; string patternFull = "hello world"; Regex regexFull = new Regex(patternFull, RegexOptions.IgnoreCase); var matches = regexFull.Matches(input); if (matches.Count == 0) { string patternShort = "hello"; Regex regexShort = new Regex(patternShort, RegexOptions.IgnoreCase); matches = regexShort.Matches(input); } 

At the end, matches will be be the output of "full" or "short", but "full" will be checked first and will short-circuit if it is true.

You can wrap the logic in a function if you plan on calling it many times. This is something I came up with (but there are plenty of other ways you can do this).

public bool HasRegexMatchInOrder(string input, params string[] patterns) { foreach (var pattern in patterns) { Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); if (regex.IsMatch(input)) { return true; } } return false; } string input = "hello world"; bool hasAMatch = HasRegexMatchInOrder(input, "hello world", "hello", ...); 

4 Comments

I'm actually trying to avoid going that route because the pattern string actually contains a lot of words/keywords.
You can always wrap each regex call in a function, and call it multiple time. That will reduce a lot of copy-paste code.
@user3749947 If you're searching for many possible words, then a Dictionary might be more appropriate.
@ClickRick, A List might actually be better. For a dictionary, I can understand putting the pattern in the key field, but there would be no need for the value field. But again, what I wrote is just one way to do it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.