1

Pseudo/dummy-code that will be matched against:

RECOVERY: 'XXXXXXXXX' is UP PROBLEM: 'ABABABAB' on 'XXXXXXXXX' is WARNING PROBLEM: 'XXXXXXXXX' is DOWN RECOVERY: 'ABABABAB' on 'XXXXXXXXX' is OK PROBLEM: 'ABABABAB' on 'XXXXXXXXX' is DOWN 

Goal

Capture XXXXXXXXX(without the single-quotes) but do NOT capture ABABABAB

Best attempt so far:

(M: \'|Y: \')(.*)(?:\' )(?:is) 

Is it even possible to achive the goal above, and if yes, then how?

2 Answers 2

2

You can use a lookahead only to check if the string matched is before is:

'([^']*)'\\s*(?=\\bis\\b) 

See regex demo

Breakdown:

  • ' - single apostrophe
  • ([^']*) - capture group matching 0 or more characters other than '
  • '\\s* - a single apostrophe and 0 or more whitespace symbols
  • (?=\\bis\\b) - a lookahead making sure there is a whole word is after the current position (after the ' with optional whitespaces)

Java demo:

Pattern ptrn = Pattern.compile("'([^']*)'\\s*(?=\\bis\\b)"); Matcher matcher = ptrn.matcher("RECOVERY: 'XXXXXXXXX' is UP"); if (matcher.find()) { System.out.println(matcher.group(1)); } 

UPDATE

I used a lookahead only because you used a non-capturing group in your original regex : (?:is). A non-capturing group that has no quantifier set or any alternation inside seems reduntant and can be omitted. However, people often get misled by the name non-capturing thinking they can exclude the substring matched by this group from the overall match. To check for presence or absence of some text without matching, a lookaround should be used. Thus, I used a lookahead.

Indeed, in the current scenario, there is no need in a lookahead since it makes sense in case you need to match subsequent substrings that start wiyh the same sequence of characters.

So, a better alternative would be

'([^']*)'\s*is\b 

Java:

Pattern ptrn = Pattern.compile("'([^']*)'\\s*is\\b"); 
Sign up to request clarification or add additional context in comments.

5 Comments

Why bother using a lookahead? You're using a capturing group to extract the part you want, so what does it matter if the is is consumed?
@AlanMoore What is your solution? Im note sure to understand.. Isnt the lookahead the better solution hère? Since there could also be a ABABAB instead of XXXXXX
@YassinHajaj: I'm not saying you shouldn't check for is after the quoted word, just that you don't need to use a lookahead. Copy NEO-xx's regex into the code above and you'll get exactly the same result. The lookahead and the word boundaries are both unnecessary complications.
@AlanMoore Ok I get it, Im new to RegEx sorry. So hes capturing a group for nothing if I get it right when he does not need to capture it? Speaking about the "is" group of course
@YassinHajaj: I added my explanation why I chose that lookahead. Also, I posted my answer at midnight and went straight to bed, I shouldn't have done that, perhaps (both :)).
2

Following regex should work

\'([^']+)\'\s+is 

all the matches will be stored in matcher groups array

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.