1

I am trying to parse few simple lines with Java regex:

[txt1] [txt2] [txt3] /some/long/path?params=1,2,3 [txt1] [txt2] [txt3] /path/ [txt1] [txt2] [txt3] / 

My regex string is ^\[(.*?)\] \[(.*?)\] \[(.*?)\] (/.*)(\?.*).

I am struggling with capturing the last group - with my regex, only the first line matches the pattern but not other two lines. If I change my regex to ^\[(.*?)\] \[(.*?)\] \[(.*?)\] (/.*)(\?.*)? then all 3 lines match, but the first line doesn't capture successfully (I get only 1 group /some/long/path?params=1,2,3 instead of 2 /some/long/path and ?params=1,2,3).

How to write this regex so that all lines have 5 matching groups?

1
  • what the output of each input? Commented May 11, 2017 at 12:23

2 Answers 2

3

It is better to use negated character class in your regex for correctness and better performance:

^\[([^]]*)\] \[([^]]*)\] \[([^]]*)\] (/[^?]*)(\?.*)?$ 

RegEx Demo

Using negated character class, you don't need to use any lazy quantifier because [^?]* will match 0 or more of any character that is not ?

Code Demo

Sign up to request clarification or add additional context in comments.

Comments

1

Make you last but one .* lazy, make the last capturing group optional, and append the $, end of string anchor:

^\[(.*?)] \[(.*?)] \[(.*?)] (/.*?)(\?.*)?$ ^ ^^ 

See the regex demo

  • The .*? in the (/.*?) group should be lazy since we need to allow the subsequent group to be filled with as many chars as possible
  • (\?.*)? - must be optional as the text can be absent
  • $ is necessary since the preceding 2 groups are optional, and thus no text at the end of the string might get matched. This way, we require the regex engine to grab the rest of the line.

See a Java demo:

Pattern pattern = Pattern.compile("^\\[(.*?)] \\[(.*?)] \\[(.*?)] (/.*?)(\\?.*)?$"); String[] ss = { "[txt1] [txt2] [txt3] /some/long/path?params=1,2,3", "[txt1] [txt2] [txt3] /path/", "[txt1] [txt2] [txt3] /"}; for (String s: ss) { Matcher matcher = pattern.matcher(s); while (matcher.find()){ System.out.println("Next match for \"" + s + "\"" ); System.out.println(matcher.group(1)); System.out.println(matcher.group(2)); System.out.println(matcher.group(3)); System.out.println(matcher.group(4)); System.out.println(matcher.group(5)); } } 

Output:

Next match for "[txt1] [txt2] [txt3] /some/long/path?params=1,2,3" txt1 txt2 txt3 /some/long/path ?params=1,2,3 Next match for "[txt1] [txt2] [txt3] /path/" txt1 txt2 txt3 /path/ null Next match for "[txt1] [txt2] [txt3] /" txt1 txt2 txt3 / null 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.