So I have a String of text lets say "hello world () (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3..." Is there a regex pattern I could use that will get the parentheses and include any parentheses inside them to nth depth. So the matches would be "()", "(foo bar)", "(foo bar 2 (this looks cozy))", ...?
- In Java regex engine doesn't support recursion so regex may not be best tool here.Pshemo– Pshemo2020-08-09 22:19:43 +00:00Commented Aug 9, 2020 at 22:19
- @Pshemo well is there way to do something similar?NotAFlyingGoose– NotAFlyingGoose2020-08-09 22:25:51 +00:00Commented Aug 9, 2020 at 22:25
- I'd use a loop over the string, not regexEcto– Ecto2020-08-09 22:29:26 +00:00Commented Aug 9, 2020 at 22:29
- 2Thats a classical example for a language that is not regular and thus cannot be parsed by regular expressions (see pumping-lemma). Its context-free.Polygnome– Polygnome2020-08-09 22:59:19 +00:00Commented Aug 9, 2020 at 22:59
- Summary of comments: you need a DPDA, i.e. a parser with an explicit or implicit stack, not a regular expression.user207421– user2074212020-08-09 23:26:09 +00:00Commented Aug 9, 2020 at 23:26
Add a comment |
1 Answer
Regex flavor in Java doesn't support recursion like some other flavors do. Instead you can write your own method which will build strings from characters only if they are:
()- inside parenthesis.
To know if currently handled character is inside parenthesis we can create counter which will check parenthesis balance (you can also think of it as counter for nesting level). In short: if we saw more ( than ) then we are inside unclosed (open) parenthesis section, so we should add current character to resulting string.
Using that idea our code can look like:
String str = "hello world () (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3...)"; List<String> result = new ArrayList<>(); int parenthesisNestingLevel = 0; StringBuilder sb = new StringBuilder(); for (char ch : str.toCharArray()) { if (ch == '(') { parenthesisNestingLevel++; sb.append(ch); } else if (ch == ')') { parenthesisNestingLevel--; sb.append(ch); if (parenthesisNestingLevel == 0) { result.add(sb.toString()); sb.delete(0, sb.length());//reset sb } } else if (parenthesisNestingLevel > 0) {//we are inside unclosed parenthesis sb.append(ch); } } result.forEach(System.out::println); Output:
() (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3...) 4 Comments
user207421
No 'yet' about it. Regular expressions can't support recursion by definition. Chomsky hierarchy 1956.
Pshemo
@MarquisofLorne Thank you for information. Will need to study it since I am not rally familiar with regex history an theories behind it. But now what confuses me (and is main reason why I added "yet") is existence of
(?R) in some regex flavors. Based on example regex101.com/r/NLElp8/1 from stackoverflow.com/a/35271017 it looks like it represents recursion. So are those flavors no longer proper regex?user207421
That is correct. They are no longer proper regular expressions, by definition.
Pshemo
@MarquisofLorne Thank you for clarification.