0

So I have a String of text lets say "hello world () (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3..." Is there a regex pattern I could use that will get the parentheses and include any parentheses inside them to nth depth. So the matches would be "()", "(foo bar)", "(foo bar 2 (this looks cozy))", ...?

5
  • In Java regex engine doesn't support recursion so regex may not be best tool here. Commented Aug 9, 2020 at 22:19
  • @Pshemo well is there way to do something similar? Commented Aug 9, 2020 at 22:25
  • I'd use a loop over the string, not regex Commented Aug 9, 2020 at 22:29
  • 2
    Thats a classical example for a language that is not regular and thus cannot be parsed by regular expressions (see pumping-lemma). Its context-free. Commented Aug 9, 2020 at 22:59
  • Summary of comments: you need a DPDA, i.e. a parser with an explicit or implicit stack, not a regular expression. Commented Aug 9, 2020 at 23:26

1 Answer 1

2

Regex flavor in Java doesn't support recursion like some other flavors do. Instead you can write your own method which will build strings from characters only if they are:

  • (
  • )
  • inside parenthesis.

To know if currently handled character is inside parenthesis we can create counter which will check parenthesis balance (you can also think of it as counter for nesting level). In short: if we saw more ( than ) then we are inside unclosed (open) parenthesis section, so we should add current character to resulting string.

Using that idea our code can look like:

String str = "hello world () (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3...)"; List<String> result = new ArrayList<>(); int parenthesisNestingLevel = 0; StringBuilder sb = new StringBuilder(); for (char ch : str.toCharArray()) { if (ch == '(') { parenthesisNestingLevel++; sb.append(ch); } else if (ch == ')') { parenthesisNestingLevel--; sb.append(ch); if (parenthesisNestingLevel == 0) { result.add(sb.toString()); sb.delete(0, sb.length());//reset sb } } else if (parenthesisNestingLevel > 0) {//we are inside unclosed parenthesis sb.append(ch); } } result.forEach(System.out::println); 

Output:

() (foo bar) (foo bar 2 (this looks cozy)) (foo bar 3...) 
Sign up to request clarification or add additional context in comments.

4 Comments

No 'yet' about it. Regular expressions can't support recursion by definition. Chomsky hierarchy 1956.
@MarquisofLorne Thank you for information. Will need to study it since I am not rally familiar with regex history an theories behind it. But now what confuses me (and is main reason why I added "yet") is existence of (?R) in some regex flavors. Based on example regex101.com/r/NLElp8/1 from stackoverflow.com/a/35271017 it looks like it represents recursion. So are those flavors no longer proper regex?
That is correct. They are no longer proper regular expressions, by definition.
@MarquisofLorne Thank you for clarification.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.