5

I have following expression:

String formula7 = "(^TBC10.Actual.Value_<<Year>> == \"Final\" ? ^FN10101.Actual.Value_<<Year>> : ^INT805.Consensus.Value_<<Year>>) - (^PH2003 + ^PH2005 + ^PH2011 + ^PH2010 + ^PH2837 + ^PH2838 + ^PH2839 + ^PH2006 + ^PH2089)"; 

Now I want to extract groups from this string like the following:

// First bracket 1.( ^ TBC10.Actual.Value_ << Year >> == "Final" ? ^ FN10101.Actual.Value_ << Year >> : ^ INT805.Consensus.Value_ << Year >> ) // Condition 2. TBC10.Actual.Value_ << Year >> == "Final" // Condition left side 3. TBC10.Actual.Value_ << Year >> // Condition right side 4. Final // Trure condition value 5. FN10101.Actual.Value_ << Year >> // False condition value 6. INT805.Consensus.Value_ << Year >> // Other part 7. - ( ^ PH2003 + ^ PH2005 + ^ PH2011 + ^ PH2010 + ^ PH2837 + ^ PH2838 + ^ PH2839 + ^ PH2006 + ^ PH2089) 

To achieve this, I made the following regular expression

// Start with any characters. Then space. But it is optional. Then == sign. Then space. But it is optional. // Then contain any characters. Then space. But it is optional. Then contains ?. Then space. But it is // optional. Then contain any characters. Then space. But it is optional. Then contains : Then space. But it // Is optional. Then contain any characters. Then any characters. But it is optional public static final String CONDITION_REGEX = "((((.*)\\s?==\\s?(.*))\\s?\\?\\s?(.*)\\s?:\\s?(.*))(.*)?)"; public static final Pattern CONDITION_PATTERN = Pattern.compile(CONDITION_REGEX); 

But when I run it, I am getting the following output

// replace caret(^), double comma(") and any white space from the string String formula = formula7.replaceAll("[\\^\"\\s]", ""); Matcher matcher = CONDITION_PATTERN.matcher(formula); if (matcher.matches()) { // (TBC10.Actual.Value_<<Year>>==Final?FN10101.Actual.Value_<<Year>>:INT805.Consensus.Value_<<Year>>)-(PH2003+PH2005+PH2011+PH2010+PH2837+PH2838+PH2839+PH2006+PH2089) String group1 = matcher.group(1); // (TBC10.Actual.Value_<<Year>>==Final?FN10101.Actual.Value_<<Year>>:INT805.Consensus.Value_<<Year>>)-(PH2003+PH2005+PH2011+PH2010+PH2837+PH2838+PH2839+PH2006+PH2089) String condition = matcher.group(2); // (TBC10.Actual.Value_<<Year>>==Final String conditionLeftSide = matcher.group(3); // (TBC10.Actual.Value_<<Year>> String conditionRightSide = matcher.group(4); // Final String trueCondition = matcher.group(5); // FN10101.Actual.Value_<<Year>> String condition6 = matcher.group(6); // INT805.Consensus.Value_<<Year>>)-(PH2003+PH2005+PH2011+PH2010+PH2837+PH2838+PH2839+PH2006+PH2089) String condition7 = matcher.group(7); // "" String condition8 = matcher.group(8); } 

What I am doing wrong and how can I correct the Regex to achieve the result? I think () represent groups in Regex. And for each group I used ().

7
  • 4
    This is not a regex task, you'd better create a dedicated parser or use existing if any. However, if you want to struggle a bit more, here is a suggestion. Commented Apr 15 at 7:00
  • @WiktorStribiżew please provide the answer. I will mark as accepted. Thanks Commented Apr 15 at 9:15
  • You know, this expression is too brittle, and there will surely be strings that won't match it. I just used my best guess for the provided example, and I only provided it for you to see how impractical it is to follow the regex approach here. Commented Apr 15 at 9:52
  • @WiktorStribiżew Basically, it's not just this string. Infact, I don't know what the string would be. Here I just provide an example so you can understand. There are already rejex patterns. If string matches with this pattern, then I need to take some action. You are right. It is brittle. But for now, I need it. It is brittle, but I learned from it :) Thanks Commented Apr 15 at 11:27
  • Your example parts that you would like to match can't really be quantified. Can you add distinctive framework that generalizes what it is you expect to match. Commented Apr 16 at 21:11

2 Answers 2

0

I am posting my answer since it was my suggesting that turned out the solution.

(\(\^(([^()]*)\s(?:[><]=?|==)\s*\"([^\"]*)\")\s*\?\s*\^(\S+)\s+:\s+\^([^()]*)\))(\s+-\s+\([^()]*\)) 

See the regex demo.

Details:

  • (\(\^(([^()]*)\s(?:[><]=?|==)\s*\"([^\"]*)\")\s*\?\s*\^(\S+)\s+:\s+\^([^()]*)\)) - Group 1 matching
  • \(\^ - a (^ string
  • (([^()]*)\s(?:[><]=?|==)\s*\"([^\"]*)\") - Group 2 matching
    • ([^()]*) - Group 3: any zero or more chars other than ( and )
    • \s - a whitespace
    • (?:[><]=?|==) - < or > followed with an optional = or a == string
    • \s* - zero or more whitespaces
    • \" - a " char
    • ([^\"]*) - Group 4: zero or more chars other than "
    • \" - a " char
  • \s*\?\s* - an optional ? char enclosed with zero or more whitespaces
  • \^ - a literal ^ char
  • (\S+) - Group 5: one or more non-whitespace chars
  • \s+:\s+ - a : char enclosed with one or more whitespaces
  • \^ - a literal ^ char
  • ([^()]*) - Group 6: any zero or more chars other than ( and )
  • \) - a literal ) char
  • (\s+-\s+\([^()]*\)) - Group 7:
    • \s+-\s+ - a - char enclosed with one or more whitespaces
    • \( - a ( char
    • [^()]* - any zero or more chars other than ( and )
    • \) - a ) char
Sign up to request clarification or add additional context in comments.

Comments

0

APPROACH:

  • I matched the contents in the two parenthesis and capture the desired strings into named capture groups, (?<name>...) to help bring clarity.
  • I used the negated character classes, [^...]+, between the anchors instead of .* to be more specific and to have more control over the matching between the anchors.
  • I used the following strings/characters as anchors:
    • Beginning of string ^.
    • (^ Begin matching the first bracket from.
    • == Separates conditions on left and right sides.
    • ? Separates conditions from values
    • : Separates True and False value
    • ) End first bracket.
    • - ( Begin Other Parts, second bracket.
    • ) End match.

REGEX PATTERN (Java 8 Flavor):

(?<firstBracket>^\(\^(?<condition>(?<conditionLeftSide>[^=\n]+[^= ])[ ]*?==[ ]*(?<conditionRightSide>[^? ][^?]+[^? ]))[ ]*[?][ ]*(?<TrueConditionValue>[^:]+[^: ])[ ]*:[ ]*(?<falseConditionValue>[^)]+)\))[ ]*(?<otherPart>-[ ]*\([^)]+\)) 

Regex Demo: https://regex101.com/r/yFC663/3

TEST STRING:

(^TBC10.Actual.Value_<<Year>> == \"Final\" ? ^FN10101.Actual.Value_<<Year>> : ^INT805.Consensus.Value_<<Year>>) - (^PH2003 + ^PH2005 + ^PH2011 + ^PH2010 + ^PH2837 + ^PH2838 + ^PH2839 + ^PH2006 + ^PH2089) 

MATCH AND GROUPS:

MATCH 0-203 (^TBC10.Actual.Value_<<Year>> == \"Final\" ? ^FN10101.Actual.Value_<<Year>> : ^INT805.Consensus.Value_<<Year>>) - (^PH2003 + ^PH2005 + ^PH2011 + ^PH2010 + ^PH2837 + ^PH2838 + ^PH2839 + ^PH2006 + ^PH2089) firstBracket 0-111 (^TBC10.Actual.Value_<<Year>> == \"Final\" ? ^FN10101.Actual.Value_<<Year>> : ^INT805.Consensus.Value_<<Year>>) condition 2-42 TBC10.Actual.Value_<<Year>> == \"Final\" conditionLeftSide 2-29 TBC10.Actual.Value_<<Year>> conditionRightSide 33-42 \"Final\" trueConditionValue 45-75 ^FN10101.Actual.Value_<<Year>> falseConditionValue 78-110 ^INT805.Consensus.Value_<<Year>> otherPart 112-203 - (^PH2003 + ^PH2005 + ^PH2011 + ^PH2010 + ^PH2837 + ^PH2838 + ^PH2839 + ^PH2006 + ^PH2089) 

REGEX PATTERN NOTES:

  • (?<firstBracket> Begin named capturing group ```(?...)````
    • ````^``` Match beginning of string.
    • \( Match literal (. *\^ Match literal ^.
    • (?<condition> Begin named capture group.
      • (?<conditionLeftSide> Begin named capture group.
        • [^=\n]+ Negated character class [^...]. Match any character that is not literal = or newline character \n 1 or more times (+).
        • [^= ] Negated character class [^...]. Match any character that is not literal = or literal space character 1 or more times (+).
      • )
      • [ ]* Match literal space character 0 or more times (*).
      • == Match literal ==.
      • [ ]* Match literal space character 0 or more times (*).
      • (?<conditionRightSide> Begin named capture group.
        • [^? ] Negated character class [^...]. Match any character that is not literal ? or literal space character to make sure the capture does not end in a space.
        • [^?]+ Negated character class [^...]. Match any character that is not literal ? 1 or more times (+).
        • [^? ] Negated character class [^...]. Match any character that is not literal ? or literal space character to make sure the capture does not end in a space.
      • )
    • )
    • [ ]* Match literal space character 0 or more times (*).
    • [?] Match literal ?.
    • [ ]* Match literal space character 0 or more times (*).
    • (?<trueConditionValue> Begin named capture group.
      • [^:]+ Negated character class [^...]. Match any character that is not literal : 1 or more times (+).
      • [^: ] Negated character class [^...]. Match any character that is not literal : or or literal space character to make sure the capture does not end in a space.
    • )
    • [ ]* Match literal space character 0 or more times (*).
    • : Match literal :.
    • [ ]* Match literal space character 0 or more times (*).
    • (?<falseConditionValue> Begin named capture group.
      • [^)]+ Negated character class [^...]. Match any character that is not literal ) 1 or more times (+).
    • )
    • \) Match literal ).
  • )
  • [ ]* Match literal space character 0 or more times (*).
  • (?<otherPart> Begin named capture group.
    • - Match literal -.
    • [ ]* Match literal space character 0 or more times (*).
    • \( Match literal ).
    • [^)]+ Negated character class [^...]. Match anything that is not literal ) 1 or more times (+).
    • \) Match literal ).
  • )

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.