1

I am trying to extract both XML tags and text within tags using regex. I understand using regex is not the best option. I only have very few tags in my inline text file hence did not opt for XML parsers.

 String txt="American Airlines made <TRIPS> 100 </TRIPS> flights in <DATE> December </DATE> over <ROUTE> Altantic </ROUTE> "; String re1="<([^>]+)>"; // Tag 1 String re2="([^<]*)"; // Variable Name 1 String re3="</([^>]+)>"; // Tag 2 // String re3 = re1; Pattern p = Pattern.compile(re1+re2+re3,Pattern.CASE_INSENSITIVE | Pattern.DOTALL); Matcher m = p.matcher(txt); if (m.find()) { String tag1=m.group(1); String var1=m.group(2); System.out.println(tag1.toString()); System.out.println(var1.toString()); } 

The problem is that, it only identifies the first tag and not the second one or subsequent ones.

Current Output

TRIPS 100 

Desired Output

TRIPS 100 DATE December ROUTE Altantic 
6
  • Use <([^>]*)>(.*?)<\/\1> & extract second group. Commented Oct 17, 2016 at 4:51
  • 1
    Change if (m.find()) to while (m.find()) Commented Oct 17, 2016 at 4:53
  • Close the TRIPS element properly like <TRIPS> 100 </TRIPS>, and use the commented-out version of re3. Otherwise you will not be able to match the other elements that are properly closed. Commented Oct 17, 2016 at 4:55
  • 1
    RegEx match open tags except XHTML self-contained tags (Tony the pony) Commented Oct 17, 2016 at 4:58
  • 1
    I hope you realise what you are doing. You are writing an application that will only process the XML if it is written in a very particular way. You will thus become the cause of a dozen SO questions from people asking how to generate XML with this very particular lexical form, because the consuming application only works if it is written in this particular way. There is a reason for standards, and this kind of abuse of standards leads to everyone in the industry incurring increased costs. Commented Oct 17, 2016 at 7:42

2 Answers 2

2

Please Change if to while :

String txt = "American Airlines made <TRIPS> 100 <TRIPS> flights in <DATE> December </DATE> over <ROUTE> Altantic </ROUTE> "; String re1 = "<([^>]+)>"; // Tag 1 String re2 = "([^<]*)"; // Variable Name 1 // String re3="</([^>]+)>"; // Tag 2 String re3 = re1; Pattern p = Pattern.compile(re1 + re2 + re3, Pattern.CASE_INSENSITIVE | Pattern.DOTALL); Matcher m = p.matcher(txt); while (m.find()) { String tag1 = m.group(1); String var1 = m.group(2); System.out.println(tag1.toString()); System.out.println(var1.toString()); } 
Sign up to request clarification or add additional context in comments.

Comments

1

If you came to this post looking for a way to parse XML, don't read this. Use an XML parser instead.


Solution:

Change if (m.find()) to while (m.find()). You can iterate to find all matches.

This is the general case to find all regex matches:

Pattern p = Pattern.compile(regex,flags); Matcher m = p.matcher(text); while (m.find()) { System.out.println("First group: " + m.group(1) + "\nSecond group: " + m.group(2) ); } 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.