Extract content between XML tags

Question

I have this XML file:

<ApiHeader> <OperationName>findEntitiesResponse</OperationName> </ApiHeader> <ResponseHeader> <CompletedSuccessfully>true</CompletedSuccessfully> </ResponseHeader> <Page> <StartAtRow>0</StartAtRow> <MaxRows>999999</MaxRows> <TotalRowCount>44</TotalRowCount> </Page> <Entity> <Carrier>xd <Id>11460</Id> <CarrierCode>11460</CarrierCode> <CarrierDescription>11460 LOGIS COUTTER</CarrierDescription> <LanguageCode>en</LanguageCode> <LanguageCodeDescr>Inglés</LanguageCodeDescr> <CarrierTypeCode>GENERAL</CarrierTypeCode> <CarrierTypeCodeDescr>GENERAL</CarrierTypeCodeDescr> <SCACCode>Default</SCACCode> </Memo> </Carrier> </Entity> <Entity>

There are a lot of <Entitiy>CONTENT</Entity>like the one on the example, but I kept it simple.

What I'm trying to do is extract everything between the <Entity></Entity> tags. I've done a lot of research but the closest thing I've found is extracting content from just one tag.

And the result would be this

<Entity> <Carrier>xd <Id>11460</Id> <CarrierCode>11460</CarrierCode> <CarrierDescription>11460 LOGIS COUTTER</CarrierDescription> <LanguageCode>en</LanguageCode> <LanguageCodeDescr>Inglés</LanguageCodeDescr> <CarrierTypeCode>GENERAL</CarrierTypeCode> <CarrierTypeCodeDescr>GENERAL</CarrierTypeCodeDescr> <SCACCode>Default</SCACCode> </Memo> </Carrier> </Entity>

Remeber that there could be one or more <Entity></Entity> tags.

Thank you very much.

EDIT

`public class ReadXMLFile { private final static String filepath ="C:\Users\AGOJSO\Desktop\jordi\test.xml";

public static void main(String[] args) { printXml(); } public static void printXml() { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); try (InputStream in = new FileInputStream(filepath)) { DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(in); NodeList list = filterNodesByXPath(doc, "//root/Entity"); for (int i = 0; i < list.getLength(); i++) { Node node = list.item(i); printNode(node); } } catch (Exception e) { throw new RuntimeException(e); } } private static NodeList filterNodesByXPath(Document doc, String xpathExpr) { try { XPathFactory xPathFactory = XPathFactory.newInstance(); XPath xpath = xPathFactory.newXPath(); XPathExpression expr = xpath.compile(xpathExpr); Object eval = expr.evaluate(doc, XPathConstants.NODESET); return (NodeList) eval; } catch (Exception e) { throw new RuntimeException(e); } } private static void printNode(Node node) throws TransformerFactoryConfigurationError, TransformerException { Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); StreamResult result = new StreamResult(new StringWriter()); DOMSource source = new DOMSource(node); transformer.transform(source, result); String xmlString = result.getWriter().toString(); System.out.println(xmlString); }

} `

It doesnt print any errors, as it it seems to be doing nothing.

Yes I did, check out the edited question to check out my solution — Jorge Luís Segura Oñate
– Jorge Luís Segura Oñate, Commented Oct 9, 2018 at 16:03
Do not edit an answer into your question. If your own answer is sufficiently enough different from the one(s) given, you can always post it as such. — Jongware
– Jongware, Commented Oct 10, 2018 at 9:15

jschnasse · Accepted Answer · 2018-10-10 09:10:42Z

You could do it the old good way.

Read XML to DOM
Use XPath to extract the proper part
Print it out ... or do whatever you like

Code:

@Test public void printXml() { String yourSampleFile = "52720162.xml"; DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); try (InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile)) { DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(in); NodeList list = filterNodesByXPath(doc, "//root/Entity"); for (int i = 0; i < list.getLength(); i++) { Node node = list.item(i); printNode(node); } } catch (Exception e) { throw new RuntimeException(e); } } private NodeList filterNodesByXPath(Document doc, String xpathExpr) { try { XPathFactory xPathFactory = XPathFactory.newInstance(); XPath xpath = xPathFactory.newXPath(); XPathExpression expr = xpath.compile(xpathExpr); Object eval = expr.evaluate(doc, XPathConstants.NODESET); return (NodeList) eval; } catch (Exception e) { throw new RuntimeException(e); } } private void printNode(Node node) throws TransformerFactoryConfigurationError, TransformerException { Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); StreamResult result = new StreamResult(new StringWriter()); DOMSource source = new DOMSource(node); transformer.transform(source, result); String xmlString = result.getWriter().toString(); System.out.println(xmlString); }

A somewhat generalized form can be found at: How to read XML using XPath in Java

Allright, I'm going to try out your code.I'll let you know how it goes.Thanks for commenting on my post
Actully it didnt work, from your code I changed the String yourSampleFile to the complete path for my xml file, also i made the methods static so I could call them from ` public static void main(String[] args)` and im getting this error Caused by: java.lang.IllegalArgumentException: InputStream cannot be null.I guess that there's something wrong with my file path but i dont know what, the path is correct and the file exists also
Allright, it turns out that the problem was with the absolute route it seems that .getResourceAsStream(yourSampleFile) reads paths as relative starting from classpath so I changed it to try (InputStream in = new FileInputStream(filepath)) and now it doesnt print any errors but I see no output See Edit on original post to check out java code.
Nevermind what I said Im an idiot your code was perfect I created a new class with your code, changed this line ` NodeList list = filterNodesByXPath(doc, "CISDocument/Entity");` because I left it just like you did and it wasnt finding the node-parent structure.Thanks man you are my savior.
Cool! It worked. I forgot to mention that I had to wrap your xml in order to make it well formed. But you found out. With currentThread().getContextClassLoader() you have a very reliable pattern to access resources that are on the java classpath, or on a path relative to a directory on the classpath.

Collectives™ on Stack Overflow

Extract content between XML tags

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related