-2

I'm parsing a XML file with nodejs and RegExp, but i don't find the way to extract all children from a parent, for example i need all FormalName="(.+)" from parent PARENT1

<TopicSet FormalName="PARENT1"> <Topic> <TopicType FormalName="Child1" /> </Topic> <Topic> <TopicType FormalName="Child2" /> </Topic> <Topic> <TopicType FormalName="Child3" /> </Topic> </TopicSet> <TopicSet FormalName="PARENT2"> <Topic> <TopicType FormalName="Child1" /> </Topic> <Topic> <TopicType FormalName="Child2" /> </Topic> <Topic> <TopicType FormalName="Child3" /> </Topic> </TopicSet> 

I tried this :

<TopicSet FormalName="PARENT1">(?:(?:\s|\S)*?)TopicType FormalName="(.+)"(?:(?:\s|\S)*?)<\/TopicSet>

But it only returns the first occurence (Child1) of PARENT1, and not Child1, Child2 and Child3

https://regex101.com/r/3ESH29/2/

0

2 Answers 2

3

It is not advisable to parse xml with a regex.

Instead of using a regex, you might use a DOMParser and for example use querySelectorAll to get the values of FormalName in PARENT1:

Example using jsdom

let xml = `<TopicSet FormalName="PARENT1"> <Topic> <TopicType FormalName="Child1" /> </Topic> <Topic> <TopicType FormalName="Child2" /> </Topic> <Topic> <TopicType FormalName="Child3" /> </Topic> </TopicSet> <TopicSet FormalName="PARENT2"> <Topic> <TopicType FormalName="Child1" /> </Topic> <Topic> <TopicType FormalName="Child2" /> </Topic> <Topic> <TopicType FormalName="Child3" /> </Topic> </TopicSet>`; let parser = new DOMParser(); let doc = parser.parseFromString(xml, "text/xml"); let res = doc.querySelectorAll("TopicSet[FormalName='PARENT1'] Topic TopicType"); res.forEach(e => console.log(e.getAttribute("FormalName")));

Sign up to request clarification or add additional context in comments.

Comments

0

It may not be the best idea to do that with regular expressions. However, if you have to, you might want to create three capturing groups with parent open/close tags as left/right boundaries and swipe everything in between:

(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>) 

enter image description here

RegEx

If this wasn't your desired expression, you can modify/change your expressions in regex101.com.

RegEx Circuit

You can also visualize your expressions in jex.im:

enter image description here

JavaScript Demo

const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg; const str = `<TopicSet FormalName="PARENT1">	<Topic> <TopicType FormalName="Child1" />	</Topic>	<Topic> <TopicType FormalName="Child2" />	</Topic>	<Topic> <TopicType FormalName="Child3" />	</Topic> </TopicSet> <TopicSet FormalName="PARENT2">	<Topic> <TopicType FormalName="Child1" />	</Topic>	<Topic> <TopicType FormalName="Child2" />	</Topic>	<Topic> <TopicType FormalName="Child3" />	</Topic> </TopicSet>`; const subst = `$2`; // The substituted value will be contained in the result variable const result = str.replace(regex, subst); console.log('Substitution result: ', result);

JavaScript Demo 2

If you wish to also print the parent tag, you can simply replace it with $1$2$3 instead of $2, which here we have added to be just simple to call:

const regex = /(<TopicSet.*?>)([\s\S]*?)(<\/TopicSet>)/mg; const str = `<TopicSet FormalName="PARENT1">	<Topic> <TopicType FormalName="Child1" />	</Topic>	<Topic> <TopicType FormalName="Child2" />	</Topic>	<Topic> <TopicType FormalName="Child3" />	</Topic> </TopicSet> <TopicSet FormalName="PARENT2">	<Topic> <TopicType FormalName="Child1" />	</Topic>	<Topic> <TopicType FormalName="Child2" />	</Topic>	<Topic> <TopicType FormalName="Child3" />	</Topic> </TopicSet>`; const subst = `$1$2$3`; // The substituted value will be contained in the result variable const result = str.replace(regex, subst); console.log('Substitution result: ', result);

Demo


If you only want to extract the first parent, you can add another boundary:

(<TopicSet FormalName="PARENT1">)([\s\S]*?)(<\/TopicSet>) 

Demo

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.