Regex python encoded xml

Question

I have an html encoded xml payload where I'd like to use regular expressions to extract a certain piece of that xml and write that to a file. I know it's generally not good practice to use regex for xml but I think this a special use case.

Anyway here is a sample encoded xml:

&lt;root&gt; &lt;parent&gt; &lt;test1&gt; &lt;another&gt; &lt;subelement&gt; &lt;value&gt;hello&lt;/value&gt; &lt;/subelement&gt; &lt;/another&gt; &lt;/test1&gt; &lt;/parent&gt; &lt;/root&gt;

I ultimately want my result to be:

<test1> <another> <subelement> <value>hello</value> </subelement> </another> </test1>

Here is my implementation in python to extract out all the text between the <test1> and </test1> inclusively:

import html import re file_stream = open('/path/to/test.xmp', 'r') raw_data = file_stream.read() escaped_raw_data = html.unescape(raw_data) result = re.search(r"<test1[\s\S]*?<\/test1>", escaped_raw_data)

However I get no matches for result, what am I doing wrong? How to accomplish my goal?

In your regex, Instead of ., use [\s\S] because . does not match newlines — Gurmanjot Singh
– Gurmanjot Singh, Commented Feb 4, 2018 at 3:18
@Gurman result = re.search(r"<test1[\s\S]*?<\/test1>", escaped_raw_data) still I get a result of None — mosawi
– mosawi, Commented Feb 4, 2018 at 3:19
Yes. But, I am not proficient in Python. Was just trying to help you with the regex. Let python experts also see your question and attempt answers — Gurmanjot Singh
– Gurmanjot Singh, Commented Feb 4, 2018 at 3:26
Try result = re.search(r"<test1>.*<\/test1>", escaped_raw_data, re.DOTALL). — accdias
– accdias, Commented Feb 4, 2018 at 3:43

accdias · Accepted Answer · 2018-02-04 03:54:17Z

This works for me:

import html import re raw_data = ''' &lt;root&gt; &lt;parent&gt; &lt;test1&gt; &lt;another&gt; &lt;subelement&gt; &lt;value&gt;hello&lt;/value&gt; &lt;/subelement&gt; &lt;/another&gt; &lt;/test1&gt; &lt;/parent&gt; &lt;/root&gt; ''' escaped_raw_data = html.unescape(raw_data) result = re.search(r'(<test1>.*</test1>)', escaped_raw_data, re.DOTALL) if result: print(result.group(0))

Collectives™ on Stack Overflow

Regex python encoded xml

1 Answer 1

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Linked

Related