0

I have an html encoded xml payload where I'd like to use regular expressions to extract a certain piece of that xml and write that to a file. I know it's generally not good practice to use regex for xml but I think this a special use case.

Anyway here is a sample encoded xml:

<root> <parent> <test1> <another> <subelement> <value>hello</value> </subelement> </another> </test1> </parent> </root> 

I ultimately want my result to be:

<test1> <another> <subelement> <value>hello</value> </subelement> </another> </test1> 

Here is my implementation in python to extract out all the text between the <test1> and </test1> inclusively:

import html import re file_stream = open('/path/to/test.xmp', 'r') raw_data = file_stream.read() escaped_raw_data = html.unescape(raw_data) result = re.search(r"<test1[\s\S]*?<\/test1>", escaped_raw_data) 

However I get no matches for result, what am I doing wrong? How to accomplish my goal?

8
  • 1
    In your regex, Instead of ., use [\s\S] because . does not match newlines Commented Feb 4, 2018 at 3:18
  • @Gurman result = re.search(r"<test1[\s\S]*?<\/test1>", escaped_raw_data) still I get a result of None Commented Feb 4, 2018 at 3:19
  • 1
    <test1[\s\S]*?<\/test1> Commented Feb 4, 2018 at 3:21
  • 1
    Yes. But, I am not proficient in Python. Was just trying to help you with the regex. Let python experts also see your question and attempt answers Commented Feb 4, 2018 at 3:26
  • 1
    Try result = re.search(r"<test1>.*<\/test1>", escaped_raw_data, re.DOTALL). Commented Feb 4, 2018 at 3:43

1 Answer 1

1

This works for me:

import html import re raw_data = ''' &lt;root&gt; &lt;parent&gt; &lt;test1&gt; &lt;another&gt; &lt;subelement&gt; &lt;value&gt;hello&lt;/value&gt; &lt;/subelement&gt; &lt;/another&gt; &lt;/test1&gt; &lt;/parent&gt; &lt;/root&gt; ''' escaped_raw_data = html.unescape(raw_data) result = re.search(r'(<test1>.*</test1>)', escaped_raw_data, re.DOTALL) if result: print(result.group(0)) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.