Using regex to convert XML representation to Dictionary in python

Question

I sent a POST message in python based on this answer on SO. Once this is done, I get a resultant XML representation that looks like this from the website:

<status>Active</status> <registeredname>MyTestName</registeredname> <companyname>TEST</companyname> <email>[email protected]</email> <serviceid>8</serviceid> <productid>1</productid> <productname>Some Test Product</productname> <regdate>2013-08-06</regdate> <nextduedate>0000-00-00</nextduedate> <billingcycle>One Time</billingcycle> <validdomain>testing</validdomain> <validip>XX.XX.XXX.XX</validip> <validdirectory>/root</validdirectory> <configoptions></configoptions> <customfields></customfields> <addons></addons> <md5hash>58z9f70a9d738a98b18d0bf4304ac0c6</md5hash>

Now, I would like to convert this into a python dictionary of the format:

{"status": "Active", "registeredname": "MyTestName".......}

The corresponding PHP code from which I am trying to port has something like this:

preg_match_all('/<(.*?)>([^<]+)<\/\\1>/i', $data, $matches);

My correponding Python code is as follows:

matches = {} matches = re.findall('/<(.*?)>([^<]+)<\/\\1>/i', data)

'data' is the XML representation that I receive from the server. When I run this, my 'matches' dictionary remains empty. Is there something wrong in the regex statement? Or am I wrong in using re.findall in the first place?

Thanks in advance

You really don't want to do this with regular expressions. There are plenty of answers here on SO that show how to do this with a decent XML parser instead. — Martijn Pieters
– Martijn Pieters, Commented Aug 6, 2013 at 9:04
Is there no top-level XML tag, btw? Is this the whole document? — Martijn Pieters
– Martijn Pieters, Commented Aug 6, 2013 at 9:05
yes.. There is no top-level XML tag. this is the whole document. So I am guessing its not an XML doc per se — rahuL
– rahuL, Commented Aug 6, 2013 at 9:21

falsetru · Accepted Answer · 2013-08-06 09:10:40Z

Remove leading/trailing /s from the regular expression. No need to escape /. Specify flags=re.IGNORECASE instead of trailing i.

matches = re.findall('<(.*?)>([^<]+)</\\1>', data, flags=re.IGNORECASE) print(dict(matches))

Using raw string, no need to escape \.

matches = re.findall(r'<(.*?)>([^<]+)</\1>', data, flags=re.IGNORECASE) print(dict(matches))

Both codes print:

{'status': 'Active', 'companyname': 'TEST', ...}

non-regex alternative: lxml

Used lxml.html instead of lxml.etree because data is incomplete.

import lxml.html print({x.tag:x.text for x in lxml.html.fromstring(data)})

Both these prints the output as dictionaries. How do I store them as dictionaries? Because when I give print matches, I get [('status', 'Active'), ('companyname', 'test'),....]

Collectives™ on Stack Overflow

Using regex to convert XML representation to Dictionary in python

1 Answer 1

non-regex alternative: lxml

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

non-regex alternative: lxml

2 Comments

Linked

Related