1

I sent a POST message in python based on this answer on SO. Once this is done, I get a resultant XML representation that looks like this from the website:

<status>Active</status> <registeredname>MyTestName</registeredname> <companyname>TEST</companyname> <email>[email protected]</email> <serviceid>8</serviceid> <productid>1</productid> <productname>Some Test Product</productname> <regdate>2013-08-06</regdate> <nextduedate>0000-00-00</nextduedate> <billingcycle>One Time</billingcycle> <validdomain>testing</validdomain> <validip>XX.XX.XXX.XX</validip> <validdirectory>/root</validdirectory> <configoptions></configoptions> <customfields></customfields> <addons></addons> <md5hash>58z9f70a9d738a98b18d0bf4304ac0c6</md5hash> 

Now, I would like to convert this into a python dictionary of the format:

{"status": "Active", "registeredname": "MyTestName".......} 

The corresponding PHP code from which I am trying to port has something like this:

preg_match_all('/<(.*?)>([^<]+)<\/\\1>/i', $data, $matches); 

My correponding Python code is as follows:

matches = {} matches = re.findall('/<(.*?)>([^<]+)<\/\\1>/i', data) 

'data' is the XML representation that I receive from the server. When I run this, my 'matches' dictionary remains empty. Is there something wrong in the regex statement? Or am I wrong in using re.findall in the first place?

Thanks in advance

3
  • 3
    You really don't want to do this with regular expressions. There are plenty of answers here on SO that show how to do this with a decent XML parser instead. Commented Aug 6, 2013 at 9:04
  • Is there no top-level XML tag, btw? Is this the whole document? Commented Aug 6, 2013 at 9:05
  • yes.. There is no top-level XML tag. this is the whole document. So I am guessing its not an XML doc per se Commented Aug 6, 2013 at 9:21

1 Answer 1

3

Remove leading/trailing /s from the regular expression. No need to escape /. Specify flags=re.IGNORECASE instead of trailing i.

matches = re.findall('<(.*?)>([^<]+)</\\1>', data, flags=re.IGNORECASE) print(dict(matches)) 

Using raw string, no need to escape \.

matches = re.findall(r'<(.*?)>([^<]+)</\1>', data, flags=re.IGNORECASE) print(dict(matches)) 

Both codes print:

{'status': 'Active', 'companyname': 'TEST', ...} 

non-regex alternative: lxml

Used lxml.html instead of lxml.etree because data is incomplete.

import lxml.html print({x.tag:x.text for x in lxml.html.fromstring(data)}) 
Sign up to request clarification or add additional context in comments.

2 Comments

Both these prints the output as dictionaries. How do I store them as dictionaries? Because when I give print matches, I get [('status', 'Active'), ('companyname', 'test'),....]
@i.h4d35, Try result = dict(matches).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.