14

I have this xml from sql, and I want to do the same by python 2.7 and lxml

<?xml version="1.0" encoding="utf-16"?> <results> <Country name="Germany" Code="DE" Storage="Basic" Status="Fresh" Type="Photo" /> </results> 

Now I have:

from lxml import etree # create XML results= etree.Element('results') country= etree.Element('country') country.text = 'Germany' root.append(country) filename = "xmltestthing.xml" FILE = open(filename,"w") FILE.writelines(etree.tostring(root, pretty_print=True)) FILE.close() 

Do you know how to add rest of attributes?

3
  • Have you even tried this? country.text adds "Germany" as text between the tags, ie <country>Germany</text>, not as an attribute, which is what you want/claim. Commented Dec 17, 2010 at 12:24
  • yes, I tried, but I didn't know how to add another attributes Commented Dec 17, 2010 at 12:29
  • 1
    There you go again. "Another". You did not ANY attributes. How to add attributes is in the docs. Commented Dec 17, 2010 at 12:43

4 Answers 4

21

Note this also prints the BOM

>>> from lxml.etree import tostring >>> from lxml.builder import E >>> print tostring( E.results( E.Country(name='Germany', Code='DE', Storage='Basic', Status='Fresh', Type='Photo') ), pretty_print=True, xml_declaration=True, encoding='UTF-16') ��<?xml version='1.0' encoding='UTF-16'?> <results> <Country Status="Fresh" Type="Photo" Code="DE" Storage="Basic" name="Germany"/> </results> 
Sign up to request clarification or add additional context in comments.

1 Comment

Side note: BOM is only printed because Python's "UTF-16" codec adds it. The "utf-8" one doesn't.
15
from lxml import etree # Create the root element page = etree.Element('results') # Make a new document tree doc = etree.ElementTree(page) # Add the subelements pageElement = etree.SubElement(page, 'Country', name='Germany', Code='DE', Storage='Basic') # For multiple multiple attributes, use as shown above # Save to XML file outFile = open('output.xml', 'w') doc.write(outFile, xml_declaration=True, encoding='utf-16') 

7 Comments

I would replace the latest two lines with doc.write('output.xml', xml_declaration=True, encoding='utf-16')
Well yes that is correct, but my main intention was to show how it is done rather than the eye candy ;)
My xml now is: <?xml version='1.0' encoding='utf-16'?>਍㰀爀攀猀甀氀琀猀㸀㰀䌀漀甀渀琀爀礀 䌀漀搀攀㴀∀䐀䔀∀ 匀琀漀爀愀最攀㴀∀䈀愀猀椀挀∀ 渀愀洀攀㴀∀䜀攀爀洀愀渀礀∀⼀㸀㰀⼀爀攀猀甀氀琀猀㸀
It works for me though. I wonder why, try it with Firefox perhaps (no reason, but worth trying)
@sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?
|
5

Save to XML file

doc.write('output.xml', xml_declaration=True, encoding='utf-16') 

instead of:

outFile = open('output.xml', 'w') doc.write(outFile, xml_declaration=True, encoding='utf-16') 

1 Comment

Will this respect XML indentation? I am creating the XML file in a similar fashion. But having issues in formatting whenever I add a element. If I modifytag or modify text and write back to a new xml file it works fine. Don't know with additions it's not working. Here is the format:<InterfaceLogData><TEMP>test</TEMP><TEMP1>test1<ChildTEMP1>test1</ChildTEMP1></TEMP1></InterfaceLogData>
3

Promoting my comment to an answer:

@sukbir is probably not using Windows. What happens is that lxml writes a newline (0A 00 in UTF-16LE) between the XML header and the body. This is then molested by Win text mode to become 0D 0A 00 which makes everything after that look like UTF-16BE hence the Chinese etc characters when you display it. You can get around this in this instance by using "wb" instead of "w" when you open the file. However I'd strongly suggest that you use 'UTF-8' (spelled EXACTLY like that) as your encoding. Why are you using UTF-16? You like large files and/or weird problems?

1 Comment

Unfortunately the "wb" didn't solve this issue for me, but the newlines were the cause, so was able to work around the issue by writing the xml on one line (no pretty_print) and manually adding the declaration. On the question of "Why are you using UTF-16? You like large files and/or weird problems?" it could be (as in my case) that a 3rd party required a file in UTF-16. If you deal with other interfaces from other parties then you don't always have control over what you send them.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.