Comparing XML in a unit test in Python

Question

I have an object that can build itself from an XML string, and write itself out to an XML string. I'd like to write a unit test to test round tripping through XML, but I'm having trouble comparing the two XML versions. Whitespace and attribute order seem to be the issues. Any suggestions for how to do this? This is in Python, and I'm using ElementTree (not that that really matters here since I'm just dealing with XML in strings at this level).

Stevoisiak · Accepted Answer · 2017-11-06 23:23:13Z

20

This is an old question, but the accepted Kozyarchuk's answer doesn't work for me because of attributes order, and the minidom solution doesn't work as-is either (no idea why, I haven't debugged it).

This is what I finally came up with:

from doctest import Example from lxml.doctestcompare import LXMLOutputChecker class XmlTest(TestCase): def assertXmlEqual(self, got, want): checker = LXMLOutputChecker() if not checker.check_output(want, got, 0): message = checker.output_difference(Example("", want), got, 0) raise AssertionError(message)

This also produces a diff that can be helpful in case of large xml files.

edited Nov 6, 2017 at 23:23

Stevoisiak

27.8k32 gold badges140 silver badges245 bronze badges

answered Aug 14, 2011 at 23:05

Mikhail Korobov

22.3k8 gold badges75 silver badges66 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Aaron D Over a year ago

I have problems getting this to work in Python3, due to string encoding issues, no matter which combination of bytes(), bytearray() or encode('utf-8') I used. I'm not sure if that's a problem in the library or if I'm just missing something but this didn't work for me.

Mikhail Korobov Over a year ago

I'm not sure what's the issue; this approach is used both for Python 2 and Python 3 tests here: github.com/scrapinghub/webstruct/blob/…

Scott P. Over a year ago

I have stepped through the code and doctestcompare.py of lxml package isn't properly ported to python3 yet. self.get_parser() requires strings and then the same function requires bytes later down.

jhthompson Over a year ago

Really helpful answer for comparing HTML with lxml!! Thanks.

Hongbo Miao · Accepted Answer · 2023-10-12 07:23:19Z

19

First normalize 2 XML, then you can compare them. I've used the following using lxml

import xml.etree.ElementTree as ET obj1 = objectify.fromstring(expect) expect = ET.tostring(obj1) obj2 = objectify.fromstring(xml) result = ET.tostring(obj2) self.assertEquals(expect, result)

edited Oct 12, 2023 at 7:23

Hongbo Miao

50.7k68 gold badges204 silver badges338 bronze badges

answered Nov 26, 2008 at 19:35

Kozyarchuk

22k15 gold badges44 silver badges46 bronze badges

5 Comments

Adam Endicott Over a year ago

Oh man, I had tried this and thought the attributes were ordered differently, but I looked again and I was actually just missing one in my output. Thanks for hitting me over the head.

bobince Over a year ago

Heh. Slight note of caution, etree does not document any guarantee to serialise attributes in any particular order. At least the current pure-Python implementation of ElementTree does do a sort() on them, but it's not clear you can rely on this remaining so.

Mark E. Haase Over a year ago

My experience with etree is that it serializes them in the same order they were written to the document originally.

Stan Over a year ago

Beware : the serialization may vary with the version of Python, especially the attribute order.

kjaquier Over a year ago

Beware of the spacing as well that may be conserved. Example: t=ET.tostring; f=ET.fromstring; t(f('<A><B/></A>')) != t(f('<A> <B/></A>')). Downvoted, because I just shot myself in the foot with that.

bobince · Accepted Answer · 2008-11-26 19:56:19Z

If the problem is really just the whitespace and attribute order, and you have no other constructs than text and elements to worry about, you can parse the strings using a standard XML parser and compare the nodes manually. Here's an example using minidom, but you could write the same in etree pretty simply:

def isEqualXML(a, b): da, db= minidom.parseString(a), minidom.parseString(b) return isEqualElement(da.documentElement, db.documentElement) def isEqualElement(a, b): if a.tagName!=b.tagName: return False if sorted(a.attributes.items())!=sorted(b.attributes.items()): return False if len(a.childNodes)!=len(b.childNodes): return False for ac, bc in zip(a.childNodes, b.childNodes): if ac.nodeType!=bc.nodeType: return False if ac.nodeType==ac.TEXT_NODE and ac.data!=bc.data: return False if ac.nodeType==ac.ELEMENT_NODE and not isEqualElement(ac, bc): return False return True

If you need a more thorough equivalence comparison, covering the possibilities of other types of nodes including CDATA, PIs, entity references, comments, doctypes, namespaces and so on, you could use the DOM Level 3 Core method isEqualNode. Neither minidom nor etree have that, but pxdom is one implementation that supports it:

def isEqualXML(a, b): da, db= pxdom.parseString(a), pxdom.parseString(a) return da.isEqualNode(db)

(You may want to change some of the DOMConfiguration options on the parse if you need to specify whether entity references and CDATA sections match their replaced equivalents.)

A slightly more roundabout way of doing it would be to parse, then re-serialise to canonical form and do a string comparison. Again pxdom supports the DOM Level 3 LS option ‘canonical-form’ which you could use to do this; an alternative way using the stdlib's minidom implementation is to use c14n. However you must have the PyXML extensions install for this so you still can't quite do it within the stdlib:

from xml.dom.ext import c14n def isEqualXML(a, b): da, bd= minidom.parseString(a), minidom.parseString(b) a, b= c14n.Canonicalize(da), c14n.Canonicalize(db) return a==b

andrewrk · Accepted Answer · 2008-11-26 19:19:11Z

5

Use xmldiff, a python tool that figures out the differences between two similar XML files, the same way that diff does it.

answered Nov 26, 2008 at 19:19

andrewrk

31.5k28 gold badges99 silver badges119 bronze badges

1 Comment

guettli Over a year ago

xmldiff is GPL. Does this mean, I have to open source my script?

Robert Rossney · Accepted Answer · 2008-11-26 20:46:45Z

Why are you examining the XML data at all?

The way to test object serialization is to create an instance of the object, serialize it, deserialize it into a new object, and compare the two objects. When you make a change that breaks serialization or deserialization, this test will fail.

The only thing checking the XML data is going to find for you is if your serializer is emitting a superset of what the deserializer requires, and the deserializer silently ignores stuff it doesn't expect.

Of course, if something else is going to be consuming the serialized data, that's another matter. But in that case, you ought to be thinking about establishing a schema for the XML and validating it.

Yes, something else is going to be consuming the serialized data. I may get to the point of building a schema and validating it, but for now doing a string comparison of the XML is good enough.

Community · Accepted Answer · 2017-05-23 12:30:52Z

I also had this problem and did some digging around it today. The doctestcompare approach may suffice, but I found via Ian Bicking that it is based on formencode.doctest_xml_compare. Which appears to now be here. As you can see that is a pretty simple function, unlike doctestcompare (although I guess doctestcompare is collecting all the failures and maybe more sophisticated checking). Anyway copying/importing xml_compare out of formencode may be a good solution.

kjaw · Accepted Answer · 2020-10-15 10:36:37Z

Stevoisiak's solution

in my case doesn't work for python3. Fixed:

from lxml.doctestcompare import LXMLOutputChecker, PARSE_XML class XmlTest(TestCase): def assertXmlEqual(self, got, want): checker = LXMLOutputChecker() if not checker.check_output(want.encode(), got.encode(), PARSE_XML): message = checker.output_difference(Example(b"", want.encode()), got.encode(), PARSE_XML) raise AssertionError(message)

Rob Williams · Accepted Answer · 2008-11-27 00:20:52Z

The Java component dbUnit does a lot of XML comparisons, so you might find it useful to look at their approach (especially to find any gotchas that they may have already addressed).

moylop260 · Accepted Answer · 2016-11-23 02:05:34Z

def xml_to_json(self, xml): """Receive 1 lxml etree object and return a json string""" def recursive_dict(element): return (element.tag.split('}')[1], dict(map(recursive_dict, element.getchildren()), **element.attrib)) return json.dumps(dict([recursive_dict(xml)]), default=lambda x: str(x)) def assertEqualXML(self, xml_real, xml_expected): """Receive 2 objectify objects and show a diff assert if exists.""" xml_expected_str = json.loads(self.xml_to_json(xml_expected)) xml_real_str = json.loads(self.xml_to_json(xml_real)) self.maxDiff = None self.assertEqual(xml_real_str, xml_expected_str)

You could see a output like as:

 u'date': u'2016-11-22T19:55:02', u'item2': u'MX-INV0007', - u'item3': u'Payments', ? ^^^ + u'item3': u'OAYments', ? ^^^ +

porton · Accepted Answer · 2018-09-27 15:15:07Z

It can be easily done with minidom:

class XmlTest(TestCase): def assertXmlEqual(self, got, want): return self.assertEqual(parseString(got).toxml(), parseString(want).toxml())

Collectives™ on Stack Overflow

Comparing XML in a unit test in Python

10 Answers 10

4 Comments

5 Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

4 Comments

5 Comments

Comments

1 Comment

1 Comment

Comments

Comments

Comments

Comments

Comments

Linked

Related