I have an XML file and an XML schema in another file and I'd like to validate that my XML file adheres to the schema. How do I do this in Python?
I'd prefer something using the standard library, but I can install a third-party package if necessary.
I am assuming you mean using XSD files. Surprisingly there aren't many python XML libraries that support this. lxml does however. Check Validation with lxml. The page also lists how to use lxml to validate with other schema types.
xmlschema Python package instead, makes things easier, see the answer stackoverflow.com/a/52310735/11154841.You can easily validate an XML file or tree against an XML Schema (XSD) with the xmlschema Python package. It's pure Python, available on PyPi and doesn't have many dependencies.
Example - validate a file:
import xmlschema xmlschema.validate('doc.xml', 'some.xsd') The method raises an exception if the file doesn't validate against the XSD. That exception then contains some violation details.
If you want to validate many files you only have to load the XSD once:
xsd = xmlschema.XMLSchema('some.xsd') for filename in filenames: xsd.validate(filename) If you don't need the exception you can validate like this:
if xsd.is_valid('doc.xml'): print('do something useful') Alternatively, xmlschema directly works on file objects and in memory XML trees (either created with xml.etree.ElementTree or lxml). Example:
import xml.etree.ElementTree as ET t = ET.parse('doc.xml') result = xsd.is_valid(t) print('Document is valid? {}'.format(result)) Installation lxml
pip install lxml If you get an error like "Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?", try to do this first:
# Debian/Ubuntu apt-get install python-dev python3-dev libxml2-dev libxslt-dev # Fedora 23+ dnf install python-devel python3-devel libxml2-devel libxslt-devel The simplest validator
Let's create simplest validator.py
from lxml import etree def validate(xml_path: str, xsd_path: str) -> bool: xmlschema_doc = etree.parse(xsd_path) xmlschema = etree.XMLSchema(xmlschema_doc) xml_doc = etree.parse(xml_path) result = xmlschema.validate(xml_doc) return result then write and run main.py
from validator import validate if validate("path/to/file.xml", "path/to/scheme.xsd"): print('Valid! :)') else: print('Not valid! :(') A little bit of OOP
In order to validate more than one file, there is no need to create an XMLSchema object every time, therefore:
validator.py
from lxml import etree class Validator: def __init__(self, xsd_path: str): xmlschema_doc = etree.parse(xsd_path) self.xmlschema = etree.XMLSchema(xmlschema_doc) def validate(self, xml_path: str) -> bool: xml_doc = etree.parse(xml_path) result = self.xmlschema.validate(xml_doc) return result Now we can validate all files in the directory as follows:
main.py
import os from validator import Validator validator = Validator("path/to/scheme.xsd") # The directory with XML files XML_DIR = "path/to/directory" for file_name in os.listdir(XML_DIR): print('{}: '.format(file_name), end='') file_path = '{}/{}'.format(XML_DIR, file_name) if validator.validate(file_path): print('Valid! :)') else: print('Not valid! :(') For more options read here: Validation with lxml
xmlschema Python package instead, makes things easier, see the answer stackoverflow.com/a/52310735/11154841.As for "pure python" solutions: the package index lists:
xmlschema Python package, see the answer stackoverflow.com/a/52310735/11154841.There are two ways(actually there are more) that you could do this.
1. using lxml
pip install lxml
from lxml import etree, objectify from lxml.etree import XMLSyntaxError def xml_validator(some_xml_string, xsd_file='/path/to/my_schema_file.xsd'): try: schema = etree.XMLSchema(file=xsd_file) parser = objectify.makeparser(schema=schema) objectify.fromstring(some_xml_string, parser) print "YEAH!, my xml file has validated" except XMLSyntaxError: #handle exception here print "Oh NO!, my xml file does not validate" pass xml_file = open('my_xml_file.xml', 'r') xml_string = xml_file.read() xml_file.close() xml_validator(xml_string, '/path/to/my_schema_file.xsd') >> xmllint --format --pretty 1 --load-trace --debug --schema /path/to/my_schema_file.xsd /path/to/my_xml_file.xml
The PyXB package at http://pyxb.sourceforge.net/ generates validating bindings for Python from XML schema documents. It handles almost every schema construct and supports multiple namespaces.
lxml provides etree.DTD
from the tests on http://lxml.de/api/lxml.tests.test_dtd-pysrc.html
... root = etree.XML(_bytes("<b/>")) dtd = etree.DTD(BytesIO("<!ELEMENT b EMPTY>")) self.assert_(dtd.validate(root)) import xmlschema def get_validation_errors(xml_file, xsd_file): schema = xmlschema.XMLSchema(xsd_file) validation_error_iterator = schema.iter_errors(xml_file) errors = list() for idx, validation_error in enumerate(validation_error_iterator, start=1): err = validation_error.__str__() errors.append(err) print(err) return errors errors = get_validation_errors('sample3.xml', 'sample_schema.xsd')