I've got an xml-file containing the directory structure for files I want to put into a tar.gz file (flattened).
How should I parse the xml to extract the path for each file?
Right now I'm using lxml and finding the paths like this:
paths = [] for case in root.iter('case'): for language in case.iter('language'): for result in language.iter('result'): for file in result.iter('file'): paths.append('/'.join([node.get('id') for node in [case, language, result, file]])) But this feels a bit too hardcoded and it does not work well if the structure change.
I can find each file-node with root.iter('file'), but how can I get all parents/directories for each node/file? Or should I do this a (completely?) different way?
The xml looks like this:
<?xml version="1.0" encoding="UTF-8"?> <files batch="regular"> <case id="case_10_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> <file id="screenshot_4.png"/> <file id="screenshot_5.png"/> <file id="screenshot_6.png"/> </result> </language> </case> <case id="case_12_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> </result> </language> </case> </files> And this is the files:
regular/case_10_some_description/english/images/screenshot_1.png regular/case_10_some_description/english/images/screenshot_2.png regular/case_10_some_description/english/images/screenshot_3.png regular/case_10_some_description/english/images/screenshot_4.png regular/case_10_some_description/english/images/screenshot_5.png regular/case_10_some_description/english/images/screenshot_6.png regular/case_12_some_description/english/images/screenshot_1.png regular/case_12_some_description/english/images/screenshot_2.png regular/case_12_some_description/english/images/screenshot_3.png