How to parse xml-file with directory structure

Question

I've got an xml-file containing the directory structure for files I want to put into a tar.gz file (flattened).

How should I parse the xml to extract the path for each file?

Right now I'm using lxml and finding the paths like this:

paths = [] for case in root.iter('case'): for language in case.iter('language'): for result in language.iter('result'): for file in result.iter('file'): paths.append('/'.join([node.get('id') for node in [case, language, result, file]]))

But this feels a bit too hardcoded and it does not work well if the structure change.

I can find each file-node with root.iter('file'), but how can I get all parents/directories for each node/file? Or should I do this a (completely?) different way?

The xml looks like this:

<?xml version="1.0" encoding="UTF-8"?> <files batch="regular"> <case id="case_10_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> <file id="screenshot_4.png"/> <file id="screenshot_5.png"/> <file id="screenshot_6.png"/> </result> </language> </case> <case id="case_12_some_description"> <language id="english"> <result id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> </result> </language> </case> </files>

And this is the files:

regular/case_10_some_description/english/images/screenshot_1.png regular/case_10_some_description/english/images/screenshot_2.png regular/case_10_some_description/english/images/screenshot_3.png regular/case_10_some_description/english/images/screenshot_4.png regular/case_10_some_description/english/images/screenshot_5.png regular/case_10_some_description/english/images/screenshot_6.png regular/case_12_some_description/english/images/screenshot_1.png regular/case_12_some_description/english/images/screenshot_2.png regular/case_12_some_description/english/images/screenshot_3.png

I wrote this python package to manage evolving templates of directory structures... github.com/robmoggach/python-dirtt — rjmoggach
– rjmoggach, Commented Oct 25, 2017 at 15:08

BendEg · Accepted Answer · 2013-09-04 09:15:22Z

Do you create this file-schema on your own? If you can change it, i would definitly. Try to make something like this:

<?xml version="1.0" encoding="UTF-8"?> <Directory id="regular"> <Directory id="case_10_some_description"> <Directory id="english"> <Directory id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> <file id="screenshot_4.png"/> <file id="screenshot_5.png"/> <file id="screenshot_6.png"/> </Directory> </Directory> </Directory> <Directory id="case_12_some_description"> <Directory id="english"> <Directory id="images"> <file id="screenshot_1.png"/> <file id="screenshot_2.png"/> <file id="screenshot_3.png"/> </Directory> </Directory> </Directory> </Directory>

Always give tag the same name if they have the same meaning. Maybe use more different attributes than tag, is would make your parsing easier

Srinivasreddy Jakkireddy · Accepted Answer · 2013-09-04 10:30:41Z

0

import xml.etree.ElementTree as ET tree = ET.parse('sample.xml') root = tree.getroot() for file in root.iter('file'): print 'regular/case_10_some_description/english/images/'+file.attrib['id']

answered Sep 4, 2013 at 10:30

Srinivasreddy Jakkireddy

2,9191 gold badge14 silver badges7 bronze badges

1 Comment

kristus Over a year ago

Thanks for the answer, but this is more hardcoded than the solution I want to get rid of. This only works for the first case also.

Collectives™ on Stack Overflow

How to parse xml-file with directory structure

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related