0

there is an info.xml file under every /var/packs/{many folders}/info.xml where are different directories but with the dirs's info in info.xml

I need to parse through every {many folders} and create a list of the filepath which is inside the Path tags if the file type is "config" which can be found by checking if "config" is the type inside the type tags.

The info.xml file wil be like this,

<Files> <File> <Path>usr/share/doc/dialog/samples/form1</Path> <Type>doc</Type> <Size>1222</Size> <Uid>0</Uid> <Gid>0</Gid> <Mode>0755</Mode> <Hash>49744d73e8667d0e353923c0241891d46ebb9032</Hash> </File> <File> <Path>usr/share/doc/dialog/samples/form3</Path> <Type>config</Type> <Size>1294</Size> <Uid>0</Uid> <Gid>0</Gid> <Mode>0755</Mode> <Hash>f30277f73e468232c59a526baf3a5ce49519b959</Hash> </File> </Files> 

3 Answers 3

2

Here is very basic example with no errors processing and works with very strictly defined XML files, but you should take it as a start and continue with the following links:

The code:

import os import os.path from xml.dom.minidom import parse def parse_file(path): files = [] try: dom = parse(path) for filetag in dom.getElementsByTagName('File'): type = filetag.getElementsByTagName('Type')[0].firstChild.data if type == 'config': path = tag.getElementsByTagName('Path')[0].firstChild.data files.append(path) dom.unlink() except: raise return files def main(): files = [] for root, dirs, files in os.walk('/var/packs'): if 'info.xml' in files: files += parse_file(os.path.join(root, 'info.xml')) print 'The list of desired files:', files if __name__ == '__main__': main() 
Sign up to request clarification or add additional context in comments.

Comments

1

Using lxml.etree and XPath:

files = [] for root, dirnames, filenames in os.walk('/var/packs'): for filename in filenames: if filename != 'info.xml': continue tree = lxml.etree.parse(os.path.join(root, filename)) files.extend(tree.getroot().xpath('//File[Type[text()="config"]]/Path/text()')) 

If lxml is not available, you can alternatively use the etree API in the standard library:

files = [] for root, dirnames, filenames in os.walk('/var/packs'): for filename in filenames: if filename != 'info.xml': continue tree = xml.etree.ElementTree.parse(os.path.join(root, filename)) for file_node in tree.findall('File'): type_node = file_node.find('Type') if type_node is not None and type_node.text == 'config': path_node = file_node.find('Path') if path_node is not None: files.append(path_node.text) 

Comments

0

Writing this off the top of my head, but here goes. We're going to make use of os.path.walk to recursively descend into your directories and minidom for doing the parsing.

import os from xml.dom import minidom # opens a given info.xml file and prints out "Path"'s contents def parseInfoXML(filename): doc = minidom.parse(filename) for fileNode in doc.getElementsByTagName("File"): # warning: we assume the existence of a Path node, and that it contains a Text node print fileNode.getElementsByTagName("Path")[0].childNodes[0].data doc.unlink() def checkDirForInfoXML(arg, dirname, names): if "info.xml" in names: parseInfoXML(os.path.join(dirname, "info.xml")) # recursively walk the directory tree, calling our visitor function to check for info.xml in each dir # this will include packs as well, so be sure that there's no info.xml in there os.path.walk("/var/packs" , checkDirForInfoXML, None) 

Not the most efficient way to accomplish it I'm sure, but it'll do if you don't expect any errors/whatever.

2 Comments

Just a side note: os.path.walk is deprecated and has been removed in 3.0 in favor of os.walk(). docs.python.org/library/os.path.html#os.path.walk
Aha, thank you. I'm unfortunately still living in the Python 2.6 stone age, heh.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.