parse xml files in root folder and its sub folders

Question

I'm working on a directory with 27 folders and various XML files inside each folder.

was able to:

parse one XML file and write to a CSV file
traverse one folder and read and parse all XML files in it

challenges:

having a problem trying to go through and parse all the XML files from the root folder to its subfolders

Send help and thank you. Code snippets below

# working in one folder only import csv import xml.etree.ElementTree as ET import os ## directory path = "/Users.../y" filenames = [] ## Count the number of xml files of each folder files = os.listdir(path) print("\n") xml_data_to_csv = open('/Users.../xml_extract.csv', 'w') list_head = [] csvwriter = csv.writer(xml_data_to_csv) # Read XML files in a folder for filename in os.listdir(path): if not filename.endswith('.xml'): continue fullname = os.path.join(path,filename) print("\n", fullname) filenames.append(fullname) # parse elements in each XML file for filename in filenames: tree = ET.parse(filename) root = tree.getroot() extract_xml=[] ## extract child elements per xml file print("\n") for x in root.iter('Info'): for element in x: print(element.tag,element.text) extract_xml.append(element.text) ## Write list nodes to csv csvwriter.writerow(extract_xml) ## Close CSV file xml_data_to_csv.close()

Davide Madrisan · Accepted Answer · 2021-10-08 21:41:51Z

You can get the list of all the XML files in a given path with

import os path = "main/root" filelist = [] for root, dirs, files in os.walk(path): for file in files: if not file.endswith('.xml'): continue filelist.append(os.path.join(root, file)) for file in filelist: print(file) # or in your case parse the XML 'file'

If for instance:

$ tree /main/root /main/root ├── a │   ├── a.xml │   ├── b.xml │   └── c.xml ├── b │   ├── d.xml │   ├── e.xml │   └── x.txt └── c   ├── f.xml    └── g.xml

we get:

/main/root/c/g.xml /main/root/c/f.xml /main/root/b/e.xml /main/root/b/d.xml /main/root/a/c.xml /main/root/a/b.xml /main/root/a/a.xml

If you want to sort directories and files:

for root, dirs, files in os.walk(path): dirs.sort() for file in sorted(files): if not file.endswith('.xml'): continue filelist.append(os.path.join(root, file))

thanks for clearing this out. thank you all so much. in love with you guys already.
Also, is there a workflow or method to include a header or column name during parsing when saving it to csv?

frippe · Accepted Answer · 2021-10-08 21:20:03Z

1

You can use os.walk:

import os for dir_name, dirs, files in os.walk('<root_dir>'): # parse files

answered Oct 8, 2021 at 21:20

frippe

1,4911 gold badge9 silver badges19 bronze badges

1 Comment

Panda Over a year ago

thank you all so much. in love with you guys already

tdelaney · Accepted Answer · 2021-10-08 21:51:31Z

You can use the pathlib module to "glob" the XML files. It will search all subdirectories for the pattern you supply and return Path objects that already include the path to the file. Cleaning up your script a bit, you would have

import csv import xml.etree.ElementTree as ET from pathlib import Path ## directory path = Path("/Users.../y") with open('/Users.../xml_extract.csv', 'w') as xml_data_to_csv: csvwriter = csv.writer(xml_data_to_csv) # Read XML files in a folder for filepath in path.glob("**/*.xml"): tree = ET.parse(filename) root = tree.getroot() extract_xml=[] ## extract child elements per xml file print("\n") for x in root.iter('Info'): for element in x: print(element.tag,element.text) extract_xml.append(element.text) ## Write list nodes to csv csvwriter.writerow(extract_xml)

suggestions or workaround as to how we can include header or column name to csv?
@Panda, right after creating csvwriter, you can write the header with csvwriter.writerow(["column 1", "column 2", "column 3"]) ... replacing with your names of course.
tried this but it's giving me the column names on each row (like a loop)
@Panda What column names do you want? I can add it to the example.

Collectives™ on Stack Overflow

parse xml files in root folder and its sub folders

3 Answers 3

3 Comments

1 Comment

6 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

6 Comments

Related