2

I have 50 folders that contain each one an xml file. The problem is that the formatting is supposed to be like:

<data> <items> <item name="item1_πα"></item> <item name="item2_πα"></item> <item name="item3_πα"></item> <item name="item4_πα"></item> </items> </data> 

but is

b'<data>\n <items>51041<item name="item1_\xcf\x80\xce\xb1\xcf\x81\xce\xb1\xce\xb3\xcf\x89\xce\xb3\xce\xae"/>\n <item name="item2"/>\n <item name="item3"/>\n <item name="item4"/>\n </items>\n</data>\n\n' 

can I modify them all with a loop and make them appear as they should?

something like this:

for i in os.listdir(r"C:\Users\user\Desktop\testin"): # <- here are the 50 folders with open('bac.xml', 'r'): # open each xml with open('bac.xml','w'): # write each xml formatted now example.writexml(file, indent='\n', addindent=' ',encoding = 'utf-8') 

Note:All the xml files in each folder have the same name.

3
  • If you do print(name) you will see that the format of the string inside name is a good match for the example you present. For example, the linebreaks and indentation are what you expect, and you will see the first item name shown as <item name="item1_παραγωγή"/>. You have your data in a bytestring. If you just display a bytestring to the terminal, Python will show \x encodings for non-ascii characters. Commented Aug 7, 2018 at 8:28
  • so how to make it appear as you said and not likexcf\x80\xce\xb etc Commented Aug 7, 2018 at 8:31
  • This is an answer you've written that I can't make it work to my code: stackoverflow.com/a/44005629/9988562 What do you suggest to make it appear formatted when exported? Commented Aug 7, 2018 at 9:10

1 Answer 1

0

The problem you have is probably with the decoding of the bytes: This thread has a solutuion. Basically you need to read the file as a byte (hence the 'rb', b for bytes) and then decode() it:

import os # this will get you all the subdirectories name = r"C:\Users\user\Desktop\testin" folders_list = [os.path.join(name, directory) for directory in os.listdir(name) if os.path.isdir(os.path.join(name,directory))] for folder in folders_list: #when you read the file: with open(folder+r'\bac.xml', 'rb') as f: your_file = f.read().decode() # if you need to write it anywhere else: with open(folder+r'\bac.xml', 'wb') as f: f.write(your_file.encode()) 
Sign up to request clarification or add additional context in comments.

7 Comments

how to do it in for loop through the folders I mentioned
I don't understand one thing: are the data stored in the xml files formatted like b'<data>\n <items>51041<item name="item1_\xcf\x80\xce\xb1\xcf\x81\xce\xb1\xce\xb3\xcf\x89\xce\xb3\xce\xae"/>\n <item name="item2"/>\n <item name="item3"/>\n <item name="item4"/>\n </items>\n</data>\n\n' or are they already properly formatted but you get the format above while reading them?
the xml files are exported by my code and if I open the xml files with notepad they appear in one line. The fix I am searching is to make them formatted in a for loop.
I don't know if i get what you need. I hope this helps.
And about the format fix ?this just reads and writes What do you suggest?Apparently it keeps the bad formatting. It needs to format it before writing. But how?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.