1

I have an XML file that needs to have a "TAB" character as a value to a key. Based on this link Represent space and tab in XML tag I encoded it as &#009 rather than use "\t" as it was interpreting it as string containing two characters '\' and 't'.

I did not use the CDATA section as that would still consider a TAB as a string containing two characters '\' and 't'

The sample XML file of my use case looks like this

<?xml version="1.0" encoding="UTF-8"?> <keys> <key> <name>key1</name> <value>value1</value> </key> <key> <name>key2</name> <value>&#009;</value> </key> <key> <name>key3</name> <value>2048</value> </key> </keys> 

This is the code that I have right now that is not able to handle this TAB character

... dom_obj = minidom.parse(self.path_to_xml) ... for each_key_child in key_child: if each_key_child.nodeType == Node.ELEMENT_NODE: if each_key_child.nodeName == 'name': node_name = str(each_key_child.childNodes[0].data.strip()) elif each_key_child.nodeName == 'value': node_value = str(each_key_child.childNodes[0].data.strip()) else: pass else: pass 

The output that I get after the script is executed is

'key1': 'value1', 'key2': '', 'key3': '2048', 

But when I execute it on the Python interactive interpreter

mobj = minidom.parse(path_to_xml_file) mobj.getElementsByTagName("value")[1].childNodes[0] 

I get the following output

<DOM Text node "u'\t'"> 

But I am not able to assign the output to a variable. This step is not working

node = mobj.getElementsByTagName("value")[1].childNodes[0].data 

But another strange thing is that when I just say node at the interpreter it is printing '\t' !!

node u'\t' 

To see if this was a genuine case where the TAB character was getting stored in the variable but not getting displayed I used it as a separator to concatenate two strings.

This works fine at the interpreter but not in the script the output of which I saw on vim through the :set list option

Can anyone tell me what is wrong with the approach taken by me. Help appreciated!

0

1 Answer 1

3

You're calling strip(). This strips tabs. Just don't do that. (Or, if you need to strip spaces or newlines or something specific, but leave tabs, call it with a specific argument, like strip('\n').)

Here's a demonstration (faked, because your example XML isn't valid, so I can't test it):

>>> mobj.getElementsByTagName("value")[1].childNodes[0] <DOM Text node "u'\t'"> >>> mobj.getElementsByTagName("value")[1].childNodes[0].data u'\t' >>> mobj.getElementsByTagName("value")[1].childNodes[0].data.strip() u'' >>> mobj.getElementsByTagName("value")[1].childNodes[0].data.strip('\n') u'\t' 
Sign up to request clarification or add additional context in comments.

1 Comment

I should have seen this thanks abamert. Wasted an hour on this silly thing

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.