0

I parse a file to a Python list and I encountered a nested list like this:

{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } } 

I want to interpret it as a list, yet nested, so I want to be the resulting Python list as follows:

[1,4,[2a,0.0],[3,0.0],[4c,0.0],[5,0.0]] 

I manage to do a correct string of this with a following:

l = """{ 1 4{ 2 0.0 }{ 3 0.0 }{ 4 0.0 }{ 5 0.0 } }""" l = l.replace("{\t",",[").replace("\t}","]").replace("{","[").replace("}","]").replace("\t",",")[1:] 

I can also apply l.strip("\t") so that it is a list, but not for a nested, otherwise it will be flattened, which I do not want.

I tried with ast.literal_eval(l), but it fails on strings e.g. 2a

6
  • I'm confused here. There is one 2a which becomes 2a but there's one 4cc which becomes 4 but on the correct string that you posted, but 2 and 4 does not have any strings to them. What's the right output? Commented Aug 23, 2018 at 11:30
  • sorry, that was a typo, 4c remains 4c and 2a remains 2a, generally they are strings, i.e. they cannot be parsed with ast without putting into '''' Commented Aug 23, 2018 at 11:33
  • 1
    Just do split by { then nested split by space Commented Aug 23, 2018 at 11:37
  • @E.Serra how do you do this nested split by space? iterate? Commented Aug 23, 2018 at 11:59
  • Either use the answer below or a for loop withib the first list. Also i recommend you deleite the empty strings in your list which Will be genérated if you have more than one empty space Commented Aug 23, 2018 at 13:23

2 Answers 2

7

Pyparsing has a built-in helper nestedExpr to help parse nested lists between opening and closing delimiters:

>>> import pyparsing as pp >>> nested_braces = pp.nestedExpr('{', '}') >>> t = """{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }""" >>> print(nested_braces.parseString(t).asList()) [['1', '4', ['2a', '0.0'], ['3', '0.0'], ['4c', '0.0'], ['5', '0.0']]] 
Sign up to request clarification or add additional context in comments.

Comments

1

You can develop your own parser using RegEx. In your situation, it is not too difficult. You can parse the enclosing curly brackets, then split the items and evaluate each item recursively.

Here is an example (which is not perfect):

import re RE_BRACE = r"\{.*\}" RE_ITEM = r"\d+[a-z]+" RE_FLOAT = r"[-+]?\d*\.\d+" RE_INT = r"\d+" find_all_items = re.compile( "|".join([RE_BRACE, RE_ITEM, RE_FLOAT, RE_INT]), flags=re.DOTALL).findall def parse(text): mo = re.match(RE_BRACE, text, flags=re.DOTALL) if mo: content = mo.group()[1:-1] items = [parse(part) for part in find_all_items(content)] return items mo = re.match(RE_ITEM, text, flags=re.DOTALL) if mo: return mo.group() mo = re.match(RE_FLOAT, text, flags=re.DOTALL) if mo: return float(mo.group()) mo = re.match(RE_INT, text, flags=re.DOTALL) if mo: return int(mo.group()) raise Exception("Invalid text: {0}".format(text)) 

note: this parser cannot parse {1 {2} {3} 4} the right way. You need a recursive parser like pyparsing for that.

Demo:

s = '''{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }''' l = parse(s) print(l) 

You get:

[1, 4, ['2a', 0.0, [3, 0.0, '4c', 0.0], 5, 0.0]] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.