Parse nested list from string that cannot be parsed with ast.literal_eval

Question

I parse a file to a Python list and I encountered a nested list like this:

{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }

I want to interpret it as a list, yet nested, so I want to be the resulting Python list as follows:

[1,4,[2a,0.0],[3,0.0],[4c,0.0],[5,0.0]]

I manage to do a correct string of this with a following:

l = """{ 1 4{ 2 0.0 }{ 3 0.0 }{ 4 0.0 }{ 5 0.0 } }""" l = l.replace("{\t",",[").replace("\t}","]").replace("{","[").replace("}","]").replace("\t",",")[1:]

I can also apply l.strip("\t") so that it is a list, but not for a nested, otherwise it will be flattened, which I do not want.

I tried with ast.literal_eval(l), but it fails on strings e.g. 2a

I'm confused here. There is one 2a which becomes 2a but there's one 4cc which becomes 4 but on the correct string that you posted, but 2 and 4 does not have any strings to them. What's the right output? — Sushant
– Sushant, Commented Aug 23, 2018 at 11:30
sorry, that was a typo, 4c remains 4c and 2a remains 2a, generally they are strings, i.e. they cannot be parsed with ast without putting into '''' — Intelligent-Infrastructure
– Intelligent-Infrastructure, Commented Aug 23, 2018 at 11:33
Either use the answer below or a for loop withib the first list. Also i recommend you deleite the empty strings in your list which Will be genérated if you have more than one empty space — E.Serra
– E.Serra, Commented Aug 23, 2018 at 13:23

PaulMcG · Accepted Answer · 2018-08-23 12:00:19Z

Pyparsing has a built-in helper nestedExpr to help parse nested lists between opening and closing delimiters:

>>> import pyparsing as pp >>> nested_braces = pp.nestedExpr('{', '}') >>> t = """{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }""" >>> print(nested_braces.parseString(t).asList()) [['1', '4', ['2a', '0.0'], ['3', '0.0'], ['4c', '0.0'], ['5', '0.0']]]

Laurent LAPORTE · Accepted Answer · 2018-08-23 12:35:43Z

You can develop your own parser using RegEx. In your situation, it is not too difficult. You can parse the enclosing curly brackets, then split the items and evaluate each item recursively.

Here is an example (which is not perfect):

import re RE_BRACE = r"\{.*\}" RE_ITEM = r"\d+[a-z]+" RE_FLOAT = r"[-+]?\d*\.\d+" RE_INT = r"\d+" find_all_items = re.compile( "|".join([RE_BRACE, RE_ITEM, RE_FLOAT, RE_INT]), flags=re.DOTALL).findall def parse(text): mo = re.match(RE_BRACE, text, flags=re.DOTALL) if mo: content = mo.group()[1:-1] items = [parse(part) for part in find_all_items(content)] return items mo = re.match(RE_ITEM, text, flags=re.DOTALL) if mo: return mo.group() mo = re.match(RE_FLOAT, text, flags=re.DOTALL) if mo: return float(mo.group()) mo = re.match(RE_INT, text, flags=re.DOTALL) if mo: return int(mo.group()) raise Exception("Invalid text: {0}".format(text))

note: this parser cannot parse {1 {2} {3} 4} the right way. You need a recursive parser like pyparsing for that.

Demo:

s = '''{ 1 4{ 2a 0.0 }{ 3 0.0 }{ 4c 0.0 }{ 5 0.0 } }''' l = parse(s) print(l)

You get:

[1, 4, ['2a', 0.0, [3, 0.0, '4c', 0.0], 5, 0.0]]

Collectives™ on Stack Overflow

Parse nested list from string that cannot be parsed with ast.literal_eval

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related