0

I would need to build a Tree that would retrieve something like this using Lark package:

start expr or_expr and_expr comp_expr identifier Name comparator eq value 'Milk' comp_expr identifier Price comparator lt value 2.55 

The grammar used is the following

from lark import Lark odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: "eq" | "lt" | "gt" | "le" | "ge" | "ne" value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS """ parser = Lark(odata_grammar, start='start', parser='lalr') url_filter = "Name eq 'Milk' and Price lt 2.55" tree = parser.parse(url_filter) print(tree.pretty()) 

When I print this tree, I find that the Tree retrieved is the following:

start expr or_expr and_expr comp_expr identifier Name comparator value 'Milk' comp_expr identifier Price comparator value 2.55 

The comparator for some reason is not retrieved. And I say retrieved because the Lark package seems to detect it but it is not printed in the tree. This is curious because when I try to "force" the comparator to doing something like this in the grammar comparator: "eq" -> eq what I get is the comparator named as eq but not comparator: eq.

1 Answer 1

1

See Tree Construction section in Lark documentation: https://lark-parser.readthedocs.io/en/stable/tree_construction.html:

" Lark filters out certain types of terminals by default, considering them punctuation:

  • Terminals that won’t appear in the tree are:

  • Unnamed literals (like "keyword" or "+")

  • Terminals whose name starts with an underscore (like _DIGIT)

Terminals that will appear in the tree are:

  • Unnamed regular expressions (like /[0-9]/)

  • Named terminals whose name starts with a letter (like DIGIT) "

so... option one - transform the string literals of your comparator rule into regexps:

odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: /eq/ | /lt/ | /gt/ | /le/ | /ge/ | /ne/ value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS 

Option two: add rules for each comparator literal:

odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: eq | lt | gt | le | ge | ne eq: "eq" lt: "lt" gt: "gt" le: "le" ge: "ge" ne: "ne" value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS """ 

Both solutions will capture eq into the the parse tree.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.