Python package Lark does not build the grammar correctly

Question

I would need to build a Tree that would retrieve something like this using Lark package:

start expr or_expr and_expr comp_expr identifier Name comparator eq value 'Milk' comp_expr identifier Price comparator lt value 2.55

The grammar used is the following

from lark import Lark odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: "eq" | "lt" | "gt" | "le" | "ge" | "ne" value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS """ parser = Lark(odata_grammar, start='start', parser='lalr') url_filter = "Name eq 'Milk' and Price lt 2.55" tree = parser.parse(url_filter) print(tree.pretty())

When I print this tree, I find that the Tree retrieved is the following:

start expr or_expr and_expr comp_expr identifier Name comparator value 'Milk' comp_expr identifier Price comparator value 2.55

The comparator for some reason is not retrieved. And I say retrieved because the Lark package seems to detect it but it is not printed in the tree. This is curious because when I try to "force" the comparator to doing something like this in the grammar comparator: "eq" -> eq what I get is the comparator named as eq but not comparator: eq.

Alberto Garcia · Accepted Answer · 2025-02-19 13:11:21Z

See Tree Construction section in Lark documentation: https://lark-parser.readthedocs.io/en/stable/tree_construction.html:

" Lark filters out certain types of terminals by default, considering them punctuation:

Terminals that won’t appear in the tree are:
Unnamed literals (like "keyword" or "+")
Terminals whose name starts with an underscore (like _DIGIT)

Terminals that will appear in the tree are:

Unnamed regular expressions (like /[0-9]/)
Named terminals whose name starts with a letter (like DIGIT) "

so... option one - transform the string literals of your comparator rule into regexps:

odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: /eq/ | /lt/ | /gt/ | /le/ | /ge/ | /ne/ value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS

Option two: add rules for each comparator literal:

odata_grammar = """ start: expr expr: or_expr or_expr: and_expr ("or" and_expr)* and_expr: comp_expr ("and" comp_expr)* comp_expr: identifier comparator value -> comp_expr comparator: eq | lt | gt | le | ge | ne eq: "eq" lt: "lt" gt: "gt" le: "le" ge: "ge" ne: "ne" value: STRING | NUMBER identifier: CNAME STRING: /'(''|[^'])*'/ DATE: /\d{4}-\d{2}-\d{2}/ NUMBER: /-?\d+(\.\d+)?/ %import common.CNAME %import common.WS %ignore WS """

Both solutions will capture eq into the the parse tree.

Collectives™ on Stack Overflow

Python package Lark does not build the grammar correctly

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related