1
$\begingroup$

In the C language, a function call has a parameter list of assignment-expressions defined as:

argument-expression-list : assignment-expression | argument-expression-list "," assignment-expression ; 

as well as general expression:

expression : assignment-expression | expression "," assignment-expression ; 

In the example above, 1 assignment-expression term have 2 potential LHS, and while testing, my self-made parser bumped into error with duplicate production with differing LHS like the examples above. And as C has such grammar, I think there must be a solution.

So Q: how do language parsers deal with these cases?

$\endgroup$
3
  • $\begingroup$ Does this help? en.wikipedia.org/wiki/Left_recursion , en.wikipedia.org/wiki/Left_recursion#Removing_left_recursion $\endgroup$ Commented Jan 18 at 16:21
  • 2
    $\begingroup$ Hand written parsers don't necessarily start with grammar and parser generator. And further, there are bottoms-up parsers that work quite differently, i.e. without recursion. $\endgroup$ Commented Jan 18 at 16:27
  • $\begingroup$ I think you'd probably want to provide more details about what problem you're running into in your parser (like how is your parser implemented) $\endgroup$ Commented Jan 18 at 23:34

3 Answers 3

5
$\begingroup$

In traditional shift/reduce or bottom-up parsing, this is called a "reduce/reduce conflict" and needs to be resolved somehow. There are multiple approaches in roughly increasing order of complexity/difficulty:

  • using information from what you've seen already. This is what happens in a language like C using an LR parser -- you have a state machine that knows whether to expect an expression or an argument-expression-list (probably based on whether you're trying to parse an argument list or not)

  • using lookahead. Looking at future tokens to try to figure out which alternative to use.

  • using trial and backtracking -- try one, and if it doesn't work out, backtrack to this point and try the other.

  • using some kind of fork and join -- create multiple threads of processing where each alternative is explored in a separate thread, and then later prune threads that fail, and join non-failing threads back together in a common parent.

The main questions are how complex your grammar is, is it ambiguous, and what do you want to do about ambiguity? The first two alternatives above will generally fail for ambiguous grammars (but can be tweaked to chose one possible parse), the third will give you one possible parse, and the last will give all possible parses.

$\endgroup$
5
$\begingroup$

In addition to the solutions covered in other answers (lookahead, context, backtracking), there is one more technique called a cover grammar which is used in several places in the ECMAScript standard to keep the grammar context-free and avoid requiring more than a single token of lookahead.

Put simply, it consists of coming up with a relaxed grammar that can "cover" both alternatives and reinterpreting what was parsed more strictly when more information becomes available.

For example, in JavaScript, (a) could be a parenthesized expression as in (a)+2 or the start of an arrow function expression as in (a) => 2*a. The standard uses a single grammar rule that covers both and calls it CoverParenthesizedExpressionAndArrowParameterList. The parser, when it encounters (a), can parse it into a "cover" AST that could represent either alternative. Then, if it sees that an arrow token (=>) follows, it can reinterpret it more strictly as an arrow parameter or report an error if the reinterpretation fails. Similarly, if it's followed by something else, it can reinterpret it as a parenthesized expression or report an error on failure.

$\endgroup$
4
  • $\begingroup$ I assume the ambiguities the cover grammars are meant to solve can be delayed all the way until all semantic information are available just preceeding the cover node in the AST? As would be the case when syntax information are not sufficient to immediately resolve the ambiguity? $\endgroup$ Commented Apr 3 at 5:52
  • $\begingroup$ @DannyNiu I think it could be delayed if the language requires it. But I don't think it's required in the JavaScript case. I believe JS can be parsed without semantic analysis. At least Babel (a popular JS parser) doesn't seem to do any semantic analysis at the parsing phase as far as I can see. $\endgroup$ Commented Apr 3 at 9:10
  • $\begingroup$ I mean which rule of a "cover production" is chosen require semantic information that may not be available at the time parser encounters it; and parsing in this case is of course syntactic. $\endgroup$ Commented Apr 3 at 9:23
  • $\begingroup$ Like I said, it could be. But in this instance, resolving the ambiguity doesn't require any semantic information. The ambiguity in "Cover Parenthesized Expression and Arrow Parameter List" is solved simply by looking at the next token. If it's =>, it's an arrow parameter list, otherwise it's a parenthesized expression. $\endgroup$ Commented Apr 3 at 9:56
2
$\begingroup$

Disclaimer

I'd note that the grammar given in the C standard is intended only for exposition, not really for implementation.

Ambiguity

I'm a bit confused as to how you encountered ambiguity (at least in this respect), assuming you used the rest of the grammar as given in the C standard. In particular, the only place argument-expression-list can occur is:

postfix-expression:
       primary-expression
       postfix-expression [ expression ]
       postfix-expression ( argument-expression-listopt )
       postfix-expression . identifier
       postfix-expression -> identifier
       postfix-expression ++
       postfix-expression --
       ( type-name ) { initializer-list }
       ( type-name ) { initializer-list , }

So if you have a postfix-expression followed immediately by a left paren, what follows can only be an (optional) argument-expression-list, and under any other circumstance, it cannot be an argument-expression-list.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.