I have recently implemented the Paull's algorithm for removing left-recursion from context-free grammars:
Assign an ordering $A_1, \dots, A_n$ to the nonterminals of the grammar.
for $i := 1$ to $n$ do begin
$\quad$ for $j:=1$ to $i-1$ do begin
$\quad\quad$ for each production of the form $A_i \to A_j\alpha$ do begin
$\quad\quad\quad$ remove $A_i \to A_j\alpha$ from the grammar
$\quad\quad\quad$ for each production of the form $A_j \to \beta$ do begin
$\quad\quad\quad\quad$ add $A_i \to \beta\alpha$ to the grammar
$\quad\quad\quad$ end
$\quad\quad$ end
$\quad$ end
$\quad$ transform the $A_i$-productions to eliminate direct left recursion
end
According to this document, the efficiency of the algorithm crucially depends on the ordering of the nonterminals chosen in the beginning; the paper discusses this issue in detail and suggest optimisations.
Some notation:
We will say that a symbol $X$ is a direct left corner of a nonterminal $A$, if there is an $A$-production with $X$ as the left-most symbol on the right-hand side. We define the left-corner relation to be the reflexive transitive closure of the direct-left-corner relation, and we define the proper-left-corner relation to be the transitive closure of the direct-left-corner relation. A nonterminal is left recursive if it is a proper left corner of itself; a nonterminal is directly left recursive if it is a direct left corner of itself; and a nonterminal is indirectly left recursive if it is left recursive, but not directly left recursive.
Here is what the authors propose:
In the inner loop of Paull’s algorithm, for nonterminals $A_i$ and $A_j$, such that $i > j$ and $A_j$ is a direct left corner of $A_i$, we replace all occurrences of $A_j$ as a direct left corner of $A_i$ with all possible expansions of $A_j$.
This only contributes to elimination of left recursion from the grammar if $A_i$ is a left-recursive nonterminal, and $A_j$ lies on a path that makes $A_i$ left recursive; that is, if $A_i$ is a left corner of $A_j$ (in addition to $A_j$ being a left corner of $A_i$).
We could eliminate replacements that are useless in removing left recursion if we could order the nonterminals of the grammar so that, if $i > j$ and $A_j$ is a direct left corner of $A_i$, then $A_i$ is also a left corner of $A_j$.
We can achieve this by ordering the nonterminals in decreasing order of the number of distinct left corners they have.
Since the left-corner relation is transitive, if C is a direct left corner of B, every left corner of C is also a left corner of B.
In addition, since we defined the left-corner relation to be reflexive, B is a left corner of itself.
Hence, if C is a direct left corner of B, it must follow B in decreasing order of number of distinct left corners, unless B is a left corner of C.
All I want is to know how to order the nonterminals in the beginning, but I don't get it from the paper. Can someone explain it in a simpler way? Pseudocode would help me to understand it better.