I am having difficulty figuring out the best way to conditionally create nodes using javacc and jjtree.
In my grammar there are literals, variables, operators and functions. Variables can have the same identifier as functions, so we may have a function named 'upper' and a variable also named 'upper'. The syntactic distinction between a function and a variable is the left parenthesis that immediately follow the function name.
I initially defined the grammar using regular expressions for each function name, including the left parenthesis:
TOKEN [ IGNORE_CASE ] : // Defines functions { <UPPER: "upper(" > | <LOWER: "lower(" > | <CAPITALIZE: "capitalize(" > | <INDEXOF: "indexOf(" > | <LASTINDEXOF: "lastIndexOf(" > | <LENGTH: "length(" > | <TRIM: "trim(" > etc... Variable identifiers are similar, but do not end with a parenthesis:
TOKEN: // Defines identifiers { < ID: <LETTER> ( <LETTER> | <DIGIT> | "." )* > | < #LETTER: [ Function non-terminals are detected and nodes are generated easily:
void functions(): { } { (<UPPER> ArgumentList() <RPAREN>) #Upper | (<LENGTH> ArgumentList() <RPAREN>) #Length | (<LOWER> ArgumentList() <RPAREN>) #Lower | (<CAPITALIZE> ArgumentList() <RPAREN>) #Capitalize | (<TRIM> ArgumentList() <RPAREN>) #Trim | (<INDEXOF> ArgumentList() <RPAREN>) #IndexOf | (<LASTINDEXOF> ArgumentList() <RPAREN>) #LastIndexOf etc... While this works, it has it's shortcomings. One problem is that whitespace is not allowed between the name of the function and the left parenthesis. Another issue of having the left parenthesis being absorbed into the function name token is that syntax highlighting of the token stream does not work very well.
I tried to remedy this by changing the grammar getting rid of all of the function regular expressions and instead parsing functions as any identifier followed by a left parenthesis:
LOOKAHEAD(2) (t=<ID> <LPAREN> ArgumentList() <RPAREN>) Parsing this works great. Constructing the nodes seems a bit awkward though. The best that I have been able to figure out is to use a semantic lookahead, like this:
void function(): { Token t; } { LOOKAHEAD(2) (t=<ID> <LPAREN> functionName(t)) } void functionName(Token t): { String name = t.image; } { LOOKAHEAD({"upper".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #Upper | LOOKAHEAD({"lower".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #Lower | LOOKAHEAD({"capitalize".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #Capitalize | LOOKAHEAD({"trim".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #Trim | LOOKAHEAD({"indexof".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #IndexOf | LOOKAHEAD({"lastindexof".equalsIgnoreCase(name)}) (ArgumentList() <RPAREN>) #LastIndexOf etc... My concern is that this is going to result in a slow parser. There are about 80 functions that will generate a long cascaded if-else-if-else chain to test the semantic lookaheads, and this will have to be traversed for each function. The original grammar was able to use a switch statement over the Token objects that I expect would have been quite fast.
Is there a better way of doing this?