Using ANTLR Parser and Lexer Separatly

Question

I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine.

CompilerLexer.g4:

lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50 ;

Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars in it but have dozens warning and errors.

CompilerParser.g4:

parser grammar CompilerParser; options { tokenVocab = CompilerLexer; } STATEMENT : EXPRESSION SEMIC | IFSTMT | WHILESTMT | FORSTMT | READSTMT SEMIC | WRITESTMT SEMIC | VARDEF SEMIC | BLOCK ; BLOCK : BEGIN STATEMENTS END ; STATEMENTS : STATEMENT STATEMENTS* ; EXPRESSION : ID ASSIGN EXPRESSION | BOOLEXP ; RELEXP : MODEXP (GT | LT | EQUAL | NOTEQUAL | LE | GE | AND | OR) RELEXP | MODEXP ; . . . VARDEF : (ID COMA)* ID COLON VARTYPE ; VARTYPE : INT | FLOAT | CHAR | STRING ; compileUnit : EOF ;

Warning and errors:

implicit definition of token 'BLOCK' in parser

implicit definition of token 'BOOLEXP' in parser

implicit definition of token 'EXP' in parser

implicit definition of token 'EXPLIST' in parser

lexer rule 'BLOCK' not allowed in parser

lexer rule 'EXP' not allowed in parser

lexer rule 'EXPLIST' not allowed in parser

lexer rule 'EXPRESSION' not allowed in parser

Have dozens of these warning and errors. What is the cause?

General Questions: What is difference between using combined grammar and using lexer and parser separately? How should join separate grammar and lexer files?

Bart Kiers · Accepted Answer · 2020-10-02 08:16:12Z

Lexer rules start with a capital letter, and parser rules start with a lowercase letter. In a parser grammar, you can't define tokens. And since ANTLR thinks all your upper-cased rules lexer rules, it produces theses errors/warning.

EDIT

user2998131 wrote:

General Questions: What is difference between using combined grammar and using lexer and parser separately?

Separating the lexer and parser rules will keeps things organized. Also, when creating separate lexer and parser grammars, you can't (accidentally) put literal tokens inside your parser grammar but will need to define all tokens in your lexer grammar. This will make it apparent which lexer rules get matched before others, and you can't make any typo's inside recurring literal tokens:

grammar P; r1 : 'foo' r2; r2 : r3 'foo '; // added an accidental space after 'foo'

But when you have a parser grammar, you can't make that mistake. You will have to use the lexer rule that matches 'foo':

parser grammar P options { tokenVocab=L; } r1 : FOO r2; r2 : r3 FOO; lexer grammar L; FOO : 'foo';

user2998131 wrote:

How should join separate grammar and lexer files?

Just like you do in your parser grammar: you point to the proper tokenVocab inside the options { ... } block.

Note that you can also import grammars, which is something different: https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports

@user2998131, ah, missed those. Will answer those at a later time.
If I could go a little bit further, writing a combined grammar means the language is pushing you to write context-sensitive lexer rules. These are antithetical to the way most lexers, including ANTLR's lexer, work. In my case --as is likely a common case-- by using a combined grammar I was adding keywords in a number of places which removed the set of strings for my general ID lexer rule. With split lexer/parser grammar files, this becomes really obvious, since you now must declare a lexer entry for each keyword, and that re-emphasizes the lack of context the lexer must operate under.
Yes, a "combined" grammar has both lexer rules and parser rules in 1 grammar file. Using tokenVocab inside a parser grammar (which you must do) will let you point your parser grammar to the lexer rules the parser grammar needs. Importing grammars is something a combined- parser- or lexer grammar can do besides all that.
@Conffusion note that it is tokenVocab=EventsLexer;, not tokenVocab=EventsLexer.g4;

Collectives™ on Stack Overflow

Using ANTLR Parser and Lexer Separatly

1 Answer 1

EDIT

12 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

EDIT

12 Comments

Linked

Related