16

I used ANTLR version 4 for creating compiler.First Phase was the Lexer part. I created "CompilerLexer.g4" file and putted lexer rules in it.It works fine.

CompilerLexer.g4:


lexer grammar CompilerLexer; INT : 'int' ; //1 FLOAT : 'float' ; //2 BEGIN : 'begin' ; //3 END : 'end' ; //4 To : 'to' ; //5 NEXT : 'next' ; //6 REAL : 'real' ; //7 BOOLEAN : 'bool' ; //8 . . . NOTEQUAL : '!=' ; //46 AND : '&&' ; //47 OR : '||' ; //48 POW : '^' ; //49 ID : [a-zA-Z]+ ; //50 WS : ' ' -> channel(HIDDEN) //50 ; 

Now it is time for phase 2 which is the parser.I created "CompilerParser.g4" file and putted grammars in it but have dozens warning and errors.

CompilerParser.g4:


parser grammar CompilerParser; options { tokenVocab = CompilerLexer; } STATEMENT : EXPRESSION SEMIC | IFSTMT | WHILESTMT | FORSTMT | READSTMT SEMIC | WRITESTMT SEMIC | VARDEF SEMIC | BLOCK ; BLOCK : BEGIN STATEMENTS END ; STATEMENTS : STATEMENT STATEMENTS* ; EXPRESSION : ID ASSIGN EXPRESSION | BOOLEXP ; RELEXP : MODEXP (GT | LT | EQUAL | NOTEQUAL | LE | GE | AND | OR) RELEXP | MODEXP ; . . . VARDEF : (ID COMA)* ID COLON VARTYPE ; VARTYPE : INT | FLOAT | CHAR | STRING ; compileUnit : EOF ; 

Warning and errors:

  • implicit definition of token 'BLOCK' in parser
  • implicit definition of token 'BOOLEXP' in parser
  • implicit definition of token 'EXP' in parser
  • implicit definition of token 'EXPLIST' in parser
  • lexer rule 'BLOCK' not allowed in parser
  • lexer rule 'EXP' not allowed in parser
  • lexer rule 'EXPLIST' not allowed in parser
  • lexer rule 'EXPRESSION' not allowed in parser

Have dozens of these warning and errors. What is the cause?

General Questions: What is difference between using combined grammar and using lexer and parser separately? How should join separate grammar and lexer files?

1 Answer 1

21

Lexer rules start with a capital letter, and parser rules start with a lowercase letter. In a parser grammar, you can't define tokens. And since ANTLR thinks all your upper-cased rules lexer rules, it produces theses errors/warning.

EDIT

user2998131 wrote:

General Questions: What is difference between using combined grammar and using lexer and parser separately?

Separating the lexer and parser rules will keeps things organized. Also, when creating separate lexer and parser grammars, you can't (accidentally) put literal tokens inside your parser grammar but will need to define all tokens in your lexer grammar. This will make it apparent which lexer rules get matched before others, and you can't make any typo's inside recurring literal tokens:

grammar P; r1 : 'foo' r2; r2 : r3 'foo '; // added an accidental space after 'foo' 

But when you have a parser grammar, you can't make that mistake. You will have to use the lexer rule that matches 'foo':

parser grammar P options { tokenVocab=L; } r1 : FOO r2; r2 : r3 FOO; lexer grammar L; FOO : 'foo'; 

user2998131 wrote:

How should join separate grammar and lexer files?

Just like you do in your parser grammar: you point to the proper tokenVocab inside the options { ... } block.

Note that you can also import grammars, which is something different: https://github.com/antlr/antlr4/blob/master/doc/grammars.md#grammar-imports

Sign up to request clarification or add additional context in comments.

12 Comments

@user2998131, ah, missed those. Will answer those at a later time.
If I could go a little bit further, writing a combined grammar means the language is pushing you to write context-sensitive lexer rules. These are antithetical to the way most lexers, including ANTLR's lexer, work. In my case --as is likely a common case-- by using a combined grammar I was adding keywords in a number of places which removed the set of strings for my general ID lexer rule. With split lexer/parser grammar files, this becomes really obvious, since you now must declare a lexer entry for each keyword, and that re-emphasizes the lack of context the lexer must operate under.
@Peanut changed the link
Yes, a "combined" grammar has both lexer rules and parser rules in 1 grammar file. Using tokenVocab inside a parser grammar (which you must do) will let you point your parser grammar to the lexer rules the parser grammar needs. Importing grammars is something a combined- parser- or lexer grammar can do besides all that.
@Conffusion note that it is tokenVocab=EventsLexer;, not tokenVocab=EventsLexer.g4;
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.