Lex and Yacc COP - 3402
General Compiler Infra-structure Scanner (tokenizer) Parser Semantic Routines Analysis/ Transformations/ optimizations Code Generator Program source (stream of characters) Tokens Syntactic Structure IR: Intermediate Representation (1) Assembly code IR: Intermediate Representation (2) Symbol and Attribute Tables Lex Yacc
Lex & Yacc • Lex – generates C code for the lexical analyzer (scanner) – Token patterns specified by regular expressions • Yacc – generates C code for a LR(1) syntax analyzer (parser) – BNF rules for the grammar
Lex  lex is a program (generator) that generates lexical analyzers, (widely used on Unix).  It is mostly used with Yacc parser generator.  Written by Eric Schmidt and Mike Lesk.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers.
Lex predefined variables and functions
Lex – Pattern Matching Primitives
Lex – Pattern Matching Examples
Example: Simple Calculator • Computes the basic arithmetic operations • Allows declaration of variables • Enough to illustrate the basic structure of Lex and Yacc programs
Lex program structure … definitions … %% … rules … %% … subroutines … %{ #include <stdio.h> #include "y.tab.h" int c; extern int yylval; %} %% " " ; [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } [^a-z0-9b] { c = yytext[0]; return(c); }
Pattern Matching and Action [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } Match a character in the a-z range Match a positive integer (sequence of 0-9 digits) Place the offset c – ‘a’ In the stack Place the integer value In the stack Buffer
Yacc • Grammars described by rules using a variant of the Backus Naur Form (BNF) – Context-free grammars • LALR(1) parse table is generated automatically based on the rules • Actions are added to the rules and executed after each reduction
Yacc Program Structure %{ #include <stdio.h> int regs[26]; int base; %} %token NUMBER LETTER %left '+' '-‘ %left '*' '/‘ %% list: | list stat 'n' |list error 'n' {yyerrok;} ; stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: '(' expr ')' {$$ = $2;} | expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} %% main(){return(yyparse());} yyerror(CHAR *s){fprintf(stderr, "%sn",s);} yywrap(){ return(1);} … definitions … %% … rules … %% … subroutines …
Rule Reduction and Action stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} Grammar rule Action “or” operator: For multiple RHS
Further reading…  “A Compact Guide to Lex & Yacc”, Thomas Niemann (recommended);  “Lex & Yacc”, Doug Brown (O’Reily);  Lots of resources on the web • Check our website for some suggestions
Conclusions • Yacc and Lex are very helpful for building the compiler front-end • A lot of time is saved when compared to hand-implementation of parser and scanner • They both work as a mixture of “rules” and “C code” • C code is generated and is merged with the rest of the compiler code

Compiler Design Brief in Computer Application

  • 1.
  • 2.
    General Compiler Infra-structure Scanner (tokenizer) ParserSemantic Routines Analysis/ Transformations/ optimizations Code Generator Program source (stream of characters) Tokens Syntactic Structure IR: Intermediate Representation (1) Assembly code IR: Intermediate Representation (2) Symbol and Attribute Tables Lex Yacc
  • 3.
    Lex & Yacc •Lex – generates C code for the lexical analyzer (scanner) – Token patterns specified by regular expressions • Yacc – generates C code for a LR(1) syntax analyzer (parser) – BNF rules for the grammar
  • 4.
    Lex  lex isa program (generator) that generates lexical analyzers, (widely used on Unix).  It is mostly used with Yacc parser generator.  Written by Eric Schmidt and Mike Lesk.  It reads the input stream (specifying the lexical analyzer ) and outputs source code implementing the lexical analyzer in the C programming language.  Lex will read patterns (regular expressions); then produces C code for a lexical analyzer that scans for identifiers.
  • 5.
  • 6.
    Lex – PatternMatching Primitives
  • 7.
    Lex – PatternMatching Examples
  • 8.
    Example: Simple Calculator •Computes the basic arithmetic operations • Allows declaration of variables • Enough to illustrate the basic structure of Lex and Yacc programs
  • 9.
    Lex program structure …definitions … %% … rules … %% … subroutines … %{ #include <stdio.h> #include "y.tab.h" int c; extern int yylval; %} %% " " ; [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } [^a-z0-9b] { c = yytext[0]; return(c); }
  • 10.
    Pattern Matching andAction [a-z] { c = yytext[0]; yylval = c - 'a'; return(LETTER); } [0-9]* { yylval = atoi(yytext); return(NUMBER); } Match a character in the a-z range Match a positive integer (sequence of 0-9 digits) Place the offset c – ‘a’ In the stack Place the integer value In the stack Buffer
  • 11.
    Yacc • Grammars describedby rules using a variant of the Backus Naur Form (BNF) – Context-free grammars • LALR(1) parse table is generated automatically based on the rules • Actions are added to the rules and executed after each reduction
  • 12.
    Yacc Program Structure %{ #include<stdio.h> int regs[26]; int base; %} %token NUMBER LETTER %left '+' '-‘ %left '*' '/‘ %% list: | list stat 'n' |list error 'n' {yyerrok;} ; stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: '(' expr ')' {$$ = $2;} | expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} %% main(){return(yyparse());} yyerror(CHAR *s){fprintf(stderr, "%sn",s);} yywrap(){ return(1);} … definitions … %% … rules … %% … subroutines …
  • 13.
    Rule Reduction andAction stat: expr {printf("%dn",$1);} | LETTER '=' expr {regs[$1] = $3;}; expr: expr '+' expr {$$ = $1 + $3;} | LETTER {$$ = regs[$1];} Grammar rule Action “or” operator: For multiple RHS
  • 14.
    Further reading…  “A CompactGuide to Lex & Yacc”, Thomas Niemann (recommended);  “Lex & Yacc”, Doug Brown (O’Reily);  Lots of resources on the web • Check our website for some suggestions
  • 15.
    Conclusions • Yacc andLex are very helpful for building the compiler front-end • A lot of time is saved when compared to hand-implementation of parser and scanner • They both work as a mixture of “rules” and “C code” • C code is generated and is merged with the rest of the compiler code