Prof Monika Shah(NU) 1 Code Generation Course : 2CS701/IT794 – Compiler Construction Prof Monika Shah Nirma University Ref : Ch.9 Compilers Principles, Techniques, and Tools by Alfred Aho, Ravi Sethi, and Jeffrey Ullman
Prof Monika Shah(NU) 3 Machine independent asm to machine dependent Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Optimization Code Generation Source Program Assembly Code IR to low-level IR Symbol Table Front end Back end Optimization Token stream AST or Parse tree AST or Annotted Parse tree Code Generation • Final Phase of Compilation • Produces a semantically equivalent target program • Main Functionalities : • Instruction selection • Register allocation and assignment • Instruction ordering • Functionalities for QoS • Produce Correct code (semantic preserving) • Efficiency • Effective use of resources • User Heuristics to generate good, but suboptimal code
4.
Prof Monika Shah(NU) 4 Major functionalities of Code Generation • Instruction selection Choose semantically equivalent target machine instructions for the IR statement Choose cost effective instructions to implement the IR statements • Register allocation and assignment • Register allocation: selecting the set of variables that will reside in registers at each point in the program • Resister assignment: selecting specific register that a variable reside in • Instruction ordering Decide schedule of Instruction (Challenging for Parallel )
5.
Prof Monika Shah(NU) 5 Issues in design of Code Generation phase • Input to Code Generator • Target Program • Memory Management • Instruction Selection • Register Allocation • Schedule order of Instructions • Approaches
6.
Prof Monika Shah(NU) 6 Issues in Code Generation design Input to Code Generation • Input : IR + Symbol • IR format choices • AST • Postfix Notation • Three Address Code • Assumption : • Front end phase generate low-level IR • Syntactic and semantic errors detected by earlier phases
7.
Prof Monika Shah(NU) 7 Issues in Code Generation design Target Program (Output) • The back-end code generator of a compiler may generate different forms of code, depending on the requirements: • Absolute machine code (executable code) +ve : Can be placed in fixed location in memory. immediate executed • Relocatable machine code (object files for linker) +ve : allows subprograms to be compiled separately • Assembly language +ve : facilitates debugging +ve: Easy Code Generation using Assembler • Byte code forms for interpreters (e.g. JVM) +ve : Platform independence
8.
Prof Monika Shah(NU) 8 The Target Machine • Implementing code generation requires thorough understanding of the target machine architecture and its instruction set • Our (hypothetical) machine: • Byte-addressable (word = 4 bytes) • Has n general purpose registers R0, R1, …, Rn-1 • Two-address instructions of the form op source, destination Issues in Code Generation design Target Program (Output)
9.
Prof Monika Shah(NU) 9 The Target Machine: Op-codes and Address Modes • Op-codes (op), for example MOV (move content of source to destination) ADD (add content of source to destination) SUB (subtract content of source from dest.) • Address modes Mode Form Address Added Cost Absolute M M 1 Register R R 0 Indexed c(R) c+contents(R) 1 Indirect register *R contents(R) 0 Indirect indexed *c(R) contents(c+contents(R)) 1 Literal #c N/A 1
10.
Prof Monika Shah(NU) 10 Instruction Costs • Machine is a simple, non-super-scalar processor with fixed instruction costs • Realistic machines have deep pipelines, I-cache, D-cache, etc. • Define the cost of instruction = 1 + cost(source-mode) + cost(destination-mode)
11.
Prof Monika Shah(NU) 11 Instruction Operation Cost MOV R0,R1 Store content(R0) into register R1 1 MOV R0,M Store content(R0) into memory location M 2 MOV M,R0 Store content(M) into register R0 2 MOV 4(R0),M Store contents(4+contents(R0)) into M 3 MOV *4(R0),M Store contents(contents(4+contents(R0))) into M 3 MOV #1,R0 Store 1 into R0 2 ADD 4(R0),*12(R1) Add contents(4+contents(R0)) to value at location contents(12+contents(R1)) 3 Examples
12.
Prof Monika Shah(NU) 12 Instruction Selection • Instruction selection is important to obtain efficient code • Suppose we translate three-address code x:=y+z to: MOV y,R0 ADD z,R0 MOV R0,x a:=a+1 MOV a,R0 ADD #1,R0 MOV R0,a ADD #1,a INC a Cost = 6 Cost = 3 Cost = 2 Better Best Issues in Code Generation design Target Program (Output)
13.
Prof Monika Shah(NU) 13 Instruction Selection: Utilizing Addressing Modes • Suppose we translate a:=b+c into MOV b,R0 ADD c,R0 MOV R0,a • Assuming addresses of a, b, and c are stored in R0, R1, and R2 MOV *R1,*R0 ADD *R2,*R0 • Assuming R1 and R2 contain values of b and c ADD R2,R1 MOV R1,a Issues in Code Generation design Cost = 6 Cost = 2 Cost = 3
14.
Prof Monika Shah(NU) 14 Need for Global Machine-Specific Code Optimizations • Suppose we translate three-address code x:=y+z to: MOV y,R0 ADD z,R0 MOV R0,x • Then, we translate a:=b+c d:=a+e to: MOV a,R0 ADD b,R0 MOV R0,a MOV a,R0 ADD e,R0 MOV R0,d Redundant Issues in Code Generation design
15.
Prof Monika Shah(NU) 15 Register Allocation and Assignment • Efficient utilization of the limited set of registers is important to generate good code • Registers are assigned by • Register allocation to select the set of variables that will reside in registers at a point in the code • Register assignment to pick the specific register that a variable will reside in • Finding an optimal register assignment in general is NP-complete Issues in Code Generation design
16.
Prof Monika Shah(NU) 16 Example t:=a*b t:=t+a t:=t/d MOV a,R1 MUL b,R1 ADD a,R1 DIV d,R1 MOV R1,t t:=a*b t:=t+a t:=t/d MOV a,R0 MOV R0,R1 MUL b,R1 ADD R0,R1 DIV d,R1 MOV R1,t { R1=t } { R0=a, R1=t }
17.
Prof Monika Shah(NU) 17 Choice of Evaluation Order • When instructions are independent, their evaluation order can be changed t1:=a+b t2:=c+d t3:=e*t2 t4:=t1-t3 a+b-(c+d)*e MOV a,R0 ADD b,R0 MOV R0,t1 MOV c,R1 ADD d,R1 MOV e,R0 MUL R1,R0 MOV t1,R1 SUB R0,R1 MOV R1,t4 t2:=c+d t3:=e*t2 t1:=a+b t4:=t1-t3 MOV c,R0 ADD d,R0 MOV e,R1 MUL R0,R1 MOV a,R0 ADD b,R0 SUB R1,R0 MOV R0,t4 reorder Issues in Code Generation design
18.
Prof Monika Shah(NU) 18 • Mapping names in source program to memory address is done by Code Generator phase • Labels in three address code need to be converted to addresses of Instructions • Back patching technique Issues in Code Generation design Memory Mapping