Intermediate Representation Design
. More of a wizardry rather than science
. each compiler uses 2-3 IRs
. HIR (high level IR) preserves loop structure and array bounds
. MIR (medium level IR) reflects range of features in a set of source languages
- language independent
- good for code generation for one or more architectures
- appropriate for most optimizations
. LIR (low level IR) low level similar to the machines
Intermediate Representation (IR) is language-independent and machine-independent.
A good intermediate representation can be said as one which:
. Captures high level language constructs,
. Should be easy to translate from abstract syntax tree,
. Should support high-level optimizations,
. Captures low-level machine features,
. Should be easy to translate to assembly,
. Should support machine-dependent optimizations,
. Has narrower interface i.e. small number of node types (instructions), and
. Should be easy to optimize and retarget .
To design such an IR having all these features is a very difficult task. Thus most compliers use multiple IRs. So, various optimizations are done by
different IRs and are easy to implement and extend.
For this, IR can be categorized into 3 types:
1. High Level IR (HIR): This is language independent but closer to the high level language. HIR preserves high-level language constructs such as structured control flows: if, for, while, etc; variables, expressions, functions etc. It also allows high level optimizations depending on the source language, e.g., function inlining, memory dependence analysis, loop transformations, etc.
for v <- v1 by v2 to v3 do
a[v]:=2
endfor
2. Medium Level IR (MIR): This is machine and language independent and can represent a set of source languages. Thus MIR is good for
code generation for one or more architectures. It utilizes simple control flow structure like "if" and "goto"; allows source language variables (human form names) as well as front-end created "temporaries" (symbolic registers). Compared to HIR, it reveals computations in greater detail (much closer to the machine than HIR), and therefore is usually preferred for needs of optimization.
The HIR Example is translated into the following MIR code:
v <- v1
t2 <- v2
t3 <- v3
L1:
if v > t3 goto L2
t4 <- addr a
t5 <- 4 * v
t6 <- t4 + t5
*t6 <- 2
v <- v + t2
goto L1
L2:
3. Low Level IR (LIR): This is machine independent but more closer to the machine (e.g., RTL used in GCC). It is easy to generate code
from LIR but generation of input program may involve some work. LIR has low level constructs such as unstructured jumps, registers,
memory locations. LIR has features of MIR and LIR. It can also have features of HIR depending on the needs.
The LIR code for the above MIR example is:
s2 <- s1
s4 <- s3
s6 <- s5
L1:
if s2 > s6 goto L2
s7 <- addr a
s8 <- 4 * s9
s10 <- s7 + s8
[s10] <- 2
s2 <- s2 + s4
goto L1
L2:
|