CONTROL FLOW GRAPHS • Motivation: language-independent and machineindependent representation of control flow in programs used in high-level and low-level code optimizers. The flow graph data structure lends itself to use of several important algorithms from graph theory. CS297- B. Narahari Basic Blocks • A straight line sequence of code – a maximal sequence of instructions that can be entered only at the first of them and exited only from the last of them. • First instruction of basic block can be: – entry point of routine – target of branch – instruction immediately following branch - leaders • extended basic block is maximal sequence that contains no join nodes other than first node. – Join node: if it has more than one predecessor – Branch node: if it has more than one successor CS297- B. Narahari Control Flow Graph: Definition A control flow graph CFG = ( Nc ; Ec ; Tc ) consists of • Nc, a set of nodes. A node represents a straight-line sequence of operations with no intervening control flow i.e. a basic block. • Ec ⊆ Nc x Nc x Labels, a set of labeled edges. • Tc , a node type mapping. Tc(n) identies the type of node n as one of: START, STOP, OTHER. We assume that CFG contains a unique START node and a unique STOP node, and that for any node N in CFG, there exist directed paths from START to N and from N to STOP. CS297- B. Narahari Dominators: Definition Node V dominates another node W = V of and only if every directed path from START to W in CFG contains V. Define dom ( W ) = { V | V dominates W } , the set of dominators of node W. Consider any simple path from START to W containing W 's dominators in the order V1,…,Vk. Then all simple paths from START to W must contain W 's dominators in the same order. The element closest to W, Vk = idom (W), is called the immediate dominator of W. CS297- B. Narahari Control Flow Graphs • From code can construct “flowchart” • from flowchart to basic blocks • edges come from the control flow information CS297- B. Narahari Control Flow Analysis • Use dominators to discover loops and simply note loops found for use in optimization • alternate approach- interval analysis: Analyze overall structure of routine and decompose it into nested regions called intervals – nesting structure forms tree, useful in structuring and speeding up data flow analysis CS297- B. Narahari Dominators: Definition (Contd.) The idom relation can be represented as a directed tree with root = START, and parent(W) = idom (W ). CS297- B. Narahari Postdominators: Definition • Node W postdominates another node V = W if and only if every directed path from V to STOP in CFG contains W . • Define pdom (V) = { W | W postdominates V}, the set of postdominators of node V • Consider any simple path from V to STOP containing V’s postdominators in the order W1,…,Wk. Then all simple paths from V to STOP must contain V’s postdominators in the same order. The element closest to V, W1 = ipdom(V), is called the immediate postdominator of V. • The ipdom relation can be represented as a directed tree with root = is STOP and parent(V) = ipdom(V) CS297- B. Narahari Example: Dominator and Postdominator Trees STOP START 3 START T T 1 2 F F T T 2 ∑ ∀x ∈ 1 POST-DOMINATOR TREE F START 3 3 F STOP STOP 2 CONTROL FLOW GRAPH 1 DOM INATOR TREE CS297- B. Narahari Algorithms for computing Dominator/Postdominator • [Purdom and Moore, 1972] O( N x E ) execution time • [Lengauer and Tarjan, 1979] Simple version: O( E x logN ) execution time Sophisticated version: O( E x α(E,N )) execution time • [Harel, 1985] O( N + E ) execution time CS297- B. Narahari Loop Nesting Structure of a Control Flow Graph • The loop nesting structure of a CFG is revealed by its interval structure: • Edge e = ( x, h, l ) in CFG is called a back edge if h ∈ dom (x); h is called a header node, and x is called a latch node. • The strongly connected region defined by back edge e = ( x, h, l ) is STR(e), which consists of the nodes and edges belonging to all paths from node h to node x, along with the back edge (x,h, l). • The interval with header h, I(h), is defined as the union of STR(e) over all back edges e targeted to header node h. CS297- B. Narahari Loop Nesting Structure of a Control Flow Graph (Contd.) • Interval nesting is defined by the subgraph relationship. I(h1) is a subinterval of I(h2) if I(h1) is a subgraph of I(h2). • The interval nesting relation can then be represented by a unique interval nesting tree (or forest of trees). • It is convenient to add a pseudo-edge from STOP to START to make START a header node with I(START) = entire CFG, and thus force the interval relation to be a single tree rather CS297- B. Narahari Reducible Control Flow Graphs • A CFG is reducible if and only if the directed graph obtained by removing all back-edges is acyclic. This graph is referred to as the forward control flow graph. CS297- B. Narahari Example: Interval Structure for a Reducible CFG START I(START) T T * I(1) 1 * * I(START) F T F * T I(1) I(2) 2 F I(2) 3 F STOP * Back edge INTERVALS IN A CONTROL FLOW GRAPH CS297- B. Narahari INTERVAL NESTING TREE Example: Interval Structure for an Irreducible CFG I START START T F I(START) T 1 * * F I(1) T 2 I(1) T F 3 INTERVAL NESTING TREE F STOP * Back edge INTERVALS IN A CONTROL FLOW GRAPH CS297- B. Narahari Data and Control Dependences Motivation: identify only the essential control and data dependences which need to be obeyed by transformations for code optimization. Program Dependence Graph (PDG) consists of 1. Set of nodes, as in the CFG 2. Control dependence edges 3. Data dependence edges Together, the control and data dependence edges dictate whether or not a proposed code transformation is legal. CS297- B. Narahari Control Dependence Analysis We want to capture two related ideas with control dependence analysis of a CFG: 1. Node Y should be control dependent on node X if node X evaluates a predicate (conditional branch) which can control whether node Y will subsequently be executed or not. This idea is useful for determining whether node Y needs to wait for node X to complete, even though they have no data dependences. CS297- B. Narahari Control Dependence Analysis (contd.) 2. Two nodes, Y and Z, should be identified as having identical control conditions if in every run of the program, node Y is executed if and only if node Z is executed. This idea is useful for determining whether nodes Y and Z can be made adjacent and executed concurrently, even though they may be far apart in the CFG. CS297- B. Narahari Control Dependence: Definition [Ferrante et al, 1987] Node Y is control dependent on node X with label L in CFG if and only if 1. there exists a nonnull path X Y, starting with the edge labeled L, such that Y post-dominates every node, W, strictly between X and Y in the path, and 2. Y does not post-dominate X Y is control dependent on X only if X can directly affect whether Y is executed or not; indirect control dependence can be defined as the transitive closure of control dependence CS297- B. Narahari Example: acyclic CFG and its Control Dependence Graph STOP START F T START 3 1 4 1 T F 2 POSTDOMINATOR TREE 2 3 u START u 4 u T 1 STOP T CONTROL FLOW GRAPH 2 T F 4 3 CONTROL DEPENDENCE GRAPH CS297- B. Narahari Example: CFG with multi-way branch and its CDG STOP START F T START 3 1 2 1 F <0 POSTDOMINATOR TREE 2 3 =0 u START u T 1 STOP <0 CONTROL FLOW GRAPH 2 >0 =0 3 CONTROL DEPENDENCE GRAPH CS297- B. Narahari Algorithm for Computing Control Dependence Given node X and branch label L, all control dependence successors can be enumerated as follows: 1. Z CFG successor of node X with label L 2. while Z ≠ ipdom(X) do (a) /* Z is control dependent on X with label L process Z as desired */ (b) Z end while CS297- B. Narahari ipdom(Z) Example: cyclic CFG and its Control Dependence Graph STOP START F T START 3 1 4 1 T F 2 POSTDOMINATOR TREE 2 3 u T START u 4 T F 1 T STOP 2 T F 4 3 CONTROL FLOW GRAPH CS297- B. Narahari CONTROL DEPENDENCE GRAPH Example: another cyclic CFG and its CDG STOP START 3 ENTRY T T 1 2 F T F 1 2 POST-DOMINATOR TREE F ENTRY 3 T T T F STOP T 3 T T 2 T CONTROL FLOW GRAPH T CS297- B. Narahari CONTROL DEPENDENCE GRAPH 1 Properties of Control Dependence • CDG is a tree → CFG is structured • CDG is acyclic → CFG is acyclic • CDG is cyclic → CFG is cyclic The control conditions of node Y is the set, CC(Y) = {(X,L)|Y is control dependent on X with label L} Two nodes, A and B, are said to be identically control dependent if and only if they have the same set of control conditions i.e. CC(A) = CC(B) CS297- B. Narahari Data Dependence Analysis If two operations have potentially interfering data accesses, data dependence analysis is necessary for determining whether or not an interference actually exists. If there is no interference, it may be possible to reorder the operations or execute them concurrently. The data accesses examined for data dependence analysis may arise from array variables, scalar variables, procedure parameters, pointer dereferences, etc. in the original source program. Data dependence analysis is conservative, in that it may state that a data dependence exists between two statements, when actually none exists. CS297- B. Narahari Data Dependence: Definition A data dependence, S1 → S2, exists between CFG nodes S1 and S2 with respect to variable X if and only if 1. there exists a path P: S1 → S2 in CFG, with no intervening write to X, and 2. at least one of the following is true: (a) (flow) X is written by S1 and later read by S2, or (b) (anti) X is read by S1 and later is written by S2 or (c) (output) X is written by S1 and later written by S2 CS297- B. Narahari Def/Use chaining for Data Dependence Analysis A def-use chain links a definition D (i.e. a write access of variable X to each use U (i.e. a read access), such that there is a path from D to U in CFG that does not redefine X. Similarly, a use-def chain links a use U to a definition D, and a def-def chain links a definition D to a definition D’ (with no intervening write to X in all cases). Def-use, use-def, and def-def chains can be computed by data flow analysis, and provide a simple but conservative way of enumerating flow, anti, and output data dependences. CS297- B. Narahari Static single assignment (SSA) form • Static single assignment (SSA) form provides a more efficient data structure for enumerating defuse, def-use and def-def chains. • SSA form requires that each use be reached by a single def (when representing def-use information; analogous requirements are enforced for representing use-def and def-def information). Each def is treated as a new “name” for the variable. • Each variable is assumed to have a dummy definition at the START node of the CFG. • A φ function is used to capture the merge of multiple reaching definitions CS297- B. Narahari Dealing with Merge Points If Cond Then X <--- 4 Else X <--- 6 o o o Use variable X several times • Tricky situation since both defs can reach all subsequent uses; exact reaching def depends on whether Cond evaluated to true or not • Keeping track of true and false cases separately is complicated and intractable (in the presence of nested conditionals) CS297- B. Narahari The SSA approach If Cond Then X1 <--- 4 Else X2 <--- 6 X3 <--- φ (X1,X2) o o o Use variable X3 several times Add this line • The SSA solution is to add a special ∅ function at each merge point • The new ∅-def X3 captures the merge of X1 and X2 CS297- B. Narahari