Garbage collection David Walker CS 320 Where are we? • Last time: A survey of common garbage collection techniques – – – – – Manual memory management Reference counting (Appel 13.2) Copying collection (Appel 13.3) Generational collection (Appel 13.4) Baker’s algorithm (Appel 13.6) • Today: – Mark-sweep collection (Appel 13.1) – Conservative collection – Compiler interface (13.7) Mark-sweep • A two-phase algorithm – Mark phase: Depth first traversal of object graph from the roots to mark live data – Sweep phase: iterate over entire heap, adding the unmarked data back onto the free list Example r1 Free list In use On free list Example Mark Phase: mark nodes reachable from roots r1 Free list In use On free list Marked Example Mark Phase: mark nodes reachable from roots r1 Free list In use On free list Marked Example Mark Phase: mark nodes reachable from roots r1 Free list In use On free list Marked Example Sweep Phase: set up sweep pointer; begin sweep p Free list r1 In use On free list Marked Example Sweep Phase: add unmarked blocks to free list p Free list r1 In use On free list Marked Example Sweep Phase p Free list r1 In use On free list Marked Example Sweep Phase: retain & unmark marked blocks p Free list r1 In use On free list Marked Example Sweep Phase p Free list r1 In use On free list Marked Example Sweep Phase: GC complete when heap boundary encountered; resume program Free list r1 In use On free list Marked p Cost of Mark Sweep • Cost of mark phase: – O(R) where R is the # of reachable words – Assume cost is c1 * R (c1 may be 10 instr’s) • Cost of sweep phase: – O(H) where H is the # of words in entire heap – Assume cost is c2 * H (c2 may be 3 instr’s) • Amortized analysis – Each collection returns H - R words – For every allocated word, we have GC cost: • ((c1 * R) + (c2 * H)) / (H - R) – R / H must be sufficiently small or GC cost is high – Eg: if R / H is larger than .5, increase heap size A Hidden Cost • Depth-first search is usually implemented as a recursive algorithm – Uses stack space proportional to the longest path in the graph of reachable objects • one activation record/node in the path • activation records are big – If the heap is one long linked list, the stack space used in the algorithm will be greater than the heap size!! – What do we do? A nifty trick • Deutsch-Schorr-Waite pointer reversal – Rather using a recursive algorithm, reuse the components of the graph you are traversing to build an explicit stack – This implementation trick only demands a few extra bits/block rather than an entire activation record/block – We already needed a few extra bits per block to hold the “mark” anyway DSW Algorithm back … next DSW Algorithm back … next back … next DSW Algorithm back next back … … … back next next DSW Algorithm back next back next … … … … back back next next DSW Algorithm back next back next … … … … back back next next • extra bits needed to keep track of which record fields we have processed so far DSW Setup • Extra space required for sweep: – 1 bit/record to keep track of whether the record has been seen (the “mark bit”) – f log 2 bits/record where f is the number of fields in the record to keep track of how many fields have been processed • assume a vector: done[x] • Functions: – – – – mark x = sets x’s mark bit marked x = true if x’s mark bit is set pointer x = true if x is a pointer fields x = returns number of fields in the record x DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* depth-first search in constant space *) (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) (* next is object being processed *) (* done[next] is field being processed *) DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* depth-first search in constant space *) (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) back = nil; mark next; done[next] = 0; DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) (* depth-first search in constant space *) y = next.i if (pointer y) & not (marked y) then next.i = back; reuse field to back = next; store back ptr next = y; mark next; done[next] = 0; else done[next] = i + 1 DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) (* depth-first search in constant space *) y = next.i if (pointer y) & not (marked y) then next.i = back; back = next; next = y; initialize for mark next; done[next] = 0; next iteration else done[next] = i + 1 DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) (* depth-first search in constant space *) y = next.i if (pointer y) & not (marked y) then next.i = back; back = next; next = y; mark next; done[next] = 0; else done[next] = i + 1 field is done DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* depth-first search in constant space *) dfs complete (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) y = next; next = back; if next = nil then return; i = done[next]; back = next.i; next.i = y; done[next] = i + 1; DSW Algorithm fun dfs(next) = if (pointer next) & not (marked next) then (* depth-first search in constant space *) (* initialization *) while true do i = done[next] if i < (fields next) then (* process ith field *) else (* back-track to previous record *) y = next; next = back; if next = nil then return; i = done[next]; back = next.i; next.i = y; done[next] = i + 1; advance to next field More Mark-Sweep • Mark-sweep collectors can benefit from the tricks used to implement malloc/free efficiently – multiple free lists, one size of block/list • Mark-sweep can suffer from fragmentation – blocks not copied and compacted like in copying collection • Mark-sweep doesn’t require 2x live data size to operate – but if the ratio of live data to heap size is too large then performance suffers Conservative Collection • Even languages like C can benefit from GC – Boehm-Weiser-Demers conservative GC uses heuristics to determine which objects are pointers and which are integers without any language support • • • • last 2 bits are non-zero => can’t be a pointer integer is not in allocated heap range => can’t be a pointer mark phase traverses all possible pointers conservative because it may retain data that isn’t reachable – thinks an integer is actually a pointer • all gc is conservative anyway so this is almost never an issue (despite what people say) • sound if your program doesn’t manufacture pointers from integers by, say, using xor (using normal pointer arithmetic is fine) Compiler Interface • The interface to the garbage collector involves two main parts – allocation code • languages can allocated up to approx 1 word/7 instructions • allocation code must be blazingly fast! • should be inlined and optimized to avoid call-return overhead – gc code • to call gc code, the program must identify the roots • to traverse data, heap layout must be specified somehow Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function Test next + N < limit (call gc on failure) Move next into function result Clear M[next], ..., M[next + N – 1] next = next + N Return from alloc function Move result into computationally useful place Store useful values into M[next],....,M[next + N - 1] Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function Test next + N < limit (call gc on failure) Move next into function result Clear M[next], ..., M[next + N – 1] useful computation not alloc overhead next = next + N Return from alloc function Move result into computationally useful place Store useful values into M[next],....,M[next + N - 1] Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function inline alloc Test next + N < limit (call gc on failure) code Move next into function result Clear M[next], ..., M[next + N – 1] next = next + N Return from alloc function Move result into computationally useful place Store useful values into M[next],....,M[next + N - 1] Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function combine moves Test next + N < limit (call gc on failure) Move next into computationally useful place Clear M[next], ..., M[next + N – 1] next = next + N Return from alloc function Move next into computationally useful place Store useful values into M[next],....,M[next + N - 1] Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function eliminate Test next + N < limit (call gc on failure) useless Move next into computationally useful place store Clear M[next], ..., M[next + N – 1] next = next + N Return from alloc function Move next into computationally useful place Store useful values into M[next],....,M[next + N - 1] Allocation Code Assume size of record allocated is N: 1. 2. 3. 4. 5. 6. 7. 8. Call alloc function Test next + N < limit (call gc on failure) Move next into computationally useful place Clear M[next], ..., M[next + N – 1] next = next + N Return from alloc function Move next into computationally useful place Store useful values into M[next],....,M[next + N - 1] total overhead for allocation on the order of 3 instructions/alloc Calling GC code • To call the GC, program must: – identify the roots: • a GC-point, is an control-flow point where the garbage collector may be called – allocation point; function call • for any GC-point, compiler generates a pointer map that says which registers, stack locations in the current frame contain pointers • a global table maps GC-points (code addresses) to pointer maps • when program calls the GC, to find all roots: – GC scans down stack, one activation record at a time, looking up the current pointer map for that record Calling GC code • To call the GC, program must: – enable GC to determine data layout of all objects in the heap • for ML, Tiger, Pascal: – every record has a header with size and pointer info • for Java, Modula-3: – each object has an extra field that points to class definition – gc uses class definition to determine object layout including size and pointer info Summary • Garbage collectors are a complex and fascinating part of any modern language implementation • Different collection algs have pros/cons – explicit MM, reference counting, copying, generational, mark-sweep – all methods, including explicit MM have costs – optimizations make allocation fast, GC time, space and latency requirements acceptable – read Appel Chapter 13 and be able to analyze, compare and contrast different GC mechanisms