Garbage collection is a process that automatically frees up memory not being used by a program. A garbage collector is a link that is also required to serve memory allocation requests.
The garbage collection process typically works by periodically scanning the memory used by a program to identify objects that are no longer being referenced or used. These objects are then marked as "garbage", and the memory they occupy is freed up by the garbage collector. This helps prevent memory leaks, which can cause a program to use up more and more memory over time and eventually crash.
There are different types of garbage collection algorithms, such as mark-and-sweep, reference counting, and copying. Each algorithm has its own advantages and disadvantages, and the choice of algorithm depends on the specific requirements of the program being developed.
The heap area of memory can be thought of as a set of objects with the following attributes: a memory address, size, and a set of references to other objects. This way, the objects and their references constitue vertices and edges of a graph.
Similarly, the stack area of memory can also be modeled as a graph, we label the stack objects as "root" objects which have references pointing to the heap objects (or non-root objects).
More formally, we have modeled the memory as a graph where is the set of memory objects (vertices), and is the set of references (edges). The graph here represents the memory as a whole. Two subgraphs and of the graph . THe subgraph represents the stack area of memory and the subgraph represents the heap area of memory, respectively.
Reachability: We say that some object is "reachable" if there is a path in for some object .
We define the set of "alive" objects. An object in is alive if and only if its is reachable from some object on the stack. That is, such that there is a path from to in .
Garbage: All objects in the heap, that are not reachable from the stack are classified as "garbage".
Syntactic Garbage: Heap memory blocks that are not reachable from the stack area.
Semantic Garbage: Heap memory blocks that the program would never use again.
An ideal garbage collector should have the following properties:
These are some of the goals while desigining a garbage collection algorithm.
Garbage collection algorithms have been an active field of research since the 1960's. There are different variations of the basic algorithm. Some of the metrics and optimization areas in garbage collection algorithms are:
A tracing algorithm is used to determine objects that are alive. Given the graph , and subgraphs where , , and where , , and . A garbage collection algorithm will compute the and sets. Consider this naïve garbage collection algorithm called "Find the Reachable".
Input:
Output: ,
Find the Reachable:
:
Alive = Aive \cup \
:
:
A Mark-Sweep garbage collector would perfrom a scan of the heap area of memory and mark them as "dead" or "alive". Usually, an additional data structure is needed for this task to store a mark bit (0/1) for each object. A memory cell is marked "alive" when the cell has references pointing to it from stack area or from someother memory cell in the heap that is alive.
Compaction can be an expensive operation and this leads to necessity of better and more efficient garbage collection algorithms. The time complexity of the mark-sweep algorithm is linear w.r.t the heap size. However, there is an overhead involved during memory allocation due to fragmentation. Hence, a mark-sweep algorithm might be optimal in a scenario where there is a series of allocation requests in the initial stage of the program, followed by objects becoming unreachable after that. This will eliminate the need to allocate objects when the heap is fragmented.
The advantage of a copying collector is that it has a very low allocation cost. Allocation can be done in time (amortized). Hence, a copying collector would be ideal when a program has a large number of allocation requests. A drawback would be the need for more garbage collection cycles as there is only half of the heap available for allocation (and "alive" objects).
Emprical analysis of applications has shown that most objects are short lived. This observation movtivated the idea of having a generational garbage collector.
The heap is divided into generations, garbage collection on younger generations of the heap are performed more frequently.
The heap is divided into 3 generations: Young, Old, and Permament. New objects are always allocated in the Young generation region.
As objects age (surivive more garbage collections cycles), they get promoted to the Old generation area, and ultimately to the Permanent generation area.
Generational garbage collectors are usually concurrent garbage collectors where a garbage collection cycle happens simultaneously during program execution by using multiple CPU cores.
A. W. Appel, “Simple generational garbage collection and fast allocation,” Software: Practice and Experience, vol. 19, no. 2, pp. 171–183, Feb. 1989, doi: https://doi.org/10.1002/spe.4380190206.
S. Graham, R. Rivest Editors, and F. Morris, “Programming Techniques A Time-and Space- Efficient Garbage Compaction Algorithm.” Accessed: May 11, 2023. [Online]. Available: https://sites.cs.ucsb.edu/~ckrintz/racelab/gc/papers/morris-compaction.pdf
S. Blackburn, R. Jones, K. Mckinley, J. Eliot, and B. Moss, “Beltway: Getting Around Garbage Collection Gridlock.” Accessed: May 11, 2023. [Online]. Available: https://sites.cs.ucsb.edu/~ckrintz/racelab/gc/papers/beltway-pldi-2002.pdf
D. Detlefs, C. Flood, S. Heller, and T. Printezis, “Garbage-first garbage collection,” Proceedings of the 4th international symposium on Memory management, Oct. 2004, doi: https://doi.org/10.1145/1029873.1029879.
R. R. Fenichel and J. C. Yochelson, “A LISP garbage-collector for virtual-memory computer systems,” Communications of The ACM, vol. 12, no. 11, pp. 611–612, Nov. 1969, doi: https://doi.org/10.1145/363269.363280.
T. Würthinger, “Incremental Garbage Collection: The Train Algorithm.” Accessed: May 13, 2023. [Online]. Available: https://ssw.jku.at/General/Staff/TW/Wuerthinger05Train.pdf
An extensive bibliography on garbage collection has been produced by Richard Jones.