How Does Java Garbage Collection Works: Java GC Internal Overview
Table of Contents
Introduction to Garbage Collection
Garbage collection is an important feature of many modern computing environments, There are basically two styles of garbage collection algorithms: reference-counting collectors and tracing collectors.
Reference-Counting GC
Reference counting garbage collection is where each object has a count of the number of references to it. Garbage is identified by having a reference count of zero. An object's reference count is incremented when a reference to it is created and decremented when a reference is destroyed.
Reference counting GC has the desirable characteristics of low memory overhead and short pause times, which are key in today's interactive mobile platforms. However, RC has a higher execution time overhead than its counterpart, tracing garbage collection. The reason is that RC implementations maintain per-object counters, which must be continually updated. In particular, the execution time overhead is high in environments where low memory overhead is critical and, therefore, non-deferred RC is used. This is because the counter updates need to be performed atomically. [1]
Tracing GC
Tracing garbage collection is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics. [2]
There are two common approaches to reducing the pause time in tracing collectors:
Parallel GC: Use multi-core CPUs for parallel garbage collection.
Concurrent GC: Running most of garbage collection steps concurrent with application. Only pause application for a short time during a GC cycle.
In this article, we will focus on tracing GC and summarize the common algorithms implemented in tracing GC category.
Tracing GC Implementations
A theoretical, most straightforward garbage collection algorithm iterates over every reachable object every time it runs. Any leftover objects are considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
JVM incorporates a number of different garbage collection algorithms that all use a technique called generational collection.
Generational Collection
While naive garbage collection examines every live object in the heap every time, generational collection exploits several empirically observed properties of most applications to minimize the work required to reclaim unused (garbage) objects. The most important of these observed properties is the weak generational hypothesis, which states that most objects survive for only a short period of time.
Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects "die young."
Generation
To optimize for this scenario, memory is managed in generations (memory pools holding objects of different ages). Garbage collection occurs in each generation when the generation fills up.
The vast majority of objects are allocated in a pool dedicated to young objects (the young generation), and most objects die there. When the young generation fills up, it causes a minor collection in which only the young generation is collected; garbage in other generations isn't reclaimed. The costs of such collections are, to the first order, proportional to the number of live objects being collected; a young generation full of dead objects is collected very quickly.
Typically, some fraction of the surviving objects from the young generation are moved to the old generation during each minor collection. Eventually, the old generation fills up and must be collected, resulting in a major collection, in which the entire heap is collected. Major collections usually last much longer than minor collections because a significantly larger number of objects are involved.
Young generation contains two separate areas: the eden space and the survivor space. Both serve different functions with the overall goal to increase memory efficiency.
Eden space: New objects are allocated in the eden space. When the eden space is full and there is no room for allocating a new object, the young generation (minor) garbage collector runs
Survivor space: There are two equally divided survivor spaces, namely S0 and S1. The minor garbage collector uses these regions in an alternate fashion.
Old generation space – This is also known as tenured space. This is where longer-lived objects reside. In other words, the garbage collector moves objects that have survived a certain number of GCs here. When the tenured space becomes full, this triggers a major GC.
Minor GC
Minor GC algorithm:
Given: S0 as the target survivor and S1 as the source survivor spaces initially.
When: Minor garbage collector runs. In other words, the eden space does not have enough space for an object that the JVM wishes to allocate.
Then:
All live objects from the eden space are copied to the S0 survivor space. The ages of these objects are set to 1, as they have just survived their first GC cycle.
S1 is examined and any live objects whose ages meet a given threshold (the tenuring threshold) are copied to the old generation space, meaning they are tenured. In other words, this is a long-lived object, so copy it to the old generation area where longer-lived objects reside. This makes future minor GC runs more efficient as it ensures these same objects are not re-examined.
The remaining live S1 objects (the ones that were not tenured) are copied to S0, where their ages are incremented by one, as they have just passed another GC cycle.
Minor GC again:
Some facts about Minor GC:
Minor GC is always triggered when JVM is unable to allocate space for a new Object, e.g. the Eden is getting full. So the higher the allocation rate, the more frequently Minor GC is executed.
Whenever the pool is filled, its entire content is copied and the pointer can start tracking the free memory from zero again. So instead of classical Mark, Sweep and Compact, cleaning Eden and Survivor spaces is carried out with Mark and Copy instead. So, no fragmentation actually takes place inside Eden or Survivor spaces.
During a Minor GC event, Tenured generation is effectively ignored. References from tenured generation to young generation are considered de facto GC roots. References from young generation to Tenured generation are simply ignored during the markup phase.
Against common belief, all Minor GCs do trigger stop-the-world pauses, stopping the application threads. For most of the applications, the length of the pauses is negligible latency-wise. This is true if most of the objects in Eden can be considered garbage and are never copied to Survivor/Old spaces. If the opposite is true and most of the newborn objects are not eligible for GC, Minor GC pauses start taking considerably more time.
Major GC
Major GC cleans up the old generation. The task of Major GC is as same as the minor GC, but the only difference is minor GC reclaims the memory of the young generation whereas Major GC reclaims the memory of the old generation. It is also said that many major GCs are triggered by minor GCs.
Major GC (old generation GC) uses mark and sweep with compacting. Minor GC uses mark and sweep with copying.
Full GC
A Full GC is triggered whenever the heap fills up. In such a case the young generation is swept first followed by the old generation. If the old generation is too full to accept the objects of the young generation, the young generation GC is omitted and the old generation GC (major GC) is used to clean and compact the full heap, either in parallel or serial. Either way, the whole heap is collected with a stop-the-world event.
GC Steps
The process of garbage collection involves several steps, including marking, sweeping, and compacting.
Marking Phase
Marking marks any live objects and anything not marked as ready to be garbage collected. The objects keep a special bit that determines whether they are marked or not. Upon creation, the bit is 0. In the mark phase, if an object is still in use and should not be removed, it gets set to 1.
GC roots
A GC root is a special type of live object and is, therefore, not eligible for GC. All objects reachable from GC roots are also live and are, therefore, not eligible for GC. The GC roots act as starting points in GC, that is, start at these roots and mark all objects reachable as live.
The most common GC roots are the following
Local variables on the stack
All active Java threads
Static variables (as these can be referenced by their classes)
Java Native Interface (JNI) references – Objects created by the native code as part of a JNI call. This is a very special case of GC roots because the JVM does not know whether the objects are referenced by the native code or not.
Stop-The-World
While you are checking all the variables on the stack and marking all their objects and the nested objects, new objects could have been created in the meantime. It’s possible that you missed that part of the stack. This would lead to unmarked objects (remember, objects are initially unmarked upon creation) that should have been marked, and they would be removed as a result. That would be very problematic.
The solution to this impacts performance as the garbage collector needs to pause the execution of the main application in order to make sure no new objects will be created during the marking phase. This strategy is called stop-the-world
Sweeping
Once the objects that need to be kept are marked, it’s time to start the next phase to actually free the memory. This deletion of the objects is called sweeping in GC jargon. To make it more interesting, we have three kinds of sweeping:
Normal sweeping
Sweeping with compacting
Sweeping with copying
Fragmentation of memory happens after storing the memory first and then removing blocks from the middle. In between the memory blocks, new memory can be allocated. If the memory blocks don’t fit in between the gaps (or at the end in this overview), we have a fragmentation problem.
Sweeping with compacting
Sweeping with compacting is a two-step process. Just like in normal sweeping, the memory blocks are deleted. This time, we do not accept the fragmented memory as the end result but execute an extra step called compacting. This moves the blocks of memory to ensure there are no gaps between them
The compacting of the memory is a costly process in terms of performance, since all the memory blocks need to be moved (and this needs to happen mostly sequentially).
Sweeping with copying
First, we have our memory region with the objects that are no longer needed, and then we have a second memory region that is not allocated yet.
In the next step, we copy all the objects we need to the second memory region.
Common Garbage Collectors In JDK
Serial GC
The serial GC runs on a single thread and uses the stop-the-world strategy. This means that the application is not running its main tasks when the garbage collector runs. It is the simplest option for garbage collection.
For the young generation, it uses the mark strategy to identify which objects are eligible for GC and the sweep with copying approach for the actual freeing of the memory. For the old generation, it uses the mark and sweep with compacting approach.
It's best-suited to single processor machines because it can't take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100 MB). The serial collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseSerialGC.
Parallel GC
The parallel garbage collector is the default garbage collector of Java 8. It uses the mark-and-copy approach for the young generation and the mark-sweep-compact approach for the old generation, just like the serial garbage collector. However, and this might come as a surprise, it does so in parallel. In this case, parallel means that uses multiple threads to clean up the heap space. So, there is not one single thread taking care of the marking, copying, and compacting phases, but multiple threads. Even though it is still stop-the-world, it performs better than the serial garbage collector, since the world needs to be stopped for a shorter amount of time.
The parallel garbage collector will work well on machines with multiple cores. On (rarer) single-core machines, the serial garbage collector is probably a better choice, due to the costs of managing multiple threads and not really parallel processing on the single core.
The parallel collector is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. You can enable it by using the -XX:+UseParallelGC option.
CMS GC
Concurrent Mark Sweep (CMS) collector and Garbage-First (G1) garbage collector are the two mostly concurrent collectors. Mostly concurrent collectors perform some expensive work concurrently to the application.
The CMS GC is a generational garbage collector as well. It has separate cycles for the young and old generation. For the young generation, it uses mark and copy with stop-the-world. So, during the GC of the young generation, the main application threads are paused.
The old generation is garbage collected with mostly concurrent mark and sweep. The term mostly concurrent means that it does most of the GC concurrently, but it will still use stop-the-world twice in a GC cycle. It pauses all the main application threads for the first time at the very beginning, then during the marking for a very short time, and then for a (usually) somewhat longer time around the middle of the GC cycle to do the final marking.
These pauses are typically very short, because the CMS GC attempts to collect enough of the old generation while running concurrently to the main application threads and, this way, prevent it from getting full. Sometimes, this is not possible. If the CMS GC cannot free up enough while the old generation is getting full, or the application fails to allocate an object, the CMS GC pauses all the application threads and the main focus shifts to GC. The situation in which this garbage collector fails to do the GC mostly concurrently is called concurrent mode failure.
If the collector then still cannot free up enough memory, OutOfMemoryError gets thrown. This happens when 98% of the application time is spent on GC and less than 2% of the heap is recovered.
G1 GC
The G1 (garbage-first) garbage collector came with Java 7. and is an upgrade of the CMS GC.
The G1 collector is parallel, concurrent, and aims for short pauses of the application. It employs a technique called incrementally compacting.
The G1 garbage collector divides the heap into smaller regions: much smaller than the generational garbage collector. It works with these smaller memory segments to mark and sweep them. It keeps track of the amount of reachable and unreachable objects per memory region. The regions with the most unreachable objects are garbage collected first since that frees up the most memory. That’s why it is called the garbage-first garbage collector. Regions with the most garbage are collected first.
It does all this while copying objects from one region to another region. This will result in freeing up the first region completely. This way the G1 GC kills two birds with one stone: achieving GC and compacting at the same time. This is why it is such an upgrade compared to earlier mentioned garbage collector.
The G1 GC is a great garbage collector. You may wonder whether this garbage collector manages to work without stop-the-world. No, the compacting still needs to happen this way. But due to the smaller regions, the pauses are much shorter.
Just like the CMS GC, the G1 GC tries to do a lot of the GC concurrently. So, the application threads don’t need to be paused most of the time. However, if the G1 GC cannot free up enough memory and the application is allocating more than can be freed up concurrently, the application threads need to be paused.
ZGC
It does all the garbage collecting concurrently and does not need to pause the application for more than 10 ms per pause.
It manages to do this by starting with marking the live objects. It doesn’t keep a map but uses reference coloring. Reference coloring means that the live state of reference is stored as the bits that are part of the reference. This requires some extra bits, which is why the ZGC only works on 64-bit systems and not on 32-bit systems.
Fragmentation is avoided by using relocation. This process happens in parallel with the application in order to avoid pauses of more than 10 ms, but this happens while the application is being executed
Without extra measurements, this could lead to unpleasant surprises. Imagine that we were trying to access a certain object with the reference, but while doing so, it got relocated and has a new reference. The old memory location could be overwritten or cleared already. In such a scenario, debugging would be a nightmare.
The load barriers run whenever a reference from the heap is loaded. It checks the metadata bits of the reference and, based on the result, it may or may not do some processing before retrieving the result. This magic is called remapping.
In the next article, we will dive deep into how G1 GC and ZGC works.
References
[1] Biased reference counting: minimizing atomic operations in garbage collection
[3] HotSpot Virtual Machine Garbage Collection Tuning Guide
[4] Java Memory Management: A comprehensive guide to garbage collection and JVM tuning