Garbage Collection Tuning Fundamentals
The core concepts behind garbage collector tuning — generational collection, pause times, and throughput trade-offs — applicable across JVM, .NET, and Go.
Garbage collection (GC) tuning is one of the more intimidating areas of performance engineering because of the sheer number of available flags and algorithms, but the underlying concepts are consistent across most managed runtimes (JVM, .NET CLR, Go) and worth understanding before diving into any specific collector’s tuning flags.
The generational hypothesis
Most GC algorithms exploit the generational hypothesis — the empirical observation that most allocated objects die young (are no longer referenced shortly after allocation), while a smaller fraction survive much longer. Generational collectors separate memory into a “young generation” (collected frequently, cheaply, since most objects there are already dead) and an “old generation” (collected less frequently, more expensively, holding longer-lived objects) — this separation is why generational GC is generally far more efficient than a naive mark-and-sweep over the entire heap every time.
Throughput vs pause time: the central trade-off
Similar in spirit to the throughput/latency trade-off covered elsewhere on this site, GC tuning fundamentally trades off throughput (total CPU time spent on GC versus actual application work, over the long run) against pause time (how long the application is stopped during a collection, particularly for old-generation/full collections). Different GC algorithms and tuning choices sit at different points on this trade-off curve — there’s no universally “best” collector, only the one that fits your application’s specific latency tolerance and throughput needs.
Stop-the-world pauses and why they matter for latency SLOs
Many collectors (to varying degrees) require pausing all application threads during some portion of collection work — these “stop-the-world” pauses directly show up as latency spikes from the application’s perspective, and for latency-sensitive services with strict p99 SLOs (covered in this site’s SLO article), GC pause time can be a direct, significant contributor to tail latency that’s easy to overlook if you’re only monitoring application-level metrics without correlating against GC logs.
Modern low-pause collectors
Newer collectors (the JVM’s ZGC and Shenandoah, for instance) specifically target very low pause times (sub-millisecond to low-millisecond, largely independent of heap size) by doing much more collection work concurrently with application execution rather than during stop-the-world pauses — generally at some cost to peak throughput or memory overhead compared to older, more throughput-oriented collectors (the JVM’s older Parallel GC, for instance). Choosing between these is a direct instance of the throughput/pause-time trade-off, made explicit through collector choice rather than just tuning parameters within a single collector.
Heap sizing’s effect on GC behavior
A heap sized too small causes frequent collections (hurting throughput, since a larger fraction of time goes to GC overhead relative to application work) — a heap sized very large can reduce collection frequency but may increase individual collection pause duration for collectors whose pause time scales with heap or live-set size, and increases memory cost. Right-sizing requires understanding your application’s actual allocation rate and object lifetime distribution, not just picking a large round number.
Diagnosing GC as a problem in the first place
Before tuning anything, confirm GC actually explains an observed latency or throughput problem — GC logs (enabled via runtime flags, varying by platform) showing collection frequency, duration, and which generation was collected are the primary diagnostic data; correlating GC pause timestamps against application-level latency spikes (the same trace/metric correlation principle covered in this site’s observability articles) confirms or rules out GC as the actual cause before investing tuning effort.
A practical tuning approach
- Confirm GC is actually a meaningful contributor via GC logs correlated against application latency.
- Identify whether your priority is pause time (latency-sensitive service) or raw throughput (batch processing).
- Choose a collector aligned with that priority, rather than starting from default flags and micro-tuning.
- Tune heap size and generation ratios based on observed allocation rate and object lifetime, validated empirically under realistic load (the same load testing principle covered throughout this site).
Takeaway: GC tuning is fundamentally about choosing your position on the throughput/pause-time trade-off curve for your specific application’s needs — confirm GC is actually the bottleneck via logs before tuning, and let your latency SLO (or lack thereof) guide collector choice before reaching for individual tuning flags.
Comments are powered by Giscus (GitHub Discussions). Enable them by
configuring GISCUS in src/consts.ts — see
giscus.app.