The USE Method: Utilization, Saturation, Errors for Resource Monitoring
How Brendan Gregg's USE method systematically checks system resources for performance bottlenecks, and how it pairs with the RED method.
The USE method, developed by Brendan Gregg, gives a systematic checklist for diagnosing resource-level performance bottlenecks — checking Utilization, Saturation, and Errors for every system resource (CPU, memory, disk, network) rather than ad hoc, intuition-driven investigation that might miss a relevant resource entirely.
Utilization
The percentage of time a resource is busy doing work — straightforward for CPU (percent busy) but worth defining carefully for other resources (for memory, utilization might mean percentage of capacity used; for a disk, percentage of time servicing requests).
Saturation
The degree to which a resource has more work queued than it can immediately service — this is the metric utilization alone misses. A CPU at 100% utilization with no queued work is simply fully busy and fine; a CPU at 100% utilization with a long run-queue of waiting processes is saturated, a meaningfully worse and more actionable signal. Saturation metrics (queue length, wait time) are often more diagnostically useful than utilization alone for spotting a genuine bottleneck.
Errors
Resource-level error counts — disk I/O errors, network interface errors, memory errors (ECC corrections, for instance) — that might be silently degrading performance (through retries, fallback paths) without showing up clearly in utilization or saturation metrics at all. Easy to overlook, but a real and sometimes surprising root cause of mysterious performance issues.
Why check every resource systematically, not just the obvious one
A common diagnostic mistake is fixating on CPU (the most commonly monitored resource) when the actual bottleneck is disk I/O saturation, network bandwidth, or even something less obvious like file descriptor exhaustion or a memory resource hitting swap. The USE method’s value is precisely in being a checklist that prompts you to check every resource’s utilization, saturation, and errors systematically, rather than jumping straight to whichever resource happens to be top-of-mind.
Applying USE during a load test
When investigating a load test result showing degraded performance at high concurrency (the kind of result covered throughout this site’s tool-specific load testing articles), running through USE on the system under test’s host machine — CPU, memory, disk, network, and for databases specifically, connection pool saturation — is a structured way to find the actual resource bottleneck rather than guessing, directly connecting to the “correlate client-side results with server-side metrics” workflow covered in this site’s LoadRunner monitoring integration article.
USE for software resources, not just hardware
The method extends naturally to software-level resources: a thread pool’s utilization and saturation (queue depth of waiting tasks), a database connection pool’s utilization and saturation, a message queue’s depth (a direct saturation signal) — applying the same three-part check to software resources that don’t map directly onto a hardware metric but behave analogously.
Where USE fits relative to RED
As covered in this site’s RED method article, RED addresses client-facing service health while USE addresses underlying resource health — when a RED dashboard shows degraded Duration or Errors, USE gives you the systematic next step for finding which specific resource is actually responsible, rather than leaving you to guess.
A practical checklist format
For each resource (CPU, memory, disk, network, and relevant software resources like connection/thread pools): what’s the utilization, is there evidence of saturation (queue depth, wait time), and are there any errors — running through this explicitly, resource by resource, during an investigation catches bottlenecks that intuition-driven investigation often misses.
Takeaway: the USE method’s systematic, resource-by-resource checklist (utilization, saturation, errors) is specifically designed to catch the bottleneck that isn’t the resource you intuitively suspected first — pair it with RED’s client-facing metrics for a complete diagnostic picture.
Comments are powered by Giscus (GitHub Discussions). Enable them by
configuring GISCUS in src/consts.ts — see
giscus.app.