Prometheus and Grafana Basics for Performance Monitoring
How Prometheus's pull-based metrics model and PromQL work, and how to build Grafana dashboards that actually answer performance questions.
Prometheus and Grafana together form one of the most widely used open-source metrics monitoring stacks, and understanding their core model clarifies a lot of behavior that’s otherwise confusing if you’re used to push-based monitoring systems.
Prometheus’s pull model
Unlike many older monitoring systems where applications push metrics to a central collector, Prometheus scrapes (pulls) metrics from instrumented targets at a configured interval, hitting each target’s /metrics HTTP endpoint. This means your application needs to expose metrics in Prometheus’s text exposition format (most language client libraries handle this for you) rather than actively sending data anywhere — a meaningful architectural difference that simplifies network configuration (no need for every application to know where to send data; Prometheus just needs to know what to scrape) at the cost of needing service discovery for dynamic environments (Prometheus needs to know what targets currently exist to scrape them).
The four metric types
- Counter — a value that only increases (total requests served, total errors) — useful for calculating rates over time, not for reading the raw value directly.
- Gauge — a value that can go up or down (current memory usage, current queue depth).
- Histogram — buckets observations into configurable ranges, enabling percentile-style queries (the same percentile-over-average principle covered throughout this site) directly within Prometheus.
- Summary — similar to histogram but calculates configured percentiles client-side rather than server-side at query time — generally less flexible than histograms for ad hoc analysis since the percentiles are fixed at instrumentation time, not queryable after the fact.
PromQL: querying time-series data
PromQL is Prometheus’s query language — rate(http_requests_total[5m]) calculates the per-second average rate of request increase over a trailing 5-minute window, the standard way to turn a raw counter into a meaningful rate metric. histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) computes an approximate 95th percentile from histogram bucket data — the standard PromQL pattern for percentile-based latency monitoring, directly applying the percentiles-over-averages principle covered elsewhere on this site.
Why histogram bucket boundaries matter
A histogram’s percentile accuracy is limited by its configured bucket boundaries — if your buckets are [0.1, 0.5, 1, 5] seconds, you can’t get a precise percentile estimate for values that fall between bucket boundaries with much granularity; the histogram_quantile function interpolates within buckets, but coarse bucket boundaries produce coarse percentile estimates. Configure bucket boundaries deliberately around your actual expected latency range and SLO thresholds, not arbitrary round numbers.
Building Grafana dashboards that answer real questions
A common mistake is building a dashboard with many small, individually-reasonable panels that don’t collectively answer “is the system healthy and why” at a glance. A more useful structure: a small number of top-level panels showing SLO-relevant metrics (error rate, p95/p99 latency against your actual SLO threshold, drawn as a clear pass/fail line) prominently, with more granular diagnostic panels (per-endpoint breakdowns, resource utilization) available but secondary — mirroring the RED/USE methods covered in this site’s dedicated article.
Alerting from Prometheus directly
Prometheus’s Alertmanager evaluates alerting rules (PromQL expressions with thresholds) and handles routing/deduplication/silencing — the same burn-rate-based alerting principle covered in this site’s SLO article can be implemented directly as PromQL alerting rules, making error-budget-aware alerting a concrete, queryable configuration rather than an abstract policy.
Takeaway: Prometheus’s pull-based, label-rich metrics model paired with PromQL’s rate() and histogram_quantile() functions gives you the building blocks for genuinely SLO-aligned monitoring — the discipline is in configuring histogram buckets deliberately and structuring dashboards around what actually answers “are we meeting our SLO,” not just what’s easy to graph.
Comments are powered by Giscus (GitHub Discussions). Enable them by
configuring GISCUS in src/consts.ts — see
giscus.app.