Writing
Blog
Articles on performance testing, SRE, observability, and AI systems performance.
Measuring LLM Inference Performance: Latency, Throughput, and Cost
The metrics that actually matter for LLM serving — TTFT, TPOT, tokens/sec, and cost per request — how they trade off, and how to load-test an inference endpoint.
Read →Installing and Configuring JMeter for Real Load Testing
How to install Apache JMeter correctly, the JVM heap settings that matter, and the first configuration changes you should make before your first real test.
Read →What Is Apache JMeter? An Introduction for Performance Testers
What Apache JMeter is, why it's still the most widely used open-source load testing tool, and where it fits next to k6, Gatling, and LoadRunner.
Read →JMeter Thread Groups Explained: Users, Ramp-Up, and Loops
How JMeter Thread Groups control virtual users, ramp-up time, and loop count, and how to choose values that actually model your real traffic pattern.
Read →SLOs and Error Budgets: A Practical Guide for Performance Engineers
How to turn vague reliability goals into measurable SLIs, SLOs, and error budgets — and how that math directly governs release velocity and on-call load.
Read →JMeter Assertions: Validating Responses Under Load
How to use JMeter assertions to catch silent failures — wrong content, slow responses, and unexpected status codes — that a simple pass/fail check misses.
Read →The JMeter HTTP Request Sampler: A Deep Dive
Every important setting on JMeter's HTTP Request sampler, from implementation choice to connection reuse, explained for accurate load testing.
Read →JMeter Listeners: Collecting and Reporting Results Correctly
Which JMeter listeners to use during scripting versus load generation, and how to produce a trustworthy HTML report from a non-GUI run.
Read →JMeter Correlation: Handling Session Tokens and Dynamic Values
How to extract and reuse dynamic values like session tokens, CSRF tokens, and IDs in JMeter so recorded scripts work correctly under load.
Read →JMeter Parameterization with CSV Data Config
How to drive JMeter test data from CSV files so virtual users don't all hammer the same account, search term, or product ID.
Read →JMeter Timers: Pacing and Think Time Done Right
The difference between JMeter's Constant Timer, Uniform Random Timer, and Constant Throughput Timer, and which one actually controls throughput.
Read →Distributed Load Testing with JMeter
How JMeter's controller/agent (master/slave) distributed testing mode works, and what to check before trusting results from multiple load generators.
Read →JMeter Logic Controllers: If, Loop, and Transaction Controllers
How JMeter's Logic Controllers (If Controller, Loop Controller, Transaction Controller) shape test flow and how to use them without breaking your results.
Read →JMeter Plugins: Extending What JMeter Can Do
An overview of the JMeter Plugins ecosystem — the Plugins Manager, the most widely used plugins, and how to install them safely.
Read →JMeter Best Practices and Common Pitfalls
A checklist of JMeter mistakes that produce misleading results, and the practices experienced performance testers use to avoid them.
Read →Reading JMeter's HTML Dashboard Report Correctly
A guide to JMeter's generated HTML dashboard report — which graphs matter, which are easy to misread, and how to compare two runs properly.
Read →Running JMeter in CI/CD: Non-GUI Mode and Automation
How to run JMeter from the command line in a CI pipeline, fail builds on performance regressions, and avoid common automation pitfalls.
Read →Database Load Testing with JMeter's JDBC Sampler
How to load test a database directly with JMeter's JDBC Request sampler, including connection pooling configuration and common gotchas.
Read →JMeter Groovy Scripting: Beyond the GUI
How to use JSR223 Groovy scripting in JMeter for custom logic that the built-in components can't express, with practical examples.
Read →JMeter vs k6 vs Gatling: Choosing the Right Load Testing Tool
A practical comparison of JMeter, k6, and Gatling across scripting model, protocol support, CI fit, and team skill requirements.
Read →Load Testing REST APIs with JMeter: A Practical Walkthrough
A practical walkthrough of scripting a realistic REST API load test in JMeter, from authentication to JSON assertions to reporting.
Read →Analyzing JMeter Results: Why Percentiles Beat Averages
How to properly analyze JMeter result data using percentiles instead of averages, with a worked example showing how averages hide real problems.
Read →Testing WebSockets with JMeter
How to load test WebSocket connections in JMeter using the WebSocket Samplers plugin, and what makes WebSocket load testing different from HTTP.
Read →LoadRunner Architecture: How VuGen, Controller, and Analysis Fit Together
A deeper look at how LoadRunner's three main components — VuGen, Controller, and Analysis — work together in a typical performance testing workflow.
Read →Introduction to LoadRunner: OpenText's Performance Engineering Platform
What LoadRunner is, its core components (VuGen, Controller, Analysis), and where it fits in 2026 alongside open-source alternatives.
Read →Recording Your First Script in VuGen
A step-by-step guide to recording, replaying, and validating your first LoadRunner VuGen script.
Read →LoadRunner Correlation Techniques: Handling Dynamic Values
How to correlate dynamic values in LoadRunner scripts using the Correlation Studio, manual web_reg_save_param, and best practices for reliable scripts.
Read →LoadRunner Parameterization: Driving Scripts with Real Test Data
How to parameterize LoadRunner VuGen scripts to avoid testing with hardcoded, repeated data, including parameter types and data allocation strategies.
Read →LoadRunner Protocols: Choosing the Right One for Your Application
How to choose the correct LoadRunner protocol for web, Citrix, SAP, and other application types, and why this decision matters more than in open-source tools.
Read →LoadRunner Analysis: Reading Graphs and Reports Correctly
A guide to the most useful graphs in LoadRunner Analysis — transaction response time, throughput, and Vuser status — and how to merge them for real insight.
Read →LoadRunner Controller: Designing a Load Test Scenario
How to design a LoadRunner Controller scenario, including Vuser groups, scheduling, and load generator assignment.
Read →LoadRunner Rendezvous Points and Pacing Explained
How LoadRunner Rendezvous Points create synchronized concurrency spikes, and how Pacing controls iteration timing — two commonly confused concepts.
Read →LoadRunner Functions and Custom C Code in VuGen Scripts
How to extend LoadRunner scripts with custom C code and the lr_* runtime API for logic the recorder and built-in functions can't express.
Read →Integrating LoadRunner with Monitoring Tools for Root-Cause Analysis
How to correlate LoadRunner test results with server-side and APM monitoring data to find the real cause of performance regressions.
Read →LoadRunner vs Modern Load Testing Tools: When to Use It
A practical framework for deciding when LoadRunner is the right choice in 2026 versus open-source alternatives like k6, JMeter, and Gatling.
Read →Getting Started with k6: Modern Load Testing in JavaScript
An introduction to k6, Grafana's open-source load testing tool, and why its code-first JavaScript scripting model fits modern CI/CD workflows.
Read →k6 Scenarios and Executors: Modeling Realistic Load Shapes
How k6's scenarios and executors let you model open-system arrival-rate traffic and closed-system concurrent-user traffic precisely.
Read →k6 Thresholds and Checks: Automating Pass/Fail Criteria
How k6 thresholds turn performance budgets into automated pass/fail criteria for CI, and how they differ from checks.
Read →Introduction to Gatling: Scala-Based Load Testing
What Gatling is, its Scala/Java DSL approach to scripting, and where it fits for JVM-comfortable teams doing serious load testing.
Read →Running k6 in CI/CD and k6 Cloud
How to integrate k6 into a CI/CD pipeline, and when k6 Cloud's distributed execution is worth it over self-hosted runs.
Read →Extending k6 with xk6: Custom Protocols and Functionality
How xk6 lets you build custom k6 binaries with extended protocol support and functionality beyond what's built in.
Read →Gatling Assertions and Reports
How Gatling's global and per-request assertions provide CI-friendly pass/fail criteria, and how to read its generated HTML reports.
Read →Gatling Simulations and Injection Profiles
How Gatling's injection profiles (rampUsers, constantUsersPerSec, and more) model different load shapes, and how to choose the right one.
Read →Introduction to Locust: Python-Based Load Testing
What Locust is, how its Python-based, code-first approach compares to k6 and Gatling, and when it's the right choice for your team.
Read →Chaos Engineering: Testing Reliability by Breaking Things on Purpose
What chaos engineering is, how to run a safe first experiment, and how it connects to error budgets and SLOs.
Read →Distributed Load Testing with Locust
How to run Locust in distributed mode across multiple machines, and the practical considerations for scaling beyond a single node.
Read →Introduction to NeoLoad: Tricentis's Performance Testing Platform
What NeoLoad is, its Design Studio and Controller-based workflow, and where it sits between LoadRunner and open-source tools.
Read →Capacity Planning with the Universal Scalability Law
How the Universal Scalability Law models contention and coherency penalties to predict where a system's throughput will actually peak and decline.
Read →Writing Incident Response Runbooks That Actually Get Used
What makes an incident runbook useful under real pressure versus one that gets ignored, with a practical structure to follow.
Read →On-Call Best Practices That Prevent Burnout
Practical on-call practices — rotation design, alert quality, and post-incident follow-up — that keep on-call sustainable rather than dreaded.
Read →Building a Genuine Blameless Postmortem Culture
What separates a blameless postmortem culture that actually works from one that's blameless only in name, and how to build the former.
Read →SRE vs DevOps vs Platform Engineering: What Actually Differs
A clear-eyed comparison of SRE, DevOps, and platform engineering as organizational approaches, and where the real differences (and overlaps) lie.
Read →Toil Reduction: Identifying and Eliminating Operational Toil
What SRE means by 'toil,' how to identify it systematically, and a practical framework for deciding what to automate first.
Read →Monitoring vs Observability: A Practical Distinction
What actually separates monitoring from observability beyond the buzzword, and why the distinction matters for debugging unknown failure modes.
Read →Runbooks vs Playbooks: A Useful Distinction for Incident Response
The practical difference between an incident runbook and a playbook, and when each is the right tool to write and maintain.
Read →SRE Team Topologies: Embedded, Centralized, and Hybrid Models
How SRE teams are typically organized — embedded, centralized, and hybrid models — and the trade-offs each makes between context and consistency.
Read →Continuous Batching: How Modern LLM Servers Achieve High Throughput
How continuous batching differs from static batching, why it's central to vLLM and TGI's throughput advantage, and what it costs individual requests.
Read →Prompt Caching and KV Cache: Why Repeated Context Gets Cheaper
How prompt/KV caching reduces cost and latency for repeated context in LLM applications, and when it actually helps versus doesn't.
Read →Benchmarking Vector Database Performance for RAG Systems
What actually matters when benchmarking a vector database for retrieval-augmented generation — recall, latency, and indexing trade-offs.
Read →GPU Utilization for LLM Model Serving: What to Actually Measure
Why GPU utilization percentage alone is a misleading metric for LLM serving, and what to measure instead to understand real efficiency.
Read →Quantization and Performance Trade-offs in LLM Serving
How model quantization (INT8, INT4, and similar) trades accuracy for latency, throughput, and memory savings, and how to evaluate the trade-off.
Read →Optimizing RAG Pipeline Latency: Where the Time Actually Goes
A breakdown of where latency accumulates in a retrieval-augmented generation pipeline, and the highest-leverage places to optimize it.
Read →Benchmarking Open-Source LLM Inference Servers: vLLM, TGI, and Ollama
A practical comparison framework for benchmarking vLLM, TGI, and Ollama, and what each is actually optimized for.
Read →Load Testing LLM APIs: A Practical Guide
How to design a load test specifically for LLM APIs, covering realistic prompt distributions, streaming measurement, and concurrency sweeps.
Read →Token Economics 101: Understanding LLM API Cost Structure
How LLM API pricing actually works — input vs output token pricing, why output costs more, and the practical levers for controlling cost.
Read →OpenTelemetry for Performance Engineers: A Practical Start
A practical introduction to OpenTelemetry's traces, metrics, and logs, and how to instrument a service for meaningful performance analysis.
Read →Prometheus and Grafana Basics for Performance Monitoring
How Prometheus's pull-based metrics model and PromQL work, and how to build Grafana dashboards that actually answer performance questions.
Read →The RED Method: Rate, Errors, Duration for Service Monitoring
How the RED method gives a simple, consistent framework for monitoring any request-driven service, and how it complements the USE method.
Read →Distributed Tracing Explained: Spans, Context, and Sampling
How distributed tracing actually works under the hood — spans, trace context propagation, and sampling strategies — explained from first principles.
Read →Structured Logging Best Practices for Debuggable Systems
Why structured logging (key-value fields, not free text) matters for debugging at scale, and practical conventions worth adopting.
Read →The USE Method: Utilization, Saturation, Errors for Resource Monitoring
How Brendan Gregg's USE method systematically checks system resources for performance bottlenecks, and how it pairs with the RED method.
Read →APM Tool Comparison: Datadog, Dynatrace, and New Relic
A practical comparison of how Datadog, Dynatrace, and New Relic approach instrumentation, AI-assisted root-cause analysis, and pricing.
Read →Building SLO Dashboards That Drive Real Decisions
How to design an SLO dashboard that actually informs the ship/freeze decisions error budgets are meant to enable, not just display pretty graphs.
Read →Little's Law for Performance Engineers, with Worked Examples
An intuitive explanation of Little's Law (L = λW), how to derive concurrency, throughput, or latency from the other two, and common misuses.
Read →Amdahl's Law for Performance Engineers
How Amdahl's Law quantifies the limit parallelization can achieve when part of a workload is inherently serial, with practical examples.
Read →Queueing Theory Basics for Performance Engineers
An accessible introduction to queueing theory concepts — utilization, queue length, and waiting time — and why systems get dramatically slower near full utilization.
Read →Why p99 Matters: Understanding Latency Percentiles
What latency percentiles actually mean, why averages systematically mislead, and the pitfalls of averaging or combining percentiles incorrectly.
Read →Concurrency vs Parallelism: A Clear Distinction
The genuine technical distinction between concurrency and parallelism, why it matters for performance reasoning, and common confusions.
Read →Garbage Collection Tuning Fundamentals
The core concepts behind garbage collector tuning — generational collection, pause times, and throughput trade-offs — applicable across JVM, .NET, and Go.
Read →Throughput vs Latency: Why You Usually Can't Maximize Both
Why throughput and latency often trade off against each other through batching, and how to decide where to sit on that trade-off curve.
Read →Setting Performance Budgets for Web Applications
How to set practical performance budgets (page weight, load time, Core Web Vitals) and enforce them in CI before they regress in production.
Read →Synthetic Monitoring vs Real User Monitoring (RUM)
How synthetic monitoring and real user monitoring complement each other for understanding production performance, and when to rely on each.
Read →Spike, Stress, and Soak Testing: Three Different Questions
How spike testing, stress testing, and soak testing each answer a different reliability question, and why a single load test can't cover all three.
Read →How to Write a Performance Test Plan That Answers a Real Question
A practical template for a performance test plan that starts from a specific question, not a generic checklist of tools and metrics.
Read →A Pre-Launch Performance Testing Checklist
A practical checklist to run through before considering a performance testing effort complete and ready to inform a launch decision.
Read →Top Performance Testing Mistakes (and How to Avoid Them)
A roundup of the most common, costly performance testing mistakes across tools and teams, distilled into a practical avoidance guide.
Read →Understanding Apdex: Translating Latency into User Satisfaction
What the Apdex score actually measures, how to set its thresholds meaningfully, and its limitations as a single summary metric.
Read →How to Calculate an Error Budget, Step by Step
A step-by-step walkthrough of calculating an error budget from an SLO, with worked examples at different reliability targets.
Read →What is DevPerfOps? Performance as a First-Class Citizen
DevPerfOps extends DevOps by embedding performance engineering across the entire delivery pipeline — shifting it left from a pre-release gate to a continuous, shared responsibility.
Read →