Concurrency vs Parallelism: A Clear Distinction

The genuine technical distinction between concurrency and parallelism, why it matters for performance reasoning, and common confusions.

· By perf-test.com Editorial · AI-assisted
concurrencyparallelismconcepts

“Concurrency” and “parallelism” get used interchangeably in casual conversation, but they describe genuinely different things, and the distinction has real practical consequences for reasoning about performance — particularly for understanding what a given concurrency model can and can’t actually speed up.

The core distinction

Concurrency is about managing multiple tasks that are in progress at the same time — they may or may not actually execute simultaneously; what matters is that their execution is interleaved or otherwise being juggled by the system. Parallelism is specifically about tasks actually executing simultaneously, typically requiring multiple physical execution units (CPU cores). A single-core machine can be concurrent (rapidly switching between tasks, giving the appearance of simultaneity) but cannot be truly parallel — there’s only one core actually executing instructions at any given instant.

A practical example: async I/O on a single thread

A single-threaded event loop (Node.js’s traditional model, for instance) handling many simultaneous network connections is a clear example of concurrency without parallelism for the CPU-bound portions of the work — while waiting on I/O (a network response, a disk read), the single thread switches to handling another connection, achieving high concurrency in terms of many in-flight operations, without any actual parallel CPU execution happening, since there’s only one thread doing CPU work at any instant.

Why this distinction matters for performance reasoning

If a workload is I/O-bound (mostly waiting on network/disk, not actually consuming CPU), concurrency alone (without parallelism) can dramatically improve throughput — many operations can be in-flight simultaneously, each making progress while others wait, without needing multiple CPU cores at all. If a workload is CPU-bound (genuinely needs computation, not waiting), concurrency without parallelism provides no speedup for the CPU-bound portion — you genuinely need multiple cores actually executing in parallel to get more CPU-bound work done in the same wall-clock time, directly connecting to Amdahl’s Law’s discussion (covered in this site’s dedicated article) of how much parallel execution actually helps.

Why “just use more threads” doesn’t always help

A common confusion: adding more threads to a CPU-bound workload on a machine with limited CPU cores doesn’t provide parallelism beyond the core count — you get more concurrency (more tasks notionally “in progress”), but the actual parallel execution is capped by available cores, and excessive thread count beyond that can even hurt performance due to context-switching and scheduling overhead, without providing any additional real parallel throughput.

Connecting to load testing virtual user models

This distinction has a direct echo in load testing tool design (covered throughout this site’s JMeter, k6, Locust, and Gatling articles): a load generator’s ability to simulate many concurrent virtual users efficiently on limited hardware depends heavily on whether its concurrency model is lightweight (event-loop/coroutine-based, achieving high concurrency without needing a dedicated OS thread per virtual user) or thread-heavy (JMeter’s traditional thread-per-virtual-user model, which scales less efficiently per machine for very high virtual user counts) — this is part of why tools like Locust (gevent-based) and k6 (goroutine-based) can often simulate more concurrent virtual users per machine than JMeter’s traditional thread model.

Structured concurrency and modern language features

Newer concurrency primitives (Go’s goroutines, Kotlin’s coroutines, structured concurrency proposals in various languages) aim to make writing highly concurrent code safer and more manageable than raw OS threads, while still ultimately being scheduled onto however many CPU cores are actually available for the parallel-execution portion of the work — the abstraction makes concurrent code easier to write correctly, but doesn’t change the underlying concurrency-vs-parallelism distinction or the hardware core count available for genuine parallel execution.

Takeaway: concurrency is about managing many in-progress tasks; parallelism is about actually executing tasks simultaneously on multiple cores — knowing whether your workload is I/O-bound (where concurrency alone helps a lot) or CPU-bound (where you need genuine parallelism, capped by core count) determines which lever will actually improve your performance.

Discussions coming soon.

Comments are powered by Giscus (GitHub Discussions). Enable them by configuring GISCUS in src/consts.ts — see giscus.app.