LLM Cost & Latency Estimator

Estimate per-request and monthly token cost plus end-to-end latency (TTFT + decode) for an LLM workload. Free, instant, no signup.

Cost / request
Cost / month
Latency (p~avg)
  • Input cost / req:
  • Output cost / req:
  • Generation time: (output ÷ decode speed)

The model

Two independent estimates, both driven by token counts:

cost_per_request = input_tokens/1e6 * input_price
                 + output_tokens/1e6 * output_price
cost_per_month   = cost_per_request * requests_per_month

latency ≈ TTFT + output_tokens / decode_speed

Output tokens dominate both bills: they're typically priced several times higher than input, and they're the term that scales latency via the decode loop. Two practical levers — cap output length and raise decode throughput (batching, faster hardware, smaller/quantized models) — move both numbers at once.

Caveats

Discussions coming soon.

Comments are powered by Giscus (GitHub Discussions). Enable them by configuring GISCUS in src/consts.ts — see giscus.app.