Benchmark snapshot
This page is a short reference for our latest public benchmark: GoModel against LiteLLM, Portkey, and Bifrost, all pointed at the same instant mock backend so the numbers reflect gateway overhead, not model latency. The full article has the complete write-up, all the context, and the charts: AI Gateway Benchmark 2026: GoModel vs LiteLLM, Portkey & Bifrost.This is a point-in-time snapshot from a June 2026 run on AWS. Treat it as data,
not dogma. Gateway performance depends on your workload, provider mix, deployment
setup, and tuning. Older runs (March 2026, LiteLLM only, on localhost) are still
on the blog for history.
What we tested
A simple, like-for-like setup:- One gateway at a time, in Docker, on an AWS
c7i.large(2 vCPU, 4 GiB). - The same shared mock backend for everyone, so we measure only gateway overhead.
- Six workloads: chat completions, the Responses API, and Anthropic messages - each streaming and non-streaming.
8,000requests per workload at concurrency10, across two randomized-order trials (latency is the median across them).- Fair config: retries off for everyone, GoModel’s circuit breaker off, and LiteLLM run at its recommended one worker per CPU core.
At a glance
GoModel came out ahead on every operational signal most teams care about: the tightest latency tail, the highest sustained throughput, the smallest image and memory, and the fastest cold start.| Gateway | p50 (ms) | p99 (ms) | Throughput (req/s) | Peak RAM | Image (compressed) | Cold start |
|---|---|---|---|---|---|---|
| GoModel | 1.8 | 6.9 | 4,900 | 37 MB | 16 MB | 0.56 s |
| Bifrost | 2.5 | 18.3 | 3,100 | 143 MB | 77 MB | 7.1 s |
| Portkey | 9.7 | 30.5 | 950 | 112 MB | 59 MB | 1.1 s |
| LiteLLM | 30.6 | 39.3 | 324 | 2.3 GB | 372 MB | 25.5 s |
Key readouts
- GoModel has both the lowest median (
1.8 ms) and the tightest tail (6.9 ms). - It pushes the most traffic per box (
~4,900 req/s) and the most per CPU core. - It is the smallest to ship and run: a
16 MBcompressed image and37 MBof RAM under load, ready to serve0.56 safter launch. - LiteLLM, even at its recommended multi-worker config, uses
~2.3 GBof RAM and takes~25 sto start - the cost of Python on the hot path. - Portkey did not serve the Anthropic messages dialect in this single-provider setup, so it covers 4 of the 6 workloads.
Reproduce it yourself
The whole thing is one command. It provisions a small AWS box, runs all four gateways against the same mock backend, prints the tables, and tears the infrastructure back down on its own. The harness lives in the repo atdocs/2026-06-25_aws_gateway_benchmark/:
N (requests per workload) and REPEATS (trials) are env vars, e.g.
N=20000 REPEATS=5 ./run.sh for a heavier run. For a quick local check against
just LiteLLM, the older localhost harness is still in
docs/about/benchmark-tools/.