> ## Documentation Index
> Fetch the complete documentation index at: https://gomodel-feature-failover-management.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Benchmarks

> A short, up-to-date summary of GoModel benchmark results, with a link to the full write-up and the tooling to reproduce it.

## Benchmark snapshot

This page is a short reference for our latest public benchmark: GoModel against
**LiteLLM, Portkey, and Bifrost**, all pointed at the same instant mock backend so
the numbers reflect gateway overhead, not model latency.

The full article has the complete write-up, all the context, and the charts:
[AI Gateway Benchmark 2026: GoModel vs LiteLLM, Portkey & Bifrost](https://enterpilot.io/blog/gomodel-vs-litellm-portkey-bifrost-june-2026/).

<Note>
  This is a point-in-time snapshot from a June 2026 run on AWS. Treat it as data,
  not dogma. Gateway performance depends on your workload, provider mix, deployment
  setup, and tuning. Older runs (March 2026, LiteLLM only, on localhost) are still
  on the blog for history.
</Note>

## What we tested

A simple, like-for-like setup:

* One gateway at a time, in Docker, on an AWS `c7i.large` (2 vCPU, 4 GiB).
* The same shared mock backend for everyone, so we measure only gateway overhead.
* Six workloads: chat completions, the Responses API, and Anthropic messages -
  each streaming and non-streaming.
* `8,000` requests per workload at concurrency `10`, across two randomized-order
  trials (latency is the median across them).
* Fair config: retries off for everyone, GoModel's circuit breaker off, and
  LiteLLM run at its recommended one worker per CPU core.

## At a glance

GoModel came out ahead on every operational signal most teams care about:
the tightest latency tail, the highest sustained throughput, the smallest image
and memory, and the fastest cold start.

| Gateway     | p50 (ms)  | p99 (ms)  | Throughput (req/s) | Peak RAM    | Image (compressed) | Cold start   |
| ----------- | --------- | --------- | ------------------ | ----------- | ------------------ | ------------ |
| **GoModel** | **`1.8`** | **`6.9`** | **`4,900`**        | **`37 MB`** | **`16 MB`**        | **`0.56 s`** |
| Bifrost     | `2.5`     | `18.3`    | `3,100`            | `143 MB`    | `77 MB`            | `7.1 s`      |
| Portkey     | `9.7`     | `30.5`    | `950`              | `112 MB`    | `59 MB`            | `1.1 s`      |
| LiteLLM     | `30.6`    | `39.3`    | `324`              | `2.3 GB`    | `372 MB`           | `25.5 s`     |

Latency is chat completions, non-streaming (representative). Throughput is the
sustained rate from a separate concurrency sweep. Image size is the compressed
pull size.

## Key readouts

* GoModel has both the lowest median (`1.8 ms`) and the tightest tail (`6.9 ms`).
* It pushes the most traffic per box (`~4,900 req/s`) and the most per CPU core.
* It is the smallest to ship and run: a `16 MB` compressed image and `37 MB` of
  RAM under load, ready to serve `0.56 s` after launch.
* LiteLLM, even at its recommended multi-worker config, uses `~2.3 GB` of RAM and
  takes `~25 s` to start - the cost of Python on the hot path.
* Portkey did not serve the Anthropic messages dialect in this single-provider
  setup, so it covers 4 of the 6 workloads.

## Reproduce it yourself

The whole thing is one command. It provisions a small AWS box, runs all four
gateways against the same mock backend, prints the tables, and tears the
infrastructure back down on its own.

<Warning>
  This runs on **paid** AWS infrastructure, not the free tier. A `c7i.large` is
  about $0.09/hour and the run self-destructs within an hour or two, so budget   **under $1\*\* per run to be safe. If you pass `KEEP=1` or a teardown fails, you
  keep paying until you destroy the box - so confirm it is gone.
</Warning>

The harness lives in the repo at
[`docs/2026-06-25_aws_gateway_benchmark/`](https://github.com/ENTERPILOT/GoModel/tree/main/docs/2026-06-25_aws_gateway_benchmark):

```bash theme={null}
# Needs Docker, Terraform, and AWS credentials
git clone https://github.com/ENTERPILOT/GoModel.git
cd gomodel/docs/2026-06-25_aws_gateway_benchmark
./run.sh
```

Knobs like `N` (requests per workload) and `REPEATS` (trials) are env vars, e.g.
`N=20000 REPEATS=5 ./run.sh` for a heavier run. For a quick local check against
just LiteLLM, the older localhost harness is still in
[`docs/about/benchmark-tools/`](https://github.com/ENTERPILOT/GoModel/tree/main/docs/about/benchmark-tools).

## Why this page is short

It is meant to give you the result fast, inside the product docs, without a full
article. For the narrative, the charts, and the methodology details, read the
[full post](https://enterpilot.io/blog/gomodel-vs-litellm-portkey-bifrost-june-2026/).

No single benchmark settles the question for every environment. If you are
evaluating gateways seriously, reproduce the test against your own traffic and
infrastructure.