Soterra Labs Anvil · MLPerf Results
MLPerf Inference Results
Published MLPerf Inference Datacenter results for common workloads, sourced from MLCommons. Every row links to the official submission detail page.
Ingested: (—)
How to read these tables
- Workloads — llama2-70b-99 (Meta Llama 2 70B chat), llama3.1-405b (Meta Llama 3.1 405B — the largest tracked model), mixtral-8x7b (Mistral Mixtral mixture-of-experts), stable-diffusion-xl (text-to-image). The -99 / -99.9 suffix is the accuracy track (see below).
- Server vs Offline — MLPerf's two scenario types. Server simulates real-time traffic with per-query latency limits (the system has to keep up with arriving requests). Offline is batch processing with no latency constraint — pure peak throughput. The same hardware will score differently on each.
- Engine — the inference / serving software (TensorRT-LLM, vLLM, LLMBoost, etc.). Throughput often differs more between engines than between hardware variants of the same chip — a top result on LLMBoost or a vendor-internal stack means the buyer would need that same stack to reproduce. Pulled from MLCommons' Software field; version numbers stripped.
- # GPUs — total GPU/AI chips the submission used (per-node count × nodes). The system column counts the whole rack; the per-GPU column divides by this.
- Tokens/s/GPU vs Tokens/s (system) — the per-GPU column is the buyer-comparable metric (what does one chip do?). The system column is the absolute total MLCommons reports — useful for capacity planning but rewards bigger clusters on a leaderboard. Rows here are sorted by per-GPU rate. Different workloads use different units: LLMs report tokens/s, image generation samples/s, classifiers queries/s.
- Accuracy: 99.0% vs 99.9% — MLPerf's two accuracy tiers per workload. Both mean the submission cleared a quality bar relative to a reference baseline; 99.9% is the stricter "very-high-accuracy" track. This is not the model's absolute accuracy — it's which accuracy track the submitter chose.
Scroll table sideways to see all columns →
llama2-70b-99 — Offline
| GPU Type | Submitter | System | # GPUs | Engine | Tokens/s/GPU | Tokens/s (system) | Accuracy |
|---|---|---|---|---|---|---|---|
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 13,015.4 | 52,061.6 | 99% |
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 13,015.4 | 52,061.6 | 99% |
| NVIDIA Blackwell GB200 | NVIDIA | — | 4 | TensorRT | 12,934.2 | 51,736.9 | 99% |
| NVIDIA Blackwell GB200 | NVIDIA | — | 4 | TensorRT | 12,934.2 | 51,736.9 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 12,420.9 | 49,683.5 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 12,420.9 | 49,683.5 | 99% |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR680a V3 | 8 | TensorRT | 12,863.6 | 102,909.0 | 99% |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR680a V3 | 8 | TensorRT | 12,863.6 | 102,909.0 | 99% |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 12,840.6 | 102,725.0 | 99% |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 12,840.6 | 102,725.0 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 12,812.2 | 102,498.0 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 12,812.2 | 102,498.0 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 12,766.4 | 102,131.0 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 12,766.4 | 102,131.0 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 12,690.9 | 101,527.0 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 12,690.9 | 101,527.0 | 99% |
| NVIDIA Blackwell B200 | Supermicro | AS-4126GS-NBR-LCC | 8 | TensorRT | 12,689.5 | 101,516.0 | 99% |
| NVIDIA Blackwell B200 | Supermicro | AS-4126GS-NBR-LCC | 8 | TensorRT | 12,689.5 | 101,516.0 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 12,656.6 | 101,253.0 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 12,656.6 | 101,253.0 | 99% |
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 12,655.8 | 101,246.0 | 99% |
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 12,655.8 | 101,246.0 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 12,326.0 | 98,607.7 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 12,326.0 | 98,607.7 | 99% |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 9,807.9 | 78,463.5 | 99% | |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 9,807.9 | 78,463.5 | 99% | |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680 | 8 | TensorRT | 4,414.6 | 35,316.7 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680 | 8 | TensorRT | 4,414.6 | 35,316.7 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 4,386.4 | 35,091.5 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 4,386.4 | 35,091.5 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 4,384.4 | 35,075.1 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 4,384.4 | 35,075.1 | 99% |
| NVIDIA Hopper H200 | NVIDIA H200 | 8 | TensorRT | 4,380.5 | 35,043.7 | 99% | |
| NVIDIA Hopper H200 | NVIDIA H200 | 8 | TensorRT | 4,380.5 | 35,043.7 | 99% | |
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 4,370.6 | 34,964.6 | 99% |
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 4,370.6 | 34,964.6 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 4,351.5 | 34,812.1 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 4,351.5 | 34,812.1 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 4,323.5 | 34,588.3 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 4,323.5 | 34,588.3 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 4,318.5 | 34,548.2 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 4,318.5 | 34,548.2 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 4,305.9 | 68,894.8 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 4,305.9 | 68,894.8 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 4,287.7 | 34,301.9 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 4,287.7 | 34,301.9 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 3,908.4 | 31,267.4 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 3,908.4 | 31,267.4 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ESC8000A-E12 | 8 | TensorRT | 3,810.9 | 30,487.0 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ESC8000A-E12 | 8 | TensorRT | 3,810.9 | 30,487.0 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 3,764.4 | 15,057.6 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 3,764.4 | 15,057.6 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,319.4 | 34,555.0 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,319.4 | 34,555.0 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,315.1 | 34,520.4 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,315.1 | 34,520.4 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 4,310.4 | 34,483.2 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 4,310.4 | 34,483.2 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost | 8 | LLMBoost | 4,306.8 | 34,454.4 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost | 8 | LLMBoost | 4,306.8 | 34,454.4 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 4,248.4 | 33,987.4 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 4,248.4 | 33,987.4 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 4,220.3 | 33,762.5 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 4,220.3 | 33,762.5 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 4,212.9 | 33,703.0 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 4,212.9 | 33,703.0 | 99% |
| AMD Instinct MI325X | Supermicro | AS-8126GS-TNMR | 8 | LLMBoost | 4,183.6 | 33,468.4 | 99% |
| AMD Instinct MI325X | Supermicro | AS-8126GS-TNMR | 8 | LLMBoost | 4,183.6 | 33,468.4 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 4,095.1 | 32,760.5 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 4,095.1 | 32,760.5 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 16 | LLMBoost | 4,082.5 | 65,320.1 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 16 | LLMBoost | 4,082.5 | 65,320.1 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 24 | LLMBoost | 3,839.9 | 92,158.2 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 24 | LLMBoost | 3,839.9 | 92,158.2 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost Heterogeneous Cluster | 48 | LLMBoost | 3,524.9 | 169,197.0 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost Heterogeneous Cluster | 48 | LLMBoost | 3,524.9 | 169,197.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 3,902.5 | 124,879.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 3,902.5 | 124,879.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 3,897.0 | 62,351.6 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 3,897.0 | 62,351.6 | 99% |
| AMD Instinct MI300X | MangoBoost | Supermicro AS-8125GS-TNMR2 | 8 | LLMBoost | 3,481.8 | 27,854.4 | 99% |
| AMD Instinct MI300X | MangoBoost | Supermicro AS-8125GS-TNMR2 | 8 | LLMBoost | 3,481.8 | 27,854.4 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 3,475.5 | 27,803.9 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 3,475.5 | 27,803.9 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 8 | LLMBoost | 3,459.8 | 27,678.1 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 8 | LLMBoost | 3,459.8 | 27,678.1 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 3,415.7 | 27,325.9 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 3,415.7 | 27,325.9 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 16 | LLMBoost | 3,348.3 | 53,572.2 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 16 | LLMBoost | 3,348.3 | 53,572.2 | 99% |
llama2-70b-99 — Server
| GPU Type | Submitter | System | # GPUs | Engine | Tokens/s/GPU | Tokens/s (system) | Accuracy |
|---|---|---|---|---|---|---|---|
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 12,701.4 | 101,611.0 | 99% |
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 12,701.4 | 101,611.0 | 99% |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 12,499.2 | 99,993.9 | 99% |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 12,499.2 | 99,993.9 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 12,401.8 | 99,214.4 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 12,401.8 | 99,214.4 | 99% |
| NVIDIA Blackwell B200 | Supermicro | AS-4126GS-NBR-LCC | 8 | TensorRT | 12,398.3 | 99,186.5 | 99% |
| NVIDIA Blackwell B200 | Supermicro | AS-4126GS-NBR-LCC | 8 | TensorRT | 12,398.3 | 99,186.5 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 12,397.6 | 99,181.1 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 12,397.6 | 99,181.1 | 99% |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 12,396.2 | 99,169.8 | 99% | |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 12,396.2 | 99,169.8 | 99% | |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR680a V3 | 8 | TensorRT | 12,394.9 | 99,159.3 | 99% |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR680a V3 | 8 | TensorRT | 12,394.9 | 99,159.3 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 12,392.4 | 99,139.2 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 12,392.4 | 99,139.2 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 12,390.4 | 99,123.0 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 12,390.4 | 99,123.0 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 12,383.2 | 99,066.0 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 12,383.2 | 99,066.0 | 99% |
| NVIDIA Blackwell GB200 | NVIDIA | — | 4 | TensorRT | 12,339.9 | 49,359.6 | 99% |
| NVIDIA Blackwell GB200 | NVIDIA | — | 4 | TensorRT | 12,339.9 | 49,359.6 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 12,304.0 | 49,215.9 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 12,304.0 | 49,215.9 | 99% |
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 11,504.1 | 46,016.5 | 99% |
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 11,504.1 | 46,016.5 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 4,274.2 | 34,193.8 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 4,274.2 | 34,193.8 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 4,253.7 | 34,029.4 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 4,253.7 | 34,029.4 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 4,173.1 | 33,384.7 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 4,173.1 | 33,384.7 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 4,169.5 | 33,356.2 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 4,169.5 | 33,356.2 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680 | 8 | TensorRT | 4,155.5 | 33,244.3 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680 | 8 | TensorRT | 4,155.5 | 33,244.3 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 4,152.4 | 66,439.0 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 4,152.4 | 66,439.0 | 99% |
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 4,145.6 | 33,164.4 | 99% |
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 4,145.6 | 33,164.4 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 4,145.4 | 33,163.3 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 4,145.4 | 33,163.3 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 4,145.2 | 33,161.4 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 4,145.2 | 33,161.4 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 3,633.8 | 29,070.3 | 99% |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 3,633.8 | 29,070.3 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 3,434.0 | 13,736.1 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 3,434.0 | 13,736.1 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ESC8000A-E12 | 8 | TensorRT | 3,201.8 | 25,614.7 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ESC8000A-E12 | 8 | TensorRT | 3,201.8 | 25,614.7 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 4,017.4 | 32,139.4 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 4,017.4 | 32,139.4 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,003.4 | 32,027.6 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 4,003.4 | 32,027.6 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 3,969.5 | 31,755.7 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 3,969.5 | 31,755.7 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost | 8 | LLMBoost | 3,959.0 | 31,671.9 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost | 8 | LLMBoost | 3,959.0 | 31,671.9 | 99% |
| AMD Instinct MI325X | Supermicro | AS-8126GS-TNMR | 8 | LLMBoost | 3,955.8 | 31,646.2 | 99% |
| AMD Instinct MI325X | Supermicro | AS-8126GS-TNMR | 8 | LLMBoost | 3,955.8 | 31,646.2 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 3,925.2 | 31,401.4 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 3,925.2 | 31,401.4 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 3,905.1 | 31,241.0 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 3,905.1 | 31,241.0 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 3,792.4 | 30,339.4 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 3,792.4 | 30,339.4 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 16 | LLMBoost | 3,587.0 | 57,391.8 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 16 | LLMBoost | 3,587.0 | 57,391.8 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 24 | LLMBoost | 3,358.1 | 80,594.9 | 99% |
| AMD Instinct MI325X | Supermicro MangoBoost | Supermicro AS-8126GS-TNMR | 24 | LLMBoost | 3,358.1 | 80,594.9 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost Heterogeneous Cluster | 48 | LLMBoost | 3,189.1 | 153,076.0 | 99% |
| AMD Instinct MI325X | MangoBoost | MangoBoost Heterogeneous Cluster | 48 | LLMBoost | 3,189.1 | 153,076.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 3,821.1 | 122,274.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 3,821.1 | 122,274.0 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 3,818.8 | 61,101.3 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 3,818.8 | 61,101.3 | 99% |
| AMD Instinct MI300X | MangoBoost | Supermicro AS-8125GS-TNMR2 | 8 | LLMBoost | 3,107.6 | 24,860.5 | 99% |
| AMD Instinct MI300X | MangoBoost | Supermicro AS-8125GS-TNMR2 | 8 | LLMBoost | 3,107.6 | 24,860.5 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 3,093.4 | 24,747.6 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 3,093.4 | 24,747.6 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 3,074.2 | 24,593.8 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 3,074.2 | 24,593.8 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 8 | LLMBoost | 3,066.6 | 24,532.7 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 8 | LLMBoost | 3,066.6 | 24,532.7 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 16 | LLMBoost | 2,972.6 | 47,561.7 | 99% |
| AMD Instinct MI300X | Dell MangoBoost | Dell PowerEdge XE9680 | 16 | LLMBoost | 2,972.6 | 47,561.7 | 99% |
llama3.1-405b — Server
| GPU Type | Submitter | System | # GPUs | Engine | Tokens/s/GPU | Tokens/s (system) | Accuracy |
|---|---|---|---|---|---|---|---|
| NVIDIA Blackwell GB200 | NVIDIA | — | 72 | TensorRT | 161.3 | 11,614.3 | 99% |
| NVIDIA Blackwell GB200 | NVIDIA | — | 72 | TensorRT | 161.3 | 11,614.3 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 149.0 | 596.1 | 99% |
| NVIDIA Blackwell GB200 | Nebius | Nebius GB200 | 4 | TensorRT | 149.0 | 596.1 | 99% |
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 146.0 | 584.0 | 99% |
| NVIDIA Blackwell GB200 | Azure | ND_GB200_v6 | 4 | TensorRT | 146.0 | 584.0 | 99% |
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 159.9 | 1,279.5 | 99% |
| NVIDIA Blackwell B200 | Nebius | Nebius B200 | 8 | TensorRT | 159.9 | 1,279.5 | 99% |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR780a V3 | 8 | TensorRT | 156.1 | 1,249.0 | 99% |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR780a V3 | 8 | TensorRT | 156.1 | 1,249.0 | 99% |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 155.8 | 1,246.8 | 99% | |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 155.8 | 1,246.8 | 99% |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 155.8 | 1,246.8 | 99% | |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 155.8 | 1,246.8 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 155.8 | 1,246.6 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 155.8 | 1,246.6 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 155.6 | 1,245.0 | 99% |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 155.6 | 1,245.0 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 155.6 | 1,244.7 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GS-NBRT-LCC | 8 | TensorRT | 155.6 | 1,244.7 | 99% |
| NVIDIA Blackwell B200 | Broadcom Supermicro | Supermicro SYS-422GA-NBRT-LCC | 8 | TensorRT | 155.5 | 1,243.9 | 99% |
| NVIDIA Blackwell B200 | Broadcom Supermicro | Supermicro SYS-422GA-NBRT-LCC | 8 | TensorRT | 155.5 | 1,243.9 | 99% |
| NVIDIA Blackwell B200 | Oracle | BM.GPU.B200.8 | 8 | TensorRT | 155.5 | 1,243.7 | 99% |
| NVIDIA Blackwell B200 | Oracle | BM.GPU.B200.8 | 8 | TensorRT | 155.5 | 1,243.7 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 155.4 | 1,243.2 | 99% |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9680L | 8 | TensorRT | 155.4 | 1,243.2 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-A21GE-NBRT | 8 | TensorRT | 155.4 | 1,242.8 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-A21GE-NBRT | 8 | TensorRT | 155.4 | 1,242.8 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GA-NBRT-LCC | 8 | TensorRT | 155.1 | 1,240.4 | 99% |
| NVIDIA Blackwell B200 | Supermicro | SYS-422GA-NBRT-LCC | 8 | TensorRT | 155.1 | 1,240.4 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 155.1 | 1,240.4 | 99% |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 155.1 | 1,240.4 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 36.9 | 295.5 | 99% |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 36.9 | 295.5 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 36.9 | 295.4 | 99% |
| NVIDIA Hopper H200 | Nebius | Nebius H200 | 8 | TensorRT | 36.9 | 295.4 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 36.5 | 584.3 | 99% |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 16 | TensorRT | 36.5 | 584.3 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 34.7 | 277.3 | 99% |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 34.7 | 277.3 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 35.2 | 1,126.9 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 32 | TensorRT | 35.2 | 1,126.9 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 35.2 | 563.4 | 99% |
| NVIDIA Hopper H100 | Cisco | HPF HGX System | 16 | TensorRT | 35.2 | 563.4 | 99% |
mixtral-8x7b — Server
| GPU Type | Submitter | System | # GPUs | Engine | Tokens/s/GPU | Tokens/s (system) | Accuracy |
|---|---|---|---|---|---|---|---|
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 7,619.4 | 60,955.1 | 99% |
| NVIDIA Hopper H200 | HPE | HPE Cray XD670 | 8 | TensorRT | 7,619.4 | 60,955.1 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 7,499.1 | 59,992.8 | 99% |
| NVIDIA Hopper H200 | ASUSTeK | ASUSTeK ESC N8 H200 | 8 | TensorRT | 7,499.1 | 59,992.8 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 7,472.9 | 59,783.6 | 99% |
| AMD Instinct MI325X | AMD | QuantaGrid D74A-7U | 8 | vLLM | 7,472.9 | 59,783.6 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 7,380.1 | 59,040.8 | 99% |
| AMD Instinct MI325X | Quanta Cloud Technology | D75T-7U_8xMI325X | 8 | vLLM | 7,380.1 | 59,040.8 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 7,280.3 | 58,242.7 | 99% |
| AMD Instinct MI325X | GigaComputing | G893-ZX1-AAX2 | 8 | vLLM | 7,280.3 | 58,242.7 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 7,135.9 | 57,087.1 | 99% |
| AMD Instinct MI325X | MiTAC | G8825Z5 | 8 | vLLM | 7,135.9 | 57,087.1 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 7,081.7 | 56,653.7 | 99% |
| AMD Instinct MI325X | Vultr | Supermicro AS -8126GS-TNMR | 8 | vLLM | 7,081.7 | 56,653.7 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 6,975.0 | 55,800.3 | 99% |
| AMD Instinct MI325X | ASUSTeK | ESC_A8A_MI325X_256GBx8 | 8 | vLLM | 6,975.0 | 55,800.3 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 5,975.5 | 47,804.3 | 99% |
| AMD Instinct MI300X | AMD | Supermicro AS-8125GS-TNMR2 | 8 | vLLM | 5,975.5 | 47,804.3 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 5,975.2 | 47,801.9 | 99% |
| AMD Instinct MI300X | Dell | Dell Poweredge XE9680 | 8 | vLLM | 5,975.2 | 47,801.9 | 99% |
stable-diffusion-xl — Offline
| GPU Type | Submitter | System | # GPUs | Engine | Samples/s/GPU | Samples/s (system) | Accuracy |
|---|---|---|---|---|---|---|---|
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 4.1 | 32.6 | — |
| NVIDIA Blackwell B200 | Lambda | NVIDIA DGX B200 | 8 | TensorRT | 4.1 | 32.6 | — |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 4.0 | 32.3 | — |
| NVIDIA Blackwell B200 | Dell | Dell PowerEdge XE9685L | 8 | TensorRT | 4.0 | 32.3 | — |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 32.1 | — | |
| NVIDIA Blackwell B200 | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 32.1 | — | |
| NVIDIA Blackwell B200 | University-of-Florida | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 31.9 | — |
| NVIDIA Blackwell B200 | University-of-Florida | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 31.9 | — |
| NVIDIA Blackwell B200 | University-of-Florida | NVIDIA DGX B200 | 1 | TensorRT | 4.0 | 4.0 | — |
| NVIDIA Blackwell B200 | University-of-Florida | NVIDIA DGX B200 | 1 | TensorRT | 4.0 | 4.0 | — |
| NVIDIA Blackwell B200 | Broadcom Supermicro | Supermicro SYS-422GA-NBRT-LCC | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | Broadcom Supermicro | Supermicro SYS-422GA-NBRT-LCC | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR780a V3 | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | Lenovo | ThinkSystem SR780a V3 | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | NVIDIA | NVIDIA DGX B200 | 8 | TensorRT | 4.0 | 31.8 | — |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 3.9 | 31.6 | — |
| NVIDIA Blackwell B200 | GigaComputing | G894-SD1 | 8 | TensorRT | 3.9 | 31.6 | — |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 2.4 | 19.2 | — |
| NVIDIA Hopper H200 | Quanta Cloud Technology | QuantaGrid D74H-7U | 8 | TensorRT | 2.4 | 19.2 | — |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 2.4 | 19.1 | — |
| NVIDIA Hopper H200 | Cisco | Cisco UCS C885A M8 | 8 | TensorRT | 2.4 | 19.1 | — |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 2.3 | 18.6 | — |
| NVIDIA Hopper H200 | Dell Broadcom | Dell PowerEdge XE9680 | 8 | TensorRT | 2.3 | 18.6 | — |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 2.1 | 17.1 | — |
| NVIDIA Hopper H200 | Dell | Dell PowerEdge XE7745 | 8 | TensorRT | 2.1 | 17.1 | — |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 2.1 | 8.4 | — |
| NVIDIA Hopper H200 | Quanta Cloud Technology | D75E-4U_H200-NVL-141GBx4 | 4 | TensorRT | 2.1 | 8.4 | — |
| AMD Instinct MI325X | AMD | Quanta S7PA | 8 | shark-ai | 2.3 | 18.6 | — |
| AMD Instinct MI325X | AMD | Quanta S7PA | 8 | shark-ai | 2.3 | 18.6 | — |
| NVIDIA Hopper H100 | The Stage | Nebius H100 | 8 | TheStageAI | 2.3 | 18.1 | — |
| NVIDIA Hopper H100 | The Stage | Nebius H100 | 8 | TheStageAI | 2.3 | 18.1 | — |
| NVIDIA L40S | Dell | Dell PowerEdge XE7740 | 8 | TensorRT | 0.7 | 5.5 | — |
| NVIDIA L40S | Dell | Dell PowerEdge XE7740 | 8 | TensorRT | 0.7 | 5.5 | — |
| NVIDIA L4 | Dell | PowerEdge R570 | 4 | TensorRT | 0.3 | 1.0 | — |
| NVIDIA L4 | Dell | PowerEdge R570 | 4 | TensorRT | 0.3 | 1.0 | — |