How to read these tables

  • Workloadsllama2-70b-99 (Meta Llama 2 70B chat), llama3.1-405b (Meta Llama 3.1 405B — the largest tracked model), mixtral-8x7b (Mistral Mixtral mixture-of-experts), stable-diffusion-xl (text-to-image). The -99 / -99.9 suffix is the accuracy track (see below).
  • Server vs Offline — MLPerf's two scenario types. Server simulates real-time traffic with per-query latency limits (the system has to keep up with arriving requests). Offline is batch processing with no latency constraint — pure peak throughput. The same hardware will score differently on each.
  • Engine — the inference / serving software (TensorRT-LLM, vLLM, LLMBoost, etc.). Throughput often differs more between engines than between hardware variants of the same chip — a top result on LLMBoost or a vendor-internal stack means the buyer would need that same stack to reproduce. Pulled from MLCommons' Software field; version numbers stripped.
  • # GPUs — total GPU/AI chips the submission used (per-node count × nodes). The system column counts the whole rack; the per-GPU column divides by this.
  • Tokens/s/GPU vs Tokens/s (system) — the per-GPU column is the buyer-comparable metric (what does one chip do?). The system column is the absolute total MLCommons reports — useful for capacity planning but rewards bigger clusters on a leaderboard. Rows here are sorted by per-GPU rate. Different workloads use different units: LLMs report tokens/s, image generation samples/s, classifiers queries/s.
  • Accuracy: 99.0% vs 99.9% — MLPerf's two accuracy tiers per workload. Both mean the submission cleared a quality bar relative to a reference baseline; 99.9% is the stricter "very-high-accuracy" track. This is not the model's absolute accuracy — it's which accuracy track the submitter chose.

Scroll table sideways to see all columns →

llama2-70b-99 — Offline 90 submissions · top per-GPU: 13,015 tok/s · 52,062 tok/s system (Azure 4× GB200)
GPU Type Submitter System # GPUs Engine Tokens/s/GPU Tokens/s (system) Accuracy
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 13,015.4 52,061.6 99%
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 13,015.4 52,061.6 99%
NVIDIA Blackwell GB200 NVIDIA 4 TensorRT 12,934.2 51,736.9 99%
NVIDIA Blackwell GB200 NVIDIA 4 TensorRT 12,934.2 51,736.9 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 12,420.9 49,683.5 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 12,420.9 49,683.5 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR680a V3 8 TensorRT 12,863.6 102,909.0 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR680a V3 8 TensorRT 12,863.6 102,909.0 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 12,840.6 102,725.0 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 12,840.6 102,725.0 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 12,812.2 102,498.0 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 12,812.2 102,498.0 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 12,766.4 102,131.0 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 12,766.4 102,131.0 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 12,690.9 101,527.0 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 12,690.9 101,527.0 99%
NVIDIA Blackwell B200 Supermicro AS-4126GS-NBR-LCC 8 TensorRT 12,689.5 101,516.0 99%
NVIDIA Blackwell B200 Supermicro AS-4126GS-NBR-LCC 8 TensorRT 12,689.5 101,516.0 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 12,656.6 101,253.0 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 12,656.6 101,253.0 99%
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 12,655.8 101,246.0 99%
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 12,655.8 101,246.0 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 12,326.0 98,607.7 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 12,326.0 98,607.7 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 9,807.9 78,463.5 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 9,807.9 78,463.5 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680 8 TensorRT 4,414.6 35,316.7 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680 8 TensorRT 4,414.6 35,316.7 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 4,386.4 35,091.5 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 4,386.4 35,091.5 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 4,384.4 35,075.1 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 4,384.4 35,075.1 99%
NVIDIA Hopper H200 Google NVIDIA H200 8 TensorRT 4,380.5 35,043.7 99%
NVIDIA Hopper H200 Google NVIDIA H200 8 TensorRT 4,380.5 35,043.7 99%
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 4,370.6 34,964.6 99%
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 4,370.6 34,964.6 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 4,351.5 34,812.1 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 4,351.5 34,812.1 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680L 8 TensorRT 4,323.5 34,588.3 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680L 8 TensorRT 4,323.5 34,588.3 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 4,318.5 34,548.2 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 4,318.5 34,548.2 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 4,305.9 68,894.8 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 4,305.9 68,894.8 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 4,287.7 34,301.9 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 4,287.7 34,301.9 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 3,908.4 31,267.4 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 3,908.4 31,267.4 99%
NVIDIA Hopper H200 ASUSTeK ESC8000A-E12 8 TensorRT 3,810.9 30,487.0 99%
NVIDIA Hopper H200 ASUSTeK ESC8000A-E12 8 TensorRT 3,810.9 30,487.0 99%
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 3,764.4 15,057.6 99%
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 3,764.4 15,057.6 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,319.4 34,555.0 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,319.4 34,555.0 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,315.1 34,520.4 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,315.1 34,520.4 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 4,310.4 34,483.2 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 4,310.4 34,483.2 99%
AMD Instinct MI325X MangoBoost MangoBoost 8 LLMBoost 4,306.8 34,454.4 99%
AMD Instinct MI325X MangoBoost MangoBoost 8 LLMBoost 4,306.8 34,454.4 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 4,248.4 33,987.4 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 4,248.4 33,987.4 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 4,220.3 33,762.5 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 4,220.3 33,762.5 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 4,212.9 33,703.0 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 4,212.9 33,703.0 99%
AMD Instinct MI325X Supermicro AS-8126GS-TNMR 8 LLMBoost 4,183.6 33,468.4 99%
AMD Instinct MI325X Supermicro AS-8126GS-TNMR 8 LLMBoost 4,183.6 33,468.4 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 4,095.1 32,760.5 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 4,095.1 32,760.5 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 16 LLMBoost 4,082.5 65,320.1 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 16 LLMBoost 4,082.5 65,320.1 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 24 LLMBoost 3,839.9 92,158.2 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 24 LLMBoost 3,839.9 92,158.2 99%
AMD Instinct MI325X MangoBoost MangoBoost Heterogeneous Cluster 48 LLMBoost 3,524.9 169,197.0 99%
AMD Instinct MI325X MangoBoost MangoBoost Heterogeneous Cluster 48 LLMBoost 3,524.9 169,197.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 3,902.5 124,879.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 3,902.5 124,879.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 3,897.0 62,351.6 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 3,897.0 62,351.6 99%
AMD Instinct MI300X MangoBoost Supermicro AS-8125GS-TNMR2 8 LLMBoost 3,481.8 27,854.4 99%
AMD Instinct MI300X MangoBoost Supermicro AS-8125GS-TNMR2 8 LLMBoost 3,481.8 27,854.4 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 3,475.5 27,803.9 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 3,475.5 27,803.9 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 8 LLMBoost 3,459.8 27,678.1 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 8 LLMBoost 3,459.8 27,678.1 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 3,415.7 27,325.9 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 3,415.7 27,325.9 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 16 LLMBoost 3,348.3 53,572.2 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 16 LLMBoost 3,348.3 53,572.2 99%
llama2-70b-99 — Server 86 submissions · top per-GPU: 12,701 tok/s · 101,611 tok/s system (Nebius 8× B200)
GPU Type Submitter System # GPUs Engine Tokens/s/GPU Tokens/s (system) Accuracy
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 12,701.4 101,611.0 99%
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 12,701.4 101,611.0 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 12,499.2 99,993.9 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 12,499.2 99,993.9 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 12,401.8 99,214.4 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 12,401.8 99,214.4 99%
NVIDIA Blackwell B200 Supermicro AS-4126GS-NBR-LCC 8 TensorRT 12,398.3 99,186.5 99%
NVIDIA Blackwell B200 Supermicro AS-4126GS-NBR-LCC 8 TensorRT 12,398.3 99,186.5 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 12,397.6 99,181.1 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 12,397.6 99,181.1 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 12,396.2 99,169.8 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 12,396.2 99,169.8 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR680a V3 8 TensorRT 12,394.9 99,159.3 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR680a V3 8 TensorRT 12,394.9 99,159.3 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 12,392.4 99,139.2 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 12,392.4 99,139.2 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 12,390.4 99,123.0 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 12,390.4 99,123.0 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 12,383.2 99,066.0 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 12,383.2 99,066.0 99%
NVIDIA Blackwell GB200 NVIDIA 4 TensorRT 12,339.9 49,359.6 99%
NVIDIA Blackwell GB200 NVIDIA 4 TensorRT 12,339.9 49,359.6 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 12,304.0 49,215.9 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 12,304.0 49,215.9 99%
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 11,504.1 46,016.5 99%
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 11,504.1 46,016.5 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 4,274.2 34,193.8 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 4,274.2 34,193.8 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 4,253.7 34,029.4 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 4,253.7 34,029.4 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 4,173.1 33,384.7 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 4,173.1 33,384.7 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 4,169.5 33,356.2 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 4,169.5 33,356.2 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680 8 TensorRT 4,155.5 33,244.3 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680 8 TensorRT 4,155.5 33,244.3 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 4,152.4 66,439.0 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 4,152.4 66,439.0 99%
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 4,145.6 33,164.4 99%
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 4,145.6 33,164.4 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 4,145.4 33,163.3 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 4,145.4 33,163.3 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680L 8 TensorRT 4,145.2 33,161.4 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE9680L 8 TensorRT 4,145.2 33,161.4 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 3,633.8 29,070.3 99%
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 3,633.8 29,070.3 99%
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 3,434.0 13,736.1 99%
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 3,434.0 13,736.1 99%
NVIDIA Hopper H200 ASUSTeK ESC8000A-E12 8 TensorRT 3,201.8 25,614.7 99%
NVIDIA Hopper H200 ASUSTeK ESC8000A-E12 8 TensorRT 3,201.8 25,614.7 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 4,017.4 32,139.4 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 4,017.4 32,139.4 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,003.4 32,027.6 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 4,003.4 32,027.6 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 3,969.5 31,755.7 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 3,969.5 31,755.7 99%
AMD Instinct MI325X MangoBoost MangoBoost 8 LLMBoost 3,959.0 31,671.9 99%
AMD Instinct MI325X MangoBoost MangoBoost 8 LLMBoost 3,959.0 31,671.9 99%
AMD Instinct MI325X Supermicro AS-8126GS-TNMR 8 LLMBoost 3,955.8 31,646.2 99%
AMD Instinct MI325X Supermicro AS-8126GS-TNMR 8 LLMBoost 3,955.8 31,646.2 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 3,925.2 31,401.4 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 3,925.2 31,401.4 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 3,905.1 31,241.0 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 3,905.1 31,241.0 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 3,792.4 30,339.4 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 3,792.4 30,339.4 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 16 LLMBoost 3,587.0 57,391.8 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 16 LLMBoost 3,587.0 57,391.8 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 24 LLMBoost 3,358.1 80,594.9 99%
AMD Instinct MI325X Supermicro MangoBoost Supermicro AS-8126GS-TNMR 24 LLMBoost 3,358.1 80,594.9 99%
AMD Instinct MI325X MangoBoost MangoBoost Heterogeneous Cluster 48 LLMBoost 3,189.1 153,076.0 99%
AMD Instinct MI325X MangoBoost MangoBoost Heterogeneous Cluster 48 LLMBoost 3,189.1 153,076.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 3,821.1 122,274.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 3,821.1 122,274.0 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 3,818.8 61,101.3 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 3,818.8 61,101.3 99%
AMD Instinct MI300X MangoBoost Supermicro AS-8125GS-TNMR2 8 LLMBoost 3,107.6 24,860.5 99%
AMD Instinct MI300X MangoBoost Supermicro AS-8125GS-TNMR2 8 LLMBoost 3,107.6 24,860.5 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 3,093.4 24,747.6 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 3,093.4 24,747.6 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 3,074.2 24,593.8 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 3,074.2 24,593.8 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 8 LLMBoost 3,066.6 24,532.7 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 8 LLMBoost 3,066.6 24,532.7 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 16 LLMBoost 2,972.6 47,561.7 99%
AMD Instinct MI300X Dell MangoBoost Dell PowerEdge XE9680 16 LLMBoost 2,972.6 47,561.7 99%
llama3.1-405b — Server 44 submissions · top per-GPU: 161 tok/s · 11,614 tok/s system (NVIDIA 72× GB200)
GPU Type Submitter System # GPUs Engine Tokens/s/GPU Tokens/s (system) Accuracy
NVIDIA Blackwell GB200 NVIDIA 72 TensorRT 161.3 11,614.3 99%
NVIDIA Blackwell GB200 NVIDIA 72 TensorRT 161.3 11,614.3 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 149.0 596.1 99%
NVIDIA Blackwell GB200 Nebius Nebius GB200 4 TensorRT 149.0 596.1 99%
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 146.0 584.0 99%
NVIDIA Blackwell GB200 Azure ND_GB200_v6 4 TensorRT 146.0 584.0 99%
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 159.9 1,279.5 99%
NVIDIA Blackwell B200 Nebius Nebius B200 8 TensorRT 159.9 1,279.5 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR780a V3 8 TensorRT 156.1 1,249.0 99%
NVIDIA Blackwell B200 Lenovo ThinkSystem SR780a V3 8 TensorRT 156.1 1,249.0 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 155.8 1,246.8 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 155.8 1,246.8 99%
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 155.8 1,246.8 99%
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 155.8 1,246.8 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 155.8 1,246.6 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 155.8 1,246.6 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 155.6 1,245.0 99%
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 155.6 1,245.0 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 155.6 1,244.7 99%
NVIDIA Blackwell B200 Supermicro SYS-422GS-NBRT-LCC 8 TensorRT 155.6 1,244.7 99%
NVIDIA Blackwell B200 Broadcom Supermicro Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 155.5 1,243.9 99%
NVIDIA Blackwell B200 Broadcom Supermicro Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 155.5 1,243.9 99%
NVIDIA Blackwell B200 Oracle BM.GPU.B200.8 8 TensorRT 155.5 1,243.7 99%
NVIDIA Blackwell B200 Oracle BM.GPU.B200.8 8 TensorRT 155.5 1,243.7 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 155.4 1,243.2 99%
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9680L 8 TensorRT 155.4 1,243.2 99%
NVIDIA Blackwell B200 Supermicro SYS-A21GE-NBRT 8 TensorRT 155.4 1,242.8 99%
NVIDIA Blackwell B200 Supermicro SYS-A21GE-NBRT 8 TensorRT 155.4 1,242.8 99%
NVIDIA Blackwell B200 Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 155.1 1,240.4 99%
NVIDIA Blackwell B200 Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 155.1 1,240.4 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 155.1 1,240.4 99%
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 155.1 1,240.4 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 36.9 295.5 99%
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 36.9 295.5 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 36.9 295.4 99%
NVIDIA Hopper H200 Nebius Nebius H200 8 TensorRT 36.9 295.4 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 36.5 584.3 99%
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 16 TensorRT 36.5 584.3 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 34.7 277.3 99%
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 34.7 277.3 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 35.2 1,126.9 99%
NVIDIA Hopper H100 Cisco HPF HGX System 32 TensorRT 35.2 1,126.9 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 35.2 563.4 99%
NVIDIA Hopper H100 Cisco HPF HGX System 16 TensorRT 35.2 563.4 99%
mixtral-8x7b — Server 20 submissions · top per-GPU: 7,619 tok/s · 60,955 tok/s system (HPE 8× H200)
GPU Type Submitter System # GPUs Engine Tokens/s/GPU Tokens/s (system) Accuracy
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 7,619.4 60,955.1 99%
NVIDIA Hopper H200 HPE HPE Cray XD670 8 TensorRT 7,619.4 60,955.1 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 7,499.1 59,992.8 99%
NVIDIA Hopper H200 ASUSTeK ASUSTeK ESC N8 H200 8 TensorRT 7,499.1 59,992.8 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 7,472.9 59,783.6 99%
AMD Instinct MI325X AMD QuantaGrid D74A-7U 8 vLLM 7,472.9 59,783.6 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 7,380.1 59,040.8 99%
AMD Instinct MI325X Quanta Cloud Technology D75T-7U_8xMI325X 8 vLLM 7,380.1 59,040.8 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 7,280.3 58,242.7 99%
AMD Instinct MI325X GigaComputing G893-ZX1-AAX2 8 vLLM 7,280.3 58,242.7 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 7,135.9 57,087.1 99%
AMD Instinct MI325X MiTAC G8825Z5 8 vLLM 7,135.9 57,087.1 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 7,081.7 56,653.7 99%
AMD Instinct MI325X Vultr Supermicro AS -8126GS-TNMR 8 vLLM 7,081.7 56,653.7 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 6,975.0 55,800.3 99%
AMD Instinct MI325X ASUSTeK ESC_A8A_MI325X_256GBx8 8 vLLM 6,975.0 55,800.3 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 5,975.5 47,804.3 99%
AMD Instinct MI300X AMD Supermicro AS-8125GS-TNMR2 8 vLLM 5,975.5 47,804.3 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 5,975.2 47,801.9 99%
AMD Instinct MI300X Dell Dell Poweredge XE9680 8 vLLM 5,975.2 47,801.9 99%
stable-diffusion-xl — Offline 36 submissions · top per-GPU: 4 smp/s · 33 smp/s system (Lambda 8× B200)
GPU Type Submitter System # GPUs Engine Samples/s/GPU Samples/s (system) Accuracy
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 4.1 32.6
NVIDIA Blackwell B200 Lambda NVIDIA DGX B200 8 TensorRT 4.1 32.6
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 4.0 32.3
NVIDIA Blackwell B200 Dell Dell PowerEdge XE9685L 8 TensorRT 4.0 32.3
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 4.0 32.1
NVIDIA Blackwell B200 Google NVIDIA DGX B200 8 TensorRT 4.0 32.1
NVIDIA Blackwell B200 University-of-Florida NVIDIA DGX B200 8 TensorRT 4.0 31.9
NVIDIA Blackwell B200 University-of-Florida NVIDIA DGX B200 8 TensorRT 4.0 31.9
NVIDIA Blackwell B200 University-of-Florida NVIDIA DGX B200 1 TensorRT 4.0 4.0
NVIDIA Blackwell B200 University-of-Florida NVIDIA DGX B200 1 TensorRT 4.0 4.0
NVIDIA Blackwell B200 Broadcom Supermicro Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 Broadcom Supermicro Supermicro SYS-422GA-NBRT-LCC 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 Lenovo ThinkSystem SR780a V3 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 Lenovo ThinkSystem SR780a V3 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 NVIDIA NVIDIA DGX B200 8 TensorRT 4.0 31.8
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 3.9 31.6
NVIDIA Blackwell B200 GigaComputing G894-SD1 8 TensorRT 3.9 31.6
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 2.4 19.2
NVIDIA Hopper H200 Quanta Cloud Technology QuantaGrid D74H-7U 8 TensorRT 2.4 19.2
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 2.4 19.1
NVIDIA Hopper H200 Cisco Cisco UCS C885A M8 8 TensorRT 2.4 19.1
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 2.3 18.6
NVIDIA Hopper H200 Dell Broadcom Dell PowerEdge XE9680 8 TensorRT 2.3 18.6
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 2.1 17.1
NVIDIA Hopper H200 Dell Dell PowerEdge XE7745 8 TensorRT 2.1 17.1
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 2.1 8.4
NVIDIA Hopper H200 Quanta Cloud Technology D75E-4U_H200-NVL-141GBx4 4 TensorRT 2.1 8.4
AMD Instinct MI325X AMD Quanta S7PA 8 shark-ai 2.3 18.6
AMD Instinct MI325X AMD Quanta S7PA 8 shark-ai 2.3 18.6
NVIDIA Hopper H100 The Stage Nebius H100 8 TheStageAI 2.3 18.1
NVIDIA Hopper H100 The Stage Nebius H100 8 TheStageAI 2.3 18.1
NVIDIA L40S Dell Dell PowerEdge XE7740 8 TensorRT 0.7 5.5
NVIDIA L40S Dell Dell PowerEdge XE7740 8 TensorRT 0.7 5.5
NVIDIA L4 Dell PowerEdge R570 4 TensorRT 0.3 1.0
NVIDIA L4 Dell PowerEdge R570 4 TensorRT 0.3 1.0