How to read these tables

Workloads — llama2-70b-99 (Meta Llama 2 70B chat), llama3.1-405b (Meta Llama 3.1 405B — the largest tracked model), mixtral-8x7b (Mistral Mixtral mixture-of-experts), stable-diffusion-xl (text-to-image). The -99 / -99.9 suffix is the accuracy track (see below).
Server vs Offline — MLPerf's two scenario types. Server simulates real-time traffic with per-query latency limits (the system has to keep up with arriving requests). Offline is batch processing with no latency constraint — pure peak throughput. The same hardware will score differently on each.
Engine — the inference / serving software (TensorRT-LLM, vLLM, LLMBoost, etc.). Throughput often differs more between engines than between hardware variants of the same chip — a top result on LLMBoost or a vendor-internal stack means the buyer would need that same stack to reproduce. Pulled from MLCommons' Software field; version numbers stripped.
# GPUs — total GPU/AI chips the submission used (per-node count × nodes). The system column counts the whole rack; the per-GPU column divides by this.
Tokens/s/GPU vs Tokens/s (system) — the per-GPU column is the buyer-comparable metric (what does one chip do?). The system column is the absolute total MLCommons reports — useful for capacity planning but rewards bigger clusters on a leaderboard. Rows here are sorted by per-GPU rate. Different workloads use different units: LLMs report tokens/s, image generation samples/s, classifiers queries/s.
Accuracy: 99.0% vs 99.9% — MLPerf's two accuracy tiers per workload. Both mean the submission cleared a quality bar relative to a reference baseline; 99.9% is the stricter "very-high-accuracy" track. This is not the model's absolute accuracy — it's which accuracy track the submitter chose.

Scroll table sideways to see all columns →

llama2-70b-99 — Offline 90 submissions · top per-GPU: 13,015 tok/s · 52,062 tok/s system (Azure 4× GB200)

GPU Type	Submitter	System	# GPUs	Engine	Tokens/s/GPU	Tokens/s (system)	Accuracy
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	13,015.4	52,061.6	99%
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	13,015.4	52,061.6	99%
NVIDIA Blackwell GB200	NVIDIA	—	4	TensorRT	12,934.2	51,736.9	99%
NVIDIA Blackwell GB200	NVIDIA	—	4	TensorRT	12,934.2	51,736.9	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	12,420.9	49,683.5	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	12,420.9	49,683.5	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR680a V3	8	TensorRT	12,863.6	102,909.0	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR680a V3	8	TensorRT	12,863.6	102,909.0	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	12,840.6	102,725.0	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	12,840.6	102,725.0	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	12,812.2	102,498.0	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	12,812.2	102,498.0	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	12,766.4	102,131.0	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	12,766.4	102,131.0	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	12,690.9	101,527.0	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	12,690.9	101,527.0	99%
NVIDIA Blackwell B200	Supermicro	AS-4126GS-NBR-LCC	8	TensorRT	12,689.5	101,516.0	99%
NVIDIA Blackwell B200	Supermicro	AS-4126GS-NBR-LCC	8	TensorRT	12,689.5	101,516.0	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	12,656.6	101,253.0	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	12,656.6	101,253.0	99%
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	12,655.8	101,246.0	99%
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	12,655.8	101,246.0	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	12,326.0	98,607.7	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	12,326.0	98,607.7	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	9,807.9	78,463.5	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	9,807.9	78,463.5	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680	8	TensorRT	4,414.6	35,316.7	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680	8	TensorRT	4,414.6	35,316.7	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	4,386.4	35,091.5	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	4,386.4	35,091.5	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	4,384.4	35,075.1	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	4,384.4	35,075.1	99%
NVIDIA Hopper H200	Google	NVIDIA H200	8	TensorRT	4,380.5	35,043.7	99%
NVIDIA Hopper H200	Google	NVIDIA H200	8	TensorRT	4,380.5	35,043.7	99%
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	4,370.6	34,964.6	99%
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	4,370.6	34,964.6	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	4,351.5	34,812.1	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	4,351.5	34,812.1	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680L	8	TensorRT	4,323.5	34,588.3	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680L	8	TensorRT	4,323.5	34,588.3	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	4,318.5	34,548.2	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	4,318.5	34,548.2	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	4,305.9	68,894.8	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	4,305.9	68,894.8	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	4,287.7	34,301.9	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	4,287.7	34,301.9	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	3,908.4	31,267.4	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	3,908.4	31,267.4	99%
NVIDIA Hopper H200	ASUSTeK	ESC8000A-E12	8	TensorRT	3,810.9	30,487.0	99%
NVIDIA Hopper H200	ASUSTeK	ESC8000A-E12	8	TensorRT	3,810.9	30,487.0	99%
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	3,764.4	15,057.6	99%
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	3,764.4	15,057.6	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,319.4	34,555.0	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,319.4	34,555.0	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,315.1	34,520.4	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,315.1	34,520.4	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	4,310.4	34,483.2	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	4,310.4	34,483.2	99%
AMD Instinct MI325X	MangoBoost	MangoBoost	8	LLMBoost	4,306.8	34,454.4	99%
AMD Instinct MI325X	MangoBoost	MangoBoost	8	LLMBoost	4,306.8	34,454.4	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	4,248.4	33,987.4	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	4,248.4	33,987.4	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	4,220.3	33,762.5	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	4,220.3	33,762.5	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	4,212.9	33,703.0	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	4,212.9	33,703.0	99%
AMD Instinct MI325X	Supermicro	AS-8126GS-TNMR	8	LLMBoost	4,183.6	33,468.4	99%
AMD Instinct MI325X	Supermicro	AS-8126GS-TNMR	8	LLMBoost	4,183.6	33,468.4	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	4,095.1	32,760.5	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	4,095.1	32,760.5	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	16	LLMBoost	4,082.5	65,320.1	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	16	LLMBoost	4,082.5	65,320.1	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	24	LLMBoost	3,839.9	92,158.2	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	24	LLMBoost	3,839.9	92,158.2	99%
AMD Instinct MI325X	MangoBoost	MangoBoost Heterogeneous Cluster	48	LLMBoost	3,524.9	169,197.0	99%
AMD Instinct MI325X	MangoBoost	MangoBoost Heterogeneous Cluster	48	LLMBoost	3,524.9	169,197.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	3,902.5	124,879.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	3,902.5	124,879.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	3,897.0	62,351.6	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	3,897.0	62,351.6	99%
AMD Instinct MI300X	MangoBoost	Supermicro AS-8125GS-TNMR2	8	LLMBoost	3,481.8	27,854.4	99%
AMD Instinct MI300X	MangoBoost	Supermicro AS-8125GS-TNMR2	8	LLMBoost	3,481.8	27,854.4	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	3,475.5	27,803.9	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	3,475.5	27,803.9	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	8	LLMBoost	3,459.8	27,678.1	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	8	LLMBoost	3,459.8	27,678.1	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	3,415.7	27,325.9	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	3,415.7	27,325.9	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	16	LLMBoost	3,348.3	53,572.2	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	16	LLMBoost	3,348.3	53,572.2	99%

llama2-70b-99 — Server 86 submissions · top per-GPU: 12,701 tok/s · 101,611 tok/s system (Nebius 8× B200)

GPU Type	Submitter	System	# GPUs	Engine	Tokens/s/GPU	Tokens/s (system)	Accuracy
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	12,701.4	101,611.0	99%
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	12,701.4	101,611.0	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	12,499.2	99,993.9	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	12,499.2	99,993.9	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	12,401.8	99,214.4	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	12,401.8	99,214.4	99%
NVIDIA Blackwell B200	Supermicro	AS-4126GS-NBR-LCC	8	TensorRT	12,398.3	99,186.5	99%
NVIDIA Blackwell B200	Supermicro	AS-4126GS-NBR-LCC	8	TensorRT	12,398.3	99,186.5	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	12,397.6	99,181.1	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	12,397.6	99,181.1	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	12,396.2	99,169.8	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	12,396.2	99,169.8	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR680a V3	8	TensorRT	12,394.9	99,159.3	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR680a V3	8	TensorRT	12,394.9	99,159.3	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	12,392.4	99,139.2	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	12,392.4	99,139.2	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	12,390.4	99,123.0	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	12,390.4	99,123.0	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	12,383.2	99,066.0	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	12,383.2	99,066.0	99%
NVIDIA Blackwell GB200	NVIDIA	—	4	TensorRT	12,339.9	49,359.6	99%
NVIDIA Blackwell GB200	NVIDIA	—	4	TensorRT	12,339.9	49,359.6	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	12,304.0	49,215.9	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	12,304.0	49,215.9	99%
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	11,504.1	46,016.5	99%
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	11,504.1	46,016.5	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	4,274.2	34,193.8	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	4,274.2	34,193.8	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	4,253.7	34,029.4	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	4,253.7	34,029.4	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	4,173.1	33,384.7	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	4,173.1	33,384.7	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	4,169.5	33,356.2	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	4,169.5	33,356.2	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680	8	TensorRT	4,155.5	33,244.3	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680	8	TensorRT	4,155.5	33,244.3	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	4,152.4	66,439.0	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	4,152.4	66,439.0	99%
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	4,145.6	33,164.4	99%
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	4,145.6	33,164.4	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	4,145.4	33,163.3	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	4,145.4	33,163.3	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680L	8	TensorRT	4,145.2	33,161.4	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE9680L	8	TensorRT	4,145.2	33,161.4	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	3,633.8	29,070.3	99%
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	3,633.8	29,070.3	99%
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	3,434.0	13,736.1	99%
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	3,434.0	13,736.1	99%
NVIDIA Hopper H200	ASUSTeK	ESC8000A-E12	8	TensorRT	3,201.8	25,614.7	99%
NVIDIA Hopper H200	ASUSTeK	ESC8000A-E12	8	TensorRT	3,201.8	25,614.7	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	4,017.4	32,139.4	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	4,017.4	32,139.4	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,003.4	32,027.6	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	4,003.4	32,027.6	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	3,969.5	31,755.7	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	3,969.5	31,755.7	99%
AMD Instinct MI325X	MangoBoost	MangoBoost	8	LLMBoost	3,959.0	31,671.9	99%
AMD Instinct MI325X	MangoBoost	MangoBoost	8	LLMBoost	3,959.0	31,671.9	99%
AMD Instinct MI325X	Supermicro	AS-8126GS-TNMR	8	LLMBoost	3,955.8	31,646.2	99%
AMD Instinct MI325X	Supermicro	AS-8126GS-TNMR	8	LLMBoost	3,955.8	31,646.2	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	3,925.2	31,401.4	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	3,925.2	31,401.4	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	3,905.1	31,241.0	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	3,905.1	31,241.0	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	3,792.4	30,339.4	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	3,792.4	30,339.4	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	16	LLMBoost	3,587.0	57,391.8	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	16	LLMBoost	3,587.0	57,391.8	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	24	LLMBoost	3,358.1	80,594.9	99%
AMD Instinct MI325X	Supermicro MangoBoost	Supermicro AS-8126GS-TNMR	24	LLMBoost	3,358.1	80,594.9	99%
AMD Instinct MI325X	MangoBoost	MangoBoost Heterogeneous Cluster	48	LLMBoost	3,189.1	153,076.0	99%
AMD Instinct MI325X	MangoBoost	MangoBoost Heterogeneous Cluster	48	LLMBoost	3,189.1	153,076.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	3,821.1	122,274.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	3,821.1	122,274.0	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	3,818.8	61,101.3	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	3,818.8	61,101.3	99%
AMD Instinct MI300X	MangoBoost	Supermicro AS-8125GS-TNMR2	8	LLMBoost	3,107.6	24,860.5	99%
AMD Instinct MI300X	MangoBoost	Supermicro AS-8125GS-TNMR2	8	LLMBoost	3,107.6	24,860.5	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	3,093.4	24,747.6	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	3,093.4	24,747.6	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	3,074.2	24,593.8	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	3,074.2	24,593.8	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	8	LLMBoost	3,066.6	24,532.7	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	8	LLMBoost	3,066.6	24,532.7	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	16	LLMBoost	2,972.6	47,561.7	99%
AMD Instinct MI300X	Dell MangoBoost	Dell PowerEdge XE9680	16	LLMBoost	2,972.6	47,561.7	99%

llama3.1-405b — Server 44 submissions · top per-GPU: 161 tok/s · 11,614 tok/s system (NVIDIA 72× GB200)

GPU Type	Submitter	System	# GPUs	Engine	Tokens/s/GPU	Tokens/s (system)	Accuracy
NVIDIA Blackwell GB200	NVIDIA	—	72	TensorRT	161.3	11,614.3	99%
NVIDIA Blackwell GB200	NVIDIA	—	72	TensorRT	161.3	11,614.3	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	149.0	596.1	99%
NVIDIA Blackwell GB200	Nebius	Nebius GB200	4	TensorRT	149.0	596.1	99%
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	146.0	584.0	99%
NVIDIA Blackwell GB200	Azure	ND_GB200_v6	4	TensorRT	146.0	584.0	99%
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	159.9	1,279.5	99%
NVIDIA Blackwell B200	Nebius	Nebius B200	8	TensorRT	159.9	1,279.5	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR780a V3	8	TensorRT	156.1	1,249.0	99%
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR780a V3	8	TensorRT	156.1	1,249.0	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	155.8	1,246.8	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	155.8	1,246.8	99%
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	155.8	1,246.8	99%
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	155.8	1,246.8	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	155.8	1,246.6	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	155.8	1,246.6	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	155.6	1,245.0	99%
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	155.6	1,245.0	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	155.6	1,244.7	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GS-NBRT-LCC	8	TensorRT	155.6	1,244.7	99%
NVIDIA Blackwell B200	Broadcom Supermicro	Supermicro SYS-422GA-NBRT-LCC	8	TensorRT	155.5	1,243.9	99%
NVIDIA Blackwell B200	Broadcom Supermicro	Supermicro SYS-422GA-NBRT-LCC	8	TensorRT	155.5	1,243.9	99%
NVIDIA Blackwell B200	Oracle	BM.GPU.B200.8	8	TensorRT	155.5	1,243.7	99%
NVIDIA Blackwell B200	Oracle	BM.GPU.B200.8	8	TensorRT	155.5	1,243.7	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	155.4	1,243.2	99%
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9680L	8	TensorRT	155.4	1,243.2	99%
NVIDIA Blackwell B200	Supermicro	SYS-A21GE-NBRT	8	TensorRT	155.4	1,242.8	99%
NVIDIA Blackwell B200	Supermicro	SYS-A21GE-NBRT	8	TensorRT	155.4	1,242.8	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GA-NBRT-LCC	8	TensorRT	155.1	1,240.4	99%
NVIDIA Blackwell B200	Supermicro	SYS-422GA-NBRT-LCC	8	TensorRT	155.1	1,240.4	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	155.1	1,240.4	99%
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	155.1	1,240.4	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	36.9	295.5	99%
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	36.9	295.5	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	36.9	295.4	99%
NVIDIA Hopper H200	Nebius	Nebius H200	8	TensorRT	36.9	295.4	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	36.5	584.3	99%
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	16	TensorRT	36.5	584.3	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	34.7	277.3	99%
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	34.7	277.3	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	35.2	1,126.9	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	32	TensorRT	35.2	1,126.9	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	35.2	563.4	99%
NVIDIA Hopper H100	Cisco	HPF HGX System	16	TensorRT	35.2	563.4	99%

mixtral-8x7b — Server 20 submissions · top per-GPU: 7,619 tok/s · 60,955 tok/s system (HPE 8× H200)

GPU Type	Submitter	System	# GPUs	Engine	Tokens/s/GPU	Tokens/s (system)	Accuracy
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	7,619.4	60,955.1	99%
NVIDIA Hopper H200	HPE	HPE Cray XD670	8	TensorRT	7,619.4	60,955.1	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	7,499.1	59,992.8	99%
NVIDIA Hopper H200	ASUSTeK	ASUSTeK ESC N8 H200	8	TensorRT	7,499.1	59,992.8	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	7,472.9	59,783.6	99%
AMD Instinct MI325X	AMD	QuantaGrid D74A-7U	8	vLLM	7,472.9	59,783.6	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	7,380.1	59,040.8	99%
AMD Instinct MI325X	Quanta Cloud Technology	D75T-7U_8xMI325X	8	vLLM	7,380.1	59,040.8	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	7,280.3	58,242.7	99%
AMD Instinct MI325X	GigaComputing	G893-ZX1-AAX2	8	vLLM	7,280.3	58,242.7	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	7,135.9	57,087.1	99%
AMD Instinct MI325X	MiTAC	G8825Z5	8	vLLM	7,135.9	57,087.1	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	7,081.7	56,653.7	99%
AMD Instinct MI325X	Vultr	Supermicro AS -8126GS-TNMR	8	vLLM	7,081.7	56,653.7	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	6,975.0	55,800.3	99%
AMD Instinct MI325X	ASUSTeK	ESC_A8A_MI325X_256GBx8	8	vLLM	6,975.0	55,800.3	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	5,975.5	47,804.3	99%
AMD Instinct MI300X	AMD	Supermicro AS-8125GS-TNMR2	8	vLLM	5,975.5	47,804.3	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	5,975.2	47,801.9	99%
AMD Instinct MI300X	Dell	Dell Poweredge XE9680	8	vLLM	5,975.2	47,801.9	99%

stable-diffusion-xl — Offline 36 submissions · top per-GPU: 4 smp/s · 33 smp/s system (Lambda 8× B200)

GPU Type	Submitter	System	# GPUs	Engine	Samples/s/GPU	Samples/s (system)	Accuracy
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	4.1	32.6	—
NVIDIA Blackwell B200	Lambda	NVIDIA DGX B200	8	TensorRT	4.1	32.6	—
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	4.0	32.3	—
NVIDIA Blackwell B200	Dell	Dell PowerEdge XE9685L	8	TensorRT	4.0	32.3	—
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	4.0	32.1	—
NVIDIA Blackwell B200	Google	NVIDIA DGX B200	8	TensorRT	4.0	32.1	—
NVIDIA Blackwell B200	University-of-Florida	NVIDIA DGX B200	8	TensorRT	4.0	31.9	—
NVIDIA Blackwell B200	University-of-Florida	NVIDIA DGX B200	8	TensorRT	4.0	31.9	—
NVIDIA Blackwell B200	University-of-Florida	NVIDIA DGX B200	1	TensorRT	4.0	4.0	—
NVIDIA Blackwell B200	University-of-Florida	NVIDIA DGX B200	1	TensorRT	4.0	4.0	—
NVIDIA Blackwell B200	Broadcom Supermicro	Supermicro SYS-422GA-NBRT-LCC	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	Broadcom Supermicro	Supermicro SYS-422GA-NBRT-LCC	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR780a V3	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	Lenovo	ThinkSystem SR780a V3	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	NVIDIA	NVIDIA DGX B200	8	TensorRT	4.0	31.8	—
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	3.9	31.6	—
NVIDIA Blackwell B200	GigaComputing	G894-SD1	8	TensorRT	3.9	31.6	—
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	2.4	19.2	—
NVIDIA Hopper H200	Quanta Cloud Technology	QuantaGrid D74H-7U	8	TensorRT	2.4	19.2	—
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	2.4	19.1	—
NVIDIA Hopper H200	Cisco	Cisco UCS C885A M8	8	TensorRT	2.4	19.1	—
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	2.3	18.6	—
NVIDIA Hopper H200	Dell Broadcom	Dell PowerEdge XE9680	8	TensorRT	2.3	18.6	—
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	2.1	17.1	—
NVIDIA Hopper H200	Dell	Dell PowerEdge XE7745	8	TensorRT	2.1	17.1	—
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	2.1	8.4	—
NVIDIA Hopper H200	Quanta Cloud Technology	D75E-4U_H200-NVL-141GBx4	4	TensorRT	2.1	8.4	—
AMD Instinct MI325X	AMD	Quanta S7PA	8	shark-ai	2.3	18.6	—
AMD Instinct MI325X	AMD	Quanta S7PA	8	shark-ai	2.3	18.6	—
NVIDIA Hopper H100	The Stage	Nebius H100	8	TheStageAI	2.3	18.1	—
NVIDIA Hopper H100	The Stage	Nebius H100	8	TheStageAI	2.3	18.1	—
NVIDIA L40S	Dell	Dell PowerEdge XE7740	8	TensorRT	0.7	5.5	—
NVIDIA L40S	Dell	Dell PowerEdge XE7740	8	TensorRT	0.7	5.5	—
NVIDIA L4	Dell	PowerEdge R570	4	TensorRT	0.3	1.0	—
NVIDIA L4	Dell	PowerEdge R570	4	TensorRT	0.3	1.0	—

MLPerf Inference Results

How to read these tables