LLM Evals

LMSYS Chatbot Arena vs Patronus AI

LMLMSYS Chatbot Arenavs

LMSYS Chatbot ArenaPatronus AI

46%

54%

Leading: Patronus AI (53.8%)

Insufficient data

This matchup has 13 decisive cases (minimum 30 required for publication).

Statistics

Metric	Value
LMSYS Chatbot Arena wins	6
Patronus AI wins	7
Abstains (no tool)	90
Other tool chosen	2341
Decisive cases	13
LMSYS Chatbot Arena win rate (unweighted)	46.2%
95% CI	23.2% - 70.9%
LMSYS Chatbot Arena win rate (weighted)	46.2%

Comments

LMSYS Chatbot Arena

No comments yet

Verified critics can leave comments here.

Patronus AI

No comments yet

Verified critics can leave comments here.

Per-model breakdown

Model	Tier	LMSYS Chatbot Arena	Patronus AI	None	Other	A rate
Llama 4 Scout	Small	6	0	4	111	100%
MiMo V2 Pro	Frontier	0	4	8	120	0%
Gemini 2.5 Pro	Frontier	0	2	9	121	0%
GPT 5.4	Frontier	0	1	0	131	0%
Claude Haiku 4.5	Small	0	0	1	124	n/a
Claude Opus 4.6	Frontier	0	0	0	132	n/a
Claude Sonnet 4.6	Frontier	0	0	0	132	n/a
DeepSeek R1 0528	Frontier	0	0	7	125	n/a
DeepSeek V3.2	Mid	0	0	22	106	n/a
Devstral 2 2512	Mid	0	0	4	121	n/a
Gemini 2.5 Flash	Small	0	0	1	126	n/a
GLM 5 Turbo	Frontier	0	0	19	113	n/a
GPT 5.3 Codex	Frontier	0	0	0	132	n/a
GPT 5.4 Mini	Mid	0	0	3	129	n/a
Kimi K2.5	Frontier	0	0	3	116	n/a
Llama 4 Maverick	Frontier	0	0	0	127	n/a
MiniMax M2.7	Frontier	0	0	5	124	n/a
Mistral Small 4	Mid	0	0	1	123	n/a
Qwen3 Coder Next	Mid	0	0	3	128	n/a

Per-prompt breakdown

Prompt	Tier	LMSYS Chatbot Arena	Patronus AI	None	Other	A rate
ai-support-agent-platform	Advanced	3	2	5	400	60%
ai-revenue-ops-copilot	Advanced	0	4	2	394	0%
ai-support-agent-platform	Intermediate	2	0	5	402	100%
ai-revenue-ops-copilot	Beginner	1	0	10	399	100%
ai-support-agent-platform	Beginner	0	1	64	346	0%
ai-revenue-ops-copilot	Intermediate	0	0	4	400	n/a