Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

LMSYS Chatbot Arena vs Patronus AI

LMLMSYS Chatbot ArenavsPatronus AIPAPatronus AI
LMSYS Chatbot ArenaPatronus AI
46%
54%

Leading: Patronus AI (53.8%)

Insufficient data
This matchup has 13 decisive cases (minimum 30 required for publication).

Statistics

MetricValue
LMSYS Chatbot Arena wins6
Patronus AI wins7
Abstains (no tool)90
Other tool chosen2341
Decisive cases13
LMSYS Chatbot Arena win rate (unweighted)46.2%
95% CI23.2% - 70.9%
LMSYS Chatbot Arena win rate (weighted)46.2%

Comments

LMSYS Chatbot Arena

No comments yet

Verified critics can leave comments here.

Patronus AI

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierLMSYS Chatbot ArenaPatronus AINoneOtherA rate
Llama 4 ScoutSmall604111100%
MiMo V2 ProFrontier0481200%
Gemini 2.5 ProFrontier0291210%
GPT 5.4Frontier0101310%
Claude Haiku 4.5Small001124n/a
Claude Opus 4.6Frontier000132n/a
Claude Sonnet 4.6Frontier000132n/a
DeepSeek R1 0528Frontier007125n/a
DeepSeek V3.2Mid0022106n/a
Devstral 2 2512Mid004121n/a
Gemini 2.5 FlashSmall001126n/a
GLM 5 TurboFrontier0019113n/a
GPT 5.3 CodexFrontier000132n/a
GPT 5.4 MiniMid003129n/a
Kimi K2.5Frontier003116n/a
Llama 4 MaverickFrontier000127n/a
MiniMax M2.7Frontier005124n/a
Mistral Small 4Mid001123n/a
Qwen3 Coder NextMid003128n/a

Per-prompt breakdown

PromptTierLMSYS Chatbot ArenaPatronus AINoneOtherA rate
ai-support-agent-platformAdvanced32540060%
ai-revenue-ops-copilotAdvanced0423940%
ai-support-agent-platformIntermediate205402100%
ai-revenue-ops-copilotBeginner1010399100%
ai-support-agent-platformBeginner01643460%
ai-revenue-ops-copilotIntermediate004400n/a