Preseason
MatchesRankingsPrompts
Contact
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Arize AI vs Braintrust

ARArize AIvsBraintrustBRBraintrust
Arize AIBraintrust
13%
87%

Leading: Braintrust (86.6%)

Statistics

MetricValue
Arize AI wins47
Braintrust wins303
Abstains (no tool)36
Other tool chosen614
Decisive cases350
Arize AI win rate (unweighted)13.4%
95% CI10.3% - 17.4%
Arize AI win rate (weighted)13.4%

Comments

Arize AI

No comments yet

Verified critics can leave comments here.

Braintrust

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierArize AIBraintrustNoneOtherA rate
GPT 5.3 CodexFrontier051030%
Claude Opus 4.6Frontier046080%
Claude Haiku 4.5Small044170%
Kimi K2.5Frontier044040%
GPT 5.4Frontier0390150%
GLM 5 TurboFrontier0327150%
Claude Sonnet 4.6Frontier0280260%
Llama 4 ScoutSmall190327100%
MiniMax M2.7Frontier0171340%
Gemini 2.5 FlashSmall120040100%
Gemini 2.5 ProFrontier50643100%
Llama 4 MaverickFrontier50048100%
Devstral 2 2512Mid40146100%
DeepSeek R1 0528Frontier10251100%
Qwen3 Coder NextMid10350100%
GPT 5.4 MiniMid011520%
MiMo V2 ProFrontier012510%
DeepSeek V3.2Mid00943n/a
Mistral Small 4Mid00051n/a

Per-prompt breakdown

PromptTierArize AIBraintrustNoneOtherA rate
ai-revenue-ops-copilotAdvanced106319014%
ai-revenue-ops-copilotBeginner5634967%
ai-support-agent-platformAdvanced1152110517%
ai-revenue-ops-copilotIntermediate649110811%
ai-support-agent-platformBeginner739259815%
ai-support-agent-platformIntermediate837411718%