Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Arize AI vs Braintrust

ARArize AIvsBraintrustBRBraintrust
Arize AIBraintrust
13%
87%

Leading: Braintrust (87.3%)

Statistics

MetricValue
Arize AI wins105
Braintrust wins720
Abstains (no tool)90
Other tool chosen1529
Decisive cases825
Arize AI win rate (unweighted)12.7%
95% CI10.6% - 15.2%
Arize AI win rate (weighted)12.7%

Comments

Arize AI

No comments yet

Verified critics can leave comments here.

Braintrust

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierArize AIBraintrustNoneOtherA rate
GPT 5.3 CodexFrontier0126060%
Claude Opus 4.6Frontier01130190%
Kimi K2.5Frontier0109370%
Claude Haiku 4.5Small01031210%
GPT 5.4Frontier0920400%
GLM 5 TurboFrontier08419290%
Claude Sonnet 4.6Frontier0580740%
Llama 4 ScoutSmall400477100%
MiniMax M2.7Frontier1315923%
Gemini 2.5 FlashSmall270199100%
Llama 4 MaverickFrontier1300114100%
Gemini 2.5 ProFrontier1009113100%
Devstral 2 2512Mid804113100%
DeepSeek R1 0528Frontier307122100%
GPT 5.4 MiniMid0331260%
Mistral Small 4Mid201121100%
Qwen3 Coder NextMid103127100%
MiMo V2 ProFrontier0181230%
DeepSeek V3.2Mid0022106n/a

Per-prompt breakdown

PromptTierArize AIBraintrustNoneOtherA rate
ai-revenue-ops-copilotAdvanced23153222213%
ai-revenue-ops-copilotBeginner15143102429%
ai-support-agent-platformAdvanced18123526413%
ai-revenue-ops-copilotIntermediate1112242678%
ai-support-agent-platformBeginner18926423716%
ai-support-agent-platformIntermediate2087529719%