Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Arize Phoenix vs Promptfoo

Arize PhoenixARArize PhoenixvsPromptfooPRPromptfoo
Arize PhoenixPromptfoo
49%
51%

Leading: Promptfoo (51.2%)

Statistics

MetricValue
Arize Phoenix wins21
Promptfoo wins22
Abstains (no tool)90
Other tool chosen2311
Decisive cases43
Arize Phoenix win rate (unweighted)48.8%
95% CI34.6% - 63.2%
Arize Phoenix win rate (weighted)48.8%

Comments

Arize Phoenix

No comments yet

Verified critics can leave comments here.

Promptfoo

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierArize PhoenixPromptfooNoneOtherA rate
Mistral Small 4Mid27111422%
Qwen3 Coder NextMid803120100%
GPT 5.4 MiniMid42312367%
MiMo V2 ProFrontier0681180%
MiniMax M2.7Frontier505119100%
GLM 5 TurboFrontier05191080%
Llama 4 ScoutSmall204115100%
Kimi K2.5Frontier0231140%
Claude Haiku 4.5Small001124n/a
Claude Opus 4.6Frontier000132n/a
Claude Sonnet 4.6Frontier000132n/a
DeepSeek R1 0528Frontier007125n/a
DeepSeek V3.2Mid0022106n/a
Devstral 2 2512Mid004121n/a
Gemini 2.5 FlashSmall001126n/a
Gemini 2.5 ProFrontier009123n/a
GPT 5.3 CodexFrontier000132n/a
GPT 5.4Frontier000132n/a
Llama 4 MaverickFrontier000127n/a

Per-prompt breakdown

PromptTierArize PhoenixPromptfooNoneOtherA rate
ai-revenue-ops-copilotBeginner641039060%
ai-revenue-ops-copilotAdvanced46238840%
ai-revenue-ops-copilotIntermediate36439133%
ai-support-agent-platformBeginner7064340100%
ai-support-agent-platformIntermediate15539817%
ai-support-agent-platformAdvanced0154040%