Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Weights & Biases vs Ragas

WEWeights & BiasesvsRagasRARagas
Weights & BiasesRagas
48%
52%

Leading: Ragas (51.5%)

Statistics

MetricValue
Weights & Biases wins96
Ragas wins102
Abstains (no tool)90
Other tool chosen2156
Decisive cases198
Weights & Biases win rate (unweighted)48.5%
95% CI41.6% - 55.4%
Weights & Biases win rate (weighted)48.5%

Comments

Weights & Biases

No comments yet

Verified critics can leave comments here.

Ragas

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierWeights & BiasesRagasNoneOtherA rate
Gemini 2.5 FlashSmall410185100%
MiMo V2 ProFrontier102489029%
Devstral 2 2512Mid25149596%
MiniMax M2.7Frontier02451000%
Claude Opus 4.6Frontier01801140%
Llama 4 ScoutSmall113410379%
Mistral Small 4Mid0911140%
GPT 5.4 MiniMid17312113%
DeepSeek R1 0528Frontier707118100%
Claude Sonnet 4.6Frontier0601260%
GLM 5 TurboFrontier06191070%
DeepSeek V3.2Mid02221040%
Gemini 2.5 ProFrontier109122100%
GPT 5.4Frontier0101310%
Kimi K2.5Frontier0131150%
Claude Haiku 4.5Small001124n/a
GPT 5.3 CodexFrontier000132n/a
Llama 4 MaverickFrontier000127n/a
Qwen3 Coder NextMid003128n/a

Per-prompt breakdown

PromptTierWeights & BiasesRagasNoneOtherA rate
ai-support-agent-platformAdvanced1936535035%
ai-support-agent-platformBeginner12346430126%
ai-revenue-ops-copilotAdvanced319235878%
ai-revenue-ops-copilotBeginner14131037352%
ai-support-agent-platformIntermediate108538656%
ai-revenue-ops-copilotIntermediate102438883%