Preseason
MatchesRankingsPrompts
Contact
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Observability
Methodology

Braintrust vs Vellum

BraintrustBRBraintrustvsVEVellum
BraintrustVellum
43%
57%

Leading: Vellum (57.1%)

Insufficient data
This matchup has 7 decisive cases (minimum 30 required for publication).

Statistics

MetricValue
Braintrust wins3
Vellum wins4
Abstains (no tool)22
Other tool chosen980
Decisive cases7
Braintrust win rate (unweighted)42.9%
95% CI15.8% - 75.0%
Braintrust win rate (weighted)42.9%

Comments

Braintrust

No comments yet

Verified critics can leave comments here.

Vellum

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierBraintrustVellumNoneOtherA rate
Devstral 2 2512Mid048380%
Claude Haiku 4.5Small30051100%
Claude Opus 4.6Frontier00054n/a
Claude Sonnet 4.6Frontier00054n/a
DeepSeek R1 0528Frontier00153n/a
DeepSeek V3.2Mid00054n/a
Gemini 2.5 FlashSmall00053n/a
Gemini 2.5 ProFrontier00450n/a
GLM 5 TurboFrontier00054n/a
GPT 5.3 CodexFrontier00054n/a
GPT 5.4Frontier00054n/a
GPT 5.4 MiniMid00153n/a
Kimi K2.5Frontier00147n/a
Llama 4 MaverickFrontier00054n/a
Llama 4 ScoutSmall00546n/a
MiMo V2 ProFrontier00054n/a
MiniMax M2.7Frontier00152n/a
Mistral Small 4Mid00052n/a
Qwen3 Coder NextMid00153n/a

Per-prompt breakdown

PromptTierBraintrustVellumNoneOtherA rate
ai-support-agent-platformIntermediate0411660%
ai-revenue-ops-copilotBeginner3015152100%
ai-revenue-ops-copilotIntermediate000164n/a
ai-revenue-ops-copilotAdvanced001163n/a
ai-support-agent-platformBeginner005164n/a
ai-support-agent-platformAdvanced000171n/a