Preseason
MatchesRankingsPrompts
Contact
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

UpTrain vs Vellum

UPUpTrainvsVEVellum
UpTrainVellum
44%
56%

Leading: Vellum (55.6%)

Insufficient data
This matchup has 9 decisive cases (minimum 30 required for publication).

Statistics

MetricValue
UpTrain wins4
Vellum wins5
Abstains (no tool)36
Other tool chosen955
Decisive cases9
UpTrain win rate (unweighted)44.4%
95% CI18.9% - 73.3%
UpTrain win rate (weighted)44.4%

Comments

UpTrain

No comments yet

Verified critics can leave comments here.

Vellum

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierUpTrainVellumNoneOtherA rate
Devstral 2 2512Mid051450%
DeepSeek V3.2Mid40939100%
Claude Haiku 4.5Small00151n/a
Claude Opus 4.6Frontier00054n/a
Claude Sonnet 4.6Frontier00054n/a
DeepSeek R1 0528Frontier00252n/a
Gemini 2.5 FlashSmall00052n/a
Gemini 2.5 ProFrontier00648n/a
GLM 5 TurboFrontier00747n/a
GPT 5.3 CodexFrontier00054n/a
GPT 5.4Frontier00054n/a
GPT 5.4 MiniMid00153n/a
Kimi K2.5Frontier00048n/a
Llama 4 MaverickFrontier00053n/a
Llama 4 ScoutSmall00346n/a
MiMo V2 ProFrontier00252n/a
MiniMax M2.7Frontier00151n/a
Mistral Small 4Mid00051n/a
Qwen3 Coder NextMid00351n/a

Per-prompt breakdown

PromptTierUpTrainVellumNoneOtherA rate
ai-support-agent-platformIntermediate0441580%
ai-revenue-ops-copilotBeginner21416167%
ai-support-agent-platformAdvanced201166100%
ai-revenue-ops-copilotIntermediate001163n/a
ai-revenue-ops-copilotAdvanced001163n/a
ai-support-agent-platformBeginner0025144n/a