Preseason
MatchesRankingsPrompts
Contact
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Vellum vs Helicone

VEVellumvsHeliconeHEHelicone
VellumHelicone
45%
55%

Leading: Helicone (54.5%)

Insufficient data
This matchup has 11 decisive cases (minimum 30 required for publication).

Statistics

MetricValue
Vellum wins5
Helicone wins6
Abstains (no tool)36
Other tool chosen953
Decisive cases11
Vellum win rate (unweighted)45.5%
95% CI21.3% - 72.0%
Vellum win rate (weighted)45.5%

Comments

Vellum

No comments yet

Verified critics can leave comments here.

Helicone

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierVellumHeliconeNoneOtherA rate
Devstral 2 2512Mid5114483%
Llama 4 MaverickFrontier040490%
DeepSeek V3.2Mid019420%
Claude Haiku 4.5Small00151n/a
Claude Opus 4.6Frontier00054n/a
Claude Sonnet 4.6Frontier00054n/a
DeepSeek R1 0528Frontier00252n/a
Gemini 2.5 FlashSmall00052n/a
Gemini 2.5 ProFrontier00648n/a
GLM 5 TurboFrontier00747n/a
GPT 5.3 CodexFrontier00054n/a
GPT 5.4Frontier00054n/a
GPT 5.4 MiniMid00153n/a
Kimi K2.5Frontier00048n/a
Llama 4 ScoutSmall00346n/a
MiMo V2 ProFrontier00252n/a
MiniMax M2.7Frontier00151n/a
Mistral Small 4Mid00051n/a
Qwen3 Coder NextMid00351n/a

Per-prompt breakdown

PromptTierVellumHeliconeNoneOtherA rate
ai-support-agent-platformIntermediate404158100%
ai-revenue-ops-copilotBeginner12416133%
ai-revenue-ops-copilotIntermediate0211610%
ai-revenue-ops-copilotAdvanced0211610%
ai-support-agent-platformBeginner0025144n/a
ai-support-agent-platformAdvanced001168n/a