Preseason
MatchesRankingsPrompts
Contact
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Humanloop vs HumanFirst

HUHumanloopvsHUHumanFirst
HumanloopHumanFirst
49%
51%

Leading: HumanFirst (51.4%)

Statistics

MetricValue
Humanloop wins17
HumanFirst wins18
Abstains (no tool)36
Other tool chosen929
Decisive cases35
Humanloop win rate (unweighted)48.6%
95% CI33.0% - 64.4%
Humanloop win rate (weighted)48.6%

Comments

Humanloop

No comments yet

Verified critics can leave comments here.

HumanFirst

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierHumanloopHumanFirstNoneOtherA rate
Devstral 2 2512Mid81812431%
Gemini 2.5 FlashSmall40048100%
DeepSeek R1 0528Frontier20250100%
DeepSeek V3.2Mid20941100%
Claude Haiku 4.5Small10150100%
Claude Opus 4.6Frontier00054n/a
Claude Sonnet 4.6Frontier00054n/a
Gemini 2.5 ProFrontier00648n/a
GLM 5 TurboFrontier00747n/a
GPT 5.3 CodexFrontier00054n/a
GPT 5.4Frontier00054n/a
GPT 5.4 MiniMid00153n/a
Kimi K2.5Frontier00048n/a
Llama 4 MaverickFrontier00053n/a
Llama 4 ScoutSmall00346n/a
MiMo V2 ProFrontier00252n/a
MiniMax M2.7Frontier00151n/a
Mistral Small 4Mid00051n/a
Qwen3 Coder NextMid00351n/a

Per-prompt breakdown

PromptTierHumanloopHumanFirstNoneOtherA rate
ai-revenue-ops-copilotBeginner57415242%
ai-revenue-ops-copilotIntermediate51115783%
ai-support-agent-platformBeginner152513817%
ai-revenue-ops-copilotAdvanced32115860%
ai-support-agent-platformIntermediate32415760%
ai-support-agent-platformAdvanced0111670%