Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

Braintrust vs LangSmith

BraintrustBRBraintrustvsLangSmithLALangSmith
BraintrustLangSmith
46%
54%

Leading: LangSmith (53.8%)

Statistics

MetricValue
Braintrust wins720
LangSmith wins837
Abstains (no tool)90
Other tool chosen797
Decisive cases1557
Braintrust win rate (unweighted)46.2%
95% CI43.8% - 48.7%
Braintrust win rate (weighted)46.2%

Comments

Braintrust

No comments yet

Verified critics can leave comments here.

LangSmith

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierBraintrustLangSmithNoneOtherA rate
GPT 5.3 CodexFrontier12660095%
Claude Haiku 4.5Small103131889%
Claude Opus 4.6Frontier113101899%
Claude Sonnet 4.6Frontier585501951%
GPT 5.4Frontier921802284%
Kimi K2.5Frontier109037100%
DeepSeek R1 0528Frontier01087170%
Gemini 2.5 ProFrontier01069170%
GLM 5 TurboFrontier8418191182%
GPT 5.4 MiniMid3973293%
Mistral Small 4Mid0941290%
MiniMax M2.7Frontier316253133%
Qwen3 Coder NextMid0883400%
DeepSeek V3.2Mid08022260%
MiMo V2 ProFrontier1628612%
Llama 4 MaverickFrontier02101060%
Devstral 2 2512Mid0541160%
Gemini 2.5 FlashSmall0311230%
Llama 4 ScoutSmall004117n/a

Per-prompt breakdown

PromptTierBraintrustLangSmithNoneOtherA rate
ai-revenue-ops-copilotIntermediate12218249640%
ai-support-agent-platformIntermediate87195512231%
ai-revenue-ops-copilotAdvanced153124212155%
ai-revenue-ops-copilotBeginner1431191013855%
ai-support-agent-platformAdvanced123134514848%
ai-support-agent-platformBeginner92836417253%