Preseason
MatchesRankingsPrompts
GitHub
Preseason
MatchesRankingsPromptsMethodologyContact

© 2026 Preseason. All rights reserved.

Privacy PolicyTerms & Conditions
LLM Evals
Methodology

LangSmith vs Braintrust

LangSmithLALangSmithvsBraintrustBRBraintrust
LangSmithBraintrust
54%
46%

Leading: LangSmith (53.8%)

Statistics

MetricValue
LangSmith wins837
Braintrust wins720
Abstains (no tool)90
Other tool chosen797
Decisive cases1557
LangSmith win rate (unweighted)53.8%
95% CI51.3% - 56.2%
LangSmith win rate (weighted)53.8%

Comments

LangSmith

No comments yet

Verified critics can leave comments here.

Braintrust

No comments yet

Verified critics can leave comments here.

Per-model breakdown

ModelTierLangSmithBraintrustNoneOtherA rate
GPT 5.3 CodexFrontier6126005%
Claude Haiku 4.5Small131031811%
Claude Opus 4.6Frontier11130181%
Claude Sonnet 4.6Frontier555801949%
GPT 5.4Frontier189202216%
Kimi K2.5Frontier0109370%
DeepSeek R1 0528Frontier1080717100%
Gemini 2.5 ProFrontier1060917100%
GLM 5 TurboFrontier1884191118%
GPT 5.4 MiniMid97332997%
Mistral Small 4Mid940129100%
MiniMax M2.7Frontier623153167%
Qwen3 Coder NextMid880340100%
DeepSeek V3.2Mid8002226100%
MiMo V2 ProFrontier62186198%
Llama 4 MaverickFrontier2100106100%
Devstral 2 2512Mid504116100%
Gemini 2.5 FlashSmall301123100%
Llama 4 ScoutSmall004117n/a

Per-prompt breakdown

PromptTierLangSmithBraintrustNoneOtherA rate
ai-revenue-ops-copilotIntermediate18212249660%
ai-support-agent-platformIntermediate19587512269%
ai-revenue-ops-copilotAdvanced124153212145%
ai-revenue-ops-copilotBeginner1191431013845%
ai-support-agent-platformAdvanced134123514852%
ai-support-agent-platformBeginner83926417247%