LLM Evals

Braintrust vs LangSmith

BraintrustLangSmith

46%

54%

Leading: LangSmith (53.8%)

Statistics

Metric	Value
Braintrust wins	720
LangSmith wins	837
Abstains (no tool)	90
Other tool chosen	797
Decisive cases	1557
Braintrust win rate (unweighted)	46.2%
95% CI	43.8% - 48.7%
Braintrust win rate (weighted)	46.2%

Comments

Braintrust

No comments yet

Verified critics can leave comments here.

LangSmith

No comments yet

Verified critics can leave comments here.

Per-model breakdown

Model	Tier	Braintrust	LangSmith	None	Other	A rate
GPT 5.3 Codex	Frontier	126	6	0	0	95%
Claude Haiku 4.5	Small	103	13	1	8	89%
Claude Opus 4.6	Frontier	113	1	0	18	99%
Claude Sonnet 4.6	Frontier	58	55	0	19	51%
GPT 5.4	Frontier	92	18	0	22	84%
Kimi K2.5	Frontier	109	0	3	7	100%
DeepSeek R1 0528	Frontier	0	108	7	17	0%
Gemini 2.5 Pro	Frontier	0	106	9	17	0%
GLM 5 Turbo	Frontier	84	18	19	11	82%
GPT 5.4 Mini	Mid	3	97	3	29	3%
Mistral Small 4	Mid	0	94	1	29	0%
MiniMax M2.7	Frontier	31	62	5	31	33%
Qwen3 Coder Next	Mid	0	88	3	40	0%
DeepSeek V3.2	Mid	0	80	22	26	0%
MiMo V2 Pro	Frontier	1	62	8	61	2%
Llama 4 Maverick	Frontier	0	21	0	106	0%
Devstral 2 2512	Mid	0	5	4	116	0%
Gemini 2.5 Flash	Small	0	3	1	123	0%
Llama 4 Scout	Small	0	0	4	117	n/a

Per-prompt breakdown

Prompt	Tier	Braintrust	LangSmith	None	Other	A rate
ai-revenue-ops-copilot	Intermediate	122	182	4	96	40%
ai-support-agent-platform	Intermediate	87	195	5	122	31%
ai-revenue-ops-copilot	Advanced	153	124	2	121	55%
ai-revenue-ops-copilot	Beginner	143	119	10	138	55%
ai-support-agent-platform	Advanced	123	134	5	148	48%
ai-support-agent-platform	Beginner	92	83	64	172	53%