LLM Observability

LangSmith vs Langfuse

LangSmithLangfuse

67%

33%

Leading: LangSmith (67.5%)

Statistics

Metric	Value
LangSmith wins	1341
Langfuse wins	646
Abstains (no tool)	45
Other tool chosen	441
Decisive cases	1987
LangSmith win rate (unweighted)	67.5%
95% CI	65.4% - 69.5%
LangSmith win rate (weighted)	67.5%

Comments

LangSmith

No comments yet

Verified critics can leave comments here.

Langfuse

No comments yet

Verified critics can leave comments here.

Per-model breakdown

Model	Tier	LangSmith	Langfuse	None	Other	A rate
GLM 5 Turbo	Frontier	120	12	0	0	91%
Claude Sonnet 4.6	Frontier	102	30	0	0	77%
Claude Opus 4.6	Frontier	22	110	0	0	17%
GPT 5.3 Codex	Frontier	14	118	0	0	11%
GPT 5.4	Frontier	13	119	0	0	10%
GPT 5.4 Mini	Mid	105	23	1	3	82%
DeepSeek V3.2	Mid	120	7	0	5	94%
MiMo V2 Pro	Frontier	125	1	2	4	99%
Gemini 2.5 Pro	Frontier	122	1	6	3	99%
DeepSeek R1 0528	Frontier	118	0	1	13	100%
Mistral Small 4	Mid	112	4	0	13	97%
Qwen3 Coder Next	Mid	61	55	1	14	53%
Claude Haiku 4.5	Small	37	79	0	13	32%
Kimi K2.5	Frontier	65	50	4	0	57%
MiniMax M2.7	Frontier	105	1	3	21	99%
Llama 4 Maverick	Frontier	75	11	0	46	87%
Llama 4 Scout	Small	0	24	11	90	0%
Devstral 2 2512	Mid	21	0	15	90	100%
Gemini 2.5 Flash	Small	4	1	1	126	80%

Per-prompt breakdown

Prompt	Tier	LangSmith	Langfuse	None	Other	A rate
ai-support-agent-platform	Intermediate	264	81	1	70	77%
ai-revenue-ops-copilot	Intermediate	251	87	1	64	74%
ai-support-agent-platform	Beginner	174	159	11	69	52%
ai-revenue-ops-copilot	Advanced	232	96	2	79	71%
ai-support-agent-platform	Advanced	206	118	1	90	64%
ai-revenue-ops-copilot	Beginner	214	105	29	69	67%