LLM Observability

Langfuse vs LangSmith

LangfuseLangSmith

33%

67%

Leading: LangSmith (67.5%)

Statistics

Metric	Value
Langfuse wins	646
LangSmith wins	1341
Abstains (no tool)	45
Other tool chosen	441
Decisive cases	1987
Langfuse win rate (unweighted)	32.5%
95% CI	30.5% - 34.6%
Langfuse win rate (weighted)	32.5%

Comments

Langfuse

No comments yet

Verified critics can leave comments here.

LangSmith

No comments yet

Verified critics can leave comments here.

Per-model breakdown

Model	Tier	Langfuse	LangSmith	None	Other	A rate
GPT 5.4	Frontier	119	13	0	0	90%
GPT 5.3 Codex	Frontier	118	14	0	0	89%
Claude Opus 4.6	Frontier	110	22	0	0	83%
Claude Sonnet 4.6	Frontier	30	102	0	0	23%
GLM 5 Turbo	Frontier	12	120	0	0	9%
GPT 5.4 Mini	Mid	23	105	1	3	18%
DeepSeek V3.2	Mid	7	120	0	5	6%
MiMo V2 Pro	Frontier	1	125	2	4	1%
Gemini 2.5 Pro	Frontier	1	122	6	3	1%
DeepSeek R1 0528	Frontier	0	118	1	13	0%
Claude Haiku 4.5	Small	79	37	0	13	68%
Qwen3 Coder Next	Mid	55	61	1	14	47%
Mistral Small 4	Mid	4	112	0	13	3%
Kimi K2.5	Frontier	50	65	4	0	43%
MiniMax M2.7	Frontier	1	105	3	21	1%
Llama 4 Maverick	Frontier	11	75	0	46	13%
Llama 4 Scout	Small	24	0	11	90	100%
Devstral 2 2512	Mid	0	21	15	90	0%
Gemini 2.5 Flash	Small	1	4	1	126	20%

Per-prompt breakdown

Prompt	Tier	Langfuse	LangSmith	None	Other	A rate
ai-support-agent-platform	Intermediate	81	264	1	70	23%
ai-revenue-ops-copilot	Intermediate	87	251	1	64	26%
ai-support-agent-platform	Beginner	159	174	11	69	48%
ai-revenue-ops-copilot	Advanced	96	232	2	79	29%
ai-support-agent-platform	Advanced	118	206	1	90	36%
ai-revenue-ops-copilot	Beginner	105	214	29	69	33%