LLM Observability

Arize Phoenix vs Datadog

ARArize Phoenixvs

Arize PhoenixDatadog

40%

60%

Leading: Datadog (60.2%)

Statistics

Metric	Value
Arize Phoenix wins	33
Datadog wins	50
Abstains (no tool)	45
Other tool chosen	2345
Decisive cases	83
Arize Phoenix win rate (unweighted)	39.8%
95% CI	29.9% - 50.5%
Arize Phoenix win rate (weighted)	39.8%

Comments

Arize Phoenix

No comments yet

Verified critics can leave comments here.

Datadog

No comments yet

Verified critics can leave comments here.

Per-model breakdown

Model	Tier	Arize Phoenix	Datadog	None	Other	A rate
Gemini 2.5 Flash	Small	0	44	1	87	0%
MiniMax M2.7	Frontier	15	0	3	112	100%
Qwen3 Coder Next	Mid	11	0	1	119	100%
Mistral Small 4	Mid	3	1	0	125	75%
DeepSeek V3.2	Mid	2	2	0	128	50%
Claude Haiku 4.5	Small	0	2	0	127	0%
Devstral 2 2512	Mid	1	0	15	110	100%
GPT 5.4 Mini	Mid	1	0	1	130	100%
MiMo V2 Pro	Frontier	0	1	2	129	0%
Claude Opus 4.6	Frontier	0	0	0	132	n/a
Claude Sonnet 4.6	Frontier	0	0	0	132	n/a
DeepSeek R1 0528	Frontier	0	0	1	131	n/a
Gemini 2.5 Pro	Frontier	0	0	6	126	n/a
GLM 5 Turbo	Frontier	0	0	0	132	n/a
GPT 5.3 Codex	Frontier	0	0	0	132	n/a
GPT 5.4	Frontier	0	0	0	132	n/a
Kimi K2.5	Frontier	0	0	4	115	n/a
Llama 4 Maverick	Frontier	0	0	0	132	n/a
Llama 4 Scout	Small	0	0	11	114	n/a

Per-prompt breakdown

Prompt	Tier	Arize Phoenix	Datadog	None	Other	A rate
ai-support-agent-platform	Advanced	8	18	1	388	31%
ai-revenue-ops-copilot	Advanced	6	16	2	385	27%
ai-revenue-ops-copilot	Beginner	5	13	29	370	28%
ai-support-agent-platform	Intermediate	6	1	1	408	86%
ai-revenue-ops-copilot	Intermediate	5	0	1	397	100%
ai-support-agent-platform	Beginner	3	2	11	397	60%