| Metric | Value |
|---|---|
| Plausible wins | 116 |
| Firebase wins | 116 |
| Abstains (no tool) | 364 |
| Other tool chosen | 5318 |
| Decisive cases | 232 |
| Plausible win rate (unweighted) | 50.0% |
| 95% CI | 43.6% - 56.4% |
| Plausible win rate (weighted) | 50.0% |
Verified critics can leave comments here.
Verified critics can leave comments here.
| Model | Tier | Plausible | Firebase | None | Other | A rate |
|---|---|---|---|---|---|---|
| DeepSeek V3.2 | Mid | 46 | 0 | 26 | 241 | 100% |
| Claude Haiku 4.5 | Small | 20 | 20 | 4 | 261 | 50% |
| Qwen3 Coder Next | Mid | 10 | 20 | 5 | 280 | 33% |
| DeepSeek R1 0528 | Frontier | 4 | 20 | 114 | 175 | 17% |
| Llama 4 Scout | Small | 9 | 8 | 20 | 277 | 53% |
| MiMo V2 Pro | Frontier | 4 | 13 | 24 | 274 | 24% |
| MiniMax M2.7 | Frontier | 2 | 11 | 34 | 257 | 15% |
| Devstral 2 2512 | Mid | 0 | 13 | 0 | 300 | 0% |
| Claude Opus 4.6 | Frontier | 10 | 0 | 1 | 304 | 100% |
| Mistral Small 4 | Mid | 0 | 10 | 2 | 289 | 0% |
| GPT 5.4 | Frontier | 9 | 0 | 0 | 303 | 100% |
| Llama 4 Maverick | Frontier | 2 | 0 | 0 | 312 | 100% |
| GPT 5.4 Mini | Mid | 0 | 1 | 5 | 309 | 0% |
| Claude Sonnet 4.6 | Frontier | 0 | 0 | 1 | 314 | n/a |
| Gemini 2.5 Flash | Small | 0 | 0 | 21 | 291 | n/a |
| Gemini 2.5 Pro | Frontier | 0 | 0 | 23 | 289 | n/a |
| GLM 5 Turbo | Frontier | 0 | 0 | 61 | 254 | n/a |
| GPT 5.3 Codex | Frontier | 0 | 0 | 1 | 313 | n/a |
| Kimi K2.5 | Frontier | 0 | 0 | 22 | 275 | n/a |
| Prompt | Tier | Plausible | Firebase | None | Other | A rate |
|---|---|---|---|---|---|---|
| fitness-tracking-app | Beginner | 7 | 93 | 1 | 297 | 7% |
| url-shortener | Beginner | 65 | 9 | 84 | 240 | 88% |
| url-shortener | Intermediate | 34 | 8 | 55 | 300 | 81% |
| fitness-tracking-app | Intermediate | 1 | 6 | 12 | 380 | 14% |
| saas-application | Beginner | 6 | 0 | 13 | 380 | 100% |
| multi-tenant-crm | Intermediate | 1 | 0 | 28 | 363 | 100% |
| saas-application | Intermediate | 1 | 0 | 45 | 351 | 100% |
| url-shortener | Advanced | 1 | 0 | 4 | 390 | 100% |
| ai-revenue-ops-copilot | Intermediate | 0 | 0 | 3 | 389 | n/a |
| ai-revenue-ops-copilot | Beginner | 0 | 0 | 5 | 380 | n/a |
| ai-revenue-ops-copilot | Advanced | 0 | 0 | 14 | 363 | n/a |
| fitness-tracking-app | Advanced | 0 | 0 | 9 | 383 | n/a |
| multi-tenant-crm | Beginner | 0 | 0 | 35 | 362 | n/a |
| multi-tenant-crm | Advanced | 0 | 0 | 31 | 367 | n/a |
| saas-application | Advanced | 0 | 0 | 25 | 373 | n/a |