LLM evals and red-team framework for prompt, model, and system testing
23 decisive cases (30 needed)
Verified critics can leave comments here.