LLM evals and red-team framework for prompt, model, and system testing
Verified critics can leave comments here.