Topics

Browse posts by category and tag — every topic we cover, with the latest pieces under each.

Categories

reviews 4 posts

Perspective API: Still Good at Its Original Job, Still Wrong for LLM Safety

Jigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong. For LLM application safety it was never designed for this use case and it shows.
OpenAI Moderation API: An Honest Review After 18 Months in Production

OpenAI's Moderation API is the path-of-least-resistance choice for teams already in the OpenAI ecosystem. The speed is good. The category granularity has improved. The gaps are predictable.
Llama Guard Benchmark Review: Real-World Performance vs. Vendor Claims

Meta's Llama Guard series has become a default choice for open-source content moderation. Benchmarks on the standard test sets look strong. Production behavior is more complicated.
NeMo Guardrails in Production: What It Does Well and Where It Falls Over

NVIDIA's NeMo Guardrails offers conversation-flow control that classifiers can't provide. The deployment complexity is real. This is an honest review from a team that's run it in production.

ops 3 posts

site 1 posts

What this site is for

AI Moderation Tools covers defensive AI engineering — guardrails, content filters, and shipping AI features without shipping liability.