AI Moderation Tools

Honest reviews and benchmarks of AI content-moderation tooling.

Reviews and benchmarks of content-moderation and safety tooling for LLM applications. Llama Guard, NeMo Guardrails, OpenAI Moderation, Perspective API, custom classifier patterns — what works, what regresses, what costs more than it saves.

Read dispatch Masthead

Lead dispatch

Perspective API: Still Good at Its Original Job, Still Wrong for LLM Safety

Jigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong. For LLM application safety it was never designed for this use case and it shows.

Read dispatch

Lead dispatch

ops

Content Moderation for RAG Applications: The Retrieval Layer Is an Attack Surface

RAG pipelines have a moderation problem at the retrieval layer that input/output classifiers don't address. Injected content in retrieved documents can override model behavior. Here's the architecture that covers it.

Compare

ops

Classifier Ensembles for Production Content Moderation

Single classifiers have characteristic failure modes. Ensembles that combine models with different architectures and training distributions reduce coverage gaps. How to build and operate them.

Compare

ops

The Real Cost of False Positives in AI Content Moderation

False positive rates in content moderation are usually discussed as a technical metric. The business costs — user abandonment, manual review queues, appeal escalations — are rarely quantified. Here's how to measure and manage them.

Compare

Past dispatches

AI Moderation Tools — in your inbox

Honest reviews and benchmarks of AI content-moderation tooling. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Honest reviews and benchmarks of AI content-moderation tooling.

Perspective API: Still Good at Its Original Job, Still Wrong for LLM Safety

Lead dispatch

Content Moderation for RAG Applications: The Retrieval Layer Is an Attack Surface

Classifier Ensembles for Production Content Moderation

The Real Cost of False Positives in AI Content Moderation

Past dispatches

OpenAI Moderation API: An Honest Review After 18 Months in Production

What this site is for

Llama Guard Benchmark Review: Real-World Performance vs. Vendor Claims

NeMo Guardrails in Production: What It Does Well and Where It Falls Over

AI Moderation Tools — in your inbox