Dispatches from the AI security beat.

Honest reviews and benchmarks of AI content-moderation tooling.

Reviews and benchmarks of content-moderation and safety tooling for LLM applications. Llama Guard, NeMo Guardrails, OpenAI Moderation, Perspective API, custom classifier patterns — what works, what regresses, what costs more than it saves.

Lead dispatch

Best AI Content Moderation Tools 2026: Platform Comparison

A practitioner's comparison of the best AI content moderation tools in 2026 — Azure AI Content Safety, Hive Moderation, AWS Rekognition, Perspective API, and OpenAI's Moderation API, with capability matrices, pricing, and selection criteria.

Read dispatch

Isometric vector illustration representing best ai content moderation tools 2026

Recent dispatches

Isometric comparison of off-the-shelf moderation API costs versus fine-tuned classifier infrastructure and labeling expenses

ops

Fine-Tuned Classifiers vs. Off-the-Shelf Moderation APIs: Cost & Tradeoffs

Off-the-shelf moderation APIs are cheap to start and expensive to outgrow. Fine-tuned classifiers are the reverse. Here's the honest cost and tradeoff comparison — including the costs teams forget — and where the crossover actually is.

May 12, 2026

Grid of pixelated image tiles under inspection

guides

Image & Video Content Moderation Tools (2026)

Text moderation gets the attention, but image and video are where the hard moderation problems live. A practitioner's map of the major tools — cloud APIs, open-source multimodal classifiers, and CSAM-specialist services — and how to choose.

May 10, 2026

Isometric vector illustration showing icons for content moderation and safety classifiers in llama guard lineage series

guides

Llama Guard vs Llama Guard 2 vs Llama Guard 3: The Lineage, Clarified

Meta's Llama Guard series gets cited loosely, often with the wrong base model or category count. Here's the verified lineage — base models, taxonomies, and category counts — with the version differences that actually matter in production.

May 8, 2026

Isometric illustration of toxicity scores evaluating community comments against toxicity, identity attack, and threat attributes

reviews

Perspective API: Good at Its Original Job, Wrong for LLM Safety

Jigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong. For LLM application safety it was never designed for this use case and it shows.

May 5, 2026

Isometric illustration of retrieved documents flowing into an LLM context with a moderation gate blocking injected prompts

ops

Content Moderation for RAG: The Retrieval Layer Is an Attack Path

RAG pipelines have a moderation problem at the retrieval layer that input/output classifiers don't address. Injected content in retrieved documents can override model behavior. Here's the architecture that covers it.

May 4, 2026

Several classifier model blocks feeding into a single decision gate

ops

Classifier Ensembles for Production Content Moderation

Single classifiers have characteristic failure modes. Ensembles that combine models with different architectures and training distributions reduce coverage gaps. How to build and operate them.

May 4, 2026

Past dispatches

Why trust us

Trusted by researchers across the AI security community

AI Moderation Tools is part of a 26-site editorial network covering adversarial ML, AI governance, defensive tooling, and ops engineering — all open access.

Sites in network

Across 6 topic clusters

400+

Expert articles

And growing daily

Daily

New content

Automated + editorial

Free

Always free to read

Newsletter included

About this site · Subscribe free

AI Moderation Tools — in your inbox

Honest reviews and benchmarks of AI content-moderation tooling. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.