Tag
#content-moderation
6 posts tagged content-moderation.
- reviews
Perspective API: Still Good at Its Original Job, Still Wrong for LLM Safety
Jigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong. For LLM application safety it was never designed for this use case and it shows.
- ops
Content Moderation for RAG Applications: The Retrieval Layer Is an Attack Surface
RAG pipelines have a moderation problem at the retrieval layer that input/output classifiers don't address. Injected content in retrieved documents can override model behavior. Here's the architecture that covers it.
- ops
Classifier Ensembles for Production Content Moderation
Single classifiers have characteristic failure modes. Ensembles that combine models with different architectures and training distributions reduce coverage gaps. How to build and operate them.
- ops
The Real Cost of False Positives in AI Content Moderation
False positive rates in content moderation are usually discussed as a technical metric. The business costs — user abandonment, manual review queues, appeal escalations — are rarely quantified. Here's how to measure and manage them.
- reviews
OpenAI Moderation API: An Honest Review After 18 Months in Production
OpenAI's Moderation API is the path-of-least-resistance choice for teams already in the OpenAI ecosystem. The speed is good. The category granularity has improved. The gaps are predictable.
- reviews
Llama Guard Benchmark Review: Real-World Performance vs. Vendor Claims
Meta's Llama Guard series has become a default choice for open-source content moderation. Benchmarks on the standard test sets look strong. Production behavior is more complicated.