AI Moderation Tools
Perspective API benchmark results
reviews

Perspective API: Still Good at Its Original Job, Still Wrong for LLM Safety

Jigsaw's Perspective API has 8+ years of production data on toxicity detection. For community content moderation it remains strong. For LLM application safety it was never designed for this use case and it shows.

By Noor Khalid · · 8 min read

Perspective API was built by Jigsaw (Google’s technology incubator) to help publishers moderate comments sections. The first version launched in 2017. The core model has been trained on millions of labeled comments from major publishers and has genuine expertise in the specific problem it was designed for: detecting toxic language in user-generated text comments.

It also gets evaluated as an LLM safety tool, a use case it was not designed for and does not excel at. This review covers both.

What Perspective API is good at

Toxicity in community content. Detecting insulting, threatening, or demeaning language in short-form community contributions (comments, forum posts, social media replies) is the core use case. The API has been trained on real moderation decisions from real publishers. In this domain, it’s genuinely competitive.

Speed and scale. The API handles high request volumes at low latency (~20ms typical). For high-volume comment moderation, this matters.

Attribute breadth for community moderation. The API scores multiple attributes:

For community moderation, this granularity is useful. A PROFANITY score of 0.8 might warrant a warning; an IDENTITY_ATTACK score of 0.8 warrants removal; a THREAT score of 0.8 warrants reporting.

Language coverage. Perspective supports 12+ languages with explicitly trained models. The multilingual coverage is better documented and more transparent than most competitors.

What Perspective API is not good at

LLM output safety classification. This is the use case it gets misapplied to most often. The problem:

Perspective API was trained on human-written comments. LLM outputs are different in character — they can be harmful without being toxic in the Perspective sense. An LLM that provides detailed instructions for synthesizing dangerous chemicals is not producing “toxic” content in the comment-moderation sense; it’s producing harmful content in the instruction-following-safety sense. Perspective API will not catch it.

In our testing, Perspective API’s toxicity scores on LLM responses containing:

Safety-relevant content in professional contexts. Medical providers, security researchers, and legal professionals discuss topics that Perspective will flag as toxic because those topics appear in toxic comments in training data. The false positive rate in professional contexts is high.

Adversarial inputs. Perspective API was not designed with adversarial evaluation in mind. Adversarial users can trivially reduce toxicity scores through polite rephrasing while preserving harmful intent.

When to use it

Use Perspective API for:

Do not use Perspective API for:

The right architecture if you need both

For platforms that need both community toxicity moderation (user-to-user content) and LLM safety (model output safety):

The distinction between toxicity detection and LLM safety detection is a useful framing for the full comparative landscape of tools at aisecreviews.com.

Sources

  1. Perspective API Documentation
  2. Jigsaw: Perspective API Model Cards
#perspective-api #google-jigsaw #toxicity-detection #content-moderation #llm-safety
Subscribe

AI Moderation Tools — in your inbox

Honest reviews and benchmarks of AI content-moderation tooling. — delivered when there's something worth your inbox.

No spam. Unsubscribe anytime.

Related

Comments