AI Content Authenticity Detection
AI Content Authenticity Detection
Instructions
Evaluate content for AI authorship using detection APIs and analytical methods. AI content detection is probabilistic — never treat results as definitive. Build workflows that use detection as one signal among several in editorial decision-making.
Detection API Integration
Pangram API
Pangram provides granular AI detection with model attribution:
POST https://api.pangram.com/v1/detect
Headers:
Authorization: Bearer {API_KEY}
Content-Type: application/json
Body:
{
"text": "content to analyze",
"model": "latest"
}
Response:
{
"ai_probability": 0.87,
"model_attribution": {
"gpt-4": 0.65,
"claude": 0.20,
"other": 0.15
},
"sentence_level": [
{ "text": "sentence 1", "ai_probability": 0.92 },
{ "text": "sentence 2", "ai_probability": 0.45 }
]
}
Strengths: Sentence-level detection, model attribution Limitations: Accuracy drops below 200 words, may flag highly formal human writing
Grammarly Authorship API
Grammarly Authorship focuses on writing pattern analysis:
POST https://api.grammarly.com/authorship/v1/analyze
Headers:
Authorization: Bearer {API_KEY}
Content-Type: application/json
Body:
{
"text": "content to analyze",
"context": "article"
}
Response:
{
"authorship_score": 0.73,
"classification": "likely_ai",
"confidence": "medium",
"stylistic_indicators": ["uniform sentence length", "low lexical diversity"]
}
Strengths: Writing style analysis, low false positive rate on informal writing Limitations: Less effective on heavily edited AI content
Chrysalis API
Chrysalis specializes in detecting AI content that has been edited or paraphrased:
POST https://api.chrysalis.ai/v1/detect
Headers:
X-API-Key: {API_KEY}
Content-Type: application/json
Body:
{
"text": "content to analyze",
"mode": "detailed"
}
Response:
{
"ai_generated_probability": 0.81,
"editing_detected": true,
"estimated_human_edit_percentage": 25,
"paragraph_analysis": [
{ "paragraph": 1, "ai_probability": 0.95 },
{ "paragraph": 2, "ai_probability": 0.40 }
]
}
Strengths: Detects partially edited AI content, paragraph-level analysis Limitations: Newer service with less benchmarking data
Multi-API Ensemble Strategy
No single detector is reliable enough alone. Use an ensemble approach:
Consensus Scoring
Ensemble Score = (Pangram × 0.35) + (Grammarly × 0.35) + (Chrysalis × 0.30)
| Ensemble Score | Classification | Action |
|---|---|---|
| 0.85-1.00 | Very likely AI-generated | Flag for editorial review; request provenance |
| 0.65-0.84 | Probably AI-generated | Flag for review; additional investigation needed |
| 0.40-0.64 | Inconclusive | Note for awareness; do not act on detection alone |
| 0.20-0.39 | Probably human-written | No action needed; file result |
| 0.00-0.19 | Very likely human-written | No action needed |
Disagreement Handling
When APIs disagree significantly (>0.3 spread between highest and lowest):
- Weight the API with the best track record for the content type
- Run additional analysis (stylistic indicators, provenance check)
- Escalate to human reviewer with all API results
Confidence Calibration
Detection confidence varies by content characteristics:
| Factor | Impact on Accuracy |
|---|---|
| Content length | <200 words: significantly reduced; >500 words: optimal |
| Content type | Technical/formal: higher false positives; Creative/informal: more accurate |
| Language | English: best accuracy; Other languages: reduced accuracy |
| Editing level | Unedited AI: high accuracy; Heavily edited: reduced accuracy |
| Model recency | Newer AI models may evade older detectors |
Adjust confidence thresholds based on these factors:
Adjusted Confidence = Raw Score × Length Factor × Type Factor × Language Factor
Length Factor:
< 200 words: 0.6
200-500 words: 0.8
500-1000 words: 0.9
> 1000 words: 1.0
Type Factor:
Creative/informal: 1.0
News/general: 0.95
Technical/academic: 0.85
Legal/regulatory: 0.80
False Positive Mitigation
False positives (human content flagged as AI) are the primary risk:
Common false positive triggers:
- Highly structured, formal writing
- Non-native English speakers with learned patterns
- Template-based content (legal, regulatory, boilerplate)
- Content written with AI writing assistants (grammar tools)
- SEO-optimized content with formulaic structure
Mitigation strategies:
- Never make AI determination based on a single API
- Consider the writer’s known style and history
- Check for provenance (drafts, version history, interview notes)
- Weight stylistic indicators alongside probability scores
- Establish appeal process for contested determinations
Editorial Workflow Integration
Incoming Content Pipeline
Content Submitted
↓
Automatic API Screening (all 3 APIs)
↓
Ensemble Score Calculated
↓
[Score ≥ 0.65?] ──Yes──→ Flag for Editorial Review
↓ No ↓
Normal workflow Human reviewer examines:
- API results and disagreements
- Content provenance
- Author history
- Stylistic indicators
↓
[Decision: Accept / Request revision / Reject]
Bulk Screening
For auditing existing content libraries:
- Extract all content with metadata (author, date, word count)
- Run through ensemble API pipeline
- Sort by ensemble score descending
- Manually review top 10% (highest AI probability)
- Sample review middle tier for calibration
- Document findings and adjust thresholds
Reporting Format
For each content piece analyzed:
## AI Authenticity Analysis: [Content Title]
### Ensemble Result
- **Classification**: [Very likely AI / Probably AI / Inconclusive / Probably Human / Very likely Human]
- **Ensemble Score**: [X.XX]
- **Adjusted Confidence**: [X.XX] (after calibration factors)
### API Results
| API | Score | Classification | Key Indicators |
|-----|-------|---------------|---------------|
| Pangram | X.XX | [classification] | [indicators] |
| Grammarly | X.XX | [classification] | [indicators] |
| Chrysalis | X.XX | [classification] | [indicators] |
### Calibration Factors Applied
- Content length: [X words] → Factor: [X.X]
- Content type: [type] → Factor: [X.X]
- Language: [language] → Factor: [X.X]
### Stylistic Indicators
[Notable patterns identified by APIs]
### Recommendation
[Action recommendation with rationale]
Inputs Required
- Content text to analyze (minimum 200 words recommended)
- Content metadata: author, publication date, content type
- API credentials for Pangram, Grammarly Authorship, and/or Chrysalis
- Context: editorial review, audit, or real-time screening
- Any known provenance information (drafts, version history)
Output Format
## Content Authenticity Report
### Summary
- Content: [title/identifier]
- Word count: [X]
- Ensemble classification: [classification]
- Ensemble score: [X.XX]
- Recommendation: [accept / review / flag]
### Detailed API Results
[Per-API breakdown with scores and indicators]
### Confidence Assessment
[Calibration factors and adjusted confidence]
### Editorial Action
[Specific recommendation with reasoning]
Anti-Patterns
- Treating detection as proof — AI detection is probabilistic; never use a single API score as definitive evidence of AI authorship
- Binary AI/human classification — Content exists on a spectrum; AI-assisted, AI-drafted-human-edited, and fully AI-generated are all different
- Ignoring false positives — Non-native speakers, formal writers, and template-based content trigger false positives; always investigate
- Single-API reliance — No detector is reliable enough alone; always use ensemble scoring
- Threshold rigidity — Adjust confidence thresholds based on content type, length, and language
- Automated rejection — Detection should flag for human review, never automatically reject content
- Ignoring provenance — If the author can produce drafts, notes, or version history, that outweighs API scores
- Weaponizing detection — Detection tools are for editorial integrity, not for punishing contributors without due process
