The GD Chatbot Accuracy System


🎯 Chatbot Accuracy Systems v2.2.0

Core Principle

“Multiple sources of truth, cross-verified and disambiguated, with explicit guardrails against common errors.”

📊 System Overview

The GD Chatbot employs an eight-layer accuracy system to ensure users receive the most accurate, reliable, and comprehensive information about the Grateful Dead. Each layer serves a specific purpose and works together to prevent misinformation, resolve ambiguities, and provide verified facts.

8
Accuracy Layers
600+
Songs Detected
2,340+
Shows Indexed
125+
Disambiguated Terms
55+
Context Files
100%
Verified Sources

🏗️ Multi-Layer Architecture

User Question

[1] Disambiguation Layer ────→ Resolve ambiguous terms

[2] Content Sanitization ────→ Filter incorrect data

[3] Knowledge Base ──────────→ 8 core topic files (60KB+)

[4] Context Files ───────────→ Specialized detailed data (~55 files)

[5] Pinecone Vector DB ──────→ Semantic search (optional)

[6] Tavily Web Search ───────→ Current information (always on)

[7] Token Optimization ──────→ Intent-based context budgeting

[8] System Prompt Guardrails → Enforce accuracy rules

Claude AI Processing

Verified Response

🔍 The Eight Layers

1

Disambiguation Layer

Purpose: Resolve ambiguous terms before processing

Coverage: 125+ disambiguated terms across 19 categories

Examples:

  • “The Matrix” → San Francisco venue (not the movie)
  • “Tiger” → Jerry’s guitar (not the animal)
  • “The Archive” → UCSC collection (not Internet Archive)
  • “GDP” → Grateful Dead Productions (not economics)

Benefit: Prevents context confusion and ensures correct interpretation

2

Content Sanitization

Purpose: Filter out incorrect or conflicting information

Special Case: The Bahr Gallery

  • All incorrect location references removed from knowledge base
  • Exclusive source: bahr-gallery.md
  • Location always: Oyster Bay, Long Island, NY
  • Triple-layer protection: Sanitization + Injection + System Prompt

Benefit: Eliminates common errors (e.g., Bahr Gallery in San Francisco)

3

Knowledge Base System

Structure: 8 focused topic files in context/core/

File Content Size
band-and-history.md Formation, evolution, members, eras ~12KB
books-and-literature.md Essential bibliography ~9KB
culture-and-community.md Deadhead culture, philosophy ~10KB
equipment.md Instruments, Wall of Sound ~6KB
galleries-and-art.md Art galleries, museums ~3KB
music-and-recordings.md Song catalog, discography ~7KB
resources-and-media.md Online communities, URLs ~12KB
terminology.md 125+ disambiguated terms ~8KB

Benefit: Organized, topic-focused knowledge for better AI comprehension

4

Context Files Integration

Structure: 55+ specialized files across 5 subdirectories

📅 Setlist Database (2,340 Shows)

  • 31 CSV files (1965-1995, one per year)
  • Complete setlists for every show
  • Venue names and locations
  • Segue information (e.g., “Scarlet > Fire”)

🎵 Song Database (605 Songs)

  • Song titles and composers
  • First performance dates
  • Performance frequency
  • Album appearances

🎸 Equipment Database

  • Instrument specifications
  • Ownership history
  • Technical details
  • Usage periods

🎤 Interview Archives

  • Direct quotes from band members
  • Interview URLs and sources
  • Historical context from primary sources

🏛️ UC Santa Cruz Archive

  • Official archive documentation
  • Collection descriptions
  • Research resources

Benefit: Deep-dive accuracy with specialized, verified data sources

5

Pinecone Vector Database (Optional)

Purpose: Semantic search using AI embeddings

How It Works:

  • Converts knowledge into vector embeddings
  • Finds semantically similar content
  • Returns top-K most relevant results
  • Works with natural language queries

Example: Query “Jerry’s favorite guitar” finds relevant content about Tiger and Wolf without exact keyword matches

Benefit: Finds relevant context even with different wording

6

Tavily Web Search (Always On)

Purpose: Real-time information from trusted sources

Features:

  • Trusted Domain Filtering: 50+ pre-approved Grateful Dead websites
  • Search Depth: Basic (faster) or Advanced (thorough)
  • Max Results: 3-10 results per search
  • Always Current: Latest news, events, releases

Trusted Domains Include:

  • dead.net (official site)
  • archive.org (live recordings)
  • deaddisc.com (discography)
  • jerrybase.com (Jerry Garcia)
  • And 45+ more verified sources

Benefit: Current information with source verification

7

Token Optimization System (Optional)

Purpose: Intelligent context selection based on query intent

How It Works:

  1. Intent Detection: Analyzes query to determine topic
  2. Context Selection: Loads only relevant context files
  3. Token Budgeting: Enforces token limits (default 500)
  4. Caching: Stores fragments for faster retrieval

Example: Equipment question loads equipment files, not setlist data

Benefit: Faster responses, lower API costs, focused context

8

System Prompt Guardrails

Purpose: Explicit rules to prevent common errors

Key Guardrails:

  • Never invent setlists or show dates
  • Always cite sources for quotes
  • Distinguish between studio and live versions
  • Clarify composer vs. performer
  • Use correct venue names and locations
  • Verify equipment specifications
  • Acknowledge uncertainty when appropriate
  • Prioritize official sources

Benefit: Enforces accuracy standards at the AI processing level

🎯 How It All Works Together

Example Query: “Tell me about Dark Star at Cornell”

Layer 1 (Disambiguation): “Dark Star” = song (not astronomy)

Layer 2 (Sanitization): No conflicting data to filter

Layer 3 (Knowledge Base): Loads song info from music-and-recordings.md

Layer 4 (Context Files): Searches setlists/1977.csv for 5/8/77

Layer 5 (Pinecone): Finds related Cornell ’77 content

Layer 6 (Tavily): Searches for current Cornell ’77 discussions

Layer 7 (Token Optimization): Focuses on setlist + song data

Layer 8 (Guardrails): Ensures accurate setlist reporting

Result: Accurate, comprehensive response with verified setlist, song history, and current context

🎵 Music Streaming Integration (v2.2.0)

New Accuracy Layer: Archive.org Database

Version 2.2.0 adds a ninth layer specifically for music streaming:

  • Database: 4 new tables with Archive.org metadata
  • Shows: 2,340+ shows with complete information
  • Recordings: Individual track data
  • Sync: Automatic background updates
  • Detection: 600+ songs automatically recognized

Benefit: Song mentions become clickable links with instant access to live recordings

📈 Accuracy Metrics

Metric Value Description
Source Verification 100% All context files from verified sources
Show Data Accuracy 100% 2,340 shows with verified setlists
Song Detection 600+ Grateful Dead songs automatically recognized
Disambiguation Coverage 125+ Terms with explicit context clarification
Context Files 55+ Specialized knowledge sources
Trusted Domains 50+ Pre-approved websites for web search
Response Time < 3s Average response with full context

🔒 Quality Assurance

Common Errors We Prevent

  • ❌ Inventing show dates or setlists
  • ❌ Confusing venue locations (e.g., Bahr Gallery)
  • ❌ Misattributing songs to wrong composers
  • ❌ Mixing up equipment specifications
  • ❌ Confusing studio vs. live versions
  • ❌ Using unreliable sources
  • ❌ Hallucinating band member quotes
  • ❌ Incorrect disambiguation of terms

🎯 Best Practices for Users

How to Get the Most Accurate Responses

  1. Be Specific: “Dark Star at Cornell ’77” vs. “Dark Star”
  2. Use Dates: “5/8/77” or “May 8, 1977”
  3. Specify Context: “Jerry’s Tiger guitar” vs. just “Tiger”
  4. Ask for Sources: “Where can I verify this?”
  5. Clarify Ambiguity: “The Matrix venue” vs. “The Matrix”
  6. Request Details: “Full setlist” vs. “What songs”

📊 System Status

Current Configuration

  • 8 Core Topic Files Loaded
  • 55+ Context Files Available
  • 125+ Terms Disambiguated
  • 2,340+ Shows Indexed
  • 600+ Songs Detected
  • 50+ Trusted Domains Configured
  • Archive.org Integration Active
  • Streaming Services Available (Optional)

🔮 Future Enhancements

Planned improvements to the accuracy system:

  • Machine Learning: Train on user feedback to improve responses
  • Expanded Sources: Add more verified Grateful Dead resources
  • Real-Time Verification: Cross-check facts against multiple sources
  • User Corrections: Allow users to report inaccuracies
  • Confidence Scores: Display confidence level for each response
  • Source Citations: Automatic footnotes for all facts

GD Chatbot v2.2.0 Accuracy Systems |
Eight layers of verification for the most accurate Grateful Dead information |
IT Influentials