Skip to main content
< All Topics
Print

ScubaGPT Showcase

AI Project Showcase: ScubaGPT

Document type: AI Project Showcase

Project: ScubaGPT

Status: Active β€” canonical version

Last updated by Claude Code: 2026-04-19

Populated from: CLAUDE.md, ARCHITECTURE.md, REQUIREMENTS.md, documentation/README.md, documentation/CLAUDE.md, scubagpt-chatbot/readme.txt, scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md, data-pipelines/README.md, git log, plugin-installs/ directory listing

Section 0 β€” Pre-Population Audit

0.1 β€” Project root reconnaissance

Root structure: 29 items including scubagpt-chatbot/ (plugin), data-pipelines/ (22 Python scripts), .agents/ (Skills/Agents), Scuba GPT Training Data/ (580+ files, 3.7 GB), plugin-installs/ (20 versioned zips), documentation/, marketing/.

Context docs found: CLAUDE.md (root), documentation/CLAUDE.md, ARCHITECTURE.md, REQUIREMENTS.md, documentation/README.md, data-pipelines/README.md, scubagpt-chatbot/readme.txt, 17 markdown files in documentation/, 3 markdown files in scubagpt-chatbot/documentation/.

No changelog file by name β€” release history maintained in scubagpt-chatbot/readme.txt changelog section.

0.2 β€” Knowledge system discovery

Knowledge base directories:

  • scubagpt-chatbot/knowledgebase/ β€” runtime KB injected into prompts
  • data/ β€” dive-sites.json (14,642 sites), dive-operators.json (6,900+ operators), seasonal-baselines.json, analytics JSON
  • almanac/ β€” 17 regional almanac markdown files
  • destinations/ β€” 12 regional destination guides
  • topics/ β€” 15+ topical reference files plus templates directory
  • reference-encyclopedia.md β€” 150K char prompt-cached encyclopedia distilled from 536 PDFs
  • scubagpt-chatbot/disambiguations/ β€” diving terminology disambiguation JSON (EN + multilingual: ES/FR/DE/JA)
  • Scuba GPT Training Data/ β€” 580+ source files (CSVs, PDFs, seed lists) β€” 3.7 GB, excluded from git

Vector store: Pinecone (external) β€” 12,487 vectors across 4 namespaces (PDF corpus, almanac, KB markdown, site-level)

Prompt files: System prompt configured via ScubaGPT_Admin settings page and assembled dynamically by ScubaGPT_Chat::build_augmented_prompt().

0.3 β€” Version and evolution history

Git commits touching products/scuba-gpt/: 7 commits from 2026-03-27 to 2026-04-18 (repository is part of the larger ITI monorepo; earlier development history predates the current git structure).

Version timeline from plugin releases (plugin-installs/ directory):

  • v1.0.0 β€” January 2026 (initial release)
  • v1.1.0 β€” January 2026 (safety guardrails, admin UI)
  • v1.2.0–v1.2.4 β€” January–February 2026 (AI Engine integration, external APIs, bug fixes)
  • v1.3.0–v1.3.4 β€” February–March 2026 (crash-proof rewrite, Google Places, security hardening)
  • v1.4.0–v1.4.1 β€” March–April 2026 (14,642 sites, tool use, Vision, trip planner, performance)
  • v1.5.0 β€” April 2026 (data enrichment, dual-layer map, streaming, 240-test suite)

20 versioned zip files in plugin-installs/.

0.4 β€” Technology and dependency stack

Platform: WordPress plugin (PHP 8.0+, WordPress 6.0+)
AI models: Anthropic Claude (Messages API) β€” model, vision, tool use
Vector DB: Pinecone β€” semantic search via OpenAI/Voyage embeddings
Web search: Tavily β€” real-time web context
Maps: Leaflet.js (CDN) + Leaflet.markercluster
Browser APIs: Web Speech API (voice input), localStorage (sessions), FormData (image upload)
Data pipelines: Python 3 with openpyxl, pymupdf, openai, pinecone, requests, tavily
Testing: pytest
External APIs: Open-Meteo Marine, Stormglass, NOAA CO-OPS, WorldTides, OpenStreetMap Nominatim, RapidAPI (TheDiveAPI, World Dive Centres)
Shared library: ITI Shared Library (Claude API client, Tavily, Pinecone, Base Agent, Chat Handler, Vision Handler, Workflow Adapter)

0.5 β€” Product artifacts

Plugin zip releases: 20 versioned zips in plugin-installs/ from v1.0.0 through v1.5.0-map-streaming (latest: 4.3 MB)
SVG icons: assets/images/icon-dive-site.svg, assets/images/icon-dive-operator.svg
Data exports: data-pipelines/output/ (QA spreadsheet, SQL import, vector manifests, dive-sites.xlsx, dive-operators.xlsx)
Documentation: 17 markdown files in documentation/, 3 in scubagpt-chatbot/documentation/

0.6 β€” Core context documents read

  • CLAUDE.md (root) β€” project overview, directory structure, key features, development notes
  • ARCHITECTURE.md β€” component architecture, data flow, security, technology stack
  • REQUIREMENTS.md β€” user stories v1.0–v1.5.0, non-functional requirements, traceability
  • documentation/README.md β€” project README with feature list, quick start, version history
  • documentation/CLAUDE.md β€” documentation-specific context
  • scubagpt-chatbot/readme.txt β€” WordPress plugin readme with full changelog
  • scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md β€” visual design system
  • data-pipelines/README.md β€” pipeline steps and structure

0.7 β€” Market and competitive research files

No dedicated competitive analysis or market research files found. The marketing/ directory exists but is empty.


Section 1 β€” Product Overview

1.1 Product name and tagline

Name: ScubaGPT
Tagline: AI-powered chatbot and interactive map for recreational scuba divers, delivering expert guidance on diving techniques, safety, equipment, and 14,642 destinations worldwide.
Current status: Live
First commit / project start: January 2026 (v1.0.0 initial release per changelog; earliest git commit touching this project: 2026-03-27)

1.2 What it is

ScubaGPT is a WordPress plugin that provides an AI chatbot and dual-layer interactive map for recreational scuba divers. It combines Claude AI with a 6-layer RAG knowledge system (prompt-cached encyclopedia, keyword-gated markdown KB, Pinecone vector search, Tavily web search, live marine API tools, and diving terminology disambiguation) to deliver expert-level guidance across 60+ countries. The interactive Leaflet.js map visualises 14,642 enriched dive sites and 6,900+ dive operators with searchable, clustered markers and detail modals.

1.3 What makes it meaningfully different

The founding insight is that recreational divers need a safety-conscious, domain-expert AI assistant rather than a generic chatbot. Existing AI chatbots lack the domain-specific guardrails required for diving advice (medical fitness referrals, gas-planning refusals, depth-vs-certification cross-checks) and don’t have access to curated dive site data with provenance tracking. ScubaGPT’s 6-layer RAG architecture and safety pipeline were purpose-built for this domain, and its data enrichment pipeline (Tavily web search + Claude fallback with provenance tagging) means answers are grounded in verifiable, source-attributed content rather than unchecked AI generation.

πŸ’‘ [CLAUDE NOTE: inferred from CLAUDE.md safety emphasis, REQUIREMENTS.md safety user stories, and the explicit provenance/transparency architecture]

1.4 Platform and deployment context

Platform: WordPress plugin
Deployment: Self-hosted on WordPress (wp-content/plugins)
Primary interface: Chat widget + interactive map (shortcodes: [scubagpt_chat], [scubagpt_map])


Section 2 β€” User Needs and Problem Statement

2.1 Target user

Primary user: Recreational scuba divers planning trips, researching destinations, and seeking safety-conscious diving guidance. Range from beginners (Open Water certification) to experienced divers. Non-technical β€” they interact through a chat interface and visual map.
Secondary users: Dive operators (embeddable white-label widget), WordPress site administrators (admin dashboard and settings)
User environment: Embedded on a WordPress diving website (scubagpt.com), accessed via desktop or mobile browsers

2.2 The problem being solved

When recreational divers research dive destinations, conditions, and safety information online, they want to get accurate, safety-conscious answers from a domain expert, so they can plan trips with realistic expectations and avoid risks that exceed their certification level.

πŸ’‘ [CLAUDE NOTE: inferred from REQUIREMENTS.md user stories US-CORE-01 through US-CORE-03 and the safety guardrails architecture]

2.3 Unmet needs this addresses

Need How the product addresses it Source of evidence
Safety-critical advice with guardrails Medical fitness referrals, gas-planning refusals, depth-vs-certification cross-checks via ScubaGPT_Safety REQUIREMENTS.md US-CORE-02, US-CORE-03; ARCHITECTURE.md Safety Layer
Current marine conditions for dive planning Claude tool use with 8 tools calling live APIs (Open-Meteo, Stormglass, NOAA, WorldTides) REQUIREMENTS.md US-1.4-T1-01; ARCHITECTURE.md Feature Modules
Visual exploration of global dive sites Dual-layer Leaflet map with 14,642 sites and 6,900+ operators, search, and detail modals REQUIREMENTS.md US-1.5-01, US-1.5-02
Marine life identification from photos Claude Vision API integration via /chat/image endpoint REQUIREMENTS.md US-1.4-T1-02
Structured trip planning dialogue Multi-step trip planner with state machine collecting destination, dates, preferences, certification REQUIREMENTS.md US-1.4-T2-02
Trustworthy, source-attributed information Provenance tracking (description_source: api / web_sourced / ai_generated) and (AI Generated) transparency labels CLAUDE.md Important Context; test_data_attribution.py

2.4 What users were doing before this existed

Recreational divers relied on fragmented sources: PADI dive site databases (limited detail), diving forums (unvetted advice), Google searches across dozens of dive sites, and manual cross-referencing of weather/tide/condition data from separate marine weather services. No single tool combined domain-expert AI, curated dive site data, live conditions, and safety guardrails.

πŸ’‘ [CLAUDE NOTE: inferred from the product’s multi-source RAG architecture and the explicit integration of external marine APIs β€” these design choices imply the problem was information fragmentation]


Section 3 β€” Market Context and Competitive Landscape

3.1 Market category

Primary category: AI-powered vertical-market chatbots / domain-specific AI assistants
Market maturity: Emerging (AI chatbots for niche domains are proliferating post-2024, but few have deep domain knowledge systems)
Key dynamics: Rapid commoditization of generic AI chat; differentiation shifting to domain data, safety guardrails, and retrieval quality. Dive industry itself is stable with ~6M active certified divers globally. ⚑

πŸ’‘ [CLAUDE KNOWLEDGE β€” verify before publishing: diver count is approximate from PADI certification statistics]

3.2 Competitive landscape

Product / Company Approach Strengths Key gap ScubaGPT addresses Source
⚑ DiveBook (divebook.app) AI dive recommendations + digital log + trip booking + community Integrated booking monetization; AI personalization by experience level No systematic safety rails; no prompt-cached knowledge architecture; shallower RAG April 2026 web search
⚑ Scuba Steve AI (scubasteve.rocks) AI dive assistant + marine photo ID + dive planning checklists + SIMI training mode Closest functional match: AI chat, marine ID, planning checklists; mobile-first No medical/gas-planning safety detection; no multi-source RAG; no knowledge encyclopedia April 2026 web search
⚑ DiveHelp (divehelp.com) AI-powered companion + voice assistant + real-time conditions + training Voice control; smartwatch sync; AI photo editing; dive computer integration New entrant; breadth-first approach; no curated KB depth; no safety-critical guardrails April 2026 web search
⚑ theDiveGlobe / Neptune AI 3D globe dive site explorer + AI recommendations + buddy matching + dive passport Strong UX (3D globe); gamification (passport/badges); community-driven data AI advising lacks depth; no safety system; no tool use for live conditions April 2026 web search
⚑ DiveKit (divekit.app) Technical dive planning tools (deco planner, gas blender, MOD/EAD) Offline-first; high-contrast dive-condition UI; serious technical planning No AI; no conversational interface; technical divers only; no destination knowledge April 2026 web search
⚑ FINS (getfins.app) AI marine species ID (5,000+ species) + dive log + destination planning Largest species database; strong photo ID; gamified sighting tracking No conversational AI; no safety guardrails; species ID only, not an advisor April 2026 web search
⚑ ScubaSnap (scubasnap.app) AI fish recognition + dive log + community species database Simple photo ID; community contributions; 14,900+ dive sites listed Small user base (~140); limited species coverage (~108); no AI chat or safety April 2026 web search
⚑ OceanScout (oceanscout.app) Gamified marine species collection (Pokémon-style) + offline AI ID Offline capability; gamification; 100+ species Gamified niche; not a planning or advisory tool April 2026 web search
⚑ ScubAI (scub.ai) AI underwater photography + color correction; Fish ID coming Q3 2026 Best-in-class underwater photo editing; depth-aware color science Photography-focused; Fish ID not yet shipped; no advisory or planning April 2026 web search
⚑ PADI App Unified certification + logbook + dive prep + shop locator Certification authority; massive user base; official training pipeline No AI advising (as of April 2026); static content; no real-time conditions April 2026 web search
⚑ ScubaBoard Forum community 20+ years of diver knowledge; peer advice No AI; hard to search; variable quality; declining engagement General knowledge
⚑ DAN (Divers Alert Network) Safety resources + insurance + medical hotline Authoritative safety information; medical expertise Static content; no interactive advising; no trip planning General knowledge

3.3 Market positioning

ScubaGPT positions as a domain-expert AI assistant purpose-built for diving safety and trip planning, differentiated from generic chatbots by its 6-layer RAG architecture, safety guardrails, and curated data with provenance. It sits between the broad but shallow coverage of general AI and the deep but static content of traditional dive databases.

πŸ’‘ [CLAUDE NOTE: inferred from the product’s architecture and feature set relative to known alternatives]

3.4 Defensibility assessment

ScubaGPT’s defensibility rests on three layers: (1) a curated, enriched dive site database of 14,642 sites with provenance tracking and 100% description coverage β€” built through a 22-script data pipeline ingesting from 4 external sources with Tavily web enrichment; (2) domain-specific safety guardrails that require diving expertise to configure correctly (medical referrals, gas-planning refusals, certification-depth cross-checks); and (3) a 12,487-vector Pinecone index spanning 4 namespaces that powers precise retrieval.


Section 4 β€” Requirements Framing

4.1 How requirements were approached

Requirements were formalized in a structured REQUIREMENTS.md document using user story format (As a / I want / So that) with acceptance criteria tied to specific PHP classes and JavaScript files. Requirements are organized in tiered delivery groups (Critical, High Value, Strategic, Exploratory) across version milestones (v1.0–v1.5.0). Non-functional requirements cover security, performance, accessibility, and safety.

4.2 Core requirements (what it must do)

  1. Deliver safety-conscious diving advice with medical fitness referrals, gas-planning refusals, and depth-vs-certification cross-checks (US-CORE-02, US-CORE-03)
  2. Retrieve contextual knowledge from 6 layers: encyclopedia, keyword KB, Pinecone vectors, Tavily web, live tools, and disambiguation (US-CORE-01, ARCHITECTURE.md)
  3. Call live marine condition APIs (waves, tides, weather, suitability) via Claude tool use (US-1.4-T1-01)
  4. Render an interactive dual-layer map of 14,642 dive sites and 6,900+ operators with search and detail modals (US-1.5-01, US-1.5-02)
  5. Stream responses in real time with live markdown rendering (US-1.5-04, US-1.5-05)

4.3 Constraints and non-goals

Hard constraints:

  • All AI-generated descriptions must end with (AI Generated) for transparency (CLAUDE.md)
  • Safety is paramount β€” guardrails for medical, gas-planning, and depth-vs-certification are non-negotiable (REQUIREMENTS.md Safety section)
  • Plugin must never crash WordPress β€” 5-layer safety guardrail system (readme.txt v1.3.0 notes)

Explicit non-goals:

  • Not a dive computer or decompression calculator β€” gas-planning requests are explicitly refused (US-CORE-02)
  • Not a medical clearance tool β€” medical fitness queries are redirected to dive physicians and DAN (US-CORE-02)
  • Training data excluded from git due to 3.7 GB size (CLAUDE.md)

4.4 Key design decisions and their rationale

Decision Alternatives considered Rationale Evidence source
6-layer RAG over single-source retrieval Pure Pinecone RAG, pure KB injection, fine-tuning Each layer handles different knowledge needs: encyclopedia for breadth, keyword KB for depth, Pinecone for semantic, Tavily for recency, tools for live data, disambiguation for terminology ARCHITECTURE.md Knowledge System
Tavily web search + Claude fallback for descriptions instead of proximity-based backfill Proximity-based depth/type backfill from nearby sites Proximity backfill was tested and removed as too localized; web search produces higher-quality, verifiable descriptions CLAUDE.md Important Context
Data loading cascade (DB β†’ JSON β†’ CSV) for map Direct JSON only, DB only Graceful fallback ensures map functions in any deployment state; DB allows WordPress-native queries when populated REQUIREMENTS.md US-1.5-01; class-scubagpt-map.php
Provenance tagging on all descriptions No provenance tracking, simple AI/human labels Three-tier tracking (api / web_sourced / ai_generated) enables transparency auditing and prevents circular RAG (AI descriptions excluded from embeddings) CLAUDE.md, test_data_attribution.py

Section 5 β€” Knowledge System Architecture

5.1 Knowledge system overview

KB type: Multi-layer RAG with static files, vector store, web search, live APIs, and dynamic prompt assembly
Location in repo: scubagpt-chatbot/knowledgebase/ (runtime), data-pipelines/ (build), Scuba GPT Training Data/ (sources)
Estimated size: ~200 files in runtime KB; 12,487 Pinecone vectors; 150K char encyclopedia; 14,642 site records; 6,900+ operator records

5.2 Knowledge system structure


scubagpt-chatbot/knowledgebase/
β”œβ”€β”€ reference-encyclopedia.md        # 150K char prompt-cached encyclopedia (536 PDFs distilled)
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ dive-sites.json              # 14,642 enriched sites with provenance
β”‚   β”œβ”€β”€ dive-operators.json          # 6,900+ operators with GPS, certification parsing
β”‚   β”œβ”€β”€ seasonal-baselines.json      # NOAA monthly temperature baselines
β”‚   β”œβ”€β”€ country-analytics.json       # Derivative analytics
β”‚   β”œβ”€β”€ region-analytics.json
β”‚   └── species-analytics.json
β”œβ”€β”€ almanac/                         # 17 regional almanac .md files (~135K words total)
β”‚   β”œβ”€β”€ caribbean.md
β”‚   β”œβ”€β”€ indo-pacific.md
β”‚   β”œβ”€β”€ ... (15 more regions)
β”œβ”€β”€ destinations/                    # 12 regional destination guides
β”‚   β”œβ”€β”€ caribbean.md
β”‚   β”œβ”€β”€ southeast-asia.md
β”‚   β”œβ”€β”€ ... (10 more regions)
β”œβ”€β”€ topics/                          # 15+ topical reference files
β”‚   β”œβ”€β”€ equipment-guide.md
β”‚   β”œβ”€β”€ safety-medicine.md
β”‚   β”œβ”€β”€ marine-life.md
β”‚   β”œβ”€β”€ seasonal-dive-planner.md
β”‚   β”œβ”€β”€ ... (11+ more)
β”‚   └── templates/                   # Content generation templates
└── disambiguations/                 # (sibling directory)
    β”œβ”€β”€ scuba-diving-terms.json      # English terminology
    β”œβ”€β”€ scuba-diving-terms-es.json   # Spanish
    β”œβ”€β”€ scuba-diving-terms-fr.json   # French
    β”œβ”€β”€ scuba-diving-terms-de.json   # German
    └── scuba-diving-terms-ja.json   # Japanese

5.3 Knowledge categories

Category Files / format Purpose Update frequency
Reference encyclopedia 1 markdown file (150K chars) Prompt-cached comprehensive diving reference Regenerated via pipeline script 02/13
Regional almanacs 17 markdown files Seasonal conditions, marine life, site highlights per region Regenerated via pipeline script 11
Destination guides 12 markdown files Detailed regional diving destination information Manual curation
Topical references 15+ markdown files Equipment, safety, marine life, conservation, etc. Manual curation
Dive site data JSON (14,642 records) Georeferenced sites with descriptions, types, marine life, provenance Pipeline scripts 04–09, 15–17
Operator data JSON (6,900+ records) GPS-located operators with certification, tier, nearby sites Pipeline scripts 20–22
Seasonal baselines JSON NOAA monthly temperature baselines per destination Pipeline script 06
Analytics 3 JSON files Country, region, species derivative analytics Pipeline script 10
Disambiguation terms 5 JSON files Diving terminology for system prompt (EN + 4 languages) Manual curation
Vector embeddings Pinecone index (12,487 vectors, 4 namespaces) Semantic retrieval for chat context Pipeline scripts 05, 17, 22

5.4 How the knowledge system was built

Step 1 β€” Source identification:
580+ source files assembled: 536 PDFs (US Diving Manual, PADI materials, marine biology texts), 200+ diving website seed lists, dive site CSVs with coordinates, and 4 external API sources (PADI/OpenDiveMap, Dive Vibe Community, TheDiveAPI, World Dive Centres API).

Step 2 β€” Curation and cleaning:
Pipeline script 01 extracts text from 1,077 PDFs, strips boilerplate, and classifies by topic. Script 04 normalises CSV sites into structured JSON. Scripts 07–08 ingest external API data with GPS grid traversal and rate limiting. Script 09 runs multi-phase enrichment (raw field recovery, keyword extraction, region standardisation, Tavily web search + Claude fallback for descriptions).

Step 3 β€” Structuring and formatting:
Encyclopedia (script 02/13) synthesises extracted text into a 150K char prompt-cached reference. Almanac files (script 11) are generated per region. Topic KB files (script 03) are created from classified PDFs. Dive site schema extended with description_source, marine_life_source, visibility_m, rating, entry_type, ocean.

Step 4 β€” Embedding / indexing:
Pipeline script 05 chunks extracted text and upserts to Pinecone with topic/region/cert metadata. Script 17 embeds individual dive sites. Script 22 embeds operators with operator- prefix. AI-generated descriptions are excluded from embeddings to prevent circular RAG. Total: 12,487 vectors across 4 namespaces.

Step 5 β€” Retrieval configuration:
ScubaGPT_Knowledgebase loads at most one destination and one topic file per query with a 60K char budget and transient caching. Pinecone queries use top-k=5 and similarity threshold 0.7. Tavily adds real-time web context. Claude tool use provides live marine conditions.

Step 6 β€” Testing and validation:
240 pytest tests across 4 files validate data schema, GeoJSON structure, provenance tracking, and frontend behaviour. QA spreadsheet generated via data pipeline for enrichment auditing.

5.5 System prompt and agent configuration

System prompt approach: Dynamic assembly via build_augmented_prompt() β€” base system prompt (admin-configurable) β†’ disambiguation terms β†’ language detection β†’ keyword-gated KB injection β†’ safety analysis β†’ seasonal context β†’ dive plan analysis β†’ retrieved context from RAG layers.
Key behavioural guardrails: Medical fitness queries redirected to dive physicians/DAN; gas-planning/deco calculations refused; depth-vs-certification cross-checked; species ID includes confidence framing and conservation compliance.
Persona / tone configuration: MSDT-level diving advisor β€” knowledgeable, adventurous, safety-conscious, approachable, encouraging. Sound like a knowledgeable dive buddy, never robotic or corporate.
Tool use / function calling: 8 Claude tools β€” get_dive_conditions, get_tide_info, get_marine_weather, check_dive_suitability, get_equipment_recommendation, search_dive_sites_natural, and related handlers.


Section 6 β€” Build Methodology

6.1 Development approach

AI-assisted iterative development using Cursor IDE with Claude Code. The project follows a CLAUDE.md-driven specification approach where context documents anchor each development session. Formal requirements exist in REQUIREMENTS.md with tiered user stories and acceptance criteria. Data pipelines are built as numbered, sequential Python scripts.

6.2 Build phases

Phase Approximate timeframe What was built Key milestones
Foundation January 2026 Core chat plugin: Claude AI integration, Pinecone, Tavily, conversation management, rate limiting, admin UI v1.0.0, v1.1.0 (safety guardrails, admin dashboard)
Integration January–February 2026 AI Engine integration, external marine APIs (Open-Meteo, Stormglass, NOAA, WorldTides), crash-proof rewrite v1.2.0–v1.2.4, v1.3.0 (crash-proof rewrite)
Security & APIs February–March 2026 Security hardening (10 fixes), Google Places API, RapidAPI dive sites v1.3.1–v1.3.4 (CSRF, XSS, rate limiting, GDPR)
Data Expansion March–April 2026 14,642 sites from 4 sources, 22 data pipeline scripts, tool use, Vision, trip planner, species log, operators, multilingual v1.4.0, v1.4.1 (parallel RAG, performance)
Enrichment & UX April 2026 Data enrichment (100% descriptions), dual-layer map, real-time streaming, 240-test suite, operator pipeline v1.5.0 (current)

6.3 Claude Code / AI-assisted development patterns

The codebase shows extensive AI-assisted development evidenced by: (1) structured CLAUDE.md files at multiple directory levels providing context for AI assistants; (2) formal ARCHITECTURE.md and REQUIREMENTS.md that serve as both human documentation and AI session context; (3) numbered, sequential data pipeline scripts (01–22) that follow a clear build-on-previous pattern; (4) a .agents/ directory with 5 product-level Skills and 1 Agent for quarterly data maintenance; and (5) a comprehensive pytest test suite that validates source code patterns (PHP, JS, CSS) rather than executing them β€” a pattern consistent with AI-assisted test generation.

6.4 Key technical challenges and how they were resolved

Challenge How resolved Evidence
Plugin crashing WordPress on errors 5-layer safety guardrail system: pre-install validation, safe activation, graceful degradation, automatic recovery, emergency shutdown readme.txt v1.3.0 changelog; REQUIREMENTS.md safety section
Data enrichment at scale (14,642 sites) 22-script pipeline with Tavily web search (90%) + Claude Haiku fallback (10%) + provenance tracking CLAUDE.md data enrichment notes; data-pipelines/
Preventing circular RAG from AI-generated content AI-generated descriptions excluded from Pinecone embeddings; provenance tagging enables filtering CLAUDE.md Important Context
Map performance with 20,000+ markers Leaflet.markercluster for both layers; data loaded via REST endpoints with chunked loading patterns map.js, test_map_shortcode.py TestMapJsFetchPatterns
Plugin packaging missing data files Architectural flaw identified and fixed: zip rebuilt to include dive-sites.json and dive-operators.json (4.3 MB) plugin-installs/ directory (v1.5.0-map-streaming.zip)

Section 7 β€” AI Tools and Techniques

7.1 AI models and APIs used

Model / API Provider Role in product Integration method
Claude (Messages API) Anthropic Primary chat AI, tool use execution, description generation ITI Shared Library ITI_Claude_API
Claude Vision Anthropic Marine life photo identification ITI Shared Library ITI_Vision_Handler
Claude Haiku Anthropic Fallback description generation for dive sites/operators Direct API via data pipeline scripts
OpenAI text-embedding-3-small OpenAI Query embeddings for Pinecone retrieval ScubaGPT_API configuration
Tavily Search Tavily Real-time web context for chat; description sourcing for enrichment ITI Shared Library ITI_Tavily_API; pipeline scripts
Pinecone Pinecone Vector similarity search (12,487 vectors, 4 namespaces) ITI Shared Library ITI_Pinecone_API

7.2 AI orchestration and tooling

Tool Category Purpose
ITI Shared Library Orchestration Reusable WordPress components for Claude, Tavily, Pinecone, agents
Claude Tool Use (ITI_Claude_Tools) Function calling Multi-turn tool execution loop with 8 registered tools
ITI Workflow Adapter Orchestration Optional n8n routing for chat messages
Pinecone Vector DB 4-namespace index for semantic retrieval
Leaflet.js + markercluster Visualization Interactive map rendering with clustering

7.3 Prompting techniques used

  • [x] Chain-of-thought reasoning (implicit in multi-tool execution loops)
  • [ ] Few-shot examples in prompts
  • [x] Structured / JSON output prompting (tool return schemas)
  • [x] Tool use / function calling (8 tools via ITI_Claude_Tools)
  • [x] RAG context injection (6-layer: encyclopedia, KB, Pinecone, Tavily, tools, disambiguation)
  • [x] System prompt persona/role setting (MSDT-level advisor persona)
  • [x] Multi-turn conversation management (session-based history)
  • [x] Output guardrails / content filtering (medical, gas-planning, depth safety)
  • [x] Fallback / error recovery prompting (graceful degradation when tools unavailable)
  • [x] Prompt caching (anthropic-beta header for large system prompts)
  • [x] Dynamic prompt assembly (budget-controlled injection of KB, safety, seasonal context)

7.4 AI development tools used to build this

Tool How used in build
Cursor IDE with Claude Primary development environment β€” CLAUDE.md-driven sessions, code generation, test generation, documentation
Claude Code Context-aware coding, refactoring, test suite creation
ITI Agent System Orchestrator + specialist agents for architecture, testing, documentation
Product-level Skills (.agents/skills/) 5 Skills for ingestion, enrichment, scraping, QA, embeddings β€” used for data pipeline development

Section 8 β€” Version History and Evolution

8.1 Version timeline

Version Date Summary of changes Significance
v1.0.0 Jan 2026 Initial release: Claude AI chat, Pinecone RAG, Tavily web search, conversation history, rate limiting, admin settings Foundation product launch
v1.1.0 Jan 2026 5-layer safety guardrails, enhanced system prompt (9 rules), admin UI/statistics dashboard, news integration, Google Maps links Safety-first architecture established
v1.2.0 Jan 2026 AI Engine integration, external marine APIs (Open-Meteo, Stormglass, NOAA, WorldTides), function calling for live conditions Real-time data capability added
v1.2.1–v1.2.4 Jan–Feb 2026 Bug fixes: streaming, URL sanitization, AI Engine compatibility, duplicate loading protection Stability hardening
v1.3.0 Feb 2026 Complete rewrite for crash-proof operation: all code in Throwable catch blocks, recovery page, one-click restart Architectural resilience milestone
v1.3.1–v1.3.3 Feb 2026 Streaming fix, RapidAPI dive sites, Google Places API Feature expansion
v1.3.4 Mar 2026 Security hardening: 10 fixes (CSRF, XSS, rate limiting bypass, GDPR, daily token budget, API key rotation) Security milestone
v1.4.0 Mar 2026 14,642 dive sites from 4 sources, 14 data pipelines, Claude tool use (8 tools), Vision, trip planner, species log, operators, multilingual, dive log parsing, embeddable widget, buddy matching Major feature expansion
v1.4.1 Apr 2026 Parallel RAG lookups (curl_multi), prompt caching, streaming performance (requestAnimationFrame batching) Performance optimization
v1.5.0 Apr 2026 Data enrichment (100% descriptions, provenance), dual-layer map (sites + operators), real-time streaming with markdown, image upload, voice input, offline detection, session management, 240-test suite Current release β€” UX and data quality milestone

8.2 Notable pivots or scope changes

  1. AI Engine integration disabled by default (v1.3.0) β€” after compatibility issues across AI Engine plugin versions, the integration was made opt-in rather than automatic. This reflected a pivot from tight third-party coupling to a self-contained architecture.
  2. Proximity-based data backfill removed β€” during the v1.5.0 enrichment cycle, depth/type inference from nearby dive sites was tested and removed as too localized. Replaced by Tavily web search + Claude fallback for higher-quality, verifiable descriptions.
  3. Data packaging architecture change β€” the v1.5.0 plugin zip initially excluded data files (664 KB). After discovering the map would show no markers without them, the zip was rebuilt to include dive-sites.json and dive-operators.json (4.3 MB).

πŸ’‘ [CLAUDE NOTE: pivot details from CLAUDE.md “Important Context” and conversation history]

8.3 What has been cut or deferred

  • Fine-tuned models (training data and datasets exist in Fine Tunings/ but are not used in current architecture)
  • Mobile native app integration (listed in early roadmap, not implemented)
  • Content recommendation engine (listed in v1.1.0 roadmap)

Section 9 β€” Product Artifacts

9.1 Design and UX artifacts

Artifact Path Type What it shows
Dive site marker icon assets/images/icon-dive-site.svg SVG icon Blue circle with diver wave motif (24Γ—24)
Dive operator marker icon assets/images/icon-dive-operator.svg SVG icon Orange circle with shop motif (24Γ—24)
Visual Style Guide scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md Design system Color palette, typography, components, map components, streaming components

9.2 Documentation artifacts

Document Path Type Status
Architecture ARCHITECTURE.md System architecture Complete (v1.5.0)
Requirements REQUIREMENTS.md Software requirements Complete (v1.5.0)
README documentation/README.md Project documentation Complete (v1.5.0)
Plugin readme scubagpt-chatbot/readme.txt WordPress plugin readme Complete (v1.5.0)
Safety guardrails Multiple files in documentation/ Safety system docs Complete
UI/UX test plan documentation/UI-UX-TEST-PLAN.md Test plan Complete
This document SHOWCASE.md Project showcase Draft

9.3 Data and output artifacts

Artifact Path Description
Plugin releases (20 versions) plugin-installs/scubagpt-chatbot-v*.zip Versioned WordPress plugin zips from v1.0.0 to v1.5.0
Dive sites Excel export data-pipelines/output/dive-sites.xlsx 14,642 sites with all fields, styled headers, metadata sheet
Dive operators Excel export data-pipelines/output/dive-operators.xlsx 6,900+ operators with all fields
SQL import data-pipelines/output/dive-sites-import.sql WordPress database import
Pinecone vector manifest data-pipelines/output/pinecone-vectors.json Vector upsert manifest
QA spreadsheet data-pipelines/output/ Pre/post enrichment comparison

Section 10 β€” Product Ideation Story

10.1 Origin of the idea

ScubaGPT originated as a domain-specific vertical application of the GD Claude Chatbot architecture, adapted for the recreational scuba diving market. The project started in January 2026, building on an existing WordPress chatbot framework and applying it to a domain where safety-critical AI guidance, curated geographic data, and real-time marine conditions create a differentiated product.

πŸ’‘ [CLAUDE NOTE: inferred from CLAUDE.md “Based on gd-claude-chatbot architecture”, v1.0.0 release in January 2026, and the early changelog referencing AI Power and GD Chatbot integration]

10.2 How the market was assessed

Research approach used:
Domain expertise combined with iterative product development. No formal competitive analysis files exist in the repository. Market assessment appears to have been based on the builder’s domain knowledge of the diving industry and firsthand experience with the fragmentation of diving information resources.

πŸ’‘ [CLAUDE NOTE: inferred from empty marketing/ directory and absence of research files in Section 0.7]

Key market observations that shaped the product:

  1. Generic AI chatbots provide diving advice without safety guardrails, creating risk for medical and depth-related queries
  2. Existing dive site databases (PADI, SSI) are static and don’t combine with real-time conditions or AI guidance
  3. No single tool combines domain-expert AI, curated site data, live marine conditions, and visual exploration

What existing products got wrong (the gap that justified building this):
They treat diving information as either a static database problem (dive site directories) or a generic AI problem (chatbots without domain guardrails). The gap is a product that respects both the depth of domain knowledge required and the safety-critical nature of diving advice.

10.3 The core product bet

We believe that recreational divers will use an AI assistant for trip planning and diving guidance because it combines the conversational accessibility of a chatbot with the domain authority of curated data and safety-first design β€” something neither generic AI nor static dive databases provide.

πŸ’‘ [CLAUDE NOTE: inferred from the product’s architecture choices and user story framing in REQUIREMENTS.md]

10.4 How the idea evolved from first conception to current state

The product started as a chat-only AI assistant (v1.0.0) and rapidly expanded through five phases: (1) safety infrastructure (v1.1.0), (2) external API integration for live data (v1.2.0), (3) architectural resilience and security (v1.3.x), (4) massive data expansion from 455 to 14,642 sites with tool use, Vision, and multiple feature modules (v1.4.0), and (5) data quality enrichment with a visual map and streaming UX overhaul (v1.5.0). The trajectory shows a consistent pattern of deepening domain specificity β€” each version adds more diving-specific capability rather than generic features.


Section 11 β€” Lessons and Next Steps

11.1 Current state assessment

What works well: Comprehensive 6-layer RAG architecture; 14,642 enriched sites with provenance; safety guardrails; 240-test quality suite; dual-layer interactive map; real-time streaming UX.
Current limitations: No mobile native app; fine-tuned models exist but are unused; no formal A/B testing or user analytics beyond admin dashboard; marketing directory is empty.
Estimated completeness: Production-ready with active feature expansion. Core chat, map, and data systems are mature. Operator and trip planner features are functional but could be deepened.

11.2 Visible next steps

  1. Operator enrichment completion β€” extend Tavily + Claude enrichment to achieve higher coverage of operator descriptions, contacts, and specialties
  2. Quarterly data maintenance via the dive-site-data-steward Agent β€” automated staleness detection, re-enrichment, and QA auditing
  3. Embed widget deployment β€” enable dive operators to embed ScubaGPT on their own sites with white-label branding
  4. User analytics and A/B testing β€” instrument chat and map interactions to measure engagement and optimize
  5. Operator enrichment depth β€” extend descriptions, contacts, and specialties coverage for the 6,900+ operator database

11.3 Lessons learned

On the problem definition:
_[Manual input required β€” the builder should reflect on what surprised them about the user problem]_

On the knowledge system:
_[Manual input required β€” what worked and what didn’t in how the KB was structured]_

On the build process:
_[Manual input required β€” what would they do differently in the AI-assisted workflow]_

On market fit:
_[Manual input required β€” what does the current state tell them about the original hypothesis]_


Section 12 β€” Validation Checklist

  • [x] Every [PLACEHOLDER] has been replaced or marked ⚠️ [NOT FOUND]
  • [x] All externally-sourced competitive data is marked with ⚑
  • [x] All inferences are marked with πŸ’‘ [CLAUDE NOTE]
  • [x] Section 0 audit trail lists every file examined
  • [x] Version history in Section 8 is derived from actual changelog and plugin-installs/ directory
  • [x] Knowledge system paths in Section 5 reflect real directory structure
  • [x] AI tools in Section 7 are confirmed from code/config
  • [x] Section 11.3 is left blank for manual input
  • [x] Document header shows today’s date and files examined

Sources Examined

File / Path What it contributed
CLAUDE.md (root) Sections 1, 4, 5, 6, 7, 10 β€” project overview, features, data enrichment decisions, development notes
ARCHITECTURE.md Sections 1, 4, 5, 7 β€” component architecture, data flow, knowledge system, technology stack
REQUIREMENTS.md Sections 2, 4, 5, 11 β€” user stories, acceptance criteria, non-functional requirements
documentation/README.md Sections 1, 6, 8 β€” feature list, version history, project structure
documentation/CLAUDE.md Section 5 β€” plugin architecture details, test suite
scubagpt-chatbot/readme.txt Section 8 β€” full changelog from v1.0.0 to v1.5.0
scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md Section 9 β€” design system, component styling
data-pipelines/README.md Section 5 β€” pipeline steps and structure
git log --format="%h %ad %s" --date=short -- products/scuba-gpt/ Sections 6, 8 β€” build phase dates, commit history
ls -la plugin-installs/ Sections 8, 9 β€” version timeline, artifact inventory


Addendum β€” April 2026 Competitive Landscape and Build Impact

1. Industry Context (Updated April 2026)

The scuba diving app market has undergone rapid transformation driven by two converging forces: the mainstreaming of AI capabilities and the proliferation of mobile-first recreational apps. By April 2026, at least eight AI-native dive platforms have entered the space, fragmenting the market across three segments:

  • AI Advisors: DiveBook (AI recommendations + booking), Scuba Steve AI (AI assistant + marine photo ID + training mode), DiveHelp (AI companion + voice + wearable sync), theDiveGlobe Neptune AI (3D globe + AI recommendations + buddy matching)
  • Marine ID Tools: FINS (5,000+ species), ScubaSnap (community-driven photo ID), OceanScout (gamified collection), ScubAI (underwater photography with Fish ID launching Q3 2026)
  • Technical Planning: DiveKit (offline deco planner, gas blender, MOD/EAD calculators)

The claim “only AI-powered scuba advisor” has been untenable since at least three competitors began offering AI chat. ScubaGPT’s positioning shifts from “we have AI” to “we have the deepest knowledge architecture and the only systematic safety engineering in this space.”

2. Parity Gaps Closed by v1.4.0–v1.5.0

Gap (from April 2026 analysis) Resolution Competitor Parity
Marine life photo identification β€” 4 competitors had it Claude Vision via ITI_Vision_Handler + /chat/image endpoint Now at parity with Scuba Steve AI; FINS still leads on species breadth (5,000+ vs. Vision-based)
Marine weather APIs designed but unimplemented 8 Claude tools live via ITI_Claude_Tools (bypassed AI Engine dependency) Ahead of most competitors on live condition integration
No structured recommendation engine Dive operator recommendation engine with scored matching At parity with DiveBook; different approach (content-based vs. booking-integrated)
No social features / buddy matching Buddy matching with profile + compatibility scoring At parity with theDiveGlobe; lighter implementation
No interactive map / limited to 455 sites Dual-layer Leaflet.js map with 14,642 sites + 6,900+ operators Comparable in site count to ScubaSnap (14,900+); theDiveGlobe has 3D globe UX

3. New Differentiators Created by v1.4.0–v1.5.0

Differentiator What it is Who else has it
Prompt-cached encyclopedia (150K chars from 1,077 PDFs) Distilled domain knowledge as first system block for Anthropic caching No competitor has a comparable prompt-cached knowledge architecture
Six-layer RAG pipeline Encyclopedia + keyword KB + Pinecone + Tavily + tool use + disambiguation Most competitors use single-layer RAG or none
Proactive safety briefing Automatic depth-vs-certification cross-reference flagging risky dive plans No competitor has systematic proactive safety analysis
Multi-language disambiguation Scuba terminology in 5 languages with deterministic resolution No competitor offers localized disambiguation
Embeddable white-label widget Operators can embed ScubaGPT on their sites with branding and topic restrictions No competitor offers B2B embeddable deployment
Dual-layer interactive map 14,642 sites + 6,900+ operators on independently toggleable Leaflet layers with search, modals, ARIA No competitor combines operator and site layers on an embeddable map
Real-time streaming with live markdown Markdown renders progressively as text streams β€” not after completion Unique in this niche; competitors stream text but not formatted markdown
240-test automated suite pytest coverage across map, chat, data attribution, operator schema Demonstrates engineering rigor unusual for vertical AI products
Data pipeline infrastructure 22-script reproducible pipeline from raw PDFs to production knowledge Competitors’ knowledge systems are opaque
Provenance tagging Three-tier source tracking (api / web_sourced / ai_generated) enabling transparency and circular-RAG prevention No competitor publishes data provenance

4. Honest Assessment

Strengths after v1.5.0:

  • Safety detection system plus proactive dive plan analysis is genuinely unique β€” still no competitor has systematic safety rails
  • Six-layer RAG with prompt-cached encyclopedia and 12,487 Pinecone vectors across 4 namespaces provides measurably deeper responses
  • 55+ curated knowledgebase files (12 regions + 15 topics + 17 almanac + analytics + data) with 7 disambiguation files
  • 22-script data pipeline infrastructure means knowledge can be updated reproducibly from raw sources to production vectors
  • WordPress plugin model creates a B2B distribution channel (operator embedding) that no competitor addresses
  • 240-test automated suite provides quality infrastructure no other niche plugin has

Gaps we’re honest about:

  • FINS has 5,000+ species for photo ID; Claude Vision approach is general-purpose and not specialized
  • theDiveGlobe has a 3D globe with gamified engagement; Leaflet.js map is functional but less visually compelling
  • DiveBook has booking integration creating a revenue flywheel we lack
  • WordPress plugin deployment means no native mobile app β€” all consumer-facing competitors are mobile-first
  • DiveHelp has smartwatch integration β€” hardware-adjacent features we cannot match as a WordPress plugin
  • The niche is small β€” total addressable market for AI-powered scuba advisory tools is inherently limited

What we’re watching:

  • DiveHelp as a new entrant with aggressive breadth (voice, wearables, AI photo editing, training)
  • ScubAI’s Fish ID launch in Q3 2026 β€” another competitor entering marine species identification
  • Whether PADI adds AI advising to their unified app β€” if they do, they own the certification-to-advice pipeline
  • DiveBook’s booking monetization β€” could create winner-take-most dynamics
  • Whether WordPress plugin is the right form factor, or a standalone PWA would reach more divers

5. Portfolio Context

ScubaGPT demonstrates ITI’s ability to build AI products for safety-critical niche verticals where generic AI tools are insufficient. The v1.4.0–v1.5.0 builds demonstrate the full product development lifecycle: competitive analysis β†’ roadmap β†’ requirements β†’ implementation (20 features across 4 tiers) β†’ data pipeline engineering (22 scripts) β†’ multi-source data aggregation (14,642 sites) β†’ knowledge architecture (almanac + analytics + 4-namespace embeddings) β†’ UX engineering (dual-layer map, streaming markdown) β†’ quality infrastructure (240-test suite) β†’ documentation. The safety detection system, multi-layer RAG architecture, prompt-cached encyclopedia, and streaming optimization represent genuine engineering. The product’s value as consulting portfolio evidence lies in showing that responsible AI product development in a safety-critical domain requires domain-specific guardrails, knowledge architecture, data engineering, and performance tuning that go beyond what the base model provides.

Populated by Claude Code on 2026-04-19 using the AI Project Showcase skill methodology.

Table of Contents