ScubaGPT Showcase
AI Project Showcase: ScubaGPT
Section 0 — Pre-Population Audit
0.1 — Project root reconnaissance
Root structure: 29 items including scubagpt-chatbot/ (plugin), data-pipelines/ (22 Python scripts), .agents/ (Skills/Agents), Scuba GPT Training Data/ (580+ files, 3.7 GB), plugin-installs/ (20 versioned zips), documentation/, marketing/.
Context docs found: CLAUDE.md (root), documentation/CLAUDE.md, ARCHITECTURE.md, REQUIREMENTS.md, documentation/README.md, data-pipelines/README.md, scubagpt-chatbot/readme.txt, 17 markdown files in documentation/, 3 markdown files in scubagpt-chatbot/documentation/.
No changelog file by name — release history maintained in scubagpt-chatbot/readme.txt changelog section.
0.2 — Knowledge system discovery
Knowledge base directories:
-
scubagpt-chatbot/knowledgebase/— runtime KB injected into prompts -
data/—dive-sites.json(14,642 sites),dive-operators.json(6,900+ operators),seasonal-baselines.json, analytics JSON -
almanac/— 17 regional almanac markdown files -
destinations/— 12 regional destination guides -
topics/— 15+ topical reference files plus templates directory -
reference-encyclopedia.md— 150K char prompt-cached encyclopedia distilled from 536 PDFs -
scubagpt-chatbot/disambiguations/— diving terminology disambiguation JSON (EN + multilingual: ES/FR/DE/JA) -
Scuba GPT Training Data/— 580+ source files (CSVs, PDFs, seed lists) — 3.7 GB, excluded from git
Vector store: Pinecone (external) — 12,487 vectors across 4 namespaces (PDF corpus, almanac, KB markdown, site-level)
Prompt files: System prompt configured via ScubaGPT_Admin settings page and assembled dynamically by ScubaGPT_Chat::build_augmented_prompt().
0.3 — Version and evolution history
Git commits touching products/scuba-gpt/: 7 commits from 2026-03-27 to 2026-04-18 (repository is part of the larger ITI monorepo; earlier development history predates the current git structure).
Version timeline from plugin releases (plugin-installs/ directory):
- v1.0.0 — January 2026 (initial release)
- v1.1.0 — January 2026 (safety guardrails, admin UI)
- v1.2.0–v1.2.4 — January–February 2026 (AI Engine integration, external APIs, bug fixes)
- v1.3.0–v1.3.4 — February–March 2026 (crash-proof rewrite, Google Places, security hardening)
- v1.4.0–v1.4.1 — March–April 2026 (14,642 sites, tool use, Vision, trip planner, performance)
- v1.5.0 — April 2026 (data enrichment, dual-layer map, streaming, 240-test suite)
20 versioned zip files in plugin-installs/.
0.4 — Technology and dependency stack
Platform: WordPress plugin (PHP 8.0+, WordPress 6.0+)
AI models: Anthropic Claude (Messages API) — model, vision, tool use
Vector DB: Pinecone — semantic search via OpenAI/Voyage embeddings
Web search: Tavily — real-time web context
Maps: Leaflet.js (CDN) + Leaflet.markercluster
Browser APIs: Web Speech API (voice input), localStorage (sessions), FormData (image upload)
Data pipelines: Python 3 with openpyxl, pymupdf, openai, pinecone, requests, tavily
Testing: pytest
External APIs: Open-Meteo Marine, Stormglass, NOAA CO-OPS, WorldTides, OpenStreetMap Nominatim, RapidAPI (TheDiveAPI, World Dive Centres)
Shared library: ITI Shared Library (Claude API client, Tavily, Pinecone, Base Agent, Chat Handler, Vision Handler, Workflow Adapter)
0.5 — Product artifacts
Plugin zip releases: 20 versioned zips in plugin-installs/ from v1.0.0 through v1.5.0-map-streaming (latest: 4.3 MB)
SVG icons: assets/images/icon-dive-site.svg, assets/images/icon-dive-operator.svg
Data exports: data-pipelines/output/ (QA spreadsheet, SQL import, vector manifests, dive-sites.xlsx, dive-operators.xlsx)
Documentation: 17 markdown files in documentation/, 3 in scubagpt-chatbot/documentation/
0.6 — Core context documents read
-
CLAUDE.md(root) — project overview, directory structure, key features, development notes -
ARCHITECTURE.md— component architecture, data flow, security, technology stack -
REQUIREMENTS.md— user stories v1.0–v1.5.0, non-functional requirements, traceability -
documentation/README.md— project README with feature list, quick start, version history -
documentation/CLAUDE.md— documentation-specific context -
scubagpt-chatbot/readme.txt— WordPress plugin readme with full changelog -
scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md— visual design system -
data-pipelines/README.md— pipeline steps and structure
0.7 — Market and competitive research files
No dedicated competitive analysis or market research files found. The marketing/ directory exists but is empty.
Section 1 — Product Overview
1.1 Product name and tagline
Name: ScubaGPT
Tagline: AI-powered chatbot and interactive map for recreational scuba divers, delivering expert guidance on diving techniques, safety, equipment, and 14,642 destinations worldwide.
Current status: Live
First commit / project start: January 2026 (v1.0.0 initial release per changelog; earliest git commit touching this project: 2026-03-27)
1.2 What it is
ScubaGPT is a WordPress plugin that provides an AI chatbot and dual-layer interactive map for recreational scuba divers. It combines Claude AI with a 6-layer RAG knowledge system (prompt-cached encyclopedia, keyword-gated markdown KB, Pinecone vector search, Tavily web search, live marine API tools, and diving terminology disambiguation) to deliver expert-level guidance across 60+ countries. The interactive Leaflet.js map visualises 14,642 enriched dive sites and 6,900+ dive operators with searchable, clustered markers and detail modals.
1.3 What makes it meaningfully different
The founding insight is that recreational divers need a safety-conscious, domain-expert AI assistant rather than a generic chatbot. Existing AI chatbots lack the domain-specific guardrails required for diving advice (medical fitness referrals, gas-planning refusals, depth-vs-certification cross-checks) and don’t have access to curated dive site data with provenance tracking. ScubaGPT’s 6-layer RAG architecture and safety pipeline were purpose-built for this domain, and its data enrichment pipeline (Tavily web search + Claude fallback with provenance tagging) means answers are grounded in verifiable, source-attributed content rather than unchecked AI generation.
💡 [CLAUDE NOTE: inferred from CLAUDE.md safety emphasis, REQUIREMENTS.md safety user stories, and the explicit provenance/transparency architecture]
1.4 Platform and deployment context
Platform: WordPress plugin
Deployment: Self-hosted on WordPress (wp-content/plugins)
Primary interface: Chat widget + interactive map (shortcodes: [scubagpt_chat], [scubagpt_map])
Section 2 — User Needs and Problem Statement
2.1 Target user
Primary user: Recreational scuba divers planning trips, researching destinations, and seeking safety-conscious diving guidance. Range from beginners (Open Water certification) to experienced divers. Non-technical — they interact through a chat interface and visual map.
Secondary users: Dive operators (embeddable white-label widget), WordPress site administrators (admin dashboard and settings)
User environment: Embedded on a WordPress diving website (scubagpt.com), accessed via desktop or mobile browsers
2.2 The problem being solved
When recreational divers research dive destinations, conditions, and safety information online, they want to get accurate, safety-conscious answers from a domain expert, so they can plan trips with realistic expectations and avoid risks that exceed their certification level.
💡 [CLAUDE NOTE: inferred from REQUIREMENTS.md user stories US-CORE-01 through US-CORE-03 and the safety guardrails architecture]
2.3 Unmet needs this addresses
| Need | How the product addresses it | Source of evidence |
|---|---|---|
| Safety-critical advice with guardrails | Medical fitness referrals, gas-planning refusals, depth-vs-certification cross-checks via ScubaGPT_Safety
|
REQUIREMENTS.md US-CORE-02, US-CORE-03; ARCHITECTURE.md Safety Layer |
| Current marine conditions for dive planning | Claude tool use with 8 tools calling live APIs (Open-Meteo, Stormglass, NOAA, WorldTides) | REQUIREMENTS.md US-1.4-T1-01; ARCHITECTURE.md Feature Modules |
| Visual exploration of global dive sites | Dual-layer Leaflet map with 14,642 sites and 6,900+ operators, search, and detail modals | REQUIREMENTS.md US-1.5-01, US-1.5-02 |
| Marine life identification from photos | Claude Vision API integration via /chat/image endpoint |
REQUIREMENTS.md US-1.4-T1-02 |
| Structured trip planning dialogue | Multi-step trip planner with state machine collecting destination, dates, preferences, certification | REQUIREMENTS.md US-1.4-T2-02 |
| Trustworthy, source-attributed information | Provenance tracking (description_source: api / web_sourced / ai_generated) and (AI Generated) transparency labels |
CLAUDE.md Important Context; test_data_attribution.py |
2.4 What users were doing before this existed
Recreational divers relied on fragmented sources: PADI dive site databases (limited detail), diving forums (unvetted advice), Google searches across dozens of dive sites, and manual cross-referencing of weather/tide/condition data from separate marine weather services. No single tool combined domain-expert AI, curated dive site data, live conditions, and safety guardrails.
💡 [CLAUDE NOTE: inferred from the product’s multi-source RAG architecture and the explicit integration of external marine APIs — these design choices imply the problem was information fragmentation]
Section 3 — Market Context and Competitive Landscape
3.1 Market category
Primary category: AI-powered vertical-market chatbots / domain-specific AI assistants
Market maturity: Emerging (AI chatbots for niche domains are proliferating post-2024, but few have deep domain knowledge systems)
Key dynamics: Rapid commoditization of generic AI chat; differentiation shifting to domain data, safety guardrails, and retrieval quality. Dive industry itself is stable with ~6M active certified divers globally. ⚡
💡 [CLAUDE KNOWLEDGE — verify before publishing: diver count is approximate from PADI certification statistics]
3.2 Competitive landscape
| Product / Company | Approach | Strengths | Key gap ScubaGPT addresses | Source |
|---|---|---|---|---|
| ⚡ DiveBook (divebook.app) | AI dive recommendations + digital log + trip booking + community | Integrated booking monetization; AI personalization by experience level | No systematic safety rails; no prompt-cached knowledge architecture; shallower RAG | April 2026 web search |
| ⚡ Scuba Steve AI (scubasteve.rocks) | AI dive assistant + marine photo ID + dive planning checklists + SIMI training mode | Closest functional match: AI chat, marine ID, planning checklists; mobile-first | No medical/gas-planning safety detection; no multi-source RAG; no knowledge encyclopedia | April 2026 web search |
| ⚡ DiveHelp (divehelp.com) | AI-powered companion + voice assistant + real-time conditions + training | Voice control; smartwatch sync; AI photo editing; dive computer integration | New entrant; breadth-first approach; no curated KB depth; no safety-critical guardrails | April 2026 web search |
| ⚡ theDiveGlobe / Neptune AI | 3D globe dive site explorer + AI recommendations + buddy matching + dive passport | Strong UX (3D globe); gamification (passport/badges); community-driven data | AI advising lacks depth; no safety system; no tool use for live conditions | April 2026 web search |
| ⚡ DiveKit (divekit.app) | Technical dive planning tools (deco planner, gas blender, MOD/EAD) | Offline-first; high-contrast dive-condition UI; serious technical planning | No AI; no conversational interface; technical divers only; no destination knowledge | April 2026 web search |
| ⚡ FINS (getfins.app) | AI marine species ID (5,000+ species) + dive log + destination planning | Largest species database; strong photo ID; gamified sighting tracking | No conversational AI; no safety guardrails; species ID only, not an advisor | April 2026 web search |
| ⚡ ScubaSnap (scubasnap.app) | AI fish recognition + dive log + community species database | Simple photo ID; community contributions; 14,900+ dive sites listed | Small user base (~140); limited species coverage (~108); no AI chat or safety | April 2026 web search |
| ⚡ OceanScout (oceanscout.app) | Gamified marine species collection (Pokémon-style) + offline AI ID | Offline capability; gamification; 100+ species | Gamified niche; not a planning or advisory tool | April 2026 web search |
| ⚡ ScubAI (scub.ai) | AI underwater photography + color correction; Fish ID coming Q3 2026 | Best-in-class underwater photo editing; depth-aware color science | Photography-focused; Fish ID not yet shipped; no advisory or planning | April 2026 web search |
| ⚡ PADI App | Unified certification + logbook + dive prep + shop locator | Certification authority; massive user base; official training pipeline | No AI advising (as of April 2026); static content; no real-time conditions | April 2026 web search |
| ⚡ ScubaBoard | Forum community | 20+ years of diver knowledge; peer advice | No AI; hard to search; variable quality; declining engagement | General knowledge |
| ⚡ DAN (Divers Alert Network) | Safety resources + insurance + medical hotline | Authoritative safety information; medical expertise | Static content; no interactive advising; no trip planning | General knowledge |
3.3 Market positioning
ScubaGPT positions as a domain-expert AI assistant purpose-built for diving safety and trip planning, differentiated from generic chatbots by its 6-layer RAG architecture, safety guardrails, and curated data with provenance. It sits between the broad but shallow coverage of general AI and the deep but static content of traditional dive databases.
💡 [CLAUDE NOTE: inferred from the product’s architecture and feature set relative to known alternatives]
3.4 Defensibility assessment
ScubaGPT’s defensibility rests on three layers: (1) a curated, enriched dive site database of 14,642 sites with provenance tracking and 100% description coverage — built through a 22-script data pipeline ingesting from 4 external sources with Tavily web enrichment; (2) domain-specific safety guardrails that require diving expertise to configure correctly (medical referrals, gas-planning refusals, certification-depth cross-checks); and (3) a 12,487-vector Pinecone index spanning 4 namespaces that powers precise retrieval.
Section 4 — Requirements Framing
4.1 How requirements were approached
Requirements were formalized in a structured REQUIREMENTS.md document using user story format (As a / I want / So that) with acceptance criteria tied to specific PHP classes and JavaScript files. Requirements are organized in tiered delivery groups (Critical, High Value, Strategic, Exploratory) across version milestones (v1.0–v1.5.0). Non-functional requirements cover security, performance, accessibility, and safety.
4.2 Core requirements (what it must do)
- Deliver safety-conscious diving advice with medical fitness referrals, gas-planning refusals, and depth-vs-certification cross-checks (US-CORE-02, US-CORE-03)
- Retrieve contextual knowledge from 6 layers: encyclopedia, keyword KB, Pinecone vectors, Tavily web, live tools, and disambiguation (US-CORE-01, ARCHITECTURE.md)
- Call live marine condition APIs (waves, tides, weather, suitability) via Claude tool use (US-1.4-T1-01)
- Render an interactive dual-layer map of 14,642 dive sites and 6,900+ operators with search and detail modals (US-1.5-01, US-1.5-02)
- Stream responses in real time with live markdown rendering (US-1.5-04, US-1.5-05)
4.3 Constraints and non-goals
Hard constraints:
- All AI-generated descriptions must end with
(AI Generated)for transparency (CLAUDE.md) - Safety is paramount — guardrails for medical, gas-planning, and depth-vs-certification are non-negotiable (REQUIREMENTS.md Safety section)
- Plugin must never crash WordPress — 5-layer safety guardrail system (readme.txt v1.3.0 notes)
Explicit non-goals:
- Not a dive computer or decompression calculator — gas-planning requests are explicitly refused (US-CORE-02)
- Not a medical clearance tool — medical fitness queries are redirected to dive physicians and DAN (US-CORE-02)
- Training data excluded from git due to 3.7 GB size (CLAUDE.md)
4.4 Key design decisions and their rationale
| Decision | Alternatives considered | Rationale | Evidence source |
|---|---|---|---|
| 6-layer RAG over single-source retrieval | Pure Pinecone RAG, pure KB injection, fine-tuning | Each layer handles different knowledge needs: encyclopedia for breadth, keyword KB for depth, Pinecone for semantic, Tavily for recency, tools for live data, disambiguation for terminology | ARCHITECTURE.md Knowledge System |
| Tavily web search + Claude fallback for descriptions instead of proximity-based backfill | Proximity-based depth/type backfill from nearby sites | Proximity backfill was tested and removed as too localized; web search produces higher-quality, verifiable descriptions | CLAUDE.md Important Context |
| Data loading cascade (DB → JSON → CSV) for map | Direct JSON only, DB only | Graceful fallback ensures map functions in any deployment state; DB allows WordPress-native queries when populated | REQUIREMENTS.md US-1.5-01; class-scubagpt-map.php |
| Provenance tagging on all descriptions | No provenance tracking, simple AI/human labels | Three-tier tracking (api / web_sourced / ai_generated) enables transparency auditing and prevents circular RAG (AI descriptions excluded from embeddings) | CLAUDE.md, test_data_attribution.py |
Section 5 — Knowledge System Architecture
5.1 Knowledge system overview
KB type: Multi-layer RAG with static files, vector store, web search, live APIs, and dynamic prompt assembly
Location in repo: scubagpt-chatbot/knowledgebase/ (runtime), data-pipelines/ (build), Scuba GPT Training Data/ (sources)
Estimated size: ~200 files in runtime KB; 12,487 Pinecone vectors; 150K char encyclopedia; 14,642 site records; 6,900+ operator records
5.2 Knowledge system structure
scubagpt-chatbot/knowledgebase/
├── reference-encyclopedia.md # 150K char prompt-cached encyclopedia (536 PDFs distilled)
├── data/
│ ├── dive-sites.json # 14,642 enriched sites with provenance
│ ├── dive-operators.json # 6,900+ operators with GPS, certification parsing
│ ├── seasonal-baselines.json # NOAA monthly temperature baselines
│ ├── country-analytics.json # Derivative analytics
│ ├── region-analytics.json
│ └── species-analytics.json
├── almanac/ # 17 regional almanac .md files (~135K words total)
│ ├── caribbean.md
│ ├── indo-pacific.md
│ ├── ... (15 more regions)
├── destinations/ # 12 regional destination guides
│ ├── caribbean.md
│ ├── southeast-asia.md
│ ├── ... (10 more regions)
├── topics/ # 15+ topical reference files
│ ├── equipment-guide.md
│ ├── safety-medicine.md
│ ├── marine-life.md
│ ├── seasonal-dive-planner.md
│ ├── ... (11+ more)
│ └── templates/ # Content generation templates
└── disambiguations/ # (sibling directory)
├── scuba-diving-terms.json # English terminology
├── scuba-diving-terms-es.json # Spanish
├── scuba-diving-terms-fr.json # French
├── scuba-diving-terms-de.json # German
└── scuba-diving-terms-ja.json # Japanese
5.3 Knowledge categories
| Category | Files / format | Purpose | Update frequency |
|---|---|---|---|
| Reference encyclopedia | 1 markdown file (150K chars) | Prompt-cached comprehensive diving reference | Regenerated via pipeline script 02/13 |
| Regional almanacs | 17 markdown files | Seasonal conditions, marine life, site highlights per region | Regenerated via pipeline script 11 |
| Destination guides | 12 markdown files | Detailed regional diving destination information | Manual curation |
| Topical references | 15+ markdown files | Equipment, safety, marine life, conservation, etc. | Manual curation |
| Dive site data | JSON (14,642 records) | Georeferenced sites with descriptions, types, marine life, provenance | Pipeline scripts 04–09, 15–17 |
| Operator data | JSON (6,900+ records) | GPS-located operators with certification, tier, nearby sites | Pipeline scripts 20–22 |
| Seasonal baselines | JSON | NOAA monthly temperature baselines per destination | Pipeline script 06 |
| Analytics | 3 JSON files | Country, region, species derivative analytics | Pipeline script 10 |
| Disambiguation terms | 5 JSON files | Diving terminology for system prompt (EN + 4 languages) | Manual curation |
| Vector embeddings | Pinecone index (12,487 vectors, 4 namespaces) | Semantic retrieval for chat context | Pipeline scripts 05, 17, 22 |
5.4 How the knowledge system was built
Step 1 — Source identification:
580+ source files assembled: 536 PDFs (US Diving Manual, PADI materials, marine biology texts), 200+ diving website seed lists, dive site CSVs with coordinates, and 4 external API sources (PADI/OpenDiveMap, Dive Vibe Community, TheDiveAPI, World Dive Centres API).
Step 2 — Curation and cleaning:
Pipeline script 01 extracts text from 1,077 PDFs, strips boilerplate, and classifies by topic. Script 04 normalises CSV sites into structured JSON. Scripts 07–08 ingest external API data with GPS grid traversal and rate limiting. Script 09 runs multi-phase enrichment (raw field recovery, keyword extraction, region standardisation, Tavily web search + Claude fallback for descriptions).
Step 3 — Structuring and formatting:
Encyclopedia (script 02/13) synthesises extracted text into a 150K char prompt-cached reference. Almanac files (script 11) are generated per region. Topic KB files (script 03) are created from classified PDFs. Dive site schema extended with description_source, marine_life_source, visibility_m, rating, entry_type, ocean.
Step 4 — Embedding / indexing:
Pipeline script 05 chunks extracted text and upserts to Pinecone with topic/region/cert metadata. Script 17 embeds individual dive sites. Script 22 embeds operators with operator- prefix. AI-generated descriptions are excluded from embeddings to prevent circular RAG. Total: 12,487 vectors across 4 namespaces.
Step 5 — Retrieval configuration:ScubaGPT_Knowledgebase loads at most one destination and one topic file per query with a 60K char budget and transient caching. Pinecone queries use top-k=5 and similarity threshold 0.7. Tavily adds real-time web context. Claude tool use provides live marine conditions.
Step 6 — Testing and validation:
240 pytest tests across 4 files validate data schema, GeoJSON structure, provenance tracking, and frontend behaviour. QA spreadsheet generated via data pipeline for enrichment auditing.
5.5 System prompt and agent configuration
System prompt approach: Dynamic assembly via build_augmented_prompt() — base system prompt (admin-configurable) → disambiguation terms → language detection → keyword-gated KB injection → safety analysis → seasonal context → dive plan analysis → retrieved context from RAG layers.
Key behavioural guardrails: Medical fitness queries redirected to dive physicians/DAN; gas-planning/deco calculations refused; depth-vs-certification cross-checked; species ID includes confidence framing and conservation compliance.
Persona / tone configuration: MSDT-level diving advisor — knowledgeable, adventurous, safety-conscious, approachable, encouraging. Sound like a knowledgeable dive buddy, never robotic or corporate.
Tool use / function calling: 8 Claude tools — get_dive_conditions, get_tide_info, get_marine_weather, check_dive_suitability, get_equipment_recommendation, search_dive_sites_natural, and related handlers.
Section 6 — Build Methodology
6.1 Development approach
AI-assisted iterative development using Cursor IDE with Claude Code. The project follows a CLAUDE.md-driven specification approach where context documents anchor each development session. Formal requirements exist in REQUIREMENTS.md with tiered user stories and acceptance criteria. Data pipelines are built as numbered, sequential Python scripts.
6.2 Build phases
| Phase | Approximate timeframe | What was built | Key milestones |
|---|---|---|---|
| Foundation | January 2026 | Core chat plugin: Claude AI integration, Pinecone, Tavily, conversation management, rate limiting, admin UI | v1.0.0, v1.1.0 (safety guardrails, admin dashboard) |
| Integration | January–February 2026 | AI Engine integration, external marine APIs (Open-Meteo, Stormglass, NOAA, WorldTides), crash-proof rewrite | v1.2.0–v1.2.4, v1.3.0 (crash-proof rewrite) |
| Security & APIs | February–March 2026 | Security hardening (10 fixes), Google Places API, RapidAPI dive sites | v1.3.1–v1.3.4 (CSRF, XSS, rate limiting, GDPR) |
| Data Expansion | March–April 2026 | 14,642 sites from 4 sources, 22 data pipeline scripts, tool use, Vision, trip planner, species log, operators, multilingual | v1.4.0, v1.4.1 (parallel RAG, performance) |
| Enrichment & UX | April 2026 | Data enrichment (100% descriptions), dual-layer map, real-time streaming, 240-test suite, operator pipeline | v1.5.0 (current) |
6.3 Claude Code / AI-assisted development patterns
The codebase shows extensive AI-assisted development evidenced by: (1) structured CLAUDE.md files at multiple directory levels providing context for AI assistants; (2) formal ARCHITECTURE.md and REQUIREMENTS.md that serve as both human documentation and AI session context; (3) numbered, sequential data pipeline scripts (01–22) that follow a clear build-on-previous pattern; (4) a .agents/ directory with 5 product-level Skills and 1 Agent for quarterly data maintenance; and (5) a comprehensive pytest test suite that validates source code patterns (PHP, JS, CSS) rather than executing them — a pattern consistent with AI-assisted test generation.
6.4 Key technical challenges and how they were resolved
| Challenge | How resolved | Evidence |
|---|---|---|
| Plugin crashing WordPress on errors | 5-layer safety guardrail system: pre-install validation, safe activation, graceful degradation, automatic recovery, emergency shutdown | readme.txt v1.3.0 changelog; REQUIREMENTS.md safety section |
| Data enrichment at scale (14,642 sites) | 22-script pipeline with Tavily web search (90%) + Claude Haiku fallback (10%) + provenance tracking | CLAUDE.md data enrichment notes; data-pipelines/ |
| Preventing circular RAG from AI-generated content | AI-generated descriptions excluded from Pinecone embeddings; provenance tagging enables filtering | CLAUDE.md Important Context |
| Map performance with 20,000+ markers | Leaflet.markercluster for both layers; data loaded via REST endpoints with chunked loading patterns | map.js, test_map_shortcode.py TestMapJsFetchPatterns |
| Plugin packaging missing data files | Architectural flaw identified and fixed: zip rebuilt to include dive-sites.json and dive-operators.json (4.3 MB) | plugin-installs/ directory (v1.5.0-map-streaming.zip) |
Section 7 — AI Tools and Techniques
7.1 AI models and APIs used
| Model / API | Provider | Role in product | Integration method |
|---|---|---|---|
| Claude (Messages API) | Anthropic | Primary chat AI, tool use execution, description generation | ITI Shared Library ITI_Claude_API
|
| Claude Vision | Anthropic | Marine life photo identification | ITI Shared Library ITI_Vision_Handler
|
| Claude Haiku | Anthropic | Fallback description generation for dive sites/operators | Direct API via data pipeline scripts |
| OpenAI text-embedding-3-small | OpenAI | Query embeddings for Pinecone retrieval |
ScubaGPT_API configuration |
| Tavily Search | Tavily | Real-time web context for chat; description sourcing for enrichment | ITI Shared Library ITI_Tavily_API; pipeline scripts |
| Pinecone | Pinecone | Vector similarity search (12,487 vectors, 4 namespaces) | ITI Shared Library ITI_Pinecone_API
|
7.2 AI orchestration and tooling
| Tool | Category | Purpose |
|---|---|---|
| ITI Shared Library | Orchestration | Reusable WordPress components for Claude, Tavily, Pinecone, agents |
Claude Tool Use (ITI_Claude_Tools) |
Function calling | Multi-turn tool execution loop with 8 registered tools |
| ITI Workflow Adapter | Orchestration | Optional n8n routing for chat messages |
| Pinecone | Vector DB | 4-namespace index for semantic retrieval |
| Leaflet.js + markercluster | Visualization | Interactive map rendering with clustering |
7.3 Prompting techniques used
- [x] Chain-of-thought reasoning (implicit in multi-tool execution loops)
- [ ] Few-shot examples in prompts
- [x] Structured / JSON output prompting (tool return schemas)
- [x] Tool use / function calling (8 tools via
ITI_Claude_Tools) - [x] RAG context injection (6-layer: encyclopedia, KB, Pinecone, Tavily, tools, disambiguation)
- [x] System prompt persona/role setting (MSDT-level advisor persona)
- [x] Multi-turn conversation management (session-based history)
- [x] Output guardrails / content filtering (medical, gas-planning, depth safety)
- [x] Fallback / error recovery prompting (graceful degradation when tools unavailable)
- [x] Prompt caching (anthropic-beta header for large system prompts)
- [x] Dynamic prompt assembly (budget-controlled injection of KB, safety, seasonal context)
7.4 AI development tools used to build this
| Tool | How used in build |
|---|---|
| Cursor IDE with Claude | Primary development environment — CLAUDE.md-driven sessions, code generation, test generation, documentation |
| Claude Code | Context-aware coding, refactoring, test suite creation |
| ITI Agent System | Orchestrator + specialist agents for architecture, testing, documentation |
| Product-level Skills (.agents/skills/) | 5 Skills for ingestion, enrichment, scraping, QA, embeddings — used for data pipeline development |
Section 8 — Version History and Evolution
8.1 Version timeline
| Version | Date | Summary of changes | Significance |
|---|---|---|---|
| v1.0.0 | Jan 2026 | Initial release: Claude AI chat, Pinecone RAG, Tavily web search, conversation history, rate limiting, admin settings | Foundation product launch |
| v1.1.0 | Jan 2026 | 5-layer safety guardrails, enhanced system prompt (9 rules), admin UI/statistics dashboard, news integration, Google Maps links | Safety-first architecture established |
| v1.2.0 | Jan 2026 | AI Engine integration, external marine APIs (Open-Meteo, Stormglass, NOAA, WorldTides), function calling for live conditions | Real-time data capability added |
| v1.2.1–v1.2.4 | Jan–Feb 2026 | Bug fixes: streaming, URL sanitization, AI Engine compatibility, duplicate loading protection | Stability hardening |
| v1.3.0 | Feb 2026 | Complete rewrite for crash-proof operation: all code in Throwable catch blocks, recovery page, one-click restart | Architectural resilience milestone |
| v1.3.1–v1.3.3 | Feb 2026 | Streaming fix, RapidAPI dive sites, Google Places API | Feature expansion |
| v1.3.4 | Mar 2026 | Security hardening: 10 fixes (CSRF, XSS, rate limiting bypass, GDPR, daily token budget, API key rotation) | Security milestone |
| v1.4.0 | Mar 2026 | 14,642 dive sites from 4 sources, 14 data pipelines, Claude tool use (8 tools), Vision, trip planner, species log, operators, multilingual, dive log parsing, embeddable widget, buddy matching | Major feature expansion |
| v1.4.1 | Apr 2026 | Parallel RAG lookups (curl_multi), prompt caching, streaming performance (requestAnimationFrame batching) | Performance optimization |
| v1.5.0 | Apr 2026 | Data enrichment (100% descriptions, provenance), dual-layer map (sites + operators), real-time streaming with markdown, image upload, voice input, offline detection, session management, 240-test suite | Current release — UX and data quality milestone |
8.2 Notable pivots or scope changes
- AI Engine integration disabled by default (v1.3.0) — after compatibility issues across AI Engine plugin versions, the integration was made opt-in rather than automatic. This reflected a pivot from tight third-party coupling to a self-contained architecture.
- Proximity-based data backfill removed — during the v1.5.0 enrichment cycle, depth/type inference from nearby dive sites was tested and removed as too localized. Replaced by Tavily web search + Claude fallback for higher-quality, verifiable descriptions.
- Data packaging architecture change — the v1.5.0 plugin zip initially excluded data files (664 KB). After discovering the map would show no markers without them, the zip was rebuilt to include dive-sites.json and dive-operators.json (4.3 MB).
💡 [CLAUDE NOTE: pivot details from CLAUDE.md “Important Context” and conversation history]
8.3 What has been cut or deferred
- Fine-tuned models (training data and datasets exist in
Fine Tunings/but are not used in current architecture) - Mobile native app integration (listed in early roadmap, not implemented)
- Content recommendation engine (listed in v1.1.0 roadmap)
Section 9 — Product Artifacts
9.1 Design and UX artifacts
| Artifact | Path | Type | What it shows |
|---|---|---|---|
| Dive site marker icon | assets/images/icon-dive-site.svg |
SVG icon | Blue circle with diver wave motif (24×24) |
| Dive operator marker icon | assets/images/icon-dive-operator.svg |
SVG icon | Orange circle with shop motif (24×24) |
| Visual Style Guide | scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md |
Design system | Color palette, typography, components, map components, streaming components |
9.2 Documentation artifacts
| Document | Path | Type | Status |
|---|---|---|---|
| Architecture | ARCHITECTURE.md |
System architecture | Complete (v1.5.0) |
| Requirements | REQUIREMENTS.md |
Software requirements | Complete (v1.5.0) |
| README | documentation/README.md |
Project documentation | Complete (v1.5.0) |
| Plugin readme | scubagpt-chatbot/readme.txt |
WordPress plugin readme | Complete (v1.5.0) |
| Safety guardrails | Multiple files in documentation/
|
Safety system docs | Complete |
| UI/UX test plan | documentation/UI-UX-TEST-PLAN.md |
Test plan | Complete |
| This document | SHOWCASE.md |
Project showcase | Draft |
9.3 Data and output artifacts
| Artifact | Path | Description |
|---|---|---|
| Plugin releases (20 versions) | plugin-installs/scubagpt-chatbot-v*.zip |
Versioned WordPress plugin zips from v1.0.0 to v1.5.0 |
| Dive sites Excel export | data-pipelines/output/dive-sites.xlsx |
14,642 sites with all fields, styled headers, metadata sheet |
| Dive operators Excel export | data-pipelines/output/dive-operators.xlsx |
6,900+ operators with all fields |
| SQL import | data-pipelines/output/dive-sites-import.sql |
WordPress database import |
| Pinecone vector manifest | data-pipelines/output/pinecone-vectors.json |
Vector upsert manifest |
| QA spreadsheet | data-pipelines/output/ |
Pre/post enrichment comparison |
Section 10 — Product Ideation Story
10.1 Origin of the idea
ScubaGPT originated as a domain-specific vertical application of the GD Claude Chatbot architecture, adapted for the recreational scuba diving market. The project started in January 2026, building on an existing WordPress chatbot framework and applying it to a domain where safety-critical AI guidance, curated geographic data, and real-time marine conditions create a differentiated product.
💡 [CLAUDE NOTE: inferred from CLAUDE.md “Based on gd-claude-chatbot architecture”, v1.0.0 release in January 2026, and the early changelog referencing AI Power and GD Chatbot integration]
10.2 How the market was assessed
Research approach used:
Domain expertise combined with iterative product development. No formal competitive analysis files exist in the repository. Market assessment appears to have been based on the builder’s domain knowledge of the diving industry and firsthand experience with the fragmentation of diving information resources.
💡 [CLAUDE NOTE: inferred from empty marketing/ directory and absence of research files in Section 0.7]
Key market observations that shaped the product:
- Generic AI chatbots provide diving advice without safety guardrails, creating risk for medical and depth-related queries
- Existing dive site databases (PADI, SSI) are static and don’t combine with real-time conditions or AI guidance
- No single tool combines domain-expert AI, curated site data, live marine conditions, and visual exploration
What existing products got wrong (the gap that justified building this):
They treat diving information as either a static database problem (dive site directories) or a generic AI problem (chatbots without domain guardrails). The gap is a product that respects both the depth of domain knowledge required and the safety-critical nature of diving advice.
10.3 The core product bet
We believe that recreational divers will use an AI assistant for trip planning and diving guidance because it combines the conversational accessibility of a chatbot with the domain authority of curated data and safety-first design — something neither generic AI nor static dive databases provide.
💡 [CLAUDE NOTE: inferred from the product’s architecture choices and user story framing in REQUIREMENTS.md]
10.4 How the idea evolved from first conception to current state
The product started as a chat-only AI assistant (v1.0.0) and rapidly expanded through five phases: (1) safety infrastructure (v1.1.0), (2) external API integration for live data (v1.2.0), (3) architectural resilience and security (v1.3.x), (4) massive data expansion from 455 to 14,642 sites with tool use, Vision, and multiple feature modules (v1.4.0), and (5) data quality enrichment with a visual map and streaming UX overhaul (v1.5.0). The trajectory shows a consistent pattern of deepening domain specificity — each version adds more diving-specific capability rather than generic features.
Section 11 — Lessons and Next Steps
11.1 Current state assessment
What works well: Comprehensive 6-layer RAG architecture; 14,642 enriched sites with provenance; safety guardrails; 240-test quality suite; dual-layer interactive map; real-time streaming UX.
Current limitations: No mobile native app; fine-tuned models exist but are unused; no formal A/B testing or user analytics beyond admin dashboard; marketing directory is empty.
Estimated completeness: Production-ready with active feature expansion. Core chat, map, and data systems are mature. Operator and trip planner features are functional but could be deepened.
11.2 Visible next steps
- Operator enrichment completion — extend Tavily + Claude enrichment to achieve higher coverage of operator descriptions, contacts, and specialties
- Quarterly data maintenance via the
dive-site-data-stewardAgent — automated staleness detection, re-enrichment, and QA auditing - Embed widget deployment — enable dive operators to embed ScubaGPT on their own sites with white-label branding
- User analytics and A/B testing — instrument chat and map interactions to measure engagement and optimize
- Operator enrichment depth — extend descriptions, contacts, and specialties coverage for the 6,900+ operator database
11.3 Lessons learned
On the problem definition:
_[Manual input required — the builder should reflect on what surprised them about the user problem]_
On the knowledge system:
_[Manual input required — what worked and what didn’t in how the KB was structured]_
On the build process:
_[Manual input required — what would they do differently in the AI-assisted workflow]_
On market fit:
_[Manual input required — what does the current state tell them about the original hypothesis]_
Section 12 — Validation Checklist
- [x] Every
[PLACEHOLDER]has been replaced or marked⚠️ [NOT FOUND] - [x] All externally-sourced competitive data is marked with
⚡ - [x] All inferences are marked with
💡 [CLAUDE NOTE] - [x] Section 0 audit trail lists every file examined
- [x] Version history in Section 8 is derived from actual changelog and plugin-installs/ directory
- [x] Knowledge system paths in Section 5 reflect real directory structure
- [x] AI tools in Section 7 are confirmed from code/config
- [x] Section 11.3 is left blank for manual input
- [x] Document header shows today’s date and files examined
Sources Examined
| File / Path | What it contributed |
|---|---|
CLAUDE.md (root) |
Sections 1, 4, 5, 6, 7, 10 — project overview, features, data enrichment decisions, development notes |
ARCHITECTURE.md |
Sections 1, 4, 5, 7 — component architecture, data flow, knowledge system, technology stack |
REQUIREMENTS.md |
Sections 2, 4, 5, 11 — user stories, acceptance criteria, non-functional requirements |
documentation/README.md |
Sections 1, 6, 8 — feature list, version history, project structure |
documentation/CLAUDE.md |
Section 5 — plugin architecture details, test suite |
scubagpt-chatbot/readme.txt |
Section 8 — full changelog from v1.0.0 to v1.5.0 |
scubagpt-chatbot/documentation/VISUAL-STYLE-GUIDE.md |
Section 9 — design system, component styling |
data-pipelines/README.md |
Section 5 — pipeline steps and structure |
git log --format="%h %ad %s" --date=short -- products/scuba-gpt/ |
Sections 6, 8 — build phase dates, commit history |
ls -la plugin-installs/ |
Sections 8, 9 — version timeline, artifact inventory |
Addendum — April 2026 Competitive Landscape and Build Impact
1. Industry Context (Updated April 2026)
The scuba diving app market has undergone rapid transformation driven by two converging forces: the mainstreaming of AI capabilities and the proliferation of mobile-first recreational apps. By April 2026, at least eight AI-native dive platforms have entered the space, fragmenting the market across three segments:
- AI Advisors: DiveBook (AI recommendations + booking), Scuba Steve AI (AI assistant + marine photo ID + training mode), DiveHelp (AI companion + voice + wearable sync), theDiveGlobe Neptune AI (3D globe + AI recommendations + buddy matching)
- Marine ID Tools: FINS (5,000+ species), ScubaSnap (community-driven photo ID), OceanScout (gamified collection), ScubAI (underwater photography with Fish ID launching Q3 2026)
- Technical Planning: DiveKit (offline deco planner, gas blender, MOD/EAD calculators)
The claim “only AI-powered scuba advisor” has been untenable since at least three competitors began offering AI chat. ScubaGPT’s positioning shifts from “we have AI” to “we have the deepest knowledge architecture and the only systematic safety engineering in this space.”
2. Parity Gaps Closed by v1.4.0–v1.5.0
| Gap (from April 2026 analysis) | Resolution | Competitor Parity |
|---|---|---|
| Marine life photo identification — 4 competitors had it | Claude Vision via ITI_Vision_Handler + /chat/image endpoint |
Now at parity with Scuba Steve AI; FINS still leads on species breadth (5,000+ vs. Vision-based) |
| Marine weather APIs designed but unimplemented | 8 Claude tools live via ITI_Claude_Tools (bypassed AI Engine dependency) |
Ahead of most competitors on live condition integration |
| No structured recommendation engine | Dive operator recommendation engine with scored matching | At parity with DiveBook; different approach (content-based vs. booking-integrated) |
| No social features / buddy matching | Buddy matching with profile + compatibility scoring | At parity with theDiveGlobe; lighter implementation |
| No interactive map / limited to 455 sites | Dual-layer Leaflet.js map with 14,642 sites + 6,900+ operators | Comparable in site count to ScubaSnap (14,900+); theDiveGlobe has 3D globe UX |
3. New Differentiators Created by v1.4.0–v1.5.0
| Differentiator | What it is | Who else has it |
|---|---|---|
| Prompt-cached encyclopedia (150K chars from 1,077 PDFs) | Distilled domain knowledge as first system block for Anthropic caching | No competitor has a comparable prompt-cached knowledge architecture |
| Six-layer RAG pipeline | Encyclopedia + keyword KB + Pinecone + Tavily + tool use + disambiguation | Most competitors use single-layer RAG or none |
| Proactive safety briefing | Automatic depth-vs-certification cross-reference flagging risky dive plans | No competitor has systematic proactive safety analysis |
| Multi-language disambiguation | Scuba terminology in 5 languages with deterministic resolution | No competitor offers localized disambiguation |
| Embeddable white-label widget | Operators can embed ScubaGPT on their sites with branding and topic restrictions | No competitor offers B2B embeddable deployment |
| Dual-layer interactive map | 14,642 sites + 6,900+ operators on independently toggleable Leaflet layers with search, modals, ARIA | No competitor combines operator and site layers on an embeddable map |
| Real-time streaming with live markdown | Markdown renders progressively as text streams — not after completion | Unique in this niche; competitors stream text but not formatted markdown |
| 240-test automated suite | pytest coverage across map, chat, data attribution, operator schema | Demonstrates engineering rigor unusual for vertical AI products |
| Data pipeline infrastructure | 22-script reproducible pipeline from raw PDFs to production knowledge | Competitors’ knowledge systems are opaque |
| Provenance tagging | Three-tier source tracking (api / web_sourced / ai_generated) enabling transparency and circular-RAG prevention | No competitor publishes data provenance |
4. Honest Assessment
Strengths after v1.5.0:
- Safety detection system plus proactive dive plan analysis is genuinely unique — still no competitor has systematic safety rails
- Six-layer RAG with prompt-cached encyclopedia and 12,487 Pinecone vectors across 4 namespaces provides measurably deeper responses
- 55+ curated knowledgebase files (12 regions + 15 topics + 17 almanac + analytics + data) with 7 disambiguation files
- 22-script data pipeline infrastructure means knowledge can be updated reproducibly from raw sources to production vectors
- WordPress plugin model creates a B2B distribution channel (operator embedding) that no competitor addresses
- 240-test automated suite provides quality infrastructure no other niche plugin has
Gaps we’re honest about:
- FINS has 5,000+ species for photo ID; Claude Vision approach is general-purpose and not specialized
- theDiveGlobe has a 3D globe with gamified engagement; Leaflet.js map is functional but less visually compelling
- DiveBook has booking integration creating a revenue flywheel we lack
- WordPress plugin deployment means no native mobile app — all consumer-facing competitors are mobile-first
- DiveHelp has smartwatch integration — hardware-adjacent features we cannot match as a WordPress plugin
- The niche is small — total addressable market for AI-powered scuba advisory tools is inherently limited
What we’re watching:
- DiveHelp as a new entrant with aggressive breadth (voice, wearables, AI photo editing, training)
- ScubAI’s Fish ID launch in Q3 2026 — another competitor entering marine species identification
- Whether PADI adds AI advising to their unified app — if they do, they own the certification-to-advice pipeline
- DiveBook’s booking monetization — could create winner-take-most dynamics
- Whether WordPress plugin is the right form factor, or a standalone PWA would reach more divers
5. Portfolio Context
ScubaGPT demonstrates ITI’s ability to build AI products for safety-critical niche verticals where generic AI tools are insufficient. The v1.4.0–v1.5.0 builds demonstrate the full product development lifecycle: competitive analysis → roadmap → requirements → implementation (20 features across 4 tiers) → data pipeline engineering (22 scripts) → multi-source data aggregation (14,642 sites) → knowledge architecture (almanac + analytics + 4-namespace embeddings) → UX engineering (dual-layer map, streaming markdown) → quality infrastructure (240-test suite) → documentation. The safety detection system, multi-layer RAG architecture, prompt-cached encyclopedia, and streaming optimization represent genuine engineering. The product’s value as consulting portfolio evidence lies in showing that responsible AI product development in a safety-critical domain requires domain-specific guardrails, knowledge architecture, data engineering, and performance tuning that go beyond what the base model provides.
Populated by Claude Code on 2026-04-19 using the AI Project Showcase skill methodology.
