ScubaGPT — Architecture
ScubaGPT — Architecture
System Overview
ScubaGPT is a WordPress plugin chatbot that uses the Anthropic Claude API with RAG (Retrieval Augmented Generation) from Pinecone and Tavily, optional tool use for live marine and dive-site data, Claude Vision for image-based marine life identification, and a layered safety pipeline for diving-related queries.
The runtime is organized around a singleton main class (ScubaGPT_Chatbot) that wires admin, API, chat, REST, and feature modules. Optional pieces (disambiguations, file-based knowledgebase loader, safety analyzer, tool definitions, external APIs) load after core init so a bad optional file cannot take down the whole site.
Knowledge System
| Class | Role |
|---|---|
ScubaGPT_Knowledgebase |
Keyword-gated loading of markdown under scubagpt-chatbot/knowledgebase/: at most one destination and one topic file per query, with a character budget (KB_BUDGET, 60k chars) and transient caching. Injects … into the system prompt. |
ScubaGPT_Disambiguations |
Diving terminology disambiguation snippets for the system prompt (char-limited slice passed into get_for_system_prompt()). |
Directory layout (plugin-shipped KB):
knowledgebase/destinations/— Twelve regional destination guides (e.g. Caribbean, Indo-Pacific, Red Sea & Maldives, Mediterranean & Europe, Southeast Asia, Florida & Americas, Africa & Indian Ocean, Pacific Islands & Micronesia, East Asia, Central America, Costa Rica & Galápagos, Great Barrier Reef & Australia).knowledgebase/topics/— Nine+ topical guides (e.g. equipment, safety & medicine, marine life, wrecks, liveaboards, conservation, seasonal planner, navigation & technology, dive industry reference), plustemplates/for dive site/destination/operator content patterns.knowledgebase/data/dive-sites.json— 14,642 enriched dive sites with coordinates, types, marine life, descriptions, and provenance tracking. Consumed byScubaGPT_MapREST endpoints and map.js for GeoJSON rendering.knowledgebase/data/dive-operators.json— 6,900+ operators with GPS coordinates, certification body/tier, nearby sites, and contact info. Consumed byScubaGPT_Mapfor operator layer GeoJSON andScubaGPT_Operatorsfor recommendation logic.knowledgebase/data/operators.json— Legacy structured operator data consumed byScubaGPT_Operatorsfor prompt-injection scoring and recommendations.
Note: The repository also contains a larger product-level
knowledgebase/(PDFs, embeddings) outside the plugin directory; the runtime file-based injection described above uses the plugin’sscubagpt-chatbot/knowledgebase/tree.
Safety Layer
| Class | Role |
|---|---|
ScubaGPT_Safety |
analyze_message() — regex-based detection of medical / fitness-to-dive questions and gas planning requests (deco, MOD, NDL, SAC, etc.); injects XML-style instructions and DAN referral text into the system prompt. analyze_dive_plan() — proactive cross-checks for dive-plan narratives (certification depth, conditions). Species identification paths can apply confidence framing where relevant. |
Feature Modules
| Class | Role |
|---|---|
ScubaGPT_Tools |
Defines Claude tools and handlers (marine conditions, equipment lookups, search, etc.) and exposes an ITI_Claude_Tools runner; uses ScubaGPT_External_APIs for live API calls when the tool-use path is active. |
ScubaGPT_External_APIs |
Weather, tides, dive-site discovery, and related integrations (cached, configurable via scubagpt_external_api_settings). Loaded only from init_optional_integrations() when AI engine settings are not disabled. |
ScubaGPT_Map |
[scubagpt_map] shortcode (default 1024×768px); enqueues Leaflet, Leaflet.markercluster, and map.js / map.css; public REST GeoJSON for dive sites (/dive-sites, /dive-sites/{id}) and dive operators (/dive-operators, /dive-operators/{id}). Dual-layer architecture with toggleable site/operator layers, search bar with filter overlay, detail modals, and custom SVG marker icons. Data loading cascade: DB table → JSON file → CSV fallback. |
ScubaGPT_Trip_Planner |
Multi-turn trip planner with transient-backed state (scubagpt_trip_plan_*). |
ScubaGPT_Species_Log |
Species sighting persistence, REST surface, install hook for {prefix}scubagpt_species_log. |
ScubaGPT_Operators |
Operator database and recommendation logic (pairs with knowledgebase/data/operators.json). |
ScubaGPT_Dive_Log |
Parsers for Subsurface / Shearwater-style logs; injects context for the model when user uploads or references logs. |
ScubaGPT_Embed |
White-label embed for operators; can filter scubagpt_system_prompt with operator-specific XML blocks. |
ScubaGPT_Buddy |
Buddy matching with anonymous profiles; {prefix}scubagpt_buddy_profiles. |
ITI Shared Library Components
The host WordPress install is expected to load the ITI Shared Library (see project CLAUDE.md). ScubaGPT directly references these classes when they exist:
| Component | Usage in ScubaGPT |
|---|---|
ITI_Claude_API |
Tool-use and vision paths: configured with model, tokens, temperature, system prompt, API key from plugin settings. |
ITI_Claude_Tools |
Executes the multi-turn tool use loop with ScubaGPT_Tools definitions. |
ITI_Vision_Handler |
/chat/image endpoint: image + prompt analysis for marine life ID. |
ITI_Workflow_Adapter |
Optional n8n path in process_message(); if the workflow returns success, the plugin skips the local Claude pipeline for that request. |
The following are standard ITI shared components documented in CLAUDE.md / shared/CATALOG.md; use them when extending the plugin even if not every symbol is referenced in ScubaGPT PHP today:
| Component | Typical role |
|---|---|
ITI_Map_Embed |
Reusable Leaflet patterns elsewhere in ITI; this plugin implements ScubaGPT_Map with Leaflet directly. |
ITI_Token_Budget_Manager |
Canonical token budgeting for prompts; ScubaGPT enforces limits via KB budget, history truncation, and daily token transients. |
ITI_Logger |
File-based structured logging pattern for production diagnostics. |
ITI_Cache_Manager |
Transient/cache abstractions; ScubaGPT also uses WordPress transients directly (e.g. rate limits, KB cache keys). |
ITI_Settings_Page |
Declarative admin UI pattern; ScubaGPT admin currently uses the WordPress Settings API in ScubaGPT_Admin. |
Frontend
| Asset | Role |
|---|---|
assets/js/chatbot.js |
Chat widget: REST calls to scubagpt/v1, real-time streaming with live markdown rendering, processing indicator, image upload via FormData to /chat/image, voice input (Web Speech API where available), offline/online detection, session management with localStorage, URL sanitization, mobile full-screen layout, error categorization with retry logic. |
assets/css/chatbot.css |
Responsive layout, mobile full-viewport chat with safe area insets, streaming state classes (.scubagpt-processing-indicator, .scubagpt-processing-spinner, .streaming-content), voice input pulse animation, image upload/dragover styles, dark mode via prefers-color-scheme, prefers-reduced-motion support, focus-visible accessibility, iOS zoom prevention, print styles. |
assets/js/map.js |
Leaflet dual-layer map: loads GeoJSON from /dive-sites and /dive-operators REST endpoints via Promise.all, Leaflet.markercluster for both layers, custom SVG icons (icon-dive-site.svg, icon-dive-operator.svg), toggleable layer control, search bar with real-time filter overlay and clear button, detail modals with copy/close functionality, ARIA attributes and keyboard navigation (Escape to close, focus trapping), badge counters showing visible/total markers, fitBounds on load. |
assets/css/map.css |
Map cluster styles (.scubagpt-cluster-site, .scubagpt-cluster-operator), search control layout (.scubagpt-map-search), badge control, modal overlay with animation (.scubagpt-map-modal-overlay, .scubagpt-map-modal), modal table/description/nearby-list formatting, Leaflet layer control overrides. |
assets/images/icon-dive-site.svg |
Blue circle SVG (24×24) with diver wave motif, used for dive site map markers. |
assets/images/icon-dive-operator.svg |
Orange circle SVG (24×24) with shop/building motif, used for dive operator map markers. |
Data Flow
End-to-end path for a normal chat message:
- Browser sends
POST /wp-json/scubagpt/v1/chat(or/chat/stream) withX-WP-Nonce: wp_restand payload (message, optionalsession_id, optionalconversation_history). ScubaGPT_REST—check_chat_permission()verifies nonce and anonymous/login policy →ScubaGPT_Chat::process_message()(or streaming variant).ScubaGPT_Chat— rate limit → daily token budget → ifITI_Workflow_Adapteris available and returns success, return n8n output (non-streaming path only).- RAG retrieval — Pinecone context (if configured), Tavily web context (if configured), and if tool-use is not available, inline marine API context from
ScubaGPT_External_APIsvia pattern helpers. build_augmented_prompt()— basescubagpt_system_prompt→ScubaGPT_Disambiguations→ optional language detection and localized term JSON →ScubaGPT_Knowledgebasefile injection →ScubaGPT_Safety::analyze_message()(medical / gas-planning XML) → seasonal excerpt fromtopics/seasonal-dive-planner.md→ScubaGPT_Safety::analyze_dive_plan()→ appendfrom step 4.- Claude — either
call_claude_with_tools()(ITI_Claude_API+ITI_Claude_Tools+ScubaGPT_Tools) orScubaGPT_API::call_claude()for a single completion. - Response — text extracted, optional usage totals, conversation logging, query stats, daily token accounting → JSON to client (or streamed chunks).
Vision flow: image posted to /chat/image → ITI_Vision_Handler + ITI_Claude_API → structured response to the client.
Database Tables
All names use the WordPress table prefix {prefix} (e.g. wp_).
| Table | Purpose |
|---|---|
{prefix}scubagpt_conversations |
Chat sessions (session id, user, IP, timestamps). |
{prefix}scubagpt_messages |
Message rows per conversation (role, content, tokens, sources). |
{prefix}scubagpt_query_stats |
Aggregated query/type counts and token usage by day. |
{prefix}scubagpt_species_log |
Species sighting records (ScubaGPT_Species_Log). |
{prefix}scubagpt_buddy_profiles |
Anonymous buddy-matching profiles (ScubaGPT_Buddy). |
{prefix}scubagpt_dive_sites |
Dive site rows for the interactive map (ScubaGPT_Map) when populated. |
Security Architecture
- REST: Chat routes require a valid
wp_restnonce via theX-WP-Nonceheader. Admin-only routes usecurrent_user_can('manage_options')(and related checks on specific handlers). - Input: Parameters use
sanitize_textarea_field,sanitize_text_field, length validation (e.g. 10k message cap), typed REST args. - Output: Admin notices and templates use
esc_html,esc_url, and standard WordPress escaping patterns. - Secrets: API keys stored in
wp_optionswith encryption viaScubaGPT_Admin::decrypt_api_key()/ encrypt on save. - Safety: Medical and gas-planning patterns drive non-executable XML/text injection into the system prompt so the model refuses personalized medical advice and numeric gas plans per policy.
Technology Stack
| Layer | Technologies |
|---|---|
| CMS | WordPress 6.0+ (plugin header), PHP 8.0+ |
| Data | MySQL / MariaDB |
| AI | Anthropic Claude (Messages API), optional Pinecone, Tavily |
| Embeddings | OpenAI / Voyage endpoints (via ScubaGPT_API configuration) |
| Maps | Leaflet.js (CDN), GeoJSON over REST |
| Browser | Web Speech API for voice input where supported |
| Automation (optional) | n8n via ITI_Workflow_Adapter |
Document version: Align with plugin SCUBAGPT_VERSION in scubagpt-chatbot/scubagpt-chatbot.php.
Last updated: April 2026
