Skip to main content
< All Topics
Print

ScubaGPT — Architecture

ScubaGPT — Architecture

System Overview

ScubaGPT is a WordPress plugin chatbot that uses the Anthropic Claude API with RAG (Retrieval Augmented Generation) from Pinecone and Tavily, optional tool use for live marine and dive-site data, Claude Vision for image-based marine life identification, and a layered safety pipeline for diving-related queries.

The runtime is organized around a singleton main class (ScubaGPT_Chatbot) that wires admin, API, chat, REST, and feature modules. Optional pieces (disambiguations, file-based knowledgebase loader, safety analyzer, tool definitions, external APIs) load after core init so a bad optional file cannot take down the whole site.

## Component Architecture

### Core Pipeline

| Class | Role |

|——–|——|

| ScubaGPT_Chatbot | Main plugin bootstrap: plugins_loaded safe init, activation hooks (options, DB tables, cron), core component construction, shortcode [scubagpt_chat], assets, optional integrations on init (priority 99). Singleton via ScubaGPT_Chatbot::instance(). |

| ScubaGPT_Chat | Core chat handler: rate limits, daily token budget, optional n8n routing via ITI_Workflow_Adapter, Pinecone + Tavily + inline marine API context, build_augmented_prompt() (disambiguations, language, KB, safety, seasonal, dive-plan analysis), conversation history, ITI_Claude_Tools loop when tools are enabled, streaming and non-streaming paths, conversation logging and query stats. |

| ScubaGPT_API | Thin wrapper for direct HTTP calls to Claude, Tavily, embeddings (OpenAI/Voyage), and related helpers; exposes get_api_key_for_tools() for shared-library clients. |

| ScubaGPT_REST | Registers scubagpt/v1 REST routes: chat, streaming chat, image/vision chat, conversation clear/history, connection tests, settings, GDPR delete, admin data retention. Delegates to ScubaGPT_Chat. |

| ScubaGPT_Admin | WordPress admin menus, Settings API registration (scubagpt_* options), encrypted API key storage/decryption, UI for Claude, Pinecone, Tavily, general UI, system prompt, external APIs, AI engine flags, stats views. |

Interaction: ScubaGPT_Chatbot constructs Admin → API → Chat → REST in order. ScubaGPT_REST depends on ScubaGPT_Chat; ScubaGPT_Chat depends on ScubaGPT_API. Optional modules (dive log, embed, buddy, species log, map) are constructed from the main class after the core chain is stable.

Knowledge System

Class Role
ScubaGPT_Knowledgebase Keyword-gated loading of markdown under scubagpt-chatbot/knowledgebase/: at most one destination and one topic file per query, with a character budget (KB_BUDGET, 60k chars) and transient caching. Injects into the system prompt.
ScubaGPT_Disambiguations Diving terminology disambiguation snippets for the system prompt (char-limited slice passed into get_for_system_prompt()).

Directory layout (plugin-shipped KB):

  • knowledgebase/destinations/ — Twelve regional destination guides (e.g. Caribbean, Indo-Pacific, Red Sea & Maldives, Mediterranean & Europe, Southeast Asia, Florida & Americas, Africa & Indian Ocean, Pacific Islands & Micronesia, East Asia, Central America, Costa Rica & Galápagos, Great Barrier Reef & Australia).
  • knowledgebase/topics/ — Nine+ topical guides (e.g. equipment, safety & medicine, marine life, wrecks, liveaboards, conservation, seasonal planner, navigation & technology, dive industry reference), plus templates/ for dive site/destination/operator content patterns.
  • knowledgebase/data/dive-sites.json — 14,642 enriched dive sites with coordinates, types, marine life, descriptions, and provenance tracking. Consumed by ScubaGPT_Map REST endpoints and map.js for GeoJSON rendering.
  • knowledgebase/data/dive-operators.json — 6,900+ operators with GPS coordinates, certification body/tier, nearby sites, and contact info. Consumed by ScubaGPT_Map for operator layer GeoJSON and ScubaGPT_Operators for recommendation logic.
  • knowledgebase/data/operators.json — Legacy structured operator data consumed by ScubaGPT_Operators for prompt-injection scoring and recommendations.

Note: The repository also contains a larger product-level knowledgebase/ (PDFs, embeddings) outside the plugin directory; the runtime file-based injection described above uses the plugin’s scubagpt-chatbot/knowledgebase/ tree.


Safety Layer

Class Role
ScubaGPT_Safety analyze_message() — regex-based detection of medical / fitness-to-dive questions and gas planning requests (deco, MOD, NDL, SAC, etc.); injects XML-style instructions and DAN referral text into the system prompt. analyze_dive_plan() — proactive cross-checks for dive-plan narratives (certification depth, conditions). Species identification paths can apply confidence framing where relevant.

Feature Modules

Class Role
ScubaGPT_Tools Defines Claude tools and handlers (marine conditions, equipment lookups, search, etc.) and exposes an ITI_Claude_Tools runner; uses ScubaGPT_External_APIs for live API calls when the tool-use path is active.
ScubaGPT_External_APIs Weather, tides, dive-site discovery, and related integrations (cached, configurable via scubagpt_external_api_settings). Loaded only from init_optional_integrations() when AI engine settings are not disabled.
ScubaGPT_Map [scubagpt_map] shortcode (default 1024×768px); enqueues Leaflet, Leaflet.markercluster, and map.js / map.css; public REST GeoJSON for dive sites (/dive-sites, /dive-sites/{id}) and dive operators (/dive-operators, /dive-operators/{id}). Dual-layer architecture with toggleable site/operator layers, search bar with filter overlay, detail modals, and custom SVG marker icons. Data loading cascade: DB table → JSON file → CSV fallback.
ScubaGPT_Trip_Planner Multi-turn trip planner with transient-backed state (scubagpt_trip_plan_*).
ScubaGPT_Species_Log Species sighting persistence, REST surface, install hook for {prefix}scubagpt_species_log.
ScubaGPT_Operators Operator database and recommendation logic (pairs with knowledgebase/data/operators.json).
ScubaGPT_Dive_Log Parsers for Subsurface / Shearwater-style logs; injects context for the model when user uploads or references logs.
ScubaGPT_Embed White-label embed for operators; can filter scubagpt_system_prompt with operator-specific XML blocks.
ScubaGPT_Buddy Buddy matching with anonymous profiles; {prefix}scubagpt_buddy_profiles.

ITI Shared Library Components

The host WordPress install is expected to load the ITI Shared Library (see project CLAUDE.md). ScubaGPT directly references these classes when they exist:

Component Usage in ScubaGPT
ITI_Claude_API Tool-use and vision paths: configured with model, tokens, temperature, system prompt, API key from plugin settings.
ITI_Claude_Tools Executes the multi-turn tool use loop with ScubaGPT_Tools definitions.
ITI_Vision_Handler /chat/image endpoint: image + prompt analysis for marine life ID.
ITI_Workflow_Adapter Optional n8n path in process_message(); if the workflow returns success, the plugin skips the local Claude pipeline for that request.

The following are standard ITI shared components documented in CLAUDE.md / shared/CATALOG.md; use them when extending the plugin even if not every symbol is referenced in ScubaGPT PHP today:

Component Typical role
ITI_Map_Embed Reusable Leaflet patterns elsewhere in ITI; this plugin implements ScubaGPT_Map with Leaflet directly.
ITI_Token_Budget_Manager Canonical token budgeting for prompts; ScubaGPT enforces limits via KB budget, history truncation, and daily token transients.
ITI_Logger File-based structured logging pattern for production diagnostics.
ITI_Cache_Manager Transient/cache abstractions; ScubaGPT also uses WordPress transients directly (e.g. rate limits, KB cache keys).
ITI_Settings_Page Declarative admin UI pattern; ScubaGPT admin currently uses the WordPress Settings API in ScubaGPT_Admin.

Frontend

Asset Role
assets/js/chatbot.js Chat widget: REST calls to scubagpt/v1, real-time streaming with live markdown rendering, processing indicator, image upload via FormData to /chat/image, voice input (Web Speech API where available), offline/online detection, session management with localStorage, URL sanitization, mobile full-screen layout, error categorization with retry logic.
assets/css/chatbot.css Responsive layout, mobile full-viewport chat with safe area insets, streaming state classes (.scubagpt-processing-indicator, .scubagpt-processing-spinner, .streaming-content), voice input pulse animation, image upload/dragover styles, dark mode via prefers-color-scheme, prefers-reduced-motion support, focus-visible accessibility, iOS zoom prevention, print styles.
assets/js/map.js Leaflet dual-layer map: loads GeoJSON from /dive-sites and /dive-operators REST endpoints via Promise.all, Leaflet.markercluster for both layers, custom SVG icons (icon-dive-site.svg, icon-dive-operator.svg), toggleable layer control, search bar with real-time filter overlay and clear button, detail modals with copy/close functionality, ARIA attributes and keyboard navigation (Escape to close, focus trapping), badge counters showing visible/total markers, fitBounds on load.
assets/css/map.css Map cluster styles (.scubagpt-cluster-site, .scubagpt-cluster-operator), search control layout (.scubagpt-map-search), badge control, modal overlay with animation (.scubagpt-map-modal-overlay, .scubagpt-map-modal), modal table/description/nearby-list formatting, Leaflet layer control overrides.
assets/images/icon-dive-site.svg Blue circle SVG (24×24) with diver wave motif, used for dive site map markers.
assets/images/icon-dive-operator.svg Orange circle SVG (24×24) with shop/building motif, used for dive operator map markers.

Data Flow

End-to-end path for a normal chat message:

  1. Browser sends POST /wp-json/scubagpt/v1/chat (or /chat/stream) with X-WP-Nonce: wp_rest and payload (message, optional session_id, optional conversation_history).
  2. ScubaGPT_RESTcheck_chat_permission() verifies nonce and anonymous/login policy → ScubaGPT_Chat::process_message() (or streaming variant).
  3. ScubaGPT_Chat — rate limit → daily token budget → if ITI_Workflow_Adapter is available and returns success, return n8n output (non-streaming path only).
  4. RAG retrieval — Pinecone context (if configured), Tavily web context (if configured), and if tool-use is not available, inline marine API context from ScubaGPT_External_APIs via pattern helpers.
  5. build_augmented_prompt() — base scubagpt_system_promptScubaGPT_Disambiguations → optional language detection and localized term JSON → ScubaGPT_Knowledgebase file injection → ScubaGPT_Safety::analyze_message() (medical / gas-planning XML) → seasonal excerpt from topics/seasonal-dive-planner.mdScubaGPT_Safety::analyze_dive_plan() → append from step 4.
  6. Claude — either call_claude_with_tools() (ITI_Claude_API + ITI_Claude_Tools + ScubaGPT_Tools) or ScubaGPT_API::call_claude() for a single completion.
  7. Response — text extracted, optional usage totals, conversation logging, query stats, daily token accounting → JSON to client (or streamed chunks).

Vision flow: image posted to /chat/imageITI_Vision_Handler + ITI_Claude_API → structured response to the client.


Database Tables

All names use the WordPress table prefix {prefix} (e.g. wp_).

Table Purpose
{prefix}scubagpt_conversations Chat sessions (session id, user, IP, timestamps).
{prefix}scubagpt_messages Message rows per conversation (role, content, tokens, sources).
{prefix}scubagpt_query_stats Aggregated query/type counts and token usage by day.
{prefix}scubagpt_species_log Species sighting records (ScubaGPT_Species_Log).
{prefix}scubagpt_buddy_profiles Anonymous buddy-matching profiles (ScubaGPT_Buddy).
{prefix}scubagpt_dive_sites Dive site rows for the interactive map (ScubaGPT_Map) when populated.

Security Architecture

  • REST: Chat routes require a valid wp_rest nonce via the X-WP-Nonce header. Admin-only routes use current_user_can('manage_options') (and related checks on specific handlers).
  • Input: Parameters use sanitize_textarea_field, sanitize_text_field, length validation (e.g. 10k message cap), typed REST args.
  • Output: Admin notices and templates use esc_html, esc_url, and standard WordPress escaping patterns.
  • Secrets: API keys stored in wp_options with encryption via ScubaGPT_Admin::decrypt_api_key() / encrypt on save.
  • Safety: Medical and gas-planning patterns drive non-executable XML/text injection into the system prompt so the model refuses personalized medical advice and numeric gas plans per policy.

Technology Stack

Layer Technologies
CMS WordPress 6.0+ (plugin header), PHP 8.0+
Data MySQL / MariaDB
AI Anthropic Claude (Messages API), optional Pinecone, Tavily
Embeddings OpenAI / Voyage endpoints (via ScubaGPT_API configuration)
Maps Leaflet.js (CDN), GeoJSON over REST
Browser Web Speech API for voice input where supported
Automation (optional) n8n via ITI_Workflow_Adapter

Document version: Align with plugin SCUBAGPT_VERSION in scubagpt-chatbot/scubagpt-chatbot.php.
Last updated: April 2026

Table of Contents