Skip to main content
< All Topics
Print

Chapter 22: Safety & Guardrails

Chapter 22: Safety & Guardrails

Last Updated: 2026-04

## 22.1 Overview

Safety guardrails protect users from harmful AI outputs, protect ITI from legal and reputational risk, and ensure that AI-generated code meets security standards. Two complementary guardrail systems govern all ITI development:

| System | Scope | Tool |

|——–|——-|——|

| Vibe Coding Safeguards | AI-assisted development practices | Vibe Coding Guardian agent + vibe-coding-guardrails skill |

| AI Safety Guardrails | AI output behavior in high-stakes products | safety-guardrails skill + product-level prompt constraints |

| Claims Integrity | Published content accuracy | claims-integrity-audit + claims-evidence-registry skills |

| Dependency Hygiene | Third-party dependency management | dependency-hygiene skill |

22.2 Vibe Coding Safeguards

“Vibe coding” refers to AI-assisted development where the developer accepts AI-generated code without rigorous review. While this accelerates development, it introduces 15 known pitfall categories. The Vibe Coding Safeguards system requires that every ITI product address all 15.

The 15 pitfall categories

# Pitfall Safeguard
1 Hardcoded secrets API keys in Keychain/env only; .env in .gitignore
2 Unsanitized input Platform sanitization functions on all user input
3 Unescaped output Platform escape functions on all output
4 Missing nonce/CSRF check_ajax_referer() on every WordPress AJAX handler
5 Eval on user data Never use eval(), exec(), or equivalent on user input
6 Missing permission checks current_user_can() / capability checks before privileged actions
7 Committed secrets Pre-commit hook or .gitignore enforcement
8 Missing error logging All errors logged; security events explicitly logged
9 Scope creep Scope Owner agent review for any out-of-scope additions
10 Missing tests QA agent writes tests for every new feature
11 Stale context Context Keeper agent maintains CLAUDE.md accuracy
12 Missing artifacts All 5 required artifacts present (see Chapter 24)
13 Dependency audit skipped dependency-hygiene skill applied before any new dependency
14 No deployment procedure DEPLOY.md exists and is tested before first deployment
15 Claims without evidence Claims Ombudsman agent audits all marketing and documentation

Running a Vibe Coding Guardrail audit

At the start of a new product kickoff and before any release:

“Run a vibe coding guardrail audit on [product name].”

The Vibe Coding Guardian agent reviews the product against all 15 pitfalls and produces a pass/fail report with remediation steps. Alternatively, apply the vibe-coding-guardrails skill directly for a structured risk scorecard and prioritized remediation plan.


22.3 AI Safety Guardrails for High-Stakes Products

Some ITI products operate in domains where incorrect AI outputs can cause real harm: medical, legal, financial, or safety-related advice. These products require additional guardrails in their system prompts.

Risk tiers

Tier Domain Required Guardrails
Low Travel, gardening, general advice Standard output quality controls
Medium Career, financial planning Uncertainty acknowledgment; professional referral paths
High Medical, legal, safety Mandatory disclaimers; refusal rules; emergency referrals

Required guardrails for Tier 2 (Medium) products


## Limitations
- I provide information and analysis, not professional [financial/legal/medical] advice.
- Consult a qualified professional before making major [financial/legal/health] decisions.
- My knowledge has a training cutoff. For current regulations or market conditions, verify with an up-to-date source.

Required guardrails for Tier 3 (High) products


## Non-Negotiable Safety Rules
1. I will not diagnose medical conditions or prescribe treatments.
2. For any medical emergency, immediately respond: "This is a medical emergency.
   Call 911 (US) or your local emergency number immediately."
3. I will not provide legal advice that constitutes attorney-client relationship.
4. I will acknowledge uncertainty: if I am not confident in information,
   I will say so explicitly rather than presenting a guess as fact.
5. I will always provide a referral path to a qualified professional.

22.4 The Prompt Auditor

Before building any new AI feature, apply the Prompt Auditor skill. It checks system prompts for common failure patterns identified in ITI’s RAID (Risks, Assumptions, Issues, Dependencies) history.

“Apply the prompt-auditor skill to this system prompt: [paste prompt]”

The auditor flags:

  • Scope ambiguity — unclear what the model should/shouldn’t do
  • Missing verification steps — no way to validate outputs
  • Underspecified user scenarios — edge cases not addressed
  • Safety gaps — missing constraints for the product’s risk tier
  • Format conflicts — contradictory output format instructions

22.5 Claims Integrity

All marketing content, portfolio documents, and product descriptions must pass a Claims Integrity audit before publication.

The Claims Ombudsman agent audits:

  • Factual accuracy of all stated claims
  • Evidence supporting performance claims (“increases conversions by 40%”)
  • Currency of cited statistics and sources
  • Legal exposure from unsubstantiated superlatives

Run the audit:

“Run a claims integrity audit on [document].”

Claims evidence registry

All claims used in marketing are tracked in a Claims Evidence Registry with:

  • The claim text
  • The evidence supporting it (source, date, methodology)
  • Staleness threshold (date after which evidence must be refreshed)

The registry lives in ITI/operations/ and is maintained by the Claims Evidence Curator agent. Two skills support this process: claims-integrity-audit for running audits and claims-evidence-registry for building and maintaining registries.


22.6 Security Standards for All Code

These rules apply to all ITI code, regardless of whether it was written by a human or an AI agent:

Rule Applies To
Never hardcode API keys or passwords All platforms
Sanitize all user input before processing All platforms
Escape all output before rendering All platforms
Never commit .env or credential files to Git All platforms
Never use eval() on user-controlled data PHP, JS, Python
Log all security events (failed auth, invalid tokens) All platforms
Use prepared statements for all database queries PHP, Python, Rust

Full security standards are in Chapter 26 — Security Standards.


Previous: Chapter 21 — Knowledge Bases | Next: Chapter 23 — Build Session Protocol

Table of Contents