Skip to main content
< All Topics
Print

Safety Guardrails

name: safety-guardrails

description: Designs and enforces non-negotiable safety constraints for AI products operating in high-stakes domains. Covers refusal rules, mandatory referral paths, uncertainty acknowledgment, data-currency disclosure, and confidence calibration. Use when building prompts or agents in medical, legal, financial, safety, or any domain where a wrong answer causes real harm.

Safety Guardrails

Instructions

Apply this framework when defining or auditing safety constraints for any AI product in a high-stakes domain (diving, medical, legal, financial, security, engineering, etc.).

1. Classify the Stakes

Determine which category of risk applies:

Category Examples Default Posture
Physical harm Medical clearance, gas/decompression calculations, drug interactions Hard refuse + professional referral
Legal harm Specific legal advice, jurisdiction-specific compliance Soft refuse + disclaim + refer
Financial harm Investment picks, tax calculation, insurance determination Soft refuse + general guidance only
Reputational harm Species ID with low confidence, unverified quotes State uncertainty + confidence level
Data currency Site conditions, regulations, prices, availability Timestamp disclosure mandatory

2. Define Non-Negotiable Refusals

For physical-harm categories, establish hard rules the model cannot override regardless of user framing:

Rule template:

For ANY question about [topic], provide general information only and ALWAYS direct the user to [named authority or credential]. Never [specific forbidden action].

Common hard rules by domain:

  • Health/Medical: Never clear anyone for an activity. Direct to a licensed physician or domain-specific body (e.g., DAN for dive medicine).
  • Calculations with life-safety implications: Never provide [gas planning / dosing / load calculations]. These require trained human judgment and validated tools.
  • Legal advice: Never provide jurisdiction-specific rulings as definitive guidance. Refer to a licensed attorney in the applicable jurisdiction.
  • Financial advice: Never recommend specific securities, tax positions, or insurance products. Refer to a licensed advisor.

3. Build the Referral Path

For every hard refusal, pair it with a specific referral so the user is never left stranded:

Refusal Trigger Referral Target
Medical/fitness questions [Domain body] (e.g., DAN, AMA, specialist)
Legal questions Licensed attorney in [jurisdiction]
Financial questions CFP, CPA, or RIA as appropriate
Emergency situations Emergency services first, then domain expert

4. Calibrate Confidence and Uncertainty

For domains where the model may be operating near the edge of reliable knowledge:

Confidence levels:

  • High: Multiple corroborating authoritative sources; core facts unlikely to change.
  • Medium: Single source or subject to local variation; state the basis.
  • Low / Unknown: Explicitly say “I cannot verify this” or “I’m not confident.” Never fabricate.

Uncertainty language:

  • “Based on [source] as of [date/period] — verify current conditions before relying on this.”
  • “I cannot confirm this with high confidence. [Authority] would be the right resource.”
  • “This varies significantly by [jurisdiction/location/individual] — treat this as general guidance only.”

5. Handle Data Currency

For time-sensitive information (regulations, site conditions, prices, hours, weather):

  1. State the likely age of the information: “as of my training data” or “as of [known publication date].”
  2. Identify what can change: “Regulations, permits, and site conditions change frequently.”
  3. Direct to a live source: “Check [specific authority URL or body] for current information.”
  4. Never present time-sensitive data as current fact.

6. Audit Checklist

Before shipping a prompt or agent in a high-stakes domain, verify:

  • [ ] Hard refusals defined for all physical-harm topics
  • [ ] Every refusal paired with a specific referral path
  • [ ] Confidence levels stated for species IDs, technical calculations, condition reports
  • [ ] Timestamps or uncertainty language on all time-sensitive data
  • [ ] Emergency escalation path included (“In an emergency, call [number] first”)
  • [ ] No calculations presented that require trained human validation

Standards

  • Err toward caution: When in doubt about whether a question crosses a line, apply the soft-refuse pattern (general info + referral) rather than the hard-refuse. Never leave the user with nothing.
  • Name the authority: Vague referrals (“consult a professional”) are less useful than named ones (“contact DAN at DAN.org or call 1-800-446-2671”).
  • State what you CAN do: Every refusal should be paired with what the model can helpfully provide within its lane.
  • Layer, don’t just block: Add guardrails at the instruction level (the prompt) AND at the response-generation level (calibration language). Defense in depth.

Examples

Example: Dive product (scuba-gpt pattern)

Input: “I take blood thinners — can I dive?”

Correct response structure:

  1. Acknowledge the valid concern.
  2. Provide general information: “Blood thinners can affect dive safety due to [brief factual reason].”
  3. Hard-refuse the clearance decision: “I can’t clear you to dive — that requires a dive physician.”
  4. Named referral: “Contact DAN (dan.org) or your GP to get a dive medical clearance form completed.”

Incorrect: “Blood thinners generally don’t prevent diving, you should be fine for recreational depths.”


Example: Financial advisory product

Input: “Should I put my entire IRA into [ticker]?”

Correct response structure:

  1. Provide factual context about concentration risk (general guidance).
  2. Soft-refuse the specific recommendation: “I can’t recommend specific securities or allocation percentages.”
  3. Named referral: “A Registered Investment Advisor (RIA) or CFP can review your full financial picture and give personalized advice.”

Example: Legal product

Input: “Can I get out of this non-compete?”

Correct response structure:

  1. Explain non-compete enforceability factors in general terms.
  2. Disclaim: “Enforceability varies significantly by state, contract language, and circumstances.”
  3. Named referral: “An employment attorney in [state] would review your specific agreement and advise you.”
Table of Contents