Skip to main content
< All Topics
Print

AI Coworker Trust Protocol

name: ai-coworker-trust-protocol

description: Establish and enforce structural trust patterns when using AI (Claude, Cursor, Codex, etc.) as a development coworker. Trust comes from verification infrastructure, not from the AI’s self-report. Covers code review discipline, credential isolation, dependency verification, prompt-injection surface reduction, and behavioral consistency checks. Use when onboarding AI into a project, setting up a new repo, or auditing an existing AI-assisted project for trust hygiene.

AI Coworker Trust Protocol

Core Principle

An AI coworker’s assurance that it is safe, honest, or aligned is weak evidence. A deceptively aligned system, a trained bias the AI is unaware of, and an honest-but-fallible AI all produce the same surface text. Trust must come from structure — verification infrastructure you control — not from the AI’s testimony about itself.

Treat AI as: a fast, knowledgeable, occasionally wrong contractor with no security clearance and no memory across sessions. Valuable. Not trusted on its own word.

Instructions

Apply this protocol when onboarding an AI coworker to a new project, auditing an existing project, or periodically re-verifying trust hygiene on a live one.

1. Establish the Verification Layer

The AI should never be the last line of defense for its own output. Every category of output needs an independent verification path:

AI Output Independent Verification
Code changes Human diff review; CI-based SAST (Semgrep / CodeQL); lint + typecheck; test suite
Dependencies added Lock file committed; dependency audit (npm audit, pip-audit, cargo audit); package existence verified against canonical registry; supply-chain scanner (Socket / Snyk)
Shell commands Commands surfaced, not hidden; dangerous categories (rm, curl sh, chmod 777, network egress) flagged for approval
File writes Diff inspected before accept; protected files (see CLAUDE.md) flagged
Claims about behavior (“this is safe”, “this doesn’t do X”) Verified from the artifact, not the self-report
Architectural or design decisions Questioned in a fresh session; compared to requirements doc

Gate rule: If a verification path does not exist for a class of output, that class is not ready for AI generation.

2. Credential and Privilege Isolation

The AI must not have privileges that, if misused (deliberately or accidentally), cause unrecoverable harm.

  • No production secrets. AI-assisted sessions get dev/staging credentials only. Production credentials are rotated out of any file the AI can read.
  • Least-privilege tokens. API tokens the AI uses for tool calls are scoped to the minimum surface (read-only, specific repos, rate-limited).
  • No unrestricted shell on critical systems. Sandbox development. Destructive commands require approval.
  • Separate environments for AI-assisted work and sensitive operations (client isolation, PII handling, payment flows).
  • Revocable access. Any credential exposed during AI session is assumed compromised; rotation procedure documented and tested.

3. Dependency and Supply-Chain Hygiene

AI is a known vector for dependency risk — both typosquatting (AI invents a plausible package name that attackers register) and over-installation (AI adds libraries to solve problems better solved with existing tooling).

  • Verify every new dependency against the canonical registry before install. Invoke the dependency-hygiene skill.
  • Pin versions. Lock files committed. No floating majors on production.
  • SBOM generation in CI for any deployable artifact.
  • Periodic audit (≤30 days): npm audit / pip-audit / cargo audit results clean or documented exceptions.
  • Flag unusual sources. If the AI suggests a package not on the canonical registry, treat it as a red flag until verified.

4. Prompt Injection Surface Reduction

The AI can be manipulated by content in files, web pages, tool outputs, or previous conversation. This is an active attack surface, not a theoretical risk.

  • Treat external content as untrusted. Web fetches, third-party docs, scraped data, and files contributed by external actors can carry instructions targeted at the AI.
  • Scope tool permissions tightly. The AI should not have tools that let it act on instructions embedded in external content (e.g., sending email, modifying production systems) without a human confirmation step.
  • Quarantine suspicious outputs. If the AI begins behaving unexpectedly after reading external content, back out of the session and start fresh.
  • Log tool use. Every action the AI takes should be visible in a reviewable log.

5. Behavioral Consistency Checks

AI behavior can shift across sessions, model versions, and prompt variations. Consistency is a verifiable signal.

  • Red-team probes: Periodically ask the AI to produce output it should refuse. Large behavior shifts across time are a signal worth investigating.
  • Cross-session reproduction: For security-critical logic, ask the same question in a fresh session. Compare.
  • Fresh-eyes review: Have a second AI session (or a human) review code the first session produced. Inferences drift.
  • Watch for sycophancy. If answers feel too agreeable when you’re wrong, that’s a signal that the AI is tuned toward pleasing over accuracy. Press and see if the answer changes.

6. Human Ownership of Artifacts

Every piece of AI-generated output must have a human who takes ownership of it.

  • Diff review before merge. Every line merged is read by a human who can defend it.
  • Commit messages reflect human decisions, not “Claude did this.” The author owns it.
  • Code ownership assigned. The human who merged the AI’s code is the owner on record.
  • Post-merge scan. CI runs SAST, dependency audit, and tests on every AI-generated merge.

7. Documented Rollback

Nothing the AI touches goes out without a rollback path.

  • Rollback procedure in DEPLOY.md — trigger conditions, steps, verification, communication.
  • Atomic commits so any AI-generated change can be reverted in isolation.
  • Backup before destructive operations (migrations, data transforms, config changes).

Trust Hygiene Audit

Before beginning AI-assisted work on a project, verify each control:

  • [ ] Code review gate is active (no direct-to-main merges)
  • [ ] CI runs SAST on every PR (Semgrep / CodeQL or equivalent)
  • [ ] Dependency audit is scheduled and current (<30 days)
  • [ ] Lock files committed; no unverified packages
  • [ ] .gitignore excludes secrets; no hardcoded credentials in repo
  • [ ] AI session has dev/staging credentials only
  • [ ] Protected files are listed in CLAUDE.md and respected
  • [ ] Rollback procedure documented in DEPLOY.md
  • [ ] At least one alternative AI tool or manual workflow is documented (no single point of dependency)
  • [ ] Commit history shows human authorship even for AI-assisted changes

Scoring: Any Missing on items 1, 2, 4, 5, 6, 8 = Block. AI-assisted work should not begin until resolved.

Output Format


## AI Coworker Trust Audit — [Project Name]
**Date**: YYYY-MM-DD
**Overall Posture**: Trusted / Watch / Block

### Verification Layer
| Output Class | Verification Path | Status |
|---|---|---|
| Code changes | [details] | Pass/Partial/Missing |
| Dependencies | [details] | Pass/Partial/Missing |
...

### Credential Isolation
[Findings]

### Supply-Chain Hygiene
[Last audit date, outstanding vulnerabilities]

### Prompt Injection Surface
[External-content risks, tool permission scope]

### Blocking Items
[List any Missing controls on critical items]

### Recommended Actions
1. [Specific action, owner, timeline]

Standards

  • Structure over assurance. Controls that work independent of the AI’s cooperation always beat controls that depend on the AI being honest.
  • Defense in depth. Assume any single control can be bypassed. Layer: input validation + code review + SAST + tests + monitoring + rollback.
  • Fail closed. If a verification path is down, pause AI-assisted work on that class of output.
  • Document the threat model. What are you actually worried about — malicious output, buggy output, scope drift, supply-chain injection, prompt injection? Different threats need different controls.
  • Revisit quarterly. AI capabilities, tooling, and attack surface all change. A trust audit is not a one-time artifact.

Related Skills

  • dependency-hygiene — package verification protocol
  • appsec-devsecops-engineer — CI/CD security gates
  • vibe-coding-guardrails — broader AI-development safeguards
  • ai-inference-boundary-review — reviewing AI output for uninvited inferences
  • ai-self-report-calibration — weighting AI claims about itself
  • scope-control — managing AI-introduced scope drift

Outputs: Trust audit scorecard, blocking-items list, prioritized remediation plan.

Table of Contents