Test Plan Writing
Test Plan Writing
Instructions
Write test plans that give engineers and QA a complete, executable testing program — no guesswork, no gaps.
Test plan document structure:
# [Feature/Sprint/Release] Test Plan
**Version**: [x.y.z]
**Date**: YYYY-MM-DD
**Scope**: [what is and is not being tested]
## Objectives
[What quality attributes are being verified: correctness, security, performance, reliability]
## Test Types
- Functional: [which user stories / requirements]
- Integration: [which component interactions]
- Regression: [which previously passing tests must still pass]
- Performance: [which operations need timing validation]
- Security: [which attack vectors are being tested]
- Exploratory: [which areas are being probed without a script]
## Test Cases
[See format below]
## Exit Criteria
[What constitutes a complete, passing test run]
## Environment Requirements
[OS, browser, app version, database state, API credentials needed]
## Dependencies
[External services, test data, credentials, third-party sandboxes]
Test case format:
### TC-[product]-[number]: [Test case title]
**Type**: Functional | Integration | Regression | Performance | Security
**Related**: [Story ID or Requirement ID]
**Preconditions**: [State of system before test]
**Steps**:
1. [Action]
2. [Action]
**Expected result**: [Exact expected outcome]
**Pass criteria**: [How to determine pass vs fail]
Minimum test coverage per user story:
- Happy path (primary success scenario)
- Primary error state (most likely failure)
- Boundary condition (empty input, maximum value, edge case)
- If story involves AI: malformed AI response handling
Minimum test coverage per API endpoint:
- 200 (success with expected payload)
- 400 (invalid input — missing required field, wrong type)
- 401/403 (unauthenticated / unauthorized)
- 404 (resource not found)
- 500 (server error handling — verify error message doesn’t expose internals)
- Timeout (network latency simulation)
AI-specific test scenarios (for Claude API integrations):
- Valid request → verify response structure and content quality
- Rate limit (429) → verify retry behavior and user-facing message
- Token limit exceeded (400) → verify graceful degradation
- Network timeout → verify fallback behavior
- Malformed JSON response → verify error handling, no crash
Performance test thresholds (ITI standard):
- Page load: < 3 seconds
- API response: < 5 seconds (95th percentile)
- Claude API call: < 10 seconds (with streaming progress indicator)
- Database query: < 500ms for simple CRUD, < 2s for complex joins
- n8n webhook response: < 2 seconds (non-AI workflows), < 15 seconds (AI Agent workflows)
- Redis operation: < 10ms
- Dify KB retrieval: < 3 seconds
- Docker container health check: < 5 seconds
Infrastructure/container testing patterns (Docker Compose stack):
When writing test plans for containerized services, include these test types:
- Container health: verify each container responds to its health check (PostgreSQL
pg_isready, RedisPING, n8n/healthz, Dify API/console/api/setup) - Service connectivity: verify cross-container DNS resolution and port accessibility
- Database readiness: verify expected databases exist (n8n, dify, dify_plugin), pgvector extension loaded
- Nginx proxy routing: verify routes resolve correctly (Dify API on :3001, n8n on :5678, Dify Web on :3000)
- Volume persistence: verify data survives container restart
Test markers for infrastructure tests:
@pytest.mark.smoke— quick health checks, containers running, ports responding (<30s total)@pytest.mark.integration— multi-service integration, requires full Docker stack@pytest.mark.slow— long-running tests (AI Agent workflows, Dify retrieval)
n8n workflow webhook testing patterns:
- Reachability: GET/POST to
/webhook/{path}returns 200 with valid JSON - AI Agent workflows: verify
chatInputis accepted and response containsoutputfield - Router workflows: verify each Switch branch is reachable with appropriate input discriminator
- KB-augmented workflows: verify Dify retrieval tool returns context in agent output
- Empty/invalid payload: verify workflow handles gracefully without 5xx
- Session memory: verify session-based workflows maintain context across turns using
$execution.id
Dify KB retrieval quality testing:
- Relevant query: at least one result returned with score > 0
- Irrelevant query: empty results or results with score < threshold
- Invalid dataset ID: 404 response
- Missing authentication: 401 response
- Chunking quality: retrieved segments are coherent and contextually complete
FastAPI async endpoint testing patterns:
- In-process (unit): pytest + httpx
AsyncClientwithASGITransport— no external services - Live (integration): pytest + httpx against Docker Compose URL — requires stack running
- Auth matrix: valid token, expired token, missing token, invalid role
- CORS: allowed origin, blocked origin, preflight OPTIONS request
- Security headers: X-Content-Type-Options, X-Frame-Options, Strict-Transport-Security
- File uploads: valid MIME accepted, invalid MIME rejected (415), EXIF stripping verified
- Error responses: no PII, no stack traces, no internal file paths
Antigravity test execution integration:
When the test plan will be executed via Google Antigravity (the ITI test/debug lane), include these considerations:
- Agent dispatch format: Structure test cases so they can be dispatched as Antigravity Agent Manager tasks using Planning mode. Each test suite maps to a
/test-sessionor/browser-testworkflow invocation. - Browser QA test cases: For UI-facing tests, specify viewport sizes (1440px, 1024px, 768px, 375px) and expected visual states. Antigravity’s browser sub-agent captures screenshots and recordings as Walkthrough artifacts.
- Visual regression baselines: Include a baseline capture step before code changes. Antigravity compares screenshots against baselines to detect unintended UI changes.
- Artifact review criteria: Define what constitutes a pass/fail in Walkthrough artifact review — include expected screenshot states, acceptable visual differences, and
[TEST-FAILURE]flag thresholds. - Knowledge sync: Test plans should include a post-session step to scan Walkthrough artifacts for
[CONTEXT-UPDATE]flags and route findings to the appropriate CLAUDE.md tier.
See the antigravity-testing and antigravity-browser-qa skills for detailed dispatch and artifact review protocols.
Outputs: Test plan document (Markdown), test case matrix (story/requirement → test cases), exit criteria checklist, environment setup guide, defect report template, infrastructure smoke test checklist.
