Skip to main content
< All Topics
Print

Test Plan Writing

name: test-plan-writing

description: Write comprehensive test plans covering functional, non-functional, integration, regression, and exploratory testing. Use when planning testing for a new feature, sprint, or release, defining test coverage for an API or component, writing acceptance tests for user stories, or reviewing whether testing is complete before sign-off.

Test Plan Writing

Instructions

Write test plans that give engineers and QA a complete, executable testing program — no guesswork, no gaps.

Test plan document structure:


# [Feature/Sprint/Release] Test Plan
**Version**: [x.y.z]
**Date**: YYYY-MM-DD
**Scope**: [what is and is not being tested]

## Objectives
[What quality attributes are being verified: correctness, security, performance, reliability]

## Test Types
- Functional: [which user stories / requirements]
- Integration: [which component interactions]
- Regression: [which previously passing tests must still pass]
- Performance: [which operations need timing validation]
- Security: [which attack vectors are being tested]
- Exploratory: [which areas are being probed without a script]

## Test Cases
[See format below]

## Exit Criteria
[What constitutes a complete, passing test run]

## Environment Requirements
[OS, browser, app version, database state, API credentials needed]

## Dependencies
[External services, test data, credentials, third-party sandboxes]

Test case format:


### TC-[product]-[number]: [Test case title]
**Type**: Functional | Integration | Regression | Performance | Security
**Related**: [Story ID or Requirement ID]
**Preconditions**: [State of system before test]
**Steps**:
1. [Action]
2. [Action]
**Expected result**: [Exact expected outcome]
**Pass criteria**: [How to determine pass vs fail]

Minimum test coverage per user story:

  • Happy path (primary success scenario)
  • Primary error state (most likely failure)
  • Boundary condition (empty input, maximum value, edge case)
  • If story involves AI: malformed AI response handling

Minimum test coverage per API endpoint:

  • 200 (success with expected payload)
  • 400 (invalid input — missing required field, wrong type)
  • 401/403 (unauthenticated / unauthorized)
  • 404 (resource not found)
  • 500 (server error handling — verify error message doesn’t expose internals)
  • Timeout (network latency simulation)

AI-specific test scenarios (for Claude API integrations):

  • Valid request → verify response structure and content quality
  • Rate limit (429) → verify retry behavior and user-facing message
  • Token limit exceeded (400) → verify graceful degradation
  • Network timeout → verify fallback behavior
  • Malformed JSON response → verify error handling, no crash

Performance test thresholds (ITI standard):

  • Page load: < 3 seconds
  • API response: < 5 seconds (95th percentile)
  • Claude API call: < 10 seconds (with streaming progress indicator)
  • Database query: < 500ms for simple CRUD, < 2s for complex joins
  • n8n webhook response: < 2 seconds (non-AI workflows), < 15 seconds (AI Agent workflows)
  • Redis operation: < 10ms
  • Dify KB retrieval: < 3 seconds
  • Docker container health check: < 5 seconds

Infrastructure/container testing patterns (Docker Compose stack):

When writing test plans for containerized services, include these test types:

  • Container health: verify each container responds to its health check (PostgreSQL pg_isready, Redis PING, n8n /healthz, Dify API /console/api/setup)
  • Service connectivity: verify cross-container DNS resolution and port accessibility
  • Database readiness: verify expected databases exist (n8n, dify, dify_plugin), pgvector extension loaded
  • Nginx proxy routing: verify routes resolve correctly (Dify API on :3001, n8n on :5678, Dify Web on :3000)
  • Volume persistence: verify data survives container restart

Test markers for infrastructure tests:

  • @pytest.mark.smoke — quick health checks, containers running, ports responding (<30s total)
  • @pytest.mark.integration — multi-service integration, requires full Docker stack
  • @pytest.mark.slow — long-running tests (AI Agent workflows, Dify retrieval)

n8n workflow webhook testing patterns:

  • Reachability: GET/POST to /webhook/{path} returns 200 with valid JSON
  • AI Agent workflows: verify chatInput is accepted and response contains output field
  • Router workflows: verify each Switch branch is reachable with appropriate input discriminator
  • KB-augmented workflows: verify Dify retrieval tool returns context in agent output
  • Empty/invalid payload: verify workflow handles gracefully without 5xx
  • Session memory: verify session-based workflows maintain context across turns using $execution.id

Dify KB retrieval quality testing:

  • Relevant query: at least one result returned with score > 0
  • Irrelevant query: empty results or results with score < threshold
  • Invalid dataset ID: 404 response
  • Missing authentication: 401 response
  • Chunking quality: retrieved segments are coherent and contextually complete

FastAPI async endpoint testing patterns:

  • In-process (unit): pytest + httpx AsyncClient with ASGITransport — no external services
  • Live (integration): pytest + httpx against Docker Compose URL — requires stack running
  • Auth matrix: valid token, expired token, missing token, invalid role
  • CORS: allowed origin, blocked origin, preflight OPTIONS request
  • Security headers: X-Content-Type-Options, X-Frame-Options, Strict-Transport-Security
  • File uploads: valid MIME accepted, invalid MIME rejected (415), EXIF stripping verified
  • Error responses: no PII, no stack traces, no internal file paths

Antigravity test execution integration:

When the test plan will be executed via Google Antigravity (the ITI test/debug lane), include these considerations:

  • Agent dispatch format: Structure test cases so they can be dispatched as Antigravity Agent Manager tasks using Planning mode. Each test suite maps to a /test-session or /browser-test workflow invocation.
  • Browser QA test cases: For UI-facing tests, specify viewport sizes (1440px, 1024px, 768px, 375px) and expected visual states. Antigravity’s browser sub-agent captures screenshots and recordings as Walkthrough artifacts.
  • Visual regression baselines: Include a baseline capture step before code changes. Antigravity compares screenshots against baselines to detect unintended UI changes.
  • Artifact review criteria: Define what constitutes a pass/fail in Walkthrough artifact review — include expected screenshot states, acceptable visual differences, and [TEST-FAILURE] flag thresholds.
  • Knowledge sync: Test plans should include a post-session step to scan Walkthrough artifacts for [CONTEXT-UPDATE] flags and route findings to the appropriate CLAUDE.md tier.

See the antigravity-testing and antigravity-browser-qa skills for detailed dispatch and artifact review protocols.

Outputs: Test plan document (Markdown), test case matrix (story/requirement → test cases), exit criteria checklist, environment setup guide, defect report template, infrastructure smoke test checklist.

Table of Contents