Chapter 4: Daily Operations
Chapter 4: Daily Operations
Last Updated: 2026-03
4.2 Health Check Procedure
Run these five checks in order. All must pass before the system is considered healthy.
Step 1 — Container status
docker compose ps
All 9 containers must show running or healthy. Any container showing exited or restarting requires immediate investigation.
Step 2 — n8n health endpoint
curl -s http://localhost:5678/healthz
Expected response: {"status":"ok"}
Step 3 — Dify Nginx proxy
curl -s -o /dev/null -w "%{http_code}" http://localhost:3001/console/api/setup
Expected: HTTP status below 500 (200, 401, or 404 are all acceptable — any 5xx means the proxy or API is down).
Step 4 — Redis
docker exec iti-redis redis-cli ping
Expected: PONG
Step 5 — PostgreSQL
docker exec iti-postgres pg_isready -U postgres
Expected: localhost:5432 - accepting connections
Tip: Script these five checks into a shell alias for one-command health verification.
4.3 Log Review
Check all services for errors in the last 24 hours
docker compose logs --since 24h | grep -i error
Check a specific service
docker compose logs --since 24h iti-n8n | grep -i "error\|warn\|fail"
Save logs to archive
docker compose logs --since 24h > /Users/username/Cursor/Archives/n8n-dify-logs-$(date +%Y%m%d).log
Watch live logs
docker compose logs -f --tail 50
4.4 Backup
The backup script creates timestamped archives of all databases and n8n data volumes.
Run a backup
cd ITI/infrastructure/n8n-dify
bash backup.sh
Verify the backup files were created
ls -lh /Users/username/Cursor/Archives/ | grep $(date +%Y%m%d)
Expected files for today:
| File | Contents |
|---|---|
n8n-db-YYYYMMDD.sql |
n8n PostgreSQL dump |
dify-db-YYYYMMDD.sql |
Dify PostgreSQL dump |
n8n-data-YYYYMMDD.tar.gz |
n8n Docker volume data |
dify-data-YYYYMMDD.tar.gz |
Dify Docker volume data |
Warning: If any of these files are zero bytes, the backup failed. Do not proceed with upgrades or destructive operations until you have a verified non-zero backup.
Backups auto-prune after 7 days. The Archives directory will always contain the last 7 days of backups.
4.5 Restore from Backup
Restore n8n database
docker exec -i iti-postgres psql -U postgres -d n8n < /Users/username/Cursor/Archives/n8n-db-YYYYMMDD.sql
Restore Dify database
docker exec -i iti-postgres psql -U postgres -d dify < /Users/username/Cursor/Archives/dify-db-YYYYMMDD.sql
Restore n8n volume data
docker run --rm \
-v n8n_data:/data \
-v /Users/username/Cursor/Archives:/backup \
alpine sh -c "cd /data && tar xzf /backup/n8n-data-YYYYMMDD.tar.gz"
Restart after restore
docker compose restart
Note: After a database restore, n8n may need active workflows re-published. Check n8n UI > Workflows and verify all expected workflows are active.
4.6 n8n Execution Monitoring
The n8n UI at http://localhost:5678 provides an Executions panel showing all workflow runs.
What to check weekly
- Navigate to Executions in the left sidebar.
- Filter by Status: Error.
- For each failed execution, open it and read the error message.
- Determine if the error is transient (network timeout, API rate limit) or systematic (broken workflow logic, invalid credentials).
Execution data management
n8n stores execution data in the n8n PostgreSQL database. This data grows over time and should be pruned.
Check current execution DB size:
docker exec iti-postgres psql -U postgres -d n8n -c \
"SELECT pg_size_pretty(pg_database_size('n8n'));"
Configure pruning in n8n UI: Settings > Execution Data > set “Prune data older than” to 30 days.
4.7 Dify Knowledge Base Quality Check
Run this check weekly to ensure retrieval quality has not degraded.
- Open the Dify console at
http://localhost:3000. - Navigate to Knowledge.
- Select a knowledge base used by a production workflow.
- Use Retrieval Testing to test 3–5 sample queries.
- Verify that the top results are relevant and the scores are above the threshold configured for that KB (typically 0.7+).
If retrieval quality has degraded, consider:
- Re-indexing the dataset (Knowledge > select dataset > Re-index).
- Reviewing recently added documents for quality.
- Adjusting chunk size or overlap settings.
4.8 Resource Monitoring
Check memory and CPU for all containers
docker stats --no-stream
Save resource snapshot to archive
docker stats --no-stream > /Users/username/Cursor/Archives/docker-stats-$(date +%Y%m%d).log
Check disk usage
docker system df
If disk usage is high (volumes or build cache), prune with:
docker system prune
Warning:
docker system pruneremoves stopped containers, unused images, and build cache. It does NOT remove named volumes. Usedocker volume pruneonly after a confirmed backup.
4.9 Incident Triage Decision Tree
When something is wrong, follow this sequence:
1. docker compose ps
├── Container stopped?
│ └── docker compose logs <service> --tail 50
│ ├── OOM killed? → Increase memory limit or reduce load
│ ├── Config error? → Fix .env or docker-compose.yml
│ └── Restart: docker compose up -d <service>
│
├── Container running but unhealthy?
│ ├── Test the healthcheck endpoint manually (see 4.2)
│ └── Check logs for crash loops
│
├── Memory pressure?
│ └── docker stats --no-stream → identify the consumer
│ └── docker compose restart <heavy-service>
│
├── Disk full?
│ └── docker system df → docker system prune
│
├── n8n webhooks returning 500?
│ └── Check n8n Executions panel for error details
│
└── Dify returning 502?
└── docker compose restart iti-dify-api iti-dify-nginx
4.10 Container Recovery Commands
| Situation | Command |
|---|---|
| Restart one service | docker compose restart |
| Rebuild and restart one service | docker compose up -d --force-recreate |
| Full stack restart | docker compose down && docker compose up -d |
| Nuclear reset (destroys data) | docker compose down -v — only after confirmed backup |
Previous: Chapter 3 — The Docker Stack | Next: Chapter 5 — Infrastructure Upgrades
