Skip to main content
< All Topics
Print

Chapter 4: Daily Operations

Chapter 4: Daily Operations

Last Updated: 2026-03

## 4.1 Operations Cadence

The stack requires regular attention. The following table summarizes the required cadence:

| Task | Command / Location | Frequency |

|——|——————–|———–|

| Health check | docker compose ps + endpoint checks | Daily |

| Error log review | docker compose logs --since 24h | Daily |

| Backup | bash backup.sh | Daily |

| n8n execution review | n8n UI > Executions > filter “failed” | Weekly |

| Dify KB quality check | Dify UI > Knowledge > Retrieval Testing | Weekly |

| Resource monitoring | docker stats --no-stream | Weekly |

| Image updates | docker compose pull && up -d | Monthly |

All commands run from ITI/infrastructure/n8n-dify/ unless otherwise noted.

4.2 Health Check Procedure

Run these five checks in order. All must pass before the system is considered healthy.

Step 1 — Container status


docker compose ps

All 9 containers must show running or healthy. Any container showing exited or restarting requires immediate investigation.

Step 2 — n8n health endpoint


curl -s http://localhost:5678/healthz

Expected response: {"status":"ok"}

Step 3 — Dify Nginx proxy


curl -s -o /dev/null -w "%{http_code}" http://localhost:3001/console/api/setup

Expected: HTTP status below 500 (200, 401, or 404 are all acceptable — any 5xx means the proxy or API is down).

Step 4 — Redis


docker exec iti-redis redis-cli ping

Expected: PONG

Step 5 — PostgreSQL


docker exec iti-postgres pg_isready -U postgres

Expected: localhost:5432 - accepting connections

Tip: Script these five checks into a shell alias for one-command health verification.


4.3 Log Review

Check all services for errors in the last 24 hours


docker compose logs --since 24h | grep -i error

Check a specific service


docker compose logs --since 24h iti-n8n | grep -i "error\|warn\|fail"

Save logs to archive


docker compose logs --since 24h > /Users/username/Cursor/Archives/n8n-dify-logs-$(date +%Y%m%d).log

Watch live logs


docker compose logs -f --tail 50

4.4 Backup

The backup script creates timestamped archives of all databases and n8n data volumes.

Run a backup


cd ITI/infrastructure/n8n-dify
bash backup.sh

Verify the backup files were created


ls -lh /Users/username/Cursor/Archives/ | grep $(date +%Y%m%d)

Expected files for today:

File Contents
n8n-db-YYYYMMDD.sql n8n PostgreSQL dump
dify-db-YYYYMMDD.sql Dify PostgreSQL dump
n8n-data-YYYYMMDD.tar.gz n8n Docker volume data
dify-data-YYYYMMDD.tar.gz Dify Docker volume data

Warning: If any of these files are zero bytes, the backup failed. Do not proceed with upgrades or destructive operations until you have a verified non-zero backup.

Backups auto-prune after 7 days. The Archives directory will always contain the last 7 days of backups.


4.5 Restore from Backup

Restore n8n database


docker exec -i iti-postgres psql -U postgres -d n8n < /Users/username/Cursor/Archives/n8n-db-YYYYMMDD.sql

Restore Dify database


docker exec -i iti-postgres psql -U postgres -d dify < /Users/username/Cursor/Archives/dify-db-YYYYMMDD.sql

Restore n8n volume data


docker run --rm \
  -v n8n_data:/data \
  -v /Users/username/Cursor/Archives:/backup \
  alpine sh -c "cd /data && tar xzf /backup/n8n-data-YYYYMMDD.tar.gz"

Restart after restore


docker compose restart

Note: After a database restore, n8n may need active workflows re-published. Check n8n UI > Workflows and verify all expected workflows are active.


4.6 n8n Execution Monitoring

The n8n UI at http://localhost:5678 provides an Executions panel showing all workflow runs.

What to check weekly

  1. Navigate to Executions in the left sidebar.
  2. Filter by Status: Error.
  3. For each failed execution, open it and read the error message.
  4. Determine if the error is transient (network timeout, API rate limit) or systematic (broken workflow logic, invalid credentials).

Execution data management

n8n stores execution data in the n8n PostgreSQL database. This data grows over time and should be pruned.

Check current execution DB size:


docker exec iti-postgres psql -U postgres -d n8n -c \
  "SELECT pg_size_pretty(pg_database_size('n8n'));"

Configure pruning in n8n UI: Settings > Execution Data > set “Prune data older than” to 30 days.


4.7 Dify Knowledge Base Quality Check

Run this check weekly to ensure retrieval quality has not degraded.

  1. Open the Dify console at http://localhost:3000.
  2. Navigate to Knowledge.
  3. Select a knowledge base used by a production workflow.
  4. Use Retrieval Testing to test 3–5 sample queries.
  5. Verify that the top results are relevant and the scores are above the threshold configured for that KB (typically 0.7+).

If retrieval quality has degraded, consider:

  • Re-indexing the dataset (Knowledge > select dataset > Re-index).
  • Reviewing recently added documents for quality.
  • Adjusting chunk size or overlap settings.

4.8 Resource Monitoring

Check memory and CPU for all containers


docker stats --no-stream

Save resource snapshot to archive


docker stats --no-stream > /Users/username/Cursor/Archives/docker-stats-$(date +%Y%m%d).log

Check disk usage


docker system df

If disk usage is high (volumes or build cache), prune with:


docker system prune

Warning: docker system prune removes stopped containers, unused images, and build cache. It does NOT remove named volumes. Use docker volume prune only after a confirmed backup.


4.9 Incident Triage Decision Tree

When something is wrong, follow this sequence:


1. docker compose ps
   ├── Container stopped?
   │   └── docker compose logs <service> --tail 50
   │       ├── OOM killed? → Increase memory limit or reduce load
   │       ├── Config error? → Fix .env or docker-compose.yml
   │       └── Restart: docker compose up -d <service>
   │
   ├── Container running but unhealthy?
   │   ├── Test the healthcheck endpoint manually (see 4.2)
   │   └── Check logs for crash loops
   │
   ├── Memory pressure?
   │   └── docker stats --no-stream → identify the consumer
   │       └── docker compose restart <heavy-service>
   │
   ├── Disk full?
   │   └── docker system df → docker system prune
   │
   ├── n8n webhooks returning 500?
   │   └── Check n8n Executions panel for error details
   │
   └── Dify returning 502?
       └── docker compose restart iti-dify-api iti-dify-nginx

4.10 Container Recovery Commands

Situation Command
Restart one service docker compose restart
Rebuild and restart one service docker compose up -d --force-recreate
Full stack restart docker compose down && docker compose up -d
Nuclear reset (destroys data) docker compose down -vonly after confirmed backup

Previous: Chapter 3 — The Docker Stack | Next: Chapter 5 — Infrastructure Upgrades

Table of Contents