n8n crash triage proof
Do not touch the migration first.
A first paid slice for self-hosted n8n instability. Redacted or synthetic incident signals become a triage ledger, migration-readiness gates, backup verification checks, and blocked rows before anyone attempts a MySQL to Postgres cutover. No client data, credentials, live n8n instance, or database connection is used.
8incident signals mapped to triage rows
6migration gates before cutover work
5backup and restore checks
3blocked rows that stop unsafe action
Triage Ledger CSV
Migration Readiness CSV
Backup Verification CSV
Error Log CSV
Source Signals CSV
Runbook
Source Script
Input
Eight redacted or synthetic incident signals from Docker logs, n8n env, MySQL status, retention settings, workflow history, and backup notes.
Output
Eight triage rows, six migration gates, and five backup checks that separate safe proof work from production cutover risk.
Failure Path
Unproven restore evidence, credential export, and missing downtime/rollback approval are blocked visibly.
Triage ledger excerpt
| Triage | Status | Diagnosis | Next Action | Approval |
|---|---|---|---|---|
| TRI-1001 | review | Restart storm during webhook burst. | Freeze workflow edits and collect container logs before any migration. | required |
| TRI-1002 | review | Execution table bloat is a likely database-lock source. | Measure table sizes and slow queries; define pruning window. | required |
| TRI-1003 | ready | Queue mode is missing from the current architecture. | Prepare main/webhook/worker split plan with Redis/Postgres dependencies. | required |
| TRI-1004 | blocked | Restore path is unproven. | Run a restore rehearsal on a disposable target before touching production. | required |
| TRI-1005 | blocked | Credential export is not needed for first proof. | Keep credentials in owner-controlled account and document access boundaries. | required |
| TRI-1006 | review | Workflow-change loss already happened. | Add workflow JSON export and commit-style snapshot before cutover. | required |
| TRI-1007 | review | Slow execution inserts support the MySQL bottleneck hypothesis. | Verify with slow-query samples and execution retention counts. | required |
| TRI-1008 | ready | Execution retention is too long for high-volume workflows. | Propose save-on-error and max-age pruning after backup proof. | required |