WorkflowPatch Start async fit check

n8n crash triage proof

Do not touch the migration first.

A first paid slice for self-hosted n8n instability. Redacted or synthetic incident signals become a triage ledger, migration-readiness gates, backup verification checks, and blocked rows before anyone attempts a MySQL to Postgres cutover. No client data, credentials, live n8n instance, or database connection is used.

8incident signals mapped to triage rows
6migration gates before cutover work
5backup and restore checks
3blocked rows that stop unsafe action

Input

Eight redacted or synthetic incident signals from Docker logs, n8n env, MySQL status, retention settings, workflow history, and backup notes.

Output

Eight triage rows, six migration gates, and five backup checks that separate safe proof work from production cutover risk.

Failure Path

Unproven restore evidence, credential export, and missing downtime/rollback approval are blocked visibly.

Triage ledger excerpt

Triage Status Diagnosis Next Action Approval
TRI-1001 review Restart storm during webhook burst. Freeze workflow edits and collect container logs before any migration. required
TRI-1002 review Execution table bloat is a likely database-lock source. Measure table sizes and slow queries; define pruning window. required
TRI-1003 ready Queue mode is missing from the current architecture. Prepare main/webhook/worker split plan with Redis/Postgres dependencies. required
TRI-1004 blocked Restore path is unproven. Run a restore rehearsal on a disposable target before touching production. required
TRI-1005 blocked Credential export is not needed for first proof. Keep credentials in owner-controlled account and document access boundaries. required
TRI-1006 review Workflow-change loss already happened. Add workflow JSON export and commit-style snapshot before cutover. required
TRI-1007 review Slow execution inserts support the MySQL bottleneck hypothesis. Verify with slow-query samples and execution retention counts. required
TRI-1008 ready Execution retention is too long for high-volume workflows. Propose save-on-error and max-age pruning after backup proof. required