Incident Playbook (Registry + S3 + Governance)
This playbook is for operating Helix as a system-of-record: governed exports, Registry v1, and content-addressed blob storage (FS or S3/MinIO).
Helix is pre-clinical / in-silico software. This playbook is about deterministic artifacts, approvals, and operational controls.
Quick triage (always start here)
- Capture posture (makes screenshots self-explanatory):
helix status
- Check Teams/Registry health:
curl -fsS "$HELIX_REGISTRY_URL/healthz"
- If you have server access, restart and re-check:
# Example (systemd):
sudo systemctl restart helix-teams
curl -fsS "$HELIX_REGISTRY_URL/healthz"
Registry is down (or unreachable)
Symptoms
- Official exports fail closed in
HELIX_GOVERNANCE_MODE=enforcewith a reason likeregistry_unreachable. helix statusshowsregistry_required: yesandregistry_url: <...>.
Operator actions
- Confirm the URL and token are set on the exporting machine:
helix status
- Confirm the registry server is reachable:
curl -v "$HELIX_REGISTRY_URL/healthz"
- If the registry is healthy but auth fails, rotate or re-issue a token (see “Token rotation”).
Emergency export (last resort)
Helix defaults to fail-closed when the registry is configured but unreachable. If you must export during an outage:
export HELIX_REGISTRY_ALLOW_UNREGISTERED_EXPORT=1
helix status
Run the single required export, capture the incident ticket id in your change log, then immediately unset the override:
unset HELIX_REGISTRY_ALLOW_UNREGISTERED_EXPORT
helix status
S3 / MinIO is down (blob store outage)
Symptoms
- Registry publish/fetch errors referencing missing blobs or permission/network errors.
helix statusshowsblob_backend: s3.
Operator actions
- Confirm MinIO is reachable (example):
curl -v "$HELIX_S3_ENDPOINT_URL"
- Confirm S3 credentials are present in the runtime environment (do not paste secrets into tickets):
helix status
- If a single blob is missing, Helix fails closed. Restore it from object storage versioning/backups.
Emergency waiver (export while not Approved)
This creates an append-only, signed waiver receipt inside the bundle governance ledger. It is replayable and audit-friendly.
helix governance waive \
--bundle out/bundle_dir \
--transition sha256:... \
--scope export_unapproved \
--justification @incident_1234_waiver.txt \
--role safety_reviewer \
--identity "jane.doe@org.example" \
--signing-key keys/safety_reviewer_ed25519.priv \
--license out/bundle_dir/governance/license.json
Then re-run the export command. In HELIX_GOVERNANCE_MODE=enforce, exports remain blocked if integrity checks fail.
Token rotation / revocation (Teams/Registry)
Issue a new token (offline helper)
helix teams token --db out/teams/teams.db --user alice --role approver --workspace-id <WORKSPACE_ID>
Disable a compromised token (DB-level fallback)
If you cannot use the admin UI/CLI, disable the token by SHA-256 in SQLite:
sqlite3 out/teams/teams.db "UPDATE api_tokens SET disabled=1 WHERE token_sha256='<TOKEN_SHA256_HEX>';"
Restart the server if you have long-lived processes caching auth state.
Audit: who exported what (last 7 days)
Teams records mutating actions in audit_events. To review the last 7 days:
sqlite3 out/teams/teams.db "
SELECT ts, user_id, action, entity_type, entity_id
FROM audit_events
WHERE ts >= datetime('now', '-7 days')
ORDER BY ts DESC;
"
If you need a machine-readable export for incident response, redirect to a file and attach it to the ticket.