← Docs
Helix CLI docs
Browse Helix CLI docs

Incident Playbook (Registry + S3 + Governance)

This playbook is for operating Helix as a system-of-record: governed exports, Registry v1, and content-addressed blob storage (FS or S3/MinIO).

Helix is pre-clinical / in-silico software. This playbook is about deterministic artifacts, approvals, and operational controls.

Quick triage (always start here)

  1. Capture posture (makes screenshots self-explanatory):
helix status
  1. Check Teams/Registry health:
curl -fsS "$HELIX_REGISTRY_URL/healthz"
  1. If you have server access, restart and re-check:
# Example (systemd):
sudo systemctl restart helix-teams
curl -fsS "$HELIX_REGISTRY_URL/healthz"

Registry is down (or unreachable)

Symptoms

  • Official exports fail closed in HELIX_GOVERNANCE_MODE=enforce with a reason like registry_unreachable.
  • helix status shows registry_required: yes and registry_url: <...>.

Operator actions

  1. Confirm the URL and token are set on the exporting machine:
helix status
  1. Confirm the registry server is reachable:
curl -v "$HELIX_REGISTRY_URL/healthz"
  1. If the registry is healthy but auth fails, rotate or re-issue a token (see “Token rotation”).

Emergency export (last resort)

Helix defaults to fail-closed when the registry is configured but unreachable. If you must export during an outage:

export HELIX_REGISTRY_ALLOW_UNREGISTERED_EXPORT=1
helix status

Run the single required export, capture the incident ticket id in your change log, then immediately unset the override:

unset HELIX_REGISTRY_ALLOW_UNREGISTERED_EXPORT
helix status

S3 / MinIO is down (blob store outage)

Symptoms

  • Registry publish/fetch errors referencing missing blobs or permission/network errors.
  • helix status shows blob_backend: s3.

Operator actions

  1. Confirm MinIO is reachable (example):
curl -v "$HELIX_S3_ENDPOINT_URL"
  1. Confirm S3 credentials are present in the runtime environment (do not paste secrets into tickets):
helix status
  1. If a single blob is missing, Helix fails closed. Restore it from object storage versioning/backups.

Emergency waiver (export while not Approved)

This creates an append-only, signed waiver receipt inside the bundle governance ledger. It is replayable and audit-friendly.

helix governance waive \
  --bundle out/bundle_dir \
  --transition sha256:... \
  --scope export_unapproved \
  --justification @incident_1234_waiver.txt \
  --role safety_reviewer \
  --identity "jane.doe@org.example" \
  --signing-key keys/safety_reviewer_ed25519.priv \
  --license out/bundle_dir/governance/license.json

Then re-run the export command. In HELIX_GOVERNANCE_MODE=enforce, exports remain blocked if integrity checks fail.

Token rotation / revocation (Teams/Registry)

Issue a new token (offline helper)

helix teams token --db out/teams/teams.db --user alice --role approver --workspace-id <WORKSPACE_ID>

Disable a compromised token (DB-level fallback)

If you cannot use the admin UI/CLI, disable the token by SHA-256 in SQLite:

sqlite3 out/teams/teams.db "UPDATE api_tokens SET disabled=1 WHERE token_sha256='<TOKEN_SHA256_HEX>';"

Restart the server if you have long-lived processes caching auth state.

Audit: who exported what (last 7 days)

Teams records mutating actions in audit_events. To review the last 7 days:

sqlite3 out/teams/teams.db "
  SELECT ts, user_id, action, entity_type, entity_id
  FROM audit_events
  WHERE ts >= datetime('now', '-7 days')
  ORDER BY ts DESC;
"

If you need a machine-readable export for incident response, redirect to a file and attach it to the ticket.