Cross-Version Semantic Drift Checks

Goal: catch silent meaning changes across releases by running the same bundle through two Helix versions and comparing outputs.

Quick start (two installed versions)

# previous release in /tmp/helix-prev, current repo in /home/chris/helix
/tmp/helix-prev/bin/helix run repro/helix_repro_bundle_v1/inputs/case01.spec.json --out /tmp/out-prev --backends cpu-reference
helix run repro/helix_repro_bundle_v1/inputs/case01.spec.json --out /tmp/out-curr --backends cpu-reference
python tools/semantic_drift.py --prev /tmp/out-prev/cpu-reference/run.json --curr /tmp/out-curr/cpu-reference/run.json

The script normalizes known nondeterministic fields and reports PASS/FAIL with a focused diff.

CI hook (optional)

Keep the previous release wheel cached (e.g., pip install helix-governance==<last> into .cache/helix-prev).
Run tools/semantic_drift.py --prev-cmd .cache/helix-prev/bin/helix --curr-cmd ./venv/bin/helix in CI.
Fail the job if drift is detected; update expected tolerances or document the intentional change in release notes.

What is compared

Canonical repro spec: repro/helix_repro_bundle_v1/inputs/case01.spec.json (extendable).
Primary comparison: run.json payloads for D0; for D1 backends, compare within stored tolerances when available.
Normalized out: env_fingerprint, helix_version, git_sha, and timestamps.

When drift is acceptable

Only when intentional and documented: include a drift_reason in release notes and update conformance packs/fixtures.

Extending coverage

Add more specs to repro/ and reference them in tools/semantic_drift.py.
Store prior-version expected outputs in artifacts/semantic_drift/<tag>/... if you need offline comparisons.