Data Model Ergonomics and Schema Hygiene
Canonical paths and naming rules to prevent drift across artifacts, bundles, and policies.
Canonical JSON pointers (core artifacts)
/schema/kindand/schema/version: schema identity on every JSON artifact./schema_versions(bundle-level): registry of schemas used by files inside the bundle./env_fingerprint/helix_version,/env_fingerprint/git_sha,/env_fingerprint/python/version: build/runtime anchors./case_idand/runs/*/run_id(for repro specs),/backends/<name>/files/*/sha256(for manifests),/policy/path(relative path to policy JSON)./manifest_sha256(bundle manifest),/model_packsentries (name + sha256) when present.
Naming conventions
- Keys: snake_case; enums: lower_snake; metrics: dot-separated (
repro.run.time_ms,viz.render.frames). - Policy/profile ids:
policy.<domain>.<intent>(e.g.,policy.repro.strict). - Selectors: explicit suffixes (
*_id,*_sha256,*_path).
Documentation from schemas
- Keep JSON Schemas in
schemas/; updatedocs/schema-reference.mdviahelix schema manifest --out docs/schema-reference.mdwhen schemas change. - When adding a new schema, include: human doc stub under
docs/, sample payload underdocs/assets/ortests/data/, and a pointer inmkdocs.ymlif user-facing.
Strict validation
- Prefer
additionalProperties: falsefor core objects; whitelist extension points explicitly. - Lint new/changed artifacts with
python -m jsonschemaand the bundled schemas; add unit tests undertests/for every new schema. - CI should run a schema drift check (e.g.,
tests/test_schema_manifest.py) and fail on unknown fields.
Change control
- Any schema change must note compatibility impact in
docs/policies/compatibility_deprecation.mdand add/adjust conformance coverage. - Unknown fields in core artifacts are rejected unless a specific
extensionsmap is defined for that schema.