Helix validation

Scientific validation (planned)

What we plan to track for lab-facing accuracy checks:

CRISPR outcome distributions vs small public datasets (frameshift %, del/ins mix).
Prime editing predicted efficiencies / outcome profiles vs published assays.
Basic metrics: correlation, KL divergence, frameshift accuracy, top-outcome match.

Status (seed entries)

Dataset	Modality	Metrics	Observed vs Predicted	Notes
BRCA1 exon demo (synthetic)	CRISPR	cut%, frameshift%, small_del vs small_ins	TODO: drop numbers	Synthetic scaffold; replace with public figure once picked
peg RTT=GATTACA (synthetic)	PRIME	intended_prime%, no_cut%, frameshift%	TODO: drop numbers	Synthetic; swap for a published peg when available

Planned CLI (future):

helix-cli validate --dataset path/to/ground_truth.json --backend gpu → prints metrics summary.

Notes:

Helix scoring versions (CRISPR/Prime) are recorded in every run/report; use those when comparing.
Engine backend (GPU/CPU) is captured via EngineInfo for reproducibility.

Validation packs (shipped)

Helix also ships a pack runner that produces a single machine-parsable verdict artifact suitable for CI and archival.

List packs: helix validate list
Run a pack: helix validate pack ampseq_indel_profile_v1 --outdir out/validation --ci
Sign the verdict: helix validate pack ampseq_indel_profile_v1 --outdir out/validation --sign-private-key publisher.ed25519
Verdict schema: schemas/validation/helix_validation_verdict_v1.json

Shipped packs (Public Validation Pack v1):

Pack	Demonstrates	Mode	Typical runtime	Key outputs (besides `verdict.json`)
`ampseq_indel_profile_v1`	Paired-end FASTQ ➜ deterministic editing outcomes	Exact-golden	Seconds	`artifacts/alleles.tsv`, `artifacts/top_alleles.tsv`, `artifacts/indel_hist.json`, `artifacts/summary.json`
`crisprresso_import_v1`	CRISPResso folder ➜ normalized artifacts	Exact-golden	Seconds	`artifacts/editing_summary.tsv`, `artifacts/alleles.tsv`, `artifacts/metadata.json`
`bam_qc_coverage_v1`	BAM ➜ QC counts + coverage bins + per-region coverage	Exact-golden	Seconds	`artifacts/summary.json`, `artifacts/coverage_hist.json`, `artifacts/region_coverage.tsv`
`vcf_qc_annot_v1`	VCF ➜ QC + lightweight annotation join	Exact-golden	Seconds	`artifacts/qc_summary.json`, `artifacts/annotated_variants.tsv`
`signed_plugin_trust_chain_v1`	Signed plugin install/load + tamper control	Exact-golden	Seconds	`artifacts/trust_chain.json`

Killer snippet (trust posture in 30 seconds):

helix validate pack signed_plugin_trust_chain_v1 --ci --explain

Negative controls (intentional mismatch that still returns OK + crisp --explain):

helix validate pack ampseq_indel_profile_v1 --case negative --ci --explain

Verify a signed verdict:

helix plugins keygen --out-dir out/keys --name publisher
helix validate pack ampseq_indel_profile_v1 --outdir out/validation --sign-private-key out/keys/publisher.ed25519
helix verify verdict out/validation/ampseq_indel_profile_v1/verdict.json --pubkey out/keys/publisher.pub

Verify a reference validation bundle (directory or .tar.gz):

helix verify bundle reference_validation_bundle/
helix verify bundle helix-reference-validation-bundle-v1.2.3-linux.tar.gz

Reference validation bundle contents:

publisher.pub: public key for all verdict signatures in the bundle
bundle.index.json: inventory of verdict files + sha256 + expected status
verdicts/**/verdict.json: signed verdict artifacts (one per pack + case)
verdicts/**/verdict.summary.txt: human-readable summaries

Machine summary output (useful for CI / Teams acceptance gates):

helix verify bundle helix-reference-validation-bundle-v1.2.3-linux.tar.gz --json-out verify_bundle.json

Pack fixtures are bundled under src/helix/datasets/validation_packs/ and each pack includes a manifest.json with per-file SHA-256.

UI layout snapshots

Studio layout regressions are tracked via tools/snap_layout.py, which writes artifacts/layout/*.png at common resolutions. CI workflow .github/workflows/ui-layout.yml runs this on PRs that touch Studio UI; reviewers can open the uploaded PNGs to sanity-check proportions and docking.