← Docs
Helix CLI docs
Browse Helix CLI docs

Legacy README (full manual). For the short GitHub front door, see README.md on GitHub: https://github.com/omniscoder/Helix/blob/master/README

Helix Studio — The Genome IDE (Helix kernel + CLI)

PyPI Reproducible Viz (spec v1.0) Benchmarks Verified by VeriBiota VeriBiota CI Scientific Contract CI

Helix Studio is the Visual Studio of bioinformatics: a snapshot-driven IDE that helps labs run tight CRISPR/Prime design loops — design → simulate → compare → report — with deterministic replays and exportable, provenance-stamped artifacts. PCR and Lightcone GPU viz now ship alongside CRISPR/Prime.

Under the hood is the Helix kernel + headless CLI: the same simulation + reporting contract, runnable in CI/pipelines and exportable into HTML/PNG/JSON evidence bundles.

👉 Docs & Playground: https://omniscoder.github.io/Helix/ 👉 Strategy Playbook: docs/helix_studio_strategy 👉 Snapshot Spec v1: docs/snapshot_spec_v1 👉 Headless CLI: docs/cli_headless 👉 UI/Theming Instructions: docs/instructions 👉 Studio/CLI reports: see quick commands below 👉 Changelog: https://github.com/omniscoder/Helix/blob/master/CHANGELOG 👉 Scientific Contract: docs/scientific_contract_v1

Tests: use tools/test.sh (Linux/macOS) or tools/test.ps1 (Windows). These are the only supported test entrypoints. GPU/Lightcone lanes are opt-in via env flags (see runbooks/lightcone_runbook.md).

Why Helix Is Auditable (TL;DR)

  • Audit in minutes: follow the Audit Walkthrough (hash check → helix verify --kind auto → read taint/trust).
  • Contracts baked in: scientific contract + D2 fixtures live in repro/scientific_contract_v1/ (see docs/scientific_contract_v1).
  • Conformance on demand: ./tools/conformance.sh runs the D0/D1 packs; add to any CI lane.
  • Longevity promise: bundles remain verifiable for 5 years; see docs/policies/longevity.
  • Plugin governance: signed/tainted plugin rules in docs/policies/plugin_governance.
  • D2 replay guard: distribution-check fixture and replay requirements shipped with the contract bundle (bundle_d2/verify.sh).
  • One-command trust check: helix trust check --backend cpu-reference (uses repro spec) → PASS banner or a focused diff + policy hash.

Model Limitations (CRISPR Edit DAG)

  • DSBs are modeled as instantaneous transitions; no continuous-time kinetics or latent repairable states.
  • Probability mass is conserved within modeled outcomes; cell death, arrest, and large-scale genome instability are out-of-scope.
  • “No Edit” collapses multiple kinetic failure modes into a single terminal.
  • “PAM softness” is an abstract parameter, not a binding-energy or kinetic-rate model.
  • Dominant-path visualization is a convenience; tail-risk outcomes are not visualized.

Full limitations and invalid regimes: docs/model_limitations/crispr_edit_dag_limitations.

First Run (copy/paste, ~2 minutes)

Goal: install → open a demo session → export a report.

macOS/Linux:

python -m venv .venv
source .venv/bin/activate
pip install "helix-governance"

curl -L -o demo_session.helix https://raw.githubusercontent.com/omniscoder/Helix/v1.0.2/docs/demo/demo_session.helix
helix-cli report --session demo_session.helix --outdir reports_demo/
python -m http.server --directory reports_demo 8000

Windows (PowerShell):

python -m venv .venv
.venv\\Scripts\\Activate.ps1
pip install "helix-governance"

Invoke-WebRequest -Uri https://raw.githubusercontent.com/omniscoder/Helix/v1.0.2/docs/demo/demo_session.helix -OutFile demo_session.helix
helix-cli report --session demo_session.helix --outdir reports_demo\\
python -m http.server --directory reports_demo 8000

If you want the GUI: pip install "helix-governance[studio]" then launch helix-studio and click Open CRISPR demo. Lightcone panel ships behind the same extra; GPU path requires NVIDIA OpenGL (see runbooks/lightcone_runbook.md).

Latest release: v1.0.2 – “Partner eval + intake pipeline”

  • One-command partner evaluation: helix partner run --outdir out --with-support-bundle --json-out out/partner_run.json
  • Intake + ledger automation: tools/partner_intake.py --partner-run … --json-out … --strict plus tools/partner_case.py and tools/partner_followup.py
  • Repro bundle v1 + backend parity: helix run --out repro_out --backends all and helix verify --kind repro repro_out
  • Lightcone GPU + audit packs: Studio Lightcone panel exports audit zips; verify with tools/lightcone_verify_audit_pack.py
  • Design Partner pack ships in design_partner/ and release assets

Why Helix?

  • End-to-end edit DAGs – CRISPR, Prime, and PCR simulations all produce explicit edit DAGs with per-node genome materialization, log-probabilities, and provenance. The same artifacts drive notebooks, PNGs, and interactive web/desktop viz.
  • Blueprint-level provenance – every DAG, PNG, and .viz.json is tagged with schema kind, spec version, SHA-256, and runtime metadata. You can always answer “where did this figure come from?”.
  • Realtime + batch in one toolkit – run tiny “bench” genomes in a notebook, or stream full edit DAGs into a GUI / Playground via JSONL frames for live inspection.
  • Formal verification hooks – optional VeriBiota integration re-encodes Helix DAGs in Lean and proves structural + probabilistic invariants. If the badge is green, the math checked out.
  • Teaching-friendly, research-grade – small, inspectable examples and clear CLI affordances, but under the hood you get FM-indexes, De Bruijn graphs, MinHash/HLL, prime editing physics, and CRISPR scoring engines.

Why Helix Is Auditable

  • Explicit, enforced contract — policy + semantics are schemaed, hash-pinned, and embedded in every artifact header; determinism class is required.
  • Determinism with receipts — D0/D1 comparators plus D2 distribution checks (topKMass, entropy, noEditRate, scoreQuantiles, probMassSum) with tolerances and mass conservation.
  • Replay you can trust — optional deterministic replay via replaySampler: module:callable, import-hardened; logs show replay_used, sampler_available, seedDomain, and sampler id; provenance records the replay op.
  • Trust/taint enforcement — plugin- or viz-derived data cannot silently upgrade trust; verifier fails if headers or trust labels don’t align.
  • CI proofrepro/scientific_contract_v1/bundle_d2/verify.sh runs in CI (optional pass + strict fail) to guard against regressions.

Pinned policy profiles (ready to copy)

  • policies/dev-fast.json — day-to-day D1 determinism, strict float rules, no replay requirement, plugin trust isolation via headers/taint.
  • policies/audit-strict.json — D2 with distribution checks, replay required, headers required, plugin influence rejected, viz transforms enforced.
  • Studio shows “Policy locked: <name>” when HELIX_POLICY_PATH is set; CI defaults to policies/audit-strict.json.

Verified by VeriBiota™

Helix’s CRISPR, Prime, and PCR DAGs carry the Verified by VeriBiota badge when every generated artifact satisfies:

  • Structural correctness: unique root, acyclic topology, monotonic depth, and rule-consistent transitions.
  • Semantic correctness: every edit event matches the formal CRISPR/Prime/PCR rewrite semantics tracked in Lean.
  • Probability sanity: outgoing probabilities and terminal probabilities sum to ≈1 with no negative or undefined weights.
  • Reproducibility: deterministic replays under recorded seeds, provenance-aligned frame streams, and matching sequence hashes.

The contract is documented in docs/veribiota and enforced by the helix veribiota lean-check, preflight, and export-dags commands + the veribiota.yml GitHub Action badge above. If you see the badge, the math has run.

What Helix Is

  • Think of Helix as:
  • A computational wet lab you can run in a notebook.
  • A CRISPR/Prime/PCR digital twin simulator using deterministic rules and DAGs.
  • A teaching toolkit for genomics algorithms and bioinformatics basics.
  • A bridge into OGN, sharing concepts and API style while staying lightweight and experimental.

Everything Helix emits is a software artifact: JSON, PNG, YAML, GraphML, viz-specs, or CLI summaries. No wet-lab instructions. No reagent guidance.

Highlights

Core genome + sequence tools

  • DNA summaries, k-mer counting, GC skew, motif clustering
  • FM-index search + approximate search via Myers bit-vector
  • Minimizers, syncmers, and seed-extend demos
  • ORF detection, frameshift heuristics, translation, protein metrics

RNA folding + ensembles

  • Zuker-style MFE
  • McCaskill partition function
  • Ensembles, dot-plots, entropy tracks, centroids structures

CRISPR simulation stack

  • Guide discovery with PAM registries
  • Off-target scanning (exact + mismatch-tolerant)
  • Probabilistic CRISPR cut/repair simulation
  • Edit DAGs (helix.crispr.edit_dag.v1.1) with node-level genome snapshots
  • Multi-guide, region-specific, FASTA-native support
  • Real-time JSONL frames for animation + Playground sync
  • Protein-impact annotation via transcript models

Prime editing simulator

  • pegRNA + Prime Editor definitions
  • RTT/RTT-branching logic
  • Outcome probabilities
  • Full Prime Edit DAGs with per-node genome materialization
  • Multi-peg batch workflows for large design sets
  • PCR Amplicon DAGs

PCR amplicon DAGs

  • Primer binding + per-cycle branching
  • Amplicon growth simulation
  • Visualization + JSON artifact export

Graphs & sketching

  • Build/clean/color De Bruijin graphs
  • MinHash/HLL Sketching for fast genome distance

Reproducible visualization

  • Viz-spec system (structured spec, SHA-verified)
  • Every PNG ships a provenance sidebar + sibling .viz.json
  • CLI + notebook-friendly

Workflows & experiments

  • YAML-driven experiments: CRISPR, Prime, PCR
  • Generates DAGs, PNGs, reports, provenance manifests
  • Reproducible from a single .helix.yml

GUI + realtime viz

  • Optional PySide6 desktop shell
  • Cytoscape/D3 visualization of live edit DAG streams
  • Offline-capable; accepts CLI artifacts and real-time frames

Studio / CLI report exports

  • Save a multi-run session from Studio as .helix, then:
    • All runs: helix-cli report --session my_session.helix --outdir reports/
    • Single run: helix-cli report --session my_session.helix --run-id demo_crispr_g2 --outdir reports/
  • Each HTML report includes guide info, physics + backend (GPU/CPU), outcomes JSON + CSV, and run metadata (project/sample/author/run_id).

Headless simulation + reports

  • Simulate from a JSON config into a session file: helix-cli simulate guides.json batch.helix
  • Append more runs: helix-cli simulate more_guides.json batch.helix --append
  • Then export reports (all or filtered by run_id) with helix-cli report ... as above.
  • Config schema documented in docs/cli_config_schema.md (CRISPR + PRIME via prime.pbs_sequence / prime.rt_sequence fields).
  • Quickstart demo: open docs/demo/demo_session.helix in Studio or run helix-cli report --session docs/demo/demo_session.helix --outdir reports_demo/.

Benchmarks & validation

  • Quick throughput check: helix-cli bench --backend gpu (prints CRISPR MPairs/s and PRIME preds/s)
  • Validation scaffold: see docs/validation/README.md (datasets/metrics to be filled as we publish comparisons).

Visual regression (EVS → PNG)

  • Install dev deps (in a venv): python -m pip install -r requirements-dev.txt
  • Run visual tests headless: HELIX_RUN_EVS_VISUAL=1 QT_QPA_PLATFORM=offscreen tools/test.sh -q tests/evs_visual (Windows: powershell -ExecutionPolicy Bypass -File tools/test.ps1 -q tests/evs_visual)
  • Quick smoke: scripts/run_evs_visual_smoke.sh
  • Add/refresh goldens: see docs/evs_visual.md / docs/evs_visual_tests.md

Screenshots (docs/img)

  • helix_studio_start_1920x1080.png — Start panel with CRISPR/Prime demos
  • helix_studio_evs_overlay_1920x1080.png — Outcome Explorer overlay (CRISPR baseline vs Prime candidate)
  • helix_studio_prime_inspector_1920x1080.png — Prime demo with RTT/PBS inspector rail

Helix Studio start

Outcome Explorer overlay

Prime inspector

Quick Start

Crispr Edit DAG

helix crispr dag \
  --genome genome.fna \
  --guide-sequence GGGGTTTAGAGCTATGCT \
  --json crispr_dag.json

Prime Editing Simulator

helix prime dag \
  --genome genome.fna \
  --peg-config peg.json \
  --editor-config pe3.json \
  --json prime_dag.json

PCR Amplicon DAG

helix pcr dag \
  --genome plasmid.fna \
  --primer-config primers.json \
  --pcr-config pcr_settings.json \
  --out pcr_dag.json

Rendered edit DAGs (examples)

Helix ships a couple of reference DAG renders so you can see the provenance‑stamped figures users will get from the CLI.

Regenerate them yourself with the bundled example payloads:

python -m helix.cli edit-dag viz --input examples/crispr_edit_dag.json --out docs/img/crispr_dag.png
python -m helix.cli edit-dag viz --input examples/prime_edit_dag.json  --out docs/img/dag.png

CRISPR edit DAG

Prime edit DAG

RNA Folding

helix rna mfe --fasta hairpin.fna --dotbracket mfe.dbn
helix rna ensemble --fasta hairpin.fna --gamma 1.0 --dotplot plot.png

Sequence Tools

helix dna --input seq.fna --window 400 --k 5 --plot-skew
helix seed index seq.fna --method minimizer --k 15 --plot seeds.png
helix string search seqs.fna --pattern GATTACA --k 1

Engine Backends

CRISPR and Prime editing share the same encoded physics boundary, so every simulator and CLI call can swap kernels without touching call sites. Pick one of the following backends per run (details + throughput numbers live in docs/engine_architecture.md):

  • cpu-reference – pure Python reference implementation, always available.
  • native-cpu – pybind11 extension from src/helix_engine, fastest on CPUs.
  • gpu – CUDA build of the same kernels for machines compiled with HELIX_ENGINE_ENABLE_CUDA=ON.

Prime editing mirrors the same backend preference automatically so both engines emit consistent artifacts.

Selecting a Backend

Set the backend globally with environment variables:

  • HELIX_CRISPR_BACKEND / HELIX_PRIME_BACKENDcpu-reference, native-cpu, or gpu.
  • HELIX_CRISPR_ALLOW_FALLBACK / HELIX_PRIME_ALLOW_FALLBACK1 enables automatic fallback to the Python reference backend when the requested engine is unavailable; 0 keeps failures loud.

CLI helpers accept the same knobs:

helix crispr dag \
  --engine-backend gpu \
  --engine-allow-fallback \
  --genome genome.fna \
  --guide-sequence GGGGTTTAGAGCTATGCT \
  --json crispr_dag.json

Studio inherits the process environment, so the same settings power live viz. Use helix engine info (or python -m helix.cli engine info) to see the resolved backend + fallback behavior that will be stamped into artifacts:

$ helix engine info
{
  "selected_backend": "native-cpu",
  "allow_fallback": false,
  "native_available": true,
  "native_backend": "available",
  "env_backend": null,
  "env_allow_fallback": null
}

Native backend availability (Path B)

  • A compiled helix_engine._native wheel ships for CPython 3.12. If it is present, helix engine info shows native_backend: available; otherwise it reports missing.
  • Developers can force tests to require the extension by running with HELIX_REQUIRE_NATIVE=1; the suite will fail fast if the module is absent.
  • CI has a dedicated tests-native-backend job that builds/loads the native backend and runs the native-focused tests to prevent silent bitrot.
  • Studio shows a status-bar hint (native backend: available/missing) so you can tell at a glance which path you’re on.

Performance Benchmark

helix engine benchmark reproduces the CRISPR/Prime workloads from docs/engine_architecture.md. It prints a human-readable table and can emit JSON when you want to compare against CI/doc baselines.

helix engine benchmark
helix engine benchmark --backends cpu-reference native-cpu --json helix_benchmark.json

Sample output:

CRISPR throughput (MPairs/s)
requested     actual           G       N     L    MPairs/s
cpu-reference cpu-reference     1     512    20        0.06
native-cpu    native-cpu        1     512    20        0.90

Prime throughput (predictions/sec)
backend      targets    genome  spacer   preds/sec
cpu-reference      32     2000      20     5634.48

JSON schema (v1) excerpt:

{
  "helix_version": "1.0.2",
  "scoring_versions": {"crispr": "1.0.0", "prime": "1.0.0"},
  "env": {"platform": "...", "python_version": "3.12.3", "cuda_available": false},
  "seed": 1,
  "config": {"backends": ["cpu-reference"], "crispr_shapes": ["1x512x20"]},
  "benchmarks": {
    "crispr": [{"backend_requested": "cpu-reference", "shape": "1x512x20", "mpairs_per_s": 0.06}],
    "prime": [{"workload": "32x2000x20", "predictions_per_s": 5634.48}]
  }
}

The full schema is documented in docs/engine_architecture.md.

GPU Backend (Experimental)

Helix ships CPU-only wheels on PyPI. To try the CUDA backend locally:

  1. Install CUDA toolkit + drivers (12.x tested).
  2. Build the native extension:
    ./scripts/build_native_cuda.sh
    
  3. Verify it works:
    HELIX_CRISPR_BACKEND=gpu helix engine benchmark --backends gpu --json gpu_benchmark.json
    

If _native*.so is missing, the CLI automatically falls back to the Python reference backend and records the actual backend that ran. Lightcone GPU rendering also requires NVIDIA OpenGL; see runbooks/lightcone_runbook.md for the pinned-runner expectations.

Prime Physics Scoring

Prime simulations can surface physics heuristics (PBS ΔG, microhomology, flap competition) by enabling --physics-score:

helix prime simulate --physics-score --json prime.json \
  --genome genome.fna \
  --peg-config peg.json \
  --editor-config editor.json

Each payload now carries a physics_score block:

"physics_score": {
  "pbs_dG": 2.50,
  "flap_ddG": 2.30,
  "microhomology": 1,
  "P_RT": 0.182,
  "P_flap": 0.373,
  "E_pred": 0.054
}

See docs/prime_physics.md for a deep dive on each term and how to interpret them when ranking peg designs.

Building with CUDA (Optional)

The CUDA backend ships in src/helix_engine. Build it with CMake and enable the toggle mentioned in docs/engine_cuda_plan.md:

cmake -S src/helix_engine -B build/helix_engine -DHELIX_ENGINE_ENABLE_CUDA=ON \
      -DPython_EXECUTABLE="$(which python)"
cmake --build build/helix_engine --target helix_engine_py
cp build/helix_engine/helix_engine_py.*.so \
   src/helix_engine/_native$(python3-config --extension-suffix)

Once _native*.so is on PYTHONPATH, HELIX_CRISPR_BACKEND=gpu helix ... will exercise the CUDA kernels; helix_engine.native.cuda_available() reports the hardware status.

Scoring Versioning & Metadata

Helix freezes its physics numerics via CRISPR_SCORING_VERSION and PRIME_SCORING_VERSION. Every artifact stamped by the simulators or CLI exports carries:

  • crispr_engine_backend / prime_engine_backend
  • crispr_scoring_version / prime_scoring_version
  • runtime fields such as helix_version, timestamps, command replay

Example snippet:

"meta": {
  "helix_version": "1.0.2",
  "crispr_engine_backend": "gpu",
  "crispr_scoring_version": "1.0.0",
  "prime_engine_backend": "cpu-reference",
  "prime_scoring_version": "1.0.0"
}

If you deliberately change scoring math, bump the scoring version, refresh the golden fixtures, and update the release notes.

Reproducible Viz (Spec 1.x)

Every plot:

  • stamps a provenance footer (Helix version, SHA-256, viz kind)
  • emits a .viz.json viz-spec
  • stores full command replay in <image>.provenance.json Use:
helix viz --schema
helix schema manifest
helix schema diff old.json new.json

Workflows

Turn a single YAML file into a full simulation:

helix experiment new --type crispr --out demo.helix.yml
helix experiment run --config demo.helix.yml --out dag.json
helix experiment viz --config demo.helix.yml --out dag.png

Everything—FASTA, guides, Cas config, seed, viz parameters—is captured and reproducible.

Repo Layout

src/helix/
    crispr/         # CRISPR models, guides, simulators, edit DAGs
    prime/          # Prime editing engine + DAGs
    pcr/            # PCR amplicon simulator
    rna/            # MFE + ensemble folding
    bioinformatics/ # DNA utilities, k-mers, motif clustering
    seed/           # minimizers/syncmers + mapping demo
    string/         # FM-index + Myers ED
    graphs/         # DBG tooling
    viz/            # spec-driven visualization system
    workflows/      # YAML runner
    gui/            # optional PySide6 desktop app
benchmarks/
docs/
examples/
tests/

Getting Started

Requirements

  • Python 3.10+ (3.12 tested)
  • pip or another package manager
  • Optional extras: matplotlib for plotting, biopython for protein helpers, pyyaml for workflow configs (already included in base deps).

Installation

Stable release from PyPI (installs CLI + package):

python -m venv .venv
source .venv/bin/activate
pip install "helix-governance[viz,protein,schema]"

Need only the core library? Drop the extras (viz/matplotlib, protein/Biopython, schema/pydantic). For local development, clone the repo and run:

pip install -e ".[dev]"

This exposes the helix console command and the helix Python package (from helix import bioinformatics).

Run a Script

  • K-mer + skew analysis

    helix dna --input path/to/sequence.fna --window 400 --step 50 --k 5 --plot-skew
    

    Change the GC window/step, filter top k-mers, or point at the bundled dataset src/helix/datasets/dna/plasmid_demo.fna. For quick clustering with exports, try python examples/kmer_counter.py --max-diff 1 --csv clusters.csv --plot-top 10.

  • Neural net demo

    python ann.py
    

    Prints training progress and final weights for a tiny XOR-style problem.

  • Translate a sequence

    python examples/translate_sequence.py AUGGCCUUU
    

    Add --no-stop to continue through stop codons or point to a file with --input.

  • Find ORFs

    python examples/find_orfs.py --min-length 90 --include-partial --detect-frameshifts --input your_sequence.fna --orf-fasta peptides.faa --orf-csv orfs.csv --frameshift-csv shifts.csv
    

    Prints coordinates, frames, strands, optional frameshift candidates, and can export FASTA/CSV artifacts.

  • Cyclo-spectrum playground

    python examples/cyclospectrum_demo.py --peptide NQEL --spectrum "0,113,114,128,227,242,242,355,356,370,371,484"
    

    Print linear/cyclic spectra, score against an experiment, or recover candidate peptides with the leaderboard search.

  • RNA folding trace

    python examples/nussinov_trace.py --input hairpin.fasta --min-loop 4
    

    Outputs the dot-bracket structure, base-pair list, and optional file export using the upgraded Nussinov implementation.

  • Protein summary

    helix protein --input src/helix/datasets/protein/demo_protein.faa --window 11 --top 8
    

    Computes molecular weight, charge, hydropathy windows, and more (requires the protein extra / Biopython).

  • Unified Helix CLI

    helix dna --sequence ACGTACGT --k 4
    helix spectrum --peptide NQEL --spectrum "0,113,114,128,227,242,242,355,356,370,371,484"
    helix rna mfe --fasta src/helix/datasets/dna/plasmid_demo.fna --dotbracket mfe.dbn
    helix rna ensemble --fasta src/helix/datasets/dna/plasmid_demo.fna --gamma 1.0 --dotplot dotplot.png --entropy entropy.png
    

    The helix entry point wraps the DNA, spectrum, RNA, protein, triage, viz, and workflow helpers so you can run ad-hoc analyses without hunting for scripts.

  • CRISPR guide + off-target scan

    helix crispr find-guides --fasta target.fna --pam SpCas9-NGG --guide-len 20 --json guides.json
    helix crispr offtargets --fasta genome.fna --guides guides.json --max-mm 3 --json hits.json
    helix crispr score --guides guides.json --hits hits.json --weights weights/cfd-lite.json --json scores.json
    helix crispr simulate --fasta target.fna --guides guides.json --guide-id g1 --draws 1000 --seed 42 --json crispr_sim.json
    helix viz crispr-track --input crispr_sim.json --save crispr_track.png
    

    Produces schema-tagged JSON (crispr.guides, crispr.offtargets, crispr.sim) with optional scoring and cut/repair simulations; CLI viz renders a provenance-stamped PNG. Sequences remain masked unless --emit-sequences is explicitly passed.

  • CRISPR genome simulation

    helix crispr genome-sim --genome genome.fna --guide-sequence GGGGTTTAGAGCTATGCT --cas cas9 --json crispr_cut_events.json
    

    Loads the genome into a DigitalGenome, instantiates a preset (or JSON-defined) CasSystem, and calls the in-silico cut simulator so you can inspect potential target sites. Outputs include serialized guides, Cas parameters, and any simulated CutEvent entries.

  • CRISPR edit DAG

    helix crispr dag --genome genome.fna --guide-sequence GGGGTTTAGAGCTATGCT --max-depth 1 --json crispr_edit_dag.json
    

    Builds the first-version “digital twin” graph using the new edit runtime. Nodes contain fully materialized genome views, edges capture each clean-cut event, and the JSON artifact (helix.crispr.edit_dag.v1.1) can feed notebooks and future viz surfaces.

  • CRISPR edit DAG (FASTA-native, multi-guide)

    helix crispr dag \
      --genome examples/hg19_chr_demo.fa \
      --cas-config examples/cas9.json \
      --guides-file examples/guides.tsv \
      --region chr7:55000000-55005000 \
      --max-depth 2 \
      --min-prob 1e-4 \
      --max-sites 20 \
      --seed 0 \
      --out-dir out/crispr_dags/
    
    • cas9.json (Cas/physics config)
      {
        "name": "SpCas9",
        "system_type": "cas9",
        "pam_pattern": "NGG",
        "cut_offset": 3,
        "max_mismatches": 3,
        "weight_mismatch_penalty": 1.0,
        "weight_pam_penalty": 2.0
      }
      
    • guides.tsv (guide library)
      name	sequence	                region
      G1	  ACGTACGTACGTACGTACGT	    chr7:55000010-55000040
      G2	  TGCATGCATGCATGCATGCA	    chr7:55003000-55003025
      G3	  CCCCCGGGGGAAAAATTTTT	    .
      

    The CLI slices the FASTA per guide (falling back to --region), fans out over each design, and writes one artifact per guide (e.g., out/crispr_dags/crispr_001_G1.edit_dag.json). Artifacts stay compatible with the viz/report/Playground surfaces.

  • Protein-impact annotations

    helix crispr dag \
      --genome examples/hg19_chr_demo.fa \
      --guide-sequence ACGTACGTACGTACGTACGT \
      --coding-json transcripts/BRCA1_tx.json \
      --coding-transcript BRCA1-201 \
      --out out/crispr_brca1.edit_dag.json
    

    Provide a transcript JSON (same schema used by the GUI loader) to annotate SNV outcomes with protein_impact metadata (silent, missense, nonsense). Works for both helix crispr dag and helix prime dag.

  • Prime editing sandbox

    helix prime simulate --genome genome.fna --peg-config peg.json --editor-config pe3.json --max-outcomes 16 --json prime_edits.json
    

    Wraps the new prime-editing models: pegRNA definitions (inline flags or JSON), prime-editor parameters, and the simulate_prime_edit entrypoint. Like the CRISPR command, this is purely computational—it emits hypothetical outcomes for downstream notebooks and viz.

  • Prime edit DAG

    helix prime dag --genome genome.fna --peg-config peg.json --editor-config pe3.json --json prime_edit_dag.json
    

    Produces the prime-editing DAG artifact (helix.prime.edit_dag.v1.1) so you can inspect RTT-driven branches, log probabilities, and materialized genome snapshots per node.

  • Prime edit DAG (config-driven, multi-peg)

    helix prime dag \
      --genome examples/hg19_chr_demo.fa \
      --editor-config examples/prime_editor.json \
      --pegs-file examples/pegs.tsv \
      --region chr11:5227000-5227500 \
      --max-depth 3 \
      --min-prob 1e-4 \
      --seed 0 \
      --out-dir out/prime_dags/
    
    • prime_editor.json
      {
        "name": "PE2-like",
        "cas": {
          "name": "SpCas9-H840A",
          "type": "cas9",
          "pam_pattern": "NGG",
          "cut_offset": 3,
          "max_mismatches": 2
        },
        "nick_to_rtt_offset": 0,
        "efficiency_scale": 0.6,
        "mismatch_tolerance": 2,
        "indel_bias": 0.1,
        "metadata": {
          "flap_model": "left>right (demo)"
        }
      }
      
    • pegs.tsv
      name	spacer	                     pbs	      rtt	                        region
      peg1	ACGTACGTACGTACGTACGT	     GCTAGCTA	  TCTGACTCTCTCAGGAGTC	     chr11:5227000-5227100
      peg2	TGCATGCATGCATGCATGCA	     AACCGGTT	  AAGGTTCCGGAACTTG	         chr11:5227200-5227300
      

    Each peg row emits a dedicated helix.prime.edit_dag.v1.1 artifact, mirroring the CRISPR workflow. The Playground buttons (?demo=prime) load the same format, so teams can plug their configs directly into visualization/reporting pipelines.

  • Experiment configs (YAML → DAG)

    helix experiment new --type crispr --out experiments/demo_crispr.helix.yml
    helix experiment run --config experiments/demo_crispr.helix.yml --out out/demo_crispr.edit_dag.json
    helix experiment viz --config experiments/demo_crispr.helix.yml --out out/demo_crispr.png
    

    A *.helix.yml file captures everything humans care about—FASTA path, optional region, Cas/Prime config, guide or peg design, and simulation knobs. Helix regenerates DAG JSON, PNGs, and HTML reports from that single spec (helix experiment run/viz/report). Starter templates live in templates/, and helix experiment new bootstraps a fresh config with placeholders ready to fill in.

  • Real-time DAG frames (JSONL)

    helix crispr dag \
      --genome examples/hg19_chr_demo.fa \
      --cas-config examples/cas9.json \
      --guide-sequence ACGTACGTACGTACGTACGT \
      --frames - \
      --out out/crispr_rt.edit_dag.json
    

    Streams helix.edit_dag.frame.v1 JSON lines so you can animate CRISPR edits as they unfold. See Edit DAG Frames for schema details, or try it live in the Realtime Playground.

  • CRISPR DAG micro-verification

    python benchmarks/verify_crispr_micro.py
    tools/test.sh tests/test_crispr_dag_micro.py
    

    A synthetic tests/data/crispr_micro.fna genome (two short chromosomes packed with overlapping NGG PAMs) powers a brute-force verifier that rebuilds the entire probability tree outside of the CRISPR physics engine. The benchmarks/verify_crispr_micro.py script emits a pass/fail summary, while tests/test_crispr_dag_micro.py runs the same cross-check during CI so we know every DAG leaf and probability mass matches the reference enumeration.

  • Lean/VeriBiota bridge

    helix crispr dag --genome genome.fna --guide-sequence ACGT... --json out/eg.edit_dag.json
    helix veribiota export \
      --input out/eg.edit_dag.json \
      --out out/eg.lean \
      --dag-name exampleDag \
      --module-name VeriBiota.Bridge
    

    Converts any Helix helix.*edit_dag.v1.* artifact into a Lean module that defines exampleDag : EditDAG, emits the node/edge lists, and inserts a ready-to-run #eval VeriBiota.check exampleDag plus a theorem stub (disable with --skip-theorem / --skip-eval).

    To consolidate several JSON artifacts into one Lean namespace (faster CI, shared proofs):

    helix veribiota export-dags \
      --inputs out/dag1.json out/dag2.json \
      --module-name Helix.CrisprExamples \
      --list-name exampleDags \
      --out veribiota/generated/Helix/CrisprExamples.lean
    

    The generated module defines def dag1 : EditDAG, dag2, bundles them into def exampleDags : List EditDAG, and inserts an aggregate theorem stub (∀ dag ∈ exampleDags, VeriBiota.check dag) you can turn into a real proof.

  • Lean pipeline glue

    helix veribiota lean-check --input out/dag1.json --out out/dag1.lean-check.json
    helix veribiota preflight --checks out/*.lean-check.json
    helix veribiota export-dags --inputs out/dag*.json --out veribiota/generated/Helix/CrisprExamples.lean
    

    Each DAG JSON gets a companion .lean-check.json (hashes, probabilities, metadata). preflight validates those summaries (and, optionally, re-hashes the source DAGs) so CI fails fast before Lean boots. Once preflight passes, export-dags emits a single Lean module for all DAGs, letting VeriBiota prove shared invariants (well_formed, probability sums, hash consistency) in one place.

    When integrating with the external VeriBiota/VeriBiota repo, point Helix directly at that checkout and let it populate the generated namespace:

    helix veribiota export-suite \
      --inputs out/dag*.json \
      --veribiota-root ../VeriBiota \
      --module-path Biosim/VeriBiota/Helix/MicroSuite.lean \
      --module-name Biosim.VeriBiota.Helix.MicroSuite
    

    This writes the Lean file straight into Biosim/VeriBiota/Helix/MicroSuite.lean, ready for VeriBiota’s Lake build + proof pipeline.

  • Frames → dataset

    helix edit-dag generate-dataset --n 0 --frames-input run.frames.jsonl --out dataset.jsonl
    

    Converts a JSONL frame stream into a dataset record (mix with random generations by combining --frames-input and --n). Each row stays human-readable so you can audit what landed in the corpus:

    {
      "id": 7,
      "mechanism": "crispr",
      "node_count": 11,
      "edge_count": 10,
      "top_outcomes": [
        {"stage": "repaired", "prob": 0.61, "sequence_hash": "9f4b1cbe"},
        {"stage": "error", "prob": 0.31, "sequence_hash": "72acdc01"}
      ],
      "frame_source": "runs/hbb_demo.frames.jsonl",
      "artifact": { "...": "helix.crispr.edit_dag.v1.1 payload" }
    }
    
  • Hero comparison (HBB vs mutant)

    1. Open the Realtime Playground.
    2. Guide A: ACCCAGGAAACCCGGGTTTT, Guide B: TTTACCCAGGAAACCCGGGT, PAM NGG.
    3. Hit “Compare” to watch probability mass shift between intended vs indel branches, then export the experiment .helix.yml for reproducible CLI runs.
  • Desktop GUI (optional PySide6 extra)

    pip install helix-governance[gui]
    helix gui
    

    Ships a PySide6 desktop shell with a QWebEngineView + Cytoscape canvas. The GUI streams the same JSONL frames as the CLI/Playground, so you can iterate on CRISPR or Prime runs locally (even offline) and export specs later.

  • PCR amplicon DAG

    python -m helix.cli pcr dag \
      --genome src/helix/datasets/dna/plasmid_demo.fna \
      --primer-config examples/pcr_primers.json \
      --pcr-config examples/pcr_config.json \
      --out pcr_amplicon_dag.json
    python -m helix.cli edit-dag viz --input pcr_amplicon_dag.json --out pcr_dag.png
    

    Simulates in-silico amplification (binding → cycles → error branches) and emits helix.pcr.amplicon_dag.v1. Drag-drop the JSON into the Playground for an interactive tour or animate it via helix edit-dag animate.

  • Edit DAG visualization

    helix edit-dag viz --input examples/crispr_edit_dag.json --out crispr_dag.png
    helix edit-dag viz --input examples/prime_edit_dag.json --out prime_dag.png
    

    Renders any DAG artifact to a PNG using the built-in networkx/matplotlib helper. The examples/ directory ships ready-to-plot JSON fixtures for CRISPR and Prime editing so you can kick the tires immediately. For an interactive walkthrough (including sequence diffs), open docs/notebooks/edit_dag_visual_demo.ipynb.

  • JSON configs → CLI

    # Convert the JSON genome to FASTA on the fly
    python - <<'PY' > /tmp/demo_genome.fna
    import json
    cfg = json.load(open("examples/crispr_demo_genome.json"))
    for chrom in cfg["chromosomes"]:
        print(f">{chrom['name']}\\n{chrom['sequence']}")
    PY
    GUIDE=$(jq -r '.sequence' examples/crispr_demo_guide.json)
    helix crispr dag --genome /tmp/demo_genome.fna --guide-sequence "$GUIDE" --json /tmp/demo_dag.json
    

    The examples/crispr_demo_genome.json + examples/crispr_demo_guide.json pair provides a self-contained, copy-pastable config for CLI experiments without needing any external FASTA files.

  • Prime config quickstart

    python examples/scripts/make_prime_demo_fasta.py --input examples/prime_demo_genome.json --out /tmp/prime_demo.fna
    helix prime dag --genome /tmp/prime_demo.fna \
      --peg-config examples/prime_demo_configs.json \
      --editor-config examples/prime_demo_configs.json \
      --json /tmp/prime_dag.json
    

    This uses the bundled examples/prime_demo_genome.json plus peg/editor definitions in examples/prime_demo_configs.json to build a full prime-edit DAG without any external data.

  • Workflow runner

    helix workflows --config workflows/plasmid_screen.yaml --output-dir workflow_runs
    

    Chains multiple subcommands from YAML, captures per-step logs, and writes artifacts to structured run directories.

  • Visualization helpers

    helix viz triage --json triage.json --output triage.png
    helix viz hydropathy --input src/helix/datasets/protein/demo_protein.faa --window 11
    

    Render plots directly from CLI artifacts (triage JSON, hydropathy windows). Requires matplotlib; hydropathy also needs Biopython.

  • Python API demo

    python examples/helix_api_demo.py
    

    Showcases the helix_api module for notebook-friendly access to DNA summaries, triage reports, spectra, RNA folding, and (optionally) protein metrics. For full signatures and payload descriptions, see the API reference.

  • Triage report CLI

    python examples/triage_report.py --input your_sequence.fna --output triage.png --clusters-csv clusters.csv --orfs-csv orfs.csv
    

    Generates a composite plot plus optional CSV/FASTA exports for quick daily snapshots.

  • Notebook triage dashboard Open notebooks/triage_dashboard.ipynb to plot GC skew, ORFs, and k-mer hotspots side-by-side for a quick daily scan.

  • Protein sequence peek

    from protein import show_sequence
    show_sequence("1CRN.cif")
    

    Requires the target structure file in the working directory (or adjust the loader).

Browse task-specific quickstarts in examples/README.md. Tiny datasets ship inside the package (see helix.datasets.available()), including dna/human.txt, dna/plasmid_demo.fna, and protein/demo_protein.faa for quick experiments with pandas, sklearn, or hydropathy charts.

Run Tests

tools/test.sh
powershell -ExecutionPolicy Bypass -File tools/test.ps1

The test suite runs via tools/test.sh (or tools/test.ps1 on Windows) to keep environments deterministic.

Backend parity + repro bundle:

HELIX_RUN_LIGHTCONE_GPU=1 tools/test.sh tests/test_repro_backend_parity.py

Doctor (environment sanity)

python tools/doctor.py

Benchmarks

python -m benchmarks.api_benchmarks --repeat 5 --warmup 1 --limit 0 \
  --out bench-results/api.json --summary-md bench-results/api.md

The benchmark harness now emits a schema-stamped payload (bench_result v1.0) that records commit SHA, dataset provenance, BLAS vendor, CPU/threads, locale, RNG seed, and per-case timing/RSS stats. Use --scenario dna_summary to focus on a subset, --limit 10000 to mimic CI’s faster sweep, and --summary-md to capture the Markdown table that CI publishes automatically.

  • CI drift tracking: the benchmarks GitHub Actions job pins OMP_NUM_THREADS/MKL_NUM_THREADS, seeds RNGs, runs the suite (repeat=3 by default, repeat=10 when bench_heavy=true via workflow_dispatch), uploads benchmarks/out/bench-$GITHUB_SHA.{json,md}, and appends the rendered Markdown summary to the workflow summary tab.
  • Regression gate: scripts/bench_check.py .bench/baseline.json benchmarks/out/latest.json --threshold 5 enforces a >+5 % slowdown limit and fails the workflow when hit. Update .bench/baseline.json whenever you intentionally change performance characteristics.
  • Heavier datasets: export HELIX_BENCH_DNA_FASTA / HELIX_BENCH_PROTEIN_FASTA (or pass them as workflow_dispatch inputs) to stress-test larger references. The benchmark JSON records the absolute paths and sizes so dashboards can keep apples-to-apples comparisons. Store private or future references under benchmarks/data/ and toggle them via bench_heavy=true when ready.
  • Dashboard: CI appends every main-branch run to docs/data/bench/history.csv; the published chart lives at docs/benchmarks.

Reproducible Viz & Viz-Spec

  • Every helix viz ... (and CLI modes that call them) accepts --save out.png (PNG/SVG/PDF) and auto-emits a sibling .viz.json unless --save-viz-spec overrides the path.
  • Each plot footer stamps Helix vX.Y • viz-kind • spec=1.x • key params • timestamp • input_sha256 so shared figures always carry their provenance and the SHA-256 of the original JSON payload.
  • The viz-spec JSON captures counts, quantiles, bounds, and the input_sha256 used for hashing; regressions assert against that structured payload instead of brittle pixel hashes.
  • You can feed those viz-specs (plus the original JSON inputs) into docs/notebooks to explain how a figure was produced and which parameters generated it.
  • Explore or inspect schemas with helix viz --schema, diff manifests with helix schema diff --base old.json, export everything via helix schema manifest --out schemas.json, or render ready-to-plot payloads via helix demo viz.
  • Workflows can enforce schemas per step and print provenance tables/JSON with helix workflows ... --with-schema [--as-json].
  • Every saved plot writes <image>.provenance.json next to the PNG, capturing {schema_kind, spec_version, input_sha256, viz_spec_sha256, image_sha256, helix_version, command} for chain-of-custody.
  • Full schemas, screenshots, and sample payloads live under docs/viz and the Schema Reference.

Weekend Project Ideas

  • Plot the GC skew for a bacterial plasmid and compare predicted origins to literature.
  • Extend the ORF scanner to sweep reverse complements and test on viral genomes.
  • Compare frameshift candidates against known gene models to flag likely sequencing errors.
  • Pair the ORF scanner with the GC skew plot to compare predicted origins and coding regions.
  • Use the CSV/plot outputs from examples/kmer_counter.py to highlight SNP hotspots and share charts with the community.
  • Customize notebooks/triage_dashboard.ipynb with your own sequences and publish the visuals for project updates (digital reports only).
  • Hook cyclospectrum.py into a simple leaderboard scorer and visualize the mass differences.
  • Swap the activation function in ann.py, log loss curves, and document what changes.
  • Build a notebook that fetches a PDB entry, prints its sequence via protein.py, and sketches the secondary structure counts.
  • Chain examples/translate_sequence.py with peptide_mass_lookup.py to score translated open reading frames.

Browse ready-to-run snippets in examples/README.md, and share your results in examples/ (add new files freely) or link to gist/notebook URLs in issues so others can remix.

Design Philosophy

  • Approachable first: readable code, inline comments when helpful, datasets that fit in memory.
  • Composable: functions return plain Python data structures so you can plug them into pandas, NumPy, or future OGN pipelines.
  • Biopython-friendly: we stand on Biopython's shoulders; no wheel reinvention when a stable API exists.
  • Prototype-to-production bridge: helper scripts should make it easy to migrate successful ideas into OGN when the time comes.

Roadmap

  • Real-time 2.5D simulation panels (CRISPR + Prime + PCR)
  • Unified DAG viewer with edit-diff timelines
  • Notebook-to-OGN adapters for migrating prototypes
  • More ensemble RNA metrics + stochastic simulators
  • Expanded motif discovery solvers + GPU-ready variants
  • Community-driven examples gallery

Contributing

We welcome ideas, experiments, and docs improvements. To keep things playful:

  • Open issues with context, references, or notebooks that inspired your idea.
  • Tag contributions by complexity (good-first-experiment, deep-dive, etc.).
  • Respect the code of conduct (be kind, give credit, document assumptions).
  • If you plan a larger refactor, start a discussion thread so we can pair-program or offer pointers.

Happy hacking!

  • String search

    helix string search sequences.fna --pattern GATTACA --k 1 --json hits.json
    

    Uses the FM-index for exact matches (k=0) or Myers bit-vector streaming for ≤k edit-distance hits in FASTA/plaintext inputs.

  • Seed + extend demo

    helix seed index src/helix/datasets/dna/plasmid_demo.fna --method minimizer --k 15 --window 10 --plot seeds.png
    helix seed map --ref src/helix/datasets/dna/plasmid_demo.fna --reads src/helix/datasets/dna/plasmid_demo.fna --k 15 --window 10 --band 64 --xdrop 10
    

    Generates deterministic minimizers (or syncmers) and a simple seed-and-extend JSON summary; --plot uses helix.viz.seed for density snapshots.

  • DBG toolbox

    helix dbg build --reads reads1.fna reads2.fna --k 31 --graph dbg.json --graphml dbg.graphml
    helix dbg clean --graph dbg.json --out dbg_clean.json
    helix dbg color --reads sample1.fna sample2.fna --labels case control --k 31 --out colored.json
    

    Builds/cleans JSON + GraphML de Bruijn graphs and produces colored DBG presence tables ready for pseudoalignment experiments.

  • Motif discovery

    helix motif find --fasta promoters.fasta --width 8 --solver steme --iterations 40 --json motif.json --plot pwm.png
    

    Runs EM/STEME/online solvers to infer PWMs/log-likelihoods and renders optional probability heatmaps.

  • Sketching (MinHash/HLL)

    helix sketch build --method minhash --fasta seq.fna --k 21 --size 1000
    helix sketch compare --method hll --fasta-a a.fna --fasta-b b.fna --precision 12
    

    Quickly approximate genome distances via Mash-style MinHash or HLL cardinality/Jaccard estimates.