← Docs
Helix CLI docs
Browse Helix CLI docs

CRISPR CTMC Tau-Leap (GPU-friendly)

Pure Gillespie (exact CTMC event-by-event) is accurate but divergence-heavy on GPUs: every cell takes variable time steps and branches differently. Helix’s tau-leap CTMC path uses fixed-Δt hazard stepping so the core update is SIMD-friendly and maps cleanly to CUDA.

What’s implemented

  • CPU reference simulator: src/helix/crispr/ctmc_tau_leap.py
  • Config schema (JSON Schema draft 2020-12):
    • src/helix/schema/simulation/helix_ctmc_tau_leap_v1.json
    • schemas/simulation/helix_ctmc_tau_leap_v1.json
  • CLI hook:
    • helix crispr tau-leap --config <yaml/json> --json <out.json>

Core math (tau-leap hazard stepping)

For hazard rate λ over timestep Δt:

  • p(event in Δt) = 1 - exp(-λΔt)

For competing hazards {λ_i}:

  • Λ = Σ λ_i
  • sample p(any) = 1 - exp(-ΛΔt)
  • if any event happens, pick event i with probability λ_i / Λ

Rule of thumb: keep most ΛΔt ≲ 0.1. If not, clamp with sim.maxLambdaDt or reduce Δt.

Minimal mechanistic model (v1)

Per allele state (diploid, per locus):

  • INTACT → can be cut
  • DSB → can resect and/or repair
  • REPAIRED_MUT → terminal mutated allele (no recut in v1)

Mechanistic gates:

  • resection state 0 → 1 → 2 enables pathway availability
  • competing repair hazards gated by resection:
    • NHEJ (rs==0)
    • alt-EJ (rs≥1)
    • HDR (rs==2 and donor>0)
    • SSA (rs==2 and repeat_context)

Chromatin memory:

  • cut → TRANSIENT
  • repair → REFRACTORY
  • relaxation hazard kRelax returns to OPEN

Outcomes:

  • NHEJ: insertion vs deletion mixture + geometric length sampling
  • alt-EJ: microhomology length + larger deletions
  • HDR: precise with pPrecise else fallback to NHEJ
  • SSA: fixed-size deletion (v1)

CUDA-shaped architecture (planned)

Use Structure-of-Arrays and a 1D flattening for (cell,locus,allele):

idx = (cell * L + locus) * 2 + allele

Kernel decomposition (matches the CPU reference flow):

  1. cell cycle update (optional)
  2. cut intact alleles
  3. DSB progress + competing repair + outcome sampling
  4. chromatin relaxation
  5. phenotype feedback (optional; currently biallelic KO → repair multipliers)

RNG (debuggable + reproducible)

Use counter-based RNG so results are deterministic independent of block sizes:

  • key: (global_seed, replicate_id)
  • counter tuple: (cell_id, locus_id, allele_id, step_id, subdraw_id)

CPU reference uses splitmix64-based hashing to generate uniforms; CUDA should use Philox with the same counter tuple structure.

Validation strategies

  • Unit invariants:
    • kCut=0 → all no_cut
    • deterministic replay with fixed seed
    • KO feedback triggers when outcomes force biallelic frameshift
  • Statistical sanity:
    • sweep Δt and confirm convergence of allele spectrum as Δt → 0
    • check pathway fractions respond monotonically to hazard multipliers
  • GPU parity (future):
    • match CPU reference bit-for-bit for the RNG stream + event selection, then validate distributions.