CRISPR CTMC Tau-Leap (GPU-friendly)

Pure Gillespie (exact CTMC event-by-event) is accurate but divergence-heavy on GPUs: every cell takes variable time steps and branches differently. Helix’s tau-leap CTMC path uses fixed-Δt hazard stepping so the core update is SIMD-friendly and maps cleanly to CUDA.

What’s implemented

CPU reference simulator: src/helix/crispr/ctmc_tau_leap.py
Config schema (JSON Schema draft 2020-12):
- src/helix/schema/simulation/helix_ctmc_tau_leap_v1.json
- schemas/simulation/helix_ctmc_tau_leap_v1.json
CLI hook:
- helix crispr tau-leap --config <yaml/json> --json <out.json>

Core math (tau-leap hazard stepping)

For hazard rate λ over timestep Δt:

p(event in Δt) = 1 - exp(-λΔt)

For competing hazards {λ_i}:

Λ = Σ λ_i
sample p(any) = 1 - exp(-ΛΔt)
if any event happens, pick event i with probability λ_i / Λ

Rule of thumb: keep most ΛΔt ≲ 0.1. If not, clamp with sim.maxLambdaDt or reduce Δt.

Minimal mechanistic model (v1)

Per allele state (diploid, per locus):

INTACT → can be cut
DSB → can resect and/or repair
REPAIRED_MUT → terminal mutated allele (no recut in v1)

Mechanistic gates:

resection state 0 → 1 → 2 enables pathway availability
competing repair hazards gated by resection:
- NHEJ (rs==0)
- alt-EJ (rs≥1)
- HDR (rs==2 and donor>0)
- SSA (rs==2 and repeat_context)

Chromatin memory:

cut → TRANSIENT
repair → REFRACTORY
relaxation hazard kRelax returns to OPEN

Outcomes:

NHEJ: insertion vs deletion mixture + geometric length sampling
alt-EJ: microhomology length + larger deletions
HDR: precise with pPrecise else fallback to NHEJ
SSA: fixed-size deletion (v1)

CUDA-shaped architecture (planned)

Use Structure-of-Arrays and a 1D flattening for (cell,locus,allele):

idx = (cell * L + locus) * 2 + allele

Kernel decomposition (matches the CPU reference flow):

cell cycle update (optional)
cut intact alleles
DSB progress + competing repair + outcome sampling
chromatin relaxation
phenotype feedback (optional; currently biallelic KO → repair multipliers)

RNG (debuggable + reproducible)

Use counter-based RNG so results are deterministic independent of block sizes:

key: (global_seed, replicate_id)
counter tuple: (cell_id, locus_id, allele_id, step_id, subdraw_id)

CPU reference uses splitmix64-based hashing to generate uniforms; CUDA should use Philox with the same counter tuple structure.

Validation strategies

Unit invariants:
- kCut=0 → all no_cut
- deterministic replay with fixed seed
- KO feedback triggers when outcomes force biallelic frameshift
Statistical sanity:
- sweep Δt and confirm convergence of allele spectrum as Δt → 0
- check pathway fractions respond monotonically to hazard multipliers
GPU parity (future):
- match CPU reference bit-for-bit for the RNG stream + event selection, then validate distributions.