CRISPR Edit DAG Model Limitations & Invalid Regimes
This document is a technical limitations and risk disclosure for Helix CRISPR Edit DAG modeling and visualization. It defines known invalid regimes and required mitigations that are not implemented today. If your use case depends on any of the unmet requirements below, treat the current model as out-of-scope.
Version: 1.0
Last updated: 2025-12-26
Applies to: helix.crispr.edit_dag.v1.x artifacts; Helix Studio Edit DAG view; CRISPR Sim Builder PAM softness control.
Non-goals / invalid uses
- Predicting kinetic rates, binding free energies, or time-to-repair dynamics.
- Making safety or therapeutic risk claims without external tail-risk analysis.
- Interpreting multiplex or interacting DSB outcomes as independent single-cut events.
Scope
- Applies to
helix.crispr.edit_dag.v1.xartifacts and the Edit DAG view in Helix Studio. - Describes modeling and UI semantics, not biological safety claims.
- Prime editing and PCR DAGs have separate failure modes and are not covered here.
1) Non-atomic modeling of DNA double-strand breaks (DSBs)
Limitation / Risk
The model encodes cut -> repaired as an atomic transition. It does not represent the time-extended, spatially heterogeneous DSB process.
Why this matters DSB dynamics (end resection, protein occupancy, and cell-cycle gating) shape outcome distributions. An instantaneous transition cannot capture these dependencies, so predictions outside toy loci are not biologically grounded.
What the model currently assumes
- A cut is a single event with immediate repair branching.
- No latent states exist between “cut” and “repair.”
- No explicit time or spatial structure is modeled.
Known invalid regimes
- Cell-cycle-dependent pathway switching (e.g., NHEJ vs HDR bias).
- Contexts where end resection length distributions are outcome-determining.
- Chromatin tension, nuclear subdomain effects, or protein crowding constraints.
Design requirement / mitigation
- Replace the atomic transition with a continuous-time stochastic process.
- Introduce latent states for end chemistry, resection length (as a distribution), protein occupancy, and chromatin mobility constraints.
- Allow “cut but not yet repairable” states with explicit time-to-repair dynamics.
2) Probability mass conservation and missing terminal states
Limitation / Risk The DAG conserves probability mass across modeled outcomes. Biological outcomes that remove a cell from the observed distribution are not represented.
Why this matters Ignoring cell death, arrest, and catastrophic genome damage overstates confidence and underestimates risk. A closed-world distribution is not biologically faithful.
What the model currently assumes
- All probability mass is captured by alignable, small-scale edits.
- Outcomes always sum to 1.0 across observed terminals.
Known invalid regimes
- Cytotoxic guides and stress-response activation.
- Large deletions, chromothripsis, or gross rearrangements not observable by short-read assumptions.
- Cases where a significant fraction of cells become nonviable or transcriptionally incompetent.
Design requirement / mitigation
- Explicitly leak probability mass into terminal states such as cell death, senescence, and large-scale genome instability.
- Represent “unobserved/out-of-scope” mass as a first-class output.
- Treat mass conservation as a reporting choice, not a biological truth.
3) “No Edit” as a collapsed kinetic failure class
Limitation / Risk A single “no edit” terminal collapses multiple mechanistically distinct failures (no binding, no cleavage, rapid repair, gRNA degradation).
Why this matters These mechanisms carry different correlations across guides, multiplex designs, and downstream phenotypes. Collapsing them destroys interpretability and bias diagnostics.
What the model currently assumes
- “No edit” is a valid biological outcome.
- The causal mechanism behind no-edit events is irrelevant.
Known invalid regimes
- Multiplex editing where one guide’s failure mode is correlated with another’s success.
- Chromatin accessibility changes across loci or cell states.
- Delivery and gRNA stability effects.
Design requirement / mitigation
- Decompose “no edit” into mechanistic non-events (no binding, no cut, fast repair, target inaccessible).
- Alternatively remove the terminal and infer no-edit via time-to-cut distributions and latent kinetics.
4) Non-physical PAM softness abstraction
Limitation / Risk “PAM softness” is a scalar abstraction and does not represent binding free energy or kinetic rates.
Why this matters PAM recognition is sequence- and context-dependent and directly alters binding kinetics and off-rates. A single scalar cannot express these dynamics.
What the model currently assumes
- PAM permissiveness can be scaled uniformly with a single weight.
- Binding/cleavage kinetics are not required to explain outcome distributions.
Known invalid regimes
- Off-target prediction based on kinetics or dwell-time thresholds.
- Chromatin context or methylation-dependent PAM recognition.
- Cas variants with non-uniform PAM/seed energetics.
Design requirement / mitigation
- Replace PAM softness with a binding energy landscape (delta G) that depends on sequence and context.
- Model transition rates for search -> bind -> cleave, including off-rates.
5) Single-cell vs population-level inference mismatch
Limitation / Risk The DAG displays fixed probabilities that read like population averages, but CRISPR outcomes are mixtures over heterogeneous cell states.
Why this matters Averaging across subpopulations is not equivalent to modeling them. Mixture effects can change both dominant outcomes and tail risk.
What the model currently assumes
- A homogeneous cell population or a single representative cell.
- Fixed probabilities are meaningful across all cells.
Known invalid regimes
- Cell-cycle phase heterogeneity.
- Epigenetic state differences or repair competency variation.
- Stress-response activation in subsets of cells.
Design requirement / mitigation
- Output explicit single-cell stochastic traces or a mixture model over latent cell states.
- Carry mixture weights and uncertainty alongside outcome probabilities.
6) Dominant-path visualization and tail-risk suppression
Limitation / Risk Highlighting a dominant path suggests a most-likely or most-relevant outcome, while tail risks are not visualized.
Why this matters Low-probability, high-impact events (e.g., large deletions) are often the true safety concern. A dominant-path highlight can be materially misleading.
What the model currently assumes
- The highest-probability path is the most informative summary.
- Tail risk can be ignored for visual clarity.
Known invalid regimes
- Heavy-tailed outcome distributions.
- Therapeutic or safety-critical use cases where rare events dominate risk.
Design requirement / mitigation
- Surface tail-risk visualizations, worst-case genome damage bounds, and maximum deletion credible intervals.
- Pair any dominant-path view with explicit tail-risk context.
- Figure reference: [FIGURE_REF] should show dominant-path highlighting alongside missing tail risk context.
7) Null-outcome simulation states
Limitation / Risk A run that yields zero outcomes indicates a null simulation state, failed sampling, or a disconnected DAG, but is not surfaced as a first-class outcome.
Why this matters Silent nulls can be misinterpreted as meaningful biological results, corrupting downstream decisions.
What the model currently assumes
- Zero-outcome runs are benign or ignorable.
- The simulator always produces valid terminal states.
Known invalid regimes
- Misconfigured runs, invalid parameters, or zero-probability rule sets.
- Non-recoverable simulation failures.
Design requirement / mitigation
- Make zero-outcome runs impossible without explicit failure outcomes (e.g., “simulation failed” or “all cells died”).
- Surface the failure reason prominently in UI and artifacts.
8) Non-scalability to multiplex and interacting DSBs
Limitation / Risk The current DAG architecture does not scale to multiplex editing with interacting DSBs. The state space is combinatorial and the semantics degrade quickly.
Why this matters Real experiments often involve multiple guides, correlated repairs, and translocations. A single-cut DAG cannot represent these interactions.
What the model currently assumes
- Cuts are independent or can be treated sequentially.
- The DAG can remain tractable as guides increase.
Known invalid regimes
- 2+ gRNAs with correlated repair outcomes.
- Inter-DSB interactions, translocations, and chromosomal rearrangements.
Design requirement / mitigation
- Use factorized graphical models or event-driven simulation.
- Track genome topology and explicit translocation probabilities.
- Provide interaction-aware outcome semantics instead of linear path lists.
Interpretation guidance
- This model is appropriate for in-silico, sequence-level exploration when the above limitations do not affect the decision you need to make.
- If you require cell-state heterogeneity, DSB kinetics, or safety-critical tail risk bounds, the current model is insufficient.
For a concise summary, see the warning section in docs/edit_dag_overview.md.