Run Ingest Pipeline
Omnis/Helix treats every instrument or cloud ecosystem the same once a run is translated into a canonical RunDescriptor and bundled with checksums.
RunDescriptor
A minimal example (see schemas/run/run_descriptor.schema.json for the full contract):
{
"schema_version": "run_descriptor_v0.1",
"instrument": {"vendor": "illumina", "model": "NextSeq", "vendor_run_id": "RUN123", "data_uri": "/runs/RUN123"},
"assay": {"name": "PanelA", "read_structure": "151T8I151T", "read_lengths": [151, 8, 151]},
"samples": [{"sample_id": "SampleA"}, {"sample_id": "SampleB"}],
"data_products": {"fastq_sets": [{"sample_id": "SampleA", "lane": "L001", "read_pair": {"r1": "...", "r2": "..."}}]},
"routing": {"pipeline": "generic_align", "reference_build": "hg38"}
}
RunBundle layout
run-<vendor_run_id>-<timestamp>.tar.gz
descriptor.json
manifest.json # file list with size + sha256 + bundle digest
checksums.sha256 # convenience copy
sample_sheet.raw.csv # optional
data/ # optional embedded FASTQ/BAM/VCF
reports/ # optional vendor QC
Edge → Ingest flow (quick start)
- Start ingest verifier/registry:
python tools/mock_run_ingest_server.py --host 0.0.0.0 --port 8001 --store /tmp/ingest_store - Start Ion receiver if testing Torrent payloads:
python tools/ion_receiver_server.py --config edge_config.yaml --host 0.0.0.0 --port 8082 - Start edge agent watching an Illumina fixture:
python tools/helix_edge_agent.py --config edge_config.yaml --loop - Drop the Illumina fixture run into the watched folder. The agent will bundle, upload, and ingest will verify.
Results:
GET /runson ingest shows the new run summary.GET /runs/<id>returns the full descriptor + verification flag.- Helix Instrument Runs view can call the same endpoints through the RunRegistryClient.
Run states
- received: bundle verified and stored by ingest.
- queued: OGN accepted the run and created a job.
- running: OGN job is executing.
- finished: pipeline completed successfully.
- failed: pipeline errored; check
last_messagefor context.
Live demo walkthrough
- Start ingest verifier/registry (
tools/mock_run_ingest_server.py). - Start Ion receiver (optional) and edge agent watching an Illumina fixture.
- Start OGN scheduler with ingest callbacks (see
helix/ogn/scheduler_status_hooks.py). - Submit a fixture run via the edge agent.
- Watch
/runs/{id}: status_history should walk received → queued → running → finished. - Open Helix Instrument Runs tab:
- Filter “Active” to see queued/running turn blue.
- Filter “Failed” to show broken jobs with tooltip last_message.
Vendor mapping notes
- Illumina run folder: parse RunInfo/RunParameters/SampleSheet; vendor_run_id comes from RunId or folder name.
- BaseSpace: REST API provides Run.Id and FASTQ files; token refresh via BaseSpaceTokenManager.
- Thermo Connect: treat synced run root; read
run_completed.jsonand FASTQ glob. - Ion Torrent: plugin posts JSON to Ion receiver; paths in
fastq_setsare bundled/uploaded.
For developer-facing details see schemas/run/README.md; this doc is the user-facing narrative.