Nextflow DSL2 · Docker Ready · v1.0.0-beta

Clinical Bacterial WGS
from raw reads to report

A reproducible, containerized Nextflow pipeline for clinical microbiology labs — automating QC, assembly, AMR typing, MLST, and phylogenetic surveillance into a single audit-ready HTML report.

nextflow ≥23.10 DSL2 Docker Singularity MIT License CI: passing
$ nextflow run your-github/BacTrack-NF -profile docker

Clinical labs need a pipeline
that actually fits their workflow

Whole-genome sequencing is now standard in clinical microbiology. But the bioinformatics tooling hasn't kept up — labs stitch together ad-hoc scripts that break, can't be audited, and aren't reproducible between runs.

Current Pain Points
  • Existing tools (Bactopia, Nullarbor) are research-focused, not clinical-grade
  • No single pipeline covers QC → assembly → AMR → MLST → phylogeny end-to-end
  • Outputs aren't audit-ready — no structured HTML report for clinical record-keeping
  • Hard to deploy in air-gapped hospital HPC environments
  • No built-in outbreak detection or cluster alert thresholds
BacTrack-NF solves this
  • Full DSL2 Nextflow pipeline with modular, testable processes
  • Six integrated modules: QC → Assembly → Annotation → AMR → MLST → Phylogeny
  • Structured HTML + JSON report suitable for clinical audit trails
  • Singularity support for air-gapped HPC; all containers pre-pulled and versioned
  • SNP-distance outbreak alerting with configurable cluster thresholds

Six modules, one command

Each module is a self-contained Nextflow process with its own Docker container, test data, and documentation. Mix and match with --skip_* flags.

01
📥
Input QC
FastQC
fastp
02
🧩
Assembly
Shovill
SPAdes
03
🔬
Annotation
Prokka
QUAST
04
💊
AMR Detection
AMRFinder+
RESFinder
05
🧬
MLST Typing
mlst
Kleborate
06
🌳
Phylogeny
Snippy
FastTree2

Built for clinical reproducibility

📋
Audit-Ready HTML Reports
Every run produces a structured, timestamped HTML + JSON report with QC flags, AMR profiles, and chain-of-custody metadata for clinical records.
report.html · report.json
🚨
Outbreak Cluster Alerting
SNP-distance matrix with configurable threshold alerts. Isolates within N SNPs of a reference are flagged automatically — critical for infection control.
--snp_threshold 10
🐳
Air-Gap & HPC Ready
All containers are versioned and can be pre-pulled. Singularity profile included for hospital HPCs with no internet access. Zero dependency hell.
-profile singularity
🧪
CI-Tested with nf-test
Each module ships with nf-test unit tests run on every commit. Snapshot testing ensures outputs remain consistent across tool version updates.
nf-test · GitHub Actions
⚙️
Flexible Config Profiles
Pre-built profiles for local laptop, Slurm HPC, AWS Batch, and Google Cloud. Switch with a single flag — no config file editing required.
-profile slurm,docker
📊
MultiQC Integration
Aggregate QC across all samples in a run. Instantly flag low-coverage, contaminated, or failed samples before downstream analysis wastes compute time.
multiqc · coverage · N50

Mock results from 12 K. pneumoniae isolates

Simulated run on clinical-grade Illumina reads (2×150bp, ~100× coverage). All outputs generated by the pipeline with no manual intervention.

12/12
Samples passed QC
5.4 Mb
Mean assembly size
98.2×
Mean coverage depth
127 kb
Mean N50
Mean Q30 Score
94.1%
Adapter trimmed
8.2%
Mapping rate
99.3%
Contigs < 500 bp
3.8%
Sample ID Gene Drug Class % Identity Coverage Phenotype
KP-001blaCTX-M-15Beta-lactam (ESBL)99.8%100%R
KP-001aac(6')-Ib-crAminoglycoside98.2%100%R
KP-003oqxABFluoroquinolone97.5%98%I
KP-005blaTEM-1BBeta-lactam100%100%R
KP-007mcr-1.1Colistin99.1%99%R
KP-009— (no AMR genes)S
KP-011blaKPC-2Carbapenem100%100%R
Klebsiella pneumoniae
ST258
gapA:2infB:1mdh:1 pgi:1phoE:9rpoB:4tonB:12
Klebsiella pneumoniae
ST307
gapA:2infB:1mdh:1 pgi:1phoE:1rpoB:1tonB:70
Klebsiella pneumoniae
ST147
gapA:3infB:3mdh:1 pgi:1phoE:8rpoB:4tonB:4
Klebsiella pneumoniae
ST11
gapA:2infB:2mdh:2 pgi:2phoE:3rpoB:2tonB:7
Outbreak cluster (≤8 SNPs) KP-007 (ST258) KP-011 (ST258) KP-001 (ST258) KP-003 (ST307) KP-005 (ST307) KP-009 (ST147) KP-012 (ST11) 10 SNPs

Up and running in three commands

# 1. Install Nextflow (requires Java 17+) curl -s https://get.nextflow.io | bash # 2. Pull and run BacTrack-NF nextflow run your-github/BacTrack-NF \ --input samplesheet.csv \ --outdir ./results \ --genome GCF_000240185.1 \ --snp_threshold 10 \ -profile docker # 3. Open results/report.html in any browser open results/report.html
// nextflow.config — customise resources per process params { input = null outdir = "results" genome = null snp_threshold = 20 min_coverage = 30 skip_annotation= false skip_phylogeny = false } process { withName: 'SHOVILL' { cpus = 8 memory = '32.GB' time = '2.h' } withName: 'FASTTREE' { cpus = 16 memory = '16.GB' } }
# Submit to SLURM cluster with Singularity nextflow run your-github/BacTrack-NF \ --input samplesheet.csv \ --outdir /scratch/$USER/bactrack_out \ -profile slurm,singularity \ -resume # Singularity images are pre-cached at: # NXF_SINGULARITY_CACHEDIR=/path/to/cache # Monitor run nextflow log | tail -5

PhD researcher in
clinical metagenomics

I build reproducible bioinformatics tools at the intersection of clinical microbiology and computational genomics. BacTrack-NF grew out of frustration with fragmented tooling in a clinical WGS setting — I needed something I could hand to a lab technician with one command.

Open to remote bioinformatics positions — analyst, pipeline engineer, or computational researcher roles. Comfortable across the full stack: bash to Python to Nextflow to Docker.

Python R / Bioconductor Bash Nextflow DSL2 Snakemake Docker Singularity QIIME2 BLAST scikit-learn Git / GitHub Actions
6 Pipeline modules, each independently testable
~18 min Runtime for 12 samples on 32-core HPC node
4 Deployment profiles: local, Slurm, AWS, GCP
100% Processes containerised — no dependency conflicts
DSL2 Modern Nextflow with sub-workflows & nf-test

Tweaks

Accent colour
Background warmth
Dark mode