nf-core/longraredisease is a specialized bioinformatics pipeline for structural variant (SV) detection and clinical interpretation from long-read sequencing data (Oxford Nanopore and PacBio). Designed for rare disease diagnostics, it delivers high-confidence variant discovery through multi-caller consensus, family-based analysis, and phenotype-driven prioritization.
The pipeline excels at identifying and interpreting structural variants through:
- Multi-caller SV consensus - Sniffles, CuteSV, SVIM with JASMINE merging
- Phase-aware calling - Haplotype-resolved SV detection using LongPhase
- Family analysis - Trio-based joint calling and de novo variant detection
- Clinical annotation - AnnotSV with disease database integration
- Phenotype prioritization - SVANNA-based ranking using HPO terms
Core SV Analysis (Always Enabled):
- ✅ Structural Variants - Multi-caller detection (DEL, INS, DUP, INV, BND)
- ✅ Phasing - Long-range haplotyping with LongPhase
- ✅ Quality Control - Comprehensive QC with NanoPlot, mosdepth, MultiQC
Optional Analyses:
- 🧬 Single Nucleotide Variants - Clair3 or DeepVariant (enable with
--snv true) - 📈 Copy Number Variants - Spectre or HiFiCNV (enable with
--cnv true) - 🔁 Short Tandem Repeats - Straglr genotyping (enable with
--str true) - 🧪 DNA Methylation - Modkit extraction for ONT (enable with
--methyl true)
- Nextflow ≥25.04.6 (DSL2)
- Container engine: Docker, Singularity/Apptainer, or Podman
- Java ≥17 (required by Nextflow)
| Analysis Type | CPU Cores | Memory | Storage |
|---|---|---|---|
| Single WGS sample | 8-16 | 32-64 GB | 100 GB |
Notes:
- Coverage recommendations: ≥10x for accurate SV calling, ≥30x for high-confidence trio analysis
- Storage includes space for input data, intermediate files, and results
- Adjust
--max_cpusand--max_memoryparameters based on available resources
# Install Nextflow (≥25.04.6) curl -s https://get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ # Verify installation nextflow -version# Run with test data nextflow run nf-core/longraredisease \ -profile test,docker \ --outdir test_resultsMinimal SV-focused run:
nextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir results \ --fasta reference.fasta \ --sequencing_platform ont \ -profile dockerWith family analysis and phenotype prioritization:
nextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir results \ --fasta reference.fasta \ --sequencing_platform ont \ --trio_analysis true \ --run_svanna true \ --svanna_db /path/to/svanna_db \ -profile dockerSee docs/usage.md for complete examples and parameter details.
| Parameter | Description | Format | Example |
|---|---|---|---|
--input | Samplesheet with sample metadata | CSV | samplesheet.csv |
--outdir | Output directory | Path | ./results |
--fasta | Reference genome FASTA | .fasta/.fa | GRCh38.fasta |
--sequencing_platform | Platform type | ont or pacbio or hifi | ont |
The input samplesheet is a CSV file with the following columns:
Minimal format (single samples):
sample,bam,bai sample1,/path/to/sample1.bam,/path/to/sample1.bam.bai sample2,/path/to/sample2.bam,/path/to/sample2.bam.baiFamily analysis format (trios):
sample,bam,bai,family,paternal_id,maternal_id,sex,phenotype,hpo_terms proband,proband.bam,proband.bam.bai,family1,father,mother,1,affected,"HP:0001250,HP:0002066" father,father.bam,father.bam.bai,family1,0,0,1,unaffected, mother,mother.bam,mother.bam.bai,family1,0,0,2,unaffected,Column descriptions:
sample- Unique sample identifierbam- Path to aligned BAM filebai- Path to BAM index filefamily- Family identifier (for trio analysis)paternal_id- Father's sample ID (or0if not in study)maternal_id- Mother's sample ID (or0if not in study)sex-1= male,2= female,0= unknownphenotype-affectedorunaffectedhpo_terms- Comma-separated HPO terms (e.g.,HP:0001250,HP:0002066)
| Parameter | Description | Required For |
|---|---|---|
--bed | Target regions BED file | Targeted sequencing |
--annotsv_db | AnnotSV database path | SV annotation |
--svanna_db | SVANNA database path | Phenotype prioritization |
--str_bed | STR loci BED file | STR analysis |
Structural variant analysis is always enabled. Optional analyses:
| Parameter | Description | Default |
|---|---|---|
--snv | Enable SNV calling (Clair3/DeepVariant) | false |
--cnv | Enable CNV detection (Spectre) | false |
--str | Enable STR genotyping (Straglr) | false |
--methyl | Enable methylation calling (Modkit, ONT only) | false |
| Parameter | Description | Default |
|---|---|---|
--run_cutesv | Enable CuteSV caller | true |
--run_svim | Enable SVIM caller (recommended for BNDs) | false |
--haplotag_bam | Haplotag BAM for phase-aware SV calling | true |
--min_sv_size | Minimum SV size to report (bp) | 30 |
--min_read_support | Minimum supporting reads | auto |
| Parameter | Description | Default |
|---|---|---|
--trio_analysis | Enable trio/family-based calling | false |
--run_svanna | Enable phenotype-driven prioritization | false |
--svanna_db | Path to SVANNA database | - |
| Parameter | Description | Default |
|---|---|---|
--jasmine_max_dist | Max distance for merging breakpoints (bp) | 1000 |
--jasmine_min_support | Min callers supporting merged variant | 2 |
--jasmine_spec_reads | Min supporting reads for consensus | 3 |
| Parameter | Description | Options |
|---|---|---|
--sequencing_platform | Sequencing platform | ont, pacbio |
--preset | Minimap2 alignment preset | map-ont, map-hifi, map-pb |
--snv_caller | SNV caller choice | clair3, deepvariant |
nextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ -profile dockernextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir results \ --fasta GRCh38.fasta \ --sequencing_platform pacbio \ --snv true \ --cnv true \ --str true \ -profile singularitynextflow run nf-core/longraredisease \ --input trio_samplesheet.csv \ --outdir family_results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ --trio_analysis true \ --run_svanna true \ --svanna_db /databases/svanna_data \ --annotsv_db /databases/AnnotSV \ -profile dockernextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir sensitive_results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ --run_svim true \ --min_sv_size 20 \ --min_read_support 2 \ --jasmine_min_support 1 \ -profile dockernextflow run nf-core/longraredisease \ --input samplesheet.csv \ --outdir targeted_results \ --fasta GRCh38.fasta \ --bed targets.bed \ --sequencing_platform ont \ -profile dockerresults/ ├── pipeline_info/ # Pipeline execution reports │ ├── execution_report.html # Resource usage timeline │ ├── execution_timeline.html # Process execution graph │ └── multiqc_report.html # Comprehensive QC report │ ├── qc/ # Quality control metrics │ ├── mosdepth/ # Coverage statistics per sample │ ├── nanoplot/ # Read quality metrics (ONT) │ └── cramino/ # CRAM-based QC (optional) │ ├── structural_variants/ # 🎯 PRIMARY OUTPUT: SV calls │ ├── sniffles/ # Per-sample Sniffles VCFs │ │ └── {sample}.sniffles.vcf.gz │ ├── cutesv/ # Per-sample CuteSV VCFs │ │ └── {sample}.cutesv.vcf.gz │ ├── svim/ # Per-sample SVIM VCFs (if enabled) │ │ └── {sample}.svim.vcf.gz │ ├── merged/ # Multi-caller consensus SVs │ │ ├── {sample}.jasmine.vcf.gz │ │ └── {sample}.survivor.vcf.gz │ ├── annotated/ # AnnotSV annotations │ │ └── {sample}.annotated.tsv │ └── svanna/ # Phenotype-prioritized SVs │ └── {sample}.svanna.html │ ├── phasing/ # Haplotype-resolved results │ ├── haplotagged_bams/ # Phase-tagged alignments │ │ └── {sample}.haplotagged.bam │ ├── whatshap/ # Phasing statistics │ │ └── {sample}.phased.vcf.gz │ └── longphase/ # Alternative phasing │ └── {sample}.longphase.vcf.gz │ ├── snv_calls/ # SNVs (if --snv enabled) │ ├── clair3/ │ │ └── {sample}.clair3.vcf.gz │ └── deepvariant/ │ └── {sample}.deepvariant.vcf.gz │ ├── cnv_calls/ # CNVs (if --cnv enabled) │ └── spectre/ │ └── {sample}.cnv.vcf.gz │ ├── str_calls/ # STRs (if --str enabled) │ └── straglr/ │ └── {sample}.straglr.tsv │ └── methylation/ # Methylation (if --methyl enabled, ONT only) └── modkit/ └── {sample}.bedmethyl.gz Key output files:
- Merged SVs:
structural_variants/merged/{sample}.jasmine.vcf.gz(high-confidence consensus) - Annotated SVs:
structural_variants/annotated/{sample}.annotated.tsv(clinical interpretation) - QC Report:
pipeline_info/multiqc_report.html(overall quality assessment) - Phenotype-prioritized:
structural_variants/svanna/{sample}.svanna.html(ranked by phenotype match)
Available Profiles:
- test: Minimal test dataset
- docker: Use Docker containers
- singularity: Use Singularity containers
Custom Configuration
// custom.config params { max_cpus = 16 max_memory = '64.GB' outdir = '/scratch/results' } process { withName: 'CLAIR3' { cpus = 8 memory = '32.GB' } }Run with:
nextflow run main.nf -c custom.config -profile dockerFor family-based SV analysis, provide pedigree information in your samplesheet:
sample,bam,bai,family,paternal_id,maternal_id,sex,phenotype,hpo_terms child_001,child.bam,child.bam.bai,FAM001,father_001,mother_001,2,affected,"HP:0001250,HP:0002066,HP:0001263" father_001,father.bam,father.bam.bai,FAM001,0,0,1,unaffected, mother_001,mother.bam,mother.bam.bai,FAM001,0,0,2,unaffected,Sex encoding: 1 = male, 2 = female, 0 = unknown Parental IDs: Use 0 for founders (individuals with no parents in the study)
Enable trio analysis to identify de novo structural variants:
nextflow run nf-core/longraredisease \ --input trio_samplesheet.csv \ --trio_analysis true \ --outdir trio_results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ -profile dockerThe pipeline will:
- ✅ Call SVs in each family member independently
- ✅ Merge calls using JASMINE with family-aware parameters
- ✅ Identify variants present in child but absent in parents
- ✅ Filter based on read support and quality metrics
When HPO terms are provided, SVANNA ranks SVs by phenotype relevance:
nextflow run nf-core/longraredisease \ --input trio_samplesheet.csv \ --trio_analysis true \ --run_svanna true \ --svanna_db /path/to/svanna/2302 \ --outdir prioritized_results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ -profile dockerRequired: Download SVANNA database from Monarch Initiative
Output: HTML report ranking SVs by:
- Overlap with disease-associated genes
- Regulatory impact predictions
- Phenotype similarity scores
- De novo status (if trio data available)
Enable comprehensive SV annotation:
nextflow run nf-core/longraredisease \ --input samplesheet.csv \ --annotsv_db /path/to/AnnotSV_db \ --outdir annotated_results \ --fasta GRCh38.fasta \ --sequencing_platform ont \ -profile dockerAnnotSV provides:
- Gene overlap and functional impact
- ClinGen/ClinVar annotations
- DGV/gnomAD population frequencies
- Pathogenicity predictions (ACMG criteria)
- Regulatory element disruption
Symptoms: Fewer SVs than expected
Solutions:
# Lower read support threshold --min_read_support 2 # Reduce minimum SV size --min_sv_size 20 # Enable SVIM for better breakend detection --run_svim true # Lower consensus requirement --jasmine_min_support 1Symptoms: Many low-quality SV calls
Solutions:
# Increase read support --min_read_support 5 # Require multiple caller agreement --jasmine_min_support 2 # Increase minimum SV size --min_sv_size 50Symptoms: Process killed due to OOM
Solutions:
# Increase max memory --max_memory 128.GB # Reduce parallel processes --max_cpus 16 # Use chromosome-based parallelization (automatic)Symptoms: Expected de novo variants not detected
Checklist:
- ✅ Ensure
--trio_analysis trueis set - ✅ Verify pedigree information in samplesheet
- ✅ Check read coverage in all samples (≥30×)
- ✅ Review
structural_variants/merged/for family calls - ✅ Lower
--jasmine_min_supportif needed
Symptoms: SVANNA fails or produces no rankings
Solutions:
# Verify database path and version ls -lh /path/to/svanna/2302 # Ensure HPO terms are valid (HP:XXXXXXX format) # Check samplesheet for proper HPO term formatting # Download latest SVANNA database: wget https://storage.googleapis.com/svanna-db/svanna-data-2302.tar.gz tar -xzf svanna-data-2302.tar.gzFor large cohorts (>10 samples):
# Enable resource-efficient mode --max_cpus 64 --max_memory 256.GB # Use Singularity for better resource isolation -profile singularity # Enable work directory cleanup -resume -with-dag flowchart.htmlFor whole genome sequencing:
- Expect 8-24 hours runtime (depending on coverage)
- Allocate 64-128GB RAM per sample for SV calling
- Use SSD storage for work directory (I/O intensive)
The pipeline includes test data for validation:
- Location: assets/test_data/
- Genome: Chromosome 22 subset
- Samples: Simulated nanopore data
- Runtime: ~10-15 minutes
Debugging Failed Runs:
# Check Nextflow log for detailed errors less .nextflow.log # Resume from last successful step nextflow run nf-core/longraredisease -resume # Enable debug mode for verbose output nextflow run nf-core/longraredisease --debug -profile dockerReporting Issues:
When reporting issues, please include:
- Nextflow version (
nextflow -version) - Command used to run the pipeline
- Relevant error messages from
.nextflow.log - Sample metadata (anonymized if sensitive)
- System specifications (CPU, RAM, storage)
If you use nf-core/longraredisease in your research, please cite:
nf-core/longraredisease: A Nextflow pipeline for long-read sequencing analysis in rare disease research > Citation to be added upon publication
Additionally, please cite the tools used in your analysis:
Core SV Tools:
- Sniffles2: Sedlazeck et al. (2018) Nature Methods
- CuteSV: Jiang et al. (2020) Genome Biology
- JASMINE: Kirsche et al. (2023) Nature Methods
- LongPhase: Luo et al. (2023) Nature Communications
- AnnotSV: Geoffroy et al. (2018) Bioinformatics
Optional Analysis Tools:
- SVANNA: Danis et al. (2022) AJHG
- Clair3: Zheng et al. (2022) Nature Computational Science
- Spectre: Suvakov et al. (2021) Genome Research
- Straglr: Chin et al. (2023) Genome Research
Contributions are welcome! To contribute:
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Make your changes following nf-core guidelines
- Test with
nextflow run . -profile test,docker - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Please ensure:
- ✅ Code follows nf-core style guidelines
- ✅ All tests pass successfully
- ✅ Documentation is updated accordingly
- ✅ Commit messages are descriptive
This project is licensed under the MIT License – see the LICENSE file for details.
This pipeline was developed with support from [institution/funding sources]. We thank the nf-core community for infrastructure and best practices, and all tool developers whose software makes this pipeline possible.
Pipeline Version: 1.0.0 Nextflow Version: ≥25.04.6 Last Updated: 2024
