umi-pipeline-nf
Nextflow pipeline to analyze ONT-UMI-Sequencing data
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
✓Institutional organization owner
Organization genepi has institutional domain (genepi.i-med.ac.at) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Repository
Nextflow pipeline to analyze ONT-UMI-Sequencing data
Basic Info
Statistics
- Stars: 4
- Watchers: 3
- Forks: 4
- Open Issues: 2
- Releases: 5
Metadata Files
README.md
Umi-pipeline-nf
Umi-pipeline-nf creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline processes FastQ files (typically from the fastq_pass folder of your nanopore run) and outputs high-quality aligned consensus sequences in BAM format for each UMI cluster. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
The newest version of the pipeline supports live analysis of the clusters during sequencing and seemless polishing of the clusters as soon as enough clusters are found.
Umi-pipeline-nf originated from a Snakemake-based analysis pipeline (pipeline-umi-amplicon; originally developed by Karst et al, Nat Biotechnol 18:165–169, 2021). We have migrated the pipeline to Nextflow and incorporated several optimizations and additional functionalities.
Workflow
The pipeline is organized into four main subworkflows, each with its own processing steps and outputs:
LIVE UMI PROCESSING
- Purpose: Real-time processing of raw FastQ files.
- Steps:
- Merge and filter raw FastQ files.
- Align reads to the reference genome.
- Extract UMI sequences.
- Cluster UMI-tagged reads.
- Outputs:
- Processed UMI clusters are passed on to later stages.
- Raw alignment files (e.g., in
<output>/<barcodeXX>/raw/align/or<output>/<barcodeXX>/<target>/fastq_filtered/raw/). - Filtered FastQ files and clustering statistics.
To stop the pipeline when it's in live mode, create a CONTINUE file in the output directory:
touch <output>/CONTINUEOFFLINE UMI PROCESSING
- Purpose: Batch processing with an optional subsampling step.
- Steps:
- Merge and filter FastQ files.
- Optionally subsample the merged reads.
- Perform alignment, UMI extraction, and clustering similar to LIVE processing.
- Outputs:
- Processed UMI clusters.
- Alignment and subsampling reports (e.g., in
<output>/<barcodeXX>/raw/subsampling/and<output>/<barcodeXX>/<target>/fastq_filtered/raw/).
UMI POLISHING
- Purpose: Refine UMI clusters to generate high-quality consensus sequences.
- Steps:
- Polish clusters using medaka.
- Realign consensus sequences to the reference genome.
- Re-extract and re-cluster UMIs from consensus reads.
- Parse final consensus clusters.
- Outputs:
- Consensus BAM and FastQ files (e.g., in
<output>/<barcodeXX>/<target>/align/consensus/and<output>/<barcodeXX>/<target>/fastq/consensus/). - Polishing logs and detailed cluster statistics.
- Consensus BAM and FastQ files (e.g., in
VARIANT CALLING
See the output documentation for a detailed overview of the pipeline outputs and directory structure.
Main Adaptations
- It comes with a docker/singularity container making installation simple, easy to use on clusters and results highly reproducible.
- The pipeline is optimized for parallelization.
- Additional UMI cluster splitting step to remove admixed UMI clusters.
- Read filtering strategy per UMI cluster was adapted to preserve the highest quality reads.
- Three commonly used variant callers (freebayes, lofreq or mutserve) are supported by the pipeline.
- The raw reads can be optionally subsampled.
- The raw reads can be filtered by read length and quality.
- GPU acceleration for cluster polishing by Medaka is available when using the
dockerprofile. Tested with an RTX 4080 SUPER GPU (16 GB). - Allows multi line bed files to run the pipeline for several targets at once.
- Supports live analysis of the clusters during sequencing and seemless polishing of the clusters as soon as enough clusters are found
To see all available parameters run
bash
nextflow run genepi/umi-pipeline-nf -r main --help
Quick Start
Install
nextflow.Download the pipeline and test it on a minimal dataset with a single command.
bash
nextflow run genepi/umi-pipeline-nf -r v1.0.0-beta -profile test,docker
- Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).
bash
nextflow run genepi/umi-pipeline-nf -r v1.0.0-beta -c <custom.config> -profile custom,<docker,singularity>
Citation
If you use the pipeline please cite our Paper:
Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schoenherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 16, 117 (2024). https://doi.org/10.1186/s13073-024-01391-8
Credits
The pipeline was written by @StephanAmstler.
Nextflow template pipeline: EcSeq.
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: nanoporetech/pipeline-umi-amplicon.
UMI-corrected nanopore sequencing analysis first shown by: SorenKarst/longread_umi.
Owner
- Name: Institute of Genetic Epidemiology
- Login: genepi
- Kind: organization
- Location: Innsbruck, Austria
- Website: http://genepi.i-med.ac.at
- Repositories: 55
- Profile: https://github.com/genepi
Medical University of Innsbruck
Citation (CITATION.cff)
cff-version: "1.2.0"
message: "If you use this software, please cite it as below."
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1186/s13073-024-01391-8"
date-released: "2024-10-08"
license: "Apache-2.0"
repository-code: "https://github.com/genepi/umi-pipeline-nf"
preferred-citation:
type: "article"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1186/s13073-024-01391-8"
journal: "Genome Medicine"
day: 8
month: 10
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR"
year: 2024
GitHub Events
Total
- Create event: 14
- Release event: 9
- Issues event: 21
- Watch event: 3
- Delete event: 17
- Issue comment event: 16
- Push event: 110
- Pull request event: 5
- Fork event: 3
Last Year
- Create event: 14
- Release event: 9
- Issues event: 21
- Watch event: 3
- Delete event: 17
- Issue comment event: 16
- Push event: 110
- Pull request event: 5
- Fork event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 3
- Average time to close issues: about 1 year
- Average time to close pull requests: 2 minutes
- Total issue authors: 6
- Total pull request authors: 1
- Average comments per issue: 0.91
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 3
- Average time to close issues: 27 days
- Average time to close pull requests: 2 minutes
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 1.5
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AmstlerStephan (4)
- webbchen (3)
- coro1c (3)
- cmbrunet (1)
- ebwinter95 (1)
- cnk113 (1)
- camcl (1)
Pull Request Authors
- AmstlerStephan (14)
- vmelichar (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- actions/setup-java v2 composite
- docker/build-push-action v5 composite
- docker/setup-buildx-action v3 composite
- nf-core/setup-nextflow v1 composite
- ubuntu 22.04 build
- bedtools 2.30.0.*
- freebayes 1.3.2.*
- lofreq 2.1.5.*
- medaka >2.0.0
- minimap2 2.24.*
- openjdk 11.0.9.*
- pip 22.2.2.*
- python >=3.8
- samtools 1.15.1.*
- seqtk 1.3.*
- unzip 6.0.*
- vcflib 1.0.0.*
- vsearch 2.21.2.*