minmutfinder

Tool written in Python and Bash implemented in a Nextflow pipeline that accurately identifies mutations and assesses their frequencies, accounting for multiple nucleotide mutations occurring within a single codon. Additionally, it can annotate mutations associated to phenotypical changes in viral populations based on user-supplied datasets.

https://github.com/valldhebron-bioinformatics/minmutfinder

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary

Keywords

bioinformatics-analysis bioinformatics-tool microbiology minority-mutations-analysis mutation-analysis nextflow python3 variant-calling
Last synced: 6 months ago · JSON representation ·

Repository

Tool written in Python and Bash implemented in a Nextflow pipeline that accurately identifies mutations and assesses their frequencies, accounting for multiple nucleotide mutations occurring within a single codon. Additionally, it can annotate mutations associated to phenotypical changes in viral populations based on user-supplied datasets.

Basic Info
  • Host: GitHub
  • Owner: ValldHebron-Bioinformatics
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 79.7 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
bioinformatics-analysis bioinformatics-tool microbiology minority-mutations-analysis mutation-analysis nextflow python3 variant-calling
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Static Badge License

minMutFinder

📜 Table of Contents


🎯 Overview

minMutFinder is a bioinformatics tool designed to help you identify minority mutations in population variants with precision and accuracy. Unlike other tools, minMutFinder considers the possibility of multiple nucleotide mutations within the same codon and provides comprehensive metrics for your sequences.


🔍 Features

  • 🧬 Advanced Mutation Detection: Identifies minority mutations while accounting for multiple nucleotide changes within a single codon.
  • 📊 Comprehensive Analysis: Provides detailed metrics and plots for a thorough understanding of your sequences.
  • 🔧 Customizable: Supports various versions and annotated mutations for enhanced flexibility.

🛠 Prerequisites

Ensure the following programs are installed:

Required Software

| Software | Version | Installation | | ------------- | ------- | ------------ | | Nextflow | 23.10.1 or higher | Install | | Python | 3.9 | | | libgcc-ng | 12 or higher | conda-forge | | Trimmomatic | 0.39 | Bioconda | | Minimap2 | 2.26 | Bioconda | | Lofreq | 2.1.5 | Bioconda | | Bcftools | 1.17 | Bioconda | | Samtools | 1.17 | Bioconda |

Required Python Packages:

bash os, pandas, sys, csv, gzip, shutil, matplotlib, seaborn, Bio, re, plotly, numpy


📥 Installation

Step 1: Clone the Repository

bash git clone https://github.com/ValldHebron-Bioinformatics/minMutFinder.git cd minMutFinder

Step 2: Install Dependencies

```bash

Install conda-forge dependencies

conda install -c conda-forge libgcc-ng>=12

Install bioconda dependencies

conda install -c bioconda minimap2=2.26 htslib=1.17 bcftools=1.17 samtools=1.18 trimmomatic=0.39 lofreq=2.1.5=py39hb7ef6d5_10

Install Python dependencies

pip install pandas>=2.1.4 Bio>=1.5.9 plotly>=5.18.0 numpy>=1.26.2 pysam>=0.21.0 matplotlib>=3.8.2 seaborn>=0.13.0 ```


🚀 How to Run minMutFinder

bash nextflow run minMutFinder.nf --ref_seq <reference.fasta> --out_path <output_name> --r1 <forward_reads.fastq.gz> --r2 <reverse_reads.fastq.gz> --annotate <mutations.tsv> --syn_muts <"yes"/"no">


⚙️ Arguments

  • --ref_seq: Path and filename of the reference genome FASTA file (1)(2)(4)
  • --out_path: Output name for the virus column
  • --r1: Path and filename of the forward FASTQ compressed file
  • --r2: Path and filename of the reverse FASTQ compressed file
  • --annotate: Path and filename of the TSV file containing the annotated mutations (3)
  • --syn_muts: "yes" or "no", depending on whether to include synonymous mutations in the output plot (default is "no")

📝 Notes

  1. The reference genome must contain the coding sequences (CDS) of the proteins. If there are multiple proteins, they should be separated in the FASTA file.
  2. FASTA headers must use underscores (_) between words. For example: >NC_006273_2_UL96.
  3. The annotated mutation file should be tab-separated and contain a column named mutation for annotated mutations.
  4. If FASTA headers share patterns (e.g. H3N2PA and H3N2PAX share H3N2PA) they must be differentiated (e.g. H3N2PAprot, H3N2PAX).

🔏 License

This project, minMutFinder, is licensed under the GNU General Public License v3.0. You are free to use, modify, and distribute this software under the terms of this license. For more details, refer to the LICENSE file.


🖊️ Citing minMutFinder

A research paper on minMutFinder is currently in progress. In the meantime, please cite this GitHub repository using the citation provided by GitHub. You can find the official citation by clicking the "Cite this repository" button at the top of the repository page or view the citation file directly.


🔮 Future Work and Limitations

Limitations:

  • As of now, minMutFinder has only been tested on viral sequencing data.
  • Only available for Illumina® sequencing data.

Current Thresholds:

  • Allele Frequency (AF) ≥ 5%
  • Read depth per nucleotide position ≥ 20
  • Number of threads = 64

Future Improvements:

  • At the moment we are working on uploading minMutFinder to nf-core, for easier distribution and use.
  • Support for user-provided VCF files along with SAM/BAM files, skipping the initial quality control, mapping, and variant calling steps.
  • Make it possible for the user to choose the number of threads, AF, Read depth per nucleotide positon.
  • Make it possible for the user to choose kind of output desired (complete or just final result files).
  • Support for ONT® sequencing data.

✉️ Get in Touch

If you encounter any issues, have feature requests, or need assistance, feel free to reach out:

We're always happy to help!


📚 References

Owner

  • Name: ValldHebron-Bioinformatics
  • Login: ValldHebron-Bioinformatics
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Prats-Méndez"
  given-names: "Ignasi"
  orcid: "https://orcid.org/0009-0004-0170-9043 "
title: "minMutFinder"
version: 1.0.0
date-released: 2024-09-17
url: "https://github.com/ValldHebron-Bioinformatics/minMutFinder"

GitHub Events

Total
  • Release event: 1
  • Delete event: 3
  • Push event: 18
  • Pull request event: 6
  • Create event: 2
Last Year
  • Release event: 1
  • Delete event: 3
  • Push event: 18
  • Pull request event: 6
  • Create event: 2

Dependencies

minMutFinder-nf/docs/bioconda_requirements.txt pypi
  • bcftools =1.17
  • htslib =1.17
  • lofreq =2.1.5=py39hb7ef6d5_10
  • minimap2 =2.26
  • samtools =1.18
  • trimmomatic =0.39
minMutFinder-nf/docs/conda-forge_requirements.txt pypi
  • libgcc-ng >=12
minMutFinder-nf/docs/pip_requirements.txt pypi
  • Bio >=1.5.9
  • matplotlib >=3.8.2
  • numpy >=1.26.2
  • pandas >=2.1.4
  • plotly >=5.18.0
  • pysam >=0.21.0
  • seaborn >=0.13.0