metagenomics_snakemake
Snakemake-based pipeline for metagenomics classification. Currently support short-reads Illumina Sequences and long-reads ONT Sequences.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 13 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.4%) to scientific vocabulary
Keywords
Repository
Snakemake-based pipeline for metagenomics classification. Currently support short-reads Illumina Sequences and long-reads ONT Sequences.
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.html
README Snakemake Metagenomics Workflow
Summary:
This Snakemake-based program allows to classify metagenomic samples from NGS and ONT samples.
Brief Description
The program allows to clasify and analyze NGS sequences and long-read ONT samples. The program consists of 3 workflows: short-reads classification, long-reads classification and post-classification workflow. The short-reads workflow and the long-reads workflow have a QC-only mode, which runs FastQC and NanoPlot respectively. This method is useful to evaluate sequence quality before classification. The post-classification workflow works on the results of the classification workflows and provides additional information using a metadata file and an additional target variable.
Prerequisites
This installation requires git and conda/miniconda. If they are already installed, skip these steps, otherwise install them they with the following steps:
Git Installation:
sudo apt install gitMiniconda Installation:
mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:zrm -rf ~/miniconda3/miniconda.shAfter installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:
~/miniconda3/bin/conda init bash ~/miniconda3/bin/conda init zshSnakemake Installation
git clone https://github.com/pablorr24/metagenomics_snakemake/ cd metagenomics/Snakemake conda env create -f environment.yml -n snakemake_meta conda activate snakemake_metaDatabase Installation
If you already have a database such as Silva, Greengenes, RefSeq, Kraken2, or a similar classification database, you can skip this step. Otherwise, make sure you install a database. The following instructions will download and install the Silva database
kraken2-build --special silva --db SilvaDB kraken2-build --special greengenes --db greengenesRunning a workflow
To run a workflow, first modify the configuration file and adjust to your parameters. Afterwards, run Snakemake
Short-reads
snakemake -s Snakefile_fastqc --cores all
snakemake -s Snakefile_full_workflow --cores all
Long-reads
snakemake -s Snakefile_nanoplot --cores all
snakemake -s Snakefile_long_read --cores all
Post Classification Workflow
snakemake -s Snakefile_post_analysis --cores allMetadata File
The post-classification workflow requires a metadata file, with one row per sample, and different columns specifying specific sample variables (sample location, species, etc).Output
After running the workflow, a timestamped folder is created in the
outputfolder. All your results will be inside this folder.References
Rules
The rule ‘create_otu_table’ uses a modified version of the ‘kraken2OTU.py’ script created by GitHub user sipost1, available in https://github.com/sipost1/kraken2OTUtable/blob/main/kraken2otu.py
The rules ‘calculate_alpha_diversity’ and ‘calculate_beta_diversity’ use a modified version of the ‘alpha_diversity.py’and ’beta_diversity_py’ script created by GitHub user jenniferlu717 in the DiversityTools repository, available in https://github.com/jenniferlu717/KrakenTools/blob/master/DiversityTools/alpha_diversity.py
The rules ‘create_dendogram’ and ‘create_pcoa_plot’ use a modified_version of the ‘dendro.R’ and ‘pca.R’ created by GitHub user GATB in the simka repository, both available in https://github.com/GATB/simka/tree/master/scripts/visualization
External Software
Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.
Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0
Ondov, B.D., Bergman, N.H. & Phillippy, A.M. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011). https://doi.org/10.1186/1471-2105-12-385
Wouter De Coster, Svenn D’Hert, Darrin T Schultz, Marc Cruts, Christine Van Broeckhoven, NanoPack: visualizing and processing long-read sequencing data, Bioinformatics, Volume 34, Issue 15, August 2018, Pages 2666–2669, https://doi.org/10.1093/bioinformatics/bty149
Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 2016 Dec;26(12):1721-1729. doi: 10.1101/gr.210641.116. Epub 2016 Oct 17. PMID: 27852649; PMCID: PMC5131823.
Owner
- Login: pablorr24
- Kind: user
- Repositories: 2
- Profile: https://github.com/pablorr24
Citation (citations/fastqc_citation.txt)
Short-reads FastQC workflow FastQC Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
GitHub Events
Total
Last Year
Dependencies
- _libgcc_mutex 0.1
- _openmp_mutex 4.5
- aioeasywebdav 2.4.0
- aiohttp 3.9.0
- aiosignal 1.2.0
- amply 0.1.6
- appdirs 1.4.4
- attmap 0.13.2
- attrs 23.1.0
- bcrypt 3.2.0
- boto3 1.29.1
- botocore 1.32.1
- brotli-python 1.0.9
- bzip2 1.0.8
- c-ares 1.21.0
- ca-certificates 2023.08.22
- cachetools 4.2.2
- certifi 2023.11.17
- cffi 1.16.0
- charset-normalizer 2.0.4
- coin-or-cbc 2.10.10
- coin-or-cgl 0.60.7
- coin-or-clp 1.17.8
- coin-or-osi 0.108.8
- coin-or-utils 2.11.9
- coincbc 2.10.10
- configargparse 1.4
- connection_pool 0.0.3
- cryptography 41.0.7
- datrie 0.8.2
- defusedxml 0.7.1
- docutils 0.18.1
- dpath 2.1.6
- dropbox 11.36.1
- eido 0.2.2
- fastqc 0.11.8
- filechunkio 1.8
- frozenlist 1.4.0
- ftputil 5.0.4
- gdbm 1.18
- gitdb 4.0.7
- gitpython 3.1.37
- google-api-core 2.10.1
- google-api-python-client 2.108.0
- google-auth 2.22.0
- google-auth-httplib2 0.1.1
- google-cloud-core 2.3.2
- google-cloud-storage 2.6.0
- google-crc32c 1.5.0
- google-resumable-media 2.4.0
- googleapis-common-protos 1.56.4
- grpcio 1.59.3
- httplib2 0.22.0
- humanfriendly 10.0
- idna 3.4
- iniconfig 1.1.1
- jinja2 3.1.2
- jmespath 1.0.1
- jsonschema 4.19.2
- jsonschema-specifications 2023.7.1
- jupyter_core 5.5.0
- krona 2.7
- ld_impl_linux-64 2.38
- libabseil 20230802.1
- libblas 3.9.0
- libcblas 3.9.0
- libcrc32c 1.1.2
- libexpat 2.5.0
- libffi 3.4.4
- libgcc-ng 13.2.0
- libgfortran-ng 13.2.0
- libgfortran5 13.2.0
- libgomp 13.2.0
- libgrpc 1.59.3
- liblapack 3.9.0
- liblapacke 3.9.0
- libnsl 2.0.0
- libopenblas 0.3.24
- libprotobuf 4.24.4
- libre2-11 2023.06.02
- libsodium 1.0.18
- libsqlite 3.44.0
- libstdcxx-ng 13.2.0
- libuuid 2.38.1
- libzlib 1.2.13
- logmuse 0.2.6
- markdown-it-py 2.2.0
- markupsafe 2.1.1
- mdurl 0.1.0
- multidict 6.0.4
- nbformat 5.9.2
- ncurses 6.4
- numpy 1.26.0
- oauth2client 4.1.3
- openjdk 8.0.152
- openssl 3.1.4
- packaging 23.1
- pandas 2.1.3
- paramiko 2.8.1
- peppy 0.35.7
- perl 5.34.0
- perl-threaded 5.32.1
- pip 23.3.1
- plac 1.3.4
- platformdirs 3.10.0
- pluggy 1.0.0
- ply 3.11
- prettytable 3.5.0
- protobuf 4.24.4
- psutil 5.9.0
- pulp 2.7.0
- pyasn1 0.4.8
- pyasn1-modules 0.2.8
- pycparser 2.21
- pygments 2.15.1
- pynacl 1.5.0
- pyopenssl 23.2.0
- pyparsing 3.0.9
- pysftp 0.2.9
- pysocks 1.7.1
- pytest 7.4.0
- python 3.11.6
- python-dateutil 2.8.2
- python-fastjsonschema 2.16.2
- python-irodsclient 1.1.9
- python-tzdata 2023.3
- python_abi 3.11
- pytz 2023.3.post1
- pyyaml 6.0.1
- re2 2023.06.02
- readline 8.2
- referencing 0.30.2
- requests 2.31.0
- reretry 0.11.8
- rich 13.3.5
- rpds-py 0.10.6
- rsa 4.7.2
- s3transfer 0.7.0
- setuptools 68.0.0
- setuptools-scm 7.1.0
- six 1.16.0
- slacker 0.14.0
- smart_open 5.2.1
- smmap 4.0.0
- snakemake 7.32.4
- snakemake-minimal 7.32.4
- stone 3.3.1
- stopit 1.1.2
- tabulate 0.9.0
- throttler 1.2.2
- tk 8.6.13
- toposort 1.10
- traitlets 5.7.1
- trimmomatic 0.39
- typing-extensions 4.7.1
- typing_extensions 4.7.1
- tzdata 2023c
- ubiquerg 0.6.3
- uritemplate 4.1.1
- urllib3 1.26.18
- veracitools 0.1.3
- wcwidth 0.2.5
- wheel 0.41.2
- wrapt 1.14.1
- xz 5.4.5
- yaml 0.2.5
- yarl 1.9.3
- yte 1.5.1