metagenomics_snakemake

Snakemake-based pipeline for metagenomics classification. Currently support short-reads Illumina Sequences and long-reads ONT Sequences.

https://github.com/pablorr24/metagenomics_snakemake

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 13 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.4%) to scientific vocabulary

Keywords

metagenomics pipeline

Last synced: 9 months ago · JSON representation ·

Repository

Snakemake-based pipeline for metagenomics classification. Currently support short-reads Illumina Sequences and long-reads ONT Sequences.

Basic Info

Host: GitHub
Owner: pablorr24
Language: HTML
Default Branch: main
Homepage:
Size: 319 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Topics

metagenomics pipeline

Created about 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Citation

README.html

README

Snakemake Metagenomics Workflow

Summary:
This Snakemake-based program allows to classify metagenomic samples
from NGS and ONT samples.

Brief Description
The program allows to clasify and analyze NGS sequences and long-read
ONT samples. The program consists of 3 workflows: short-reads
classification, long-reads classification and post-classification
workflow. The short-reads workflow and the long-reads workflow have a
QC-only mode, which runs FastQC and NanoPlot respectively. This method
is useful to evaluate sequence quality before classification. The
post-classification workflow works on the results of the classification
workflows and provides additional information using a metadata file and
an additional target variable.

Prerequisites
This installation requires git and conda/miniconda. If they are
already installed, skip these steps, otherwise install them they with
the following steps:
Git Installation:

sudo apt install git
Miniconda Installation:
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
After installing, initialize your newly-installed Miniconda. The following commands initialize for bash and zsh shells:zrm -rf ~/miniconda3/miniconda.sh
After installing, initialize your newly-installed Miniconda. The
following commands initialize for bash and zsh shells:
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Snakemake Installation
git clone https://github.com/pablorr24/metagenomics_snakemake/
cd metagenomics/Snakemake
conda env create -f environment.yml -n snakemake_meta
conda activate snakemake_meta

Database Installation
If you already have a database such as Silva, Greengenes, RefSeq,
Kraken2, or a similar classification database, you can skip this step.
Otherwise, make sure you install a database. The following instructions
will download and install the Silva database
kraken2-build --special silva --db SilvaDB
kraken2-build --special greengenes --db greengenes

Running a workflow
To run a workflow, first modify the configuration file and adjust to
your parameters. Afterwards, run Snakemake

Short-reads
snakemake -s Snakefile_fastqc --cores all

snakemake -s Snakefile_full_workflow --cores all

Long-reads
snakemake -s Snakefile_nanoplot --cores all

snakemake -s Snakefile_long_read --cores all

Post Classification Workflow
snakemake -s Snakefile_post_analysis --cores all
Metadata File

The post-classification workflow requires a metadata file, with one row
per sample, and different columns specifying specific sample variables
(sample location, species, etc).

Output
After running the workflow, a timestamped folder is created in the
output folder. All your results will be inside this
folder.

References
Rules
The rule ‘create_otu_table’ uses a modified version of the
‘kraken2OTU.py’ script created by GitHub user sipost1, available in https://github.com/sipost1/kraken2OTUtable/blob/main/kraken2otu.py
The rules ‘calculate_alpha_diversity’ and ‘calculate_beta_diversity’
use a modified version of the ‘alpha_diversity.py’and
’beta_diversity_py’ script created by GitHub user jenniferlu717 in the
DiversityTools repository, available in https://github.com/jenniferlu717/KrakenTools/blob/master/DiversityTools/alpha_diversity.py
The rules ‘create_dendogram’ and ‘create_pcoa_plot’ use a
modified_version of the ‘dendro.R’ and ‘pca.R’ created by GitHub user
GATB in the simka repository, both available in https://github.com/GATB/simka/tree/master/scripts/visualization
External Software
Andrews, S. (2010). FastQC: A Quality Control Tool for High
Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A
flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.
Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis
with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0
Ondov, B.D., Bergman, N.H. & Phillippy, A.M. Interactive
metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385
(2011). https://doi.org/10.1186/1471-2105-12-385
Wouter De Coster, Svenn D’Hert, Darrin T Schultz, Marc Cruts,
Christine Van Broeckhoven, NanoPack: visualizing and processing
long-read sequencing data, Bioinformatics, Volume 34, Issue 15, August
2018, Pages 2666–2669, https://doi.org/10.1093/bioinformatics/bty149
Kim D, Song L, Breitwieser FP, Salzberg SL. Centrifuge: rapid and
sensitive classification of metagenomic sequences. Genome Res. 2016
Dec;26(12):1721-1729. doi: 10.1101/gr.210641.116. Epub 2016 Oct 17.
PMID: 27852649; PMCID: PMC5131823.

Owner

Login: pablorr24
Kind: user

Repositories: 2
Profile: https://github.com/pablorr24

Citation (citations/fastqc_citation.txt)

Short-reads FastQC workflow

FastQC
Andrews, S. (2010). FastQC:  A Quality Control Tool for High Throughput Sequence Data [Online]. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

GitHub Events

Total

Last Year

Dependencies

environment.yml conda

_libgcc_mutex 0.1
_openmp_mutex 4.5
aioeasywebdav 2.4.0
aiohttp 3.9.0
aiosignal 1.2.0
amply 0.1.6
appdirs 1.4.4
attmap 0.13.2
attrs 23.1.0
bcrypt 3.2.0
boto3 1.29.1
botocore 1.32.1
brotli-python 1.0.9
bzip2 1.0.8
c-ares 1.21.0
ca-certificates 2023.08.22
cachetools 4.2.2
certifi 2023.11.17
cffi 1.16.0
charset-normalizer 2.0.4
coin-or-cbc 2.10.10
coin-or-cgl 0.60.7
coin-or-clp 1.17.8
coin-or-osi 0.108.8
coin-or-utils 2.11.9
coincbc 2.10.10
configargparse 1.4
connection_pool 0.0.3
cryptography 41.0.7
datrie 0.8.2
defusedxml 0.7.1
docutils 0.18.1
dpath 2.1.6
dropbox 11.36.1
eido 0.2.2
fastqc 0.11.8
filechunkio 1.8
frozenlist 1.4.0
ftputil 5.0.4
gdbm 1.18
gitdb 4.0.7
gitpython 3.1.37
google-api-core 2.10.1
google-api-python-client 2.108.0
google-auth 2.22.0
google-auth-httplib2 0.1.1
google-cloud-core 2.3.2
google-cloud-storage 2.6.0
google-crc32c 1.5.0
google-resumable-media 2.4.0
googleapis-common-protos 1.56.4
grpcio 1.59.3
httplib2 0.22.0
humanfriendly 10.0
idna 3.4
iniconfig 1.1.1
jinja2 3.1.2
jmespath 1.0.1
jsonschema 4.19.2
jsonschema-specifications 2023.7.1
jupyter_core 5.5.0
krona 2.7
ld_impl_linux-64 2.38
libabseil 20230802.1
libblas 3.9.0
libcblas 3.9.0
libcrc32c 1.1.2
libexpat 2.5.0
libffi 3.4.4
libgcc-ng 13.2.0
libgfortran-ng 13.2.0
libgfortran5 13.2.0
libgomp 13.2.0
libgrpc 1.59.3
liblapack 3.9.0
liblapacke 3.9.0
libnsl 2.0.0
libopenblas 0.3.24
libprotobuf 4.24.4
libre2-11 2023.06.02
libsodium 1.0.18
libsqlite 3.44.0
libstdcxx-ng 13.2.0
libuuid 2.38.1
libzlib 1.2.13
logmuse 0.2.6
markdown-it-py 2.2.0
markupsafe 2.1.1
mdurl 0.1.0
multidict 6.0.4
nbformat 5.9.2
ncurses 6.4
numpy 1.26.0
oauth2client 4.1.3
openjdk 8.0.152
openssl 3.1.4
packaging 23.1
pandas 2.1.3
paramiko 2.8.1
peppy 0.35.7
perl 5.34.0
perl-threaded 5.32.1
pip 23.3.1
plac 1.3.4
platformdirs 3.10.0
pluggy 1.0.0
ply 3.11
prettytable 3.5.0
protobuf 4.24.4
psutil 5.9.0
pulp 2.7.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
pygments 2.15.1
pynacl 1.5.0
pyopenssl 23.2.0
pyparsing 3.0.9
pysftp 0.2.9
pysocks 1.7.1
pytest 7.4.0
python 3.11.6
python-dateutil 2.8.2
python-fastjsonschema 2.16.2
python-irodsclient 1.1.9
python-tzdata 2023.3
python_abi 3.11
pytz 2023.3.post1
pyyaml 6.0.1
re2 2023.06.02
readline 8.2
referencing 0.30.2
requests 2.31.0
reretry 0.11.8
rich 13.3.5
rpds-py 0.10.6
rsa 4.7.2
s3transfer 0.7.0
setuptools 68.0.0
setuptools-scm 7.1.0
six 1.16.0
slacker 0.14.0
smart_open 5.2.1
smmap 4.0.0
snakemake 7.32.4
snakemake-minimal 7.32.4
stone 3.3.1
stopit 1.1.2
tabulate 0.9.0
throttler 1.2.2
tk 8.6.13
toposort 1.10
traitlets 5.7.1
trimmomatic 0.39
typing-extensions 4.7.1
typing_extensions 4.7.1
tzdata 2023c
ubiquerg 0.6.3
uritemplate 4.1.1
urllib3 1.26.18
veracitools 0.1.3
wcwidth 0.2.5
wheel 0.41.2
wrapt 1.14.1
xz 5.4.5
yaml 0.2.5
yarl 1.9.3
yte 1.5.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

metagenomics_snakemake

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.html

Snakemake Metagenomics Workflow

Summary:

Brief Description

Prerequisites

Snakemake Installation

Database Installation

Running a workflow

Short-reads

Long-reads

Post Classification Workflow

Output

References