https://github.com/asadprodhan/phylotree

Bayesian phylogenetic analysis

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary

Keywords

bayesian-inference nextflow-pipelines phylogenetic-trees singularity-container

Last synced: 5 months ago · JSON representation

Repository

Bayesian phylogenetic analysis

Basic Info

Host: GitHub
Owner: asadprodhan
License: gpl-3.0
Language: Nextflow
Default Branch: main
Homepage: https://github.com/asadprodhan/phyloTree
Size: 68.4 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

bayesian-inference nextflow-pipelines phylogenetic-trees singularity-container

Created almost 3 years ago · Last pushed 8 months ago

Metadata Files

Readme License

README.md

phyloTree, a Phylogenetic Analysis Workflow using Nextflow and Singularity

M. Asaduzzaman Prodhan^*

DPIRD Diagnostics and Laboratory Services

Department of Primary Industries and Regional Development

3 Baron-Hay Court, South Perth, WA 6151, Australia

^*Correspondence: Asad.Prodhan@dpird.wa.gov.au

Step 1: Create a conda environment and install the required packages

Create a conda environment

conda create -n phyloTree

Activate the environment

conda activate phyloTree

Install the following packages

conda install -c bioconda nextflow

conda install -c conda-forge singularity

conda install -c bioconda trimal

conda install bioconda::iqtree

conda install bioconda::emboss

emboss is required for seqret that is used later on for converting the alignment file format from fasta to nex

conda install conda-forge::dos2unix

conda install bioconda::mafft

Step 2: Make an alignment and trim it

Alignment

mafft --auto all_genomes_concatenated_together.fasta > all_genomes_alignment.fasta
Trimming

trimal -in all_genomes_alignment.fasta -out trimmed_all_genomes_alignment.fasta -gappyout

Step 3: Find the best model

The following command tests only the MrBayes supported models

iqtree2 -s trimmed_all_genomes_alignment.fasta -mset JC,F81,K2P,HKY85,GTR,SYM,TrN,JC+I,JC+G,JC+I+G,F81+I,F81+G,F81+I+G,K2P+I,K2P+G,K2P+I+G,HKY85+I,HKY85+G,HKY85+I+G,GTR+I,GTR+G,GTR+I+G,SYM+I,SYM+G,SYM+I+G,TrN+I,TrN+G,TrN+I+G -m TEST

Step 4: Run phyloTree pipeline

Make a directory
Change the format of the alignment from fasta to nex to be compatible with MrBayes

seqret -sequence trimmed_all_genomes_alignment.fasta -outseq trimmed_all_genomes_alignment.nex -osformat2 nexus

Keep your alignment/s (nexus format, e.g., cox1_alignment.nex) in this directory

Note: you can only replace ‘cox1’. Your alignment name must end up with ‘_alignment.nex’.

Keep the following file (again, nexus format, e.g., zzzmrbayesparameters.nex) in the same directory.

BEGIN mrbayes; lset nst=6 rates=invgamma; propset ExtTBR$prob=0; mcmc ngen=1000000 printfreq=100 samplefreq=1000 diagnfreq=1000 nchains=4 savebrlens=yes; sumt burnin=12500; sump burnin=12500; END;

“zzzmrbayesparameters.nex” contains the analysis run parameters. You can modify them. See the explanation below.
however, you must keep the file name same, i.e., “zzzmrbayesparameters.nex”
Run the following command

nextflow run asadprodhan/phyloTree -r 73f3c10

Step 3: Look at the outputs

phyloTree will generate three directories
- ‘results’ that contains the results
- ‘reports’ that contains the run reports
- ‘work’ that contains the temporary files
Visualise the tree
- ‘yourAlignmentName.mrbayes.con.tre’ is the file that contains the tree with the ‘posterior probability’ supports. You can visualise the tree using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).

Parameters in the “zzzmrbayesparameters.nex” file explained

“lset nst=6 rates=invgamma” sets a nucleotide substitution model called “GTR + I + G”

The usage of maximum likelihood method in phylogenetic analysis requires a nucleotide substitution model such as “GTR + I + G”. “GTR + I + G” is a widely used General Time Reversible (GTR) nucleotide substitution model with gamma-distributed rate variation across sites (G) and a proportion of invariable sites (I). The invariable sites account for the static, unchanging sites in a dataset.

“ngen” is the number of generations for which the analysis will be run
“printfreq” controls the frequency with which brief info about the analysis is printed to screen. The default value is 1,000.
“samplefreq” determines how often the chain is sampled; the default is every 500 generations
diagnostics calculated every “diagnfreq” generation
By default, MrBayes uses Metropolis coupling to improve the MCMC sampling of the target distribution. The Swapfreq, Nswaps, Nchains, and Temp settings together control the Metropolis coupling behavior. When Nchains is set to 1, no heating is used. When Nchains is set to a value n larger than 1, then n−1 heated chains are used. By default, Nchains is set to 4, meaning that MrBayes will use 3 heated chains and one “cold” chain.
“sumt” summarises statistics and creates five additional files
“sump” summarises the parameter values
sumt or sump is calculated as = (number of generations/sample frequency)/4. ‘4’ represents 25%
Every time the diagnostics are calculated, either a fixed number of samples (burnin) or a percentage of samples (burninfrac) from the beginning of the chain is discarded.

Owner

Name: Asad Prodhan
Login: asadprodhan
Kind: user
Location: Perth, Australia
Company: Department of Primary Industries and Regional Development

Website: www.linkedin.com/in/asadprodhan
Twitter: Asad_Prodhan
Repositories: 2
Profile: https://github.com/asadprodhan

Laboratory Scientist at DPIRD. My work involves Oxford Nanopore Sequencing and Bioinformatics for pest and pathogen diagnosis.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/asadprodhan/phylotree

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

phyloTree, a Phylogenetic Analysis Workflow using Nextflow and Singularity

M. Asaduzzaman Prodhan^*

Step 1: Create a conda environment and install the required packages

Step 2: Make an alignment and trim it

Step 3: Find the best model

Step 4: Run phyloTree pipeline

Step 3: Look at the outputs

Parameters in the “zzzmrbayesparameters.nex” file explained

Owner

GitHub Events

Total

Last Year

https://github.com/asadprodhan/phylotree

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

phyloTree, a Phylogenetic Analysis Workflow using Nextflow and Singularity

M. Asaduzzaman Prodhan*

Step 1: Create a conda environment and install the required packages

Step 2: Make an alignment and trim it

Step 3: Find the best model

Step 4: Run phyloTree pipeline

Step 3: Look at the outputs

Parameters in the “zzzmrbayesparameters.nex” file explained

Owner

GitHub Events

Total

Last Year

M. Asaduzzaman Prodhan^*