https://github.com/asadprodhan/phylotree
Bayesian phylogenetic analysis
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Keywords
Repository
Bayesian phylogenetic analysis
Basic Info
- Host: GitHub
- Owner: asadprodhan
- License: gpl-3.0
- Language: Nextflow
- Default Branch: main
- Homepage: https://github.com/asadprodhan/phyloTree
- Size: 68.4 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
phyloTree, a Phylogenetic Analysis Workflow using Nextflow and Singularity
M. Asaduzzaman Prodhan*
Step 1: Create a conda environment and install the required packages
- Create a conda environment
conda create -n phyloTree
- Activate the environment
conda activate phyloTree
Install the following packages
conda install -c bioconda nextflowconda install -c conda-forge singularityconda install -c bioconda trimalconda install bioconda::iqtreeconda install bioconda::embossemboss is required for seqret that is used later on for converting the alignment file format from fasta to nex
conda install conda-forge::dos2unixconda install bioconda::mafft
Step 2: Make an alignment and trim it
Alignment
mafft --auto all_genomes_concatenated_together.fasta > all_genomes_alignment.fastaTrimming
trimal -in all_genomes_alignment.fasta -out trimmed_all_genomes_alignment.fasta -gappyout
Step 3: Find the best model
The following command tests only the MrBayes supported models
iqtree2 -s trimmed_all_genomes_alignment.fasta -mset JC,F81,K2P,HKY85,GTR,SYM,TrN,JC+I,JC+G,JC+I+G,F81+I,F81+G,F81+I+G,K2P+I,K2P+G,K2P+I+G,HKY85+I,HKY85+G,HKY85+I+G,GTR+I,GTR+G,GTR+I+G,SYM+I,SYM+G,SYM+I+G,TrN+I,TrN+G,TrN+I+G -m TEST
Step 4: Run phyloTree pipeline
Make a directory
Change the format of the alignment from fasta to nex to be compatible with MrBayes
seqret -sequence trimmed_all_genomes_alignment.fasta -outseq trimmed_all_genomes_alignment.nex -osformat2 nexus
- Keep your alignment/s (nexus format, e.g., cox1_alignment.nex) in this directory
Note: you can only replace ‘cox1’. Your alignment name must end up with ‘_alignment.nex’.
- Keep the following file (again, nexus format, e.g., zzzmrbayesparameters.nex) in the same directory.
BEGIN mrbayes;
lset nst=6 rates=invgamma;
propset ExtTBR$prob=0;
mcmc ngen=1000000 printfreq=100 samplefreq=1000 diagnfreq=1000 nchains=4 savebrlens=yes;
sumt burnin=12500;
sump burnin=12500;
END;
“zzzmrbayesparameters.nex” contains the analysis run parameters. You can modify them. See the explanation below.
however, you must keep the file name same, i.e., “zzzmrbayesparameters.nex”
Run the following command
nextflow run asadprodhan/phyloTree -r 73f3c10
Step 3: Look at the outputs
phyloTree will generate three directories
- ‘results’ that contains the results
- ‘reports’ that contains the run reports
- ‘work’ that contains the temporary files
Visualise the tree
- ‘yourAlignmentName.mrbayes.con.tre’ is the file that contains the tree with the ‘posterior probability’ supports. You can visualise the tree using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
Parameters in the “zzzmrbayesparameters.nex” file explained
- “lset nst=6 rates=invgamma” sets a nucleotide substitution model called “GTR + I + G”
The usage of maximum likelihood method in phylogenetic analysis requires a nucleotide substitution model such as “GTR + I + G”. “GTR + I + G” is a widely used General Time Reversible (GTR) nucleotide substitution model with gamma-distributed rate variation across sites (G) and a proportion of invariable sites (I). The invariable sites account for the static, unchanging sites in a dataset.
“ngen” is the number of generations for which the analysis will be run
“printfreq” controls the frequency with which brief info about the analysis is printed to screen. The default value is 1,000.
“samplefreq” determines how often the chain is sampled; the default is every 500 generations
diagnostics calculated every “diagnfreq” generation
By default, MrBayes uses Metropolis coupling to improve the MCMC sampling of the target distribution. The Swapfreq, Nswaps, Nchains, and Temp settings together control the Metropolis coupling behavior. When Nchains is set to 1, no heating is used. When Nchains is set to a value n larger than 1, then n−1 heated chains are used. By default, Nchains is set to 4, meaning that MrBayes will use 3 heated chains and one “cold” chain.
“sumt” summarises statistics and creates five additional files
“sump” summarises the parameter values
sumt or sump is calculated as = (number of generations/sample frequency)/4. ‘4’ represents 25%
Every time the diagnostics are calculated, either a fixed number of samples (burnin) or a percentage of samples (burninfrac) from the beginning of the chain is discarded.
Owner
- Name: Asad Prodhan
- Login: asadprodhan
- Kind: user
- Location: Perth, Australia
- Company: Department of Primary Industries and Regional Development
- Website: www.linkedin.com/in/asadprodhan
- Twitter: Asad_Prodhan
- Repositories: 2
- Profile: https://github.com/asadprodhan
Laboratory Scientist at DPIRD. My work involves Oxford Nanopore Sequencing and Bioinformatics for pest and pathogen diagnosis.
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5