Phylen
Phylen: automatic phylogenetic reconstruction using the EggNOG database - Published in JOSS (2018)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
Compute a phylogeny using EggNOG database
Basic Info
- Host: GitHub
- Owner: iferres
- License: mit
- Language: R
- Default Branch: master
- Size: 6.44 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
phylen
Phylen is an R package that performs automatic phylogenetic reconstruction given a set of Hidden Markov Models (HMMs). Genomes are screened against these HMMs, genes found in all genomes ("core genes") are aligned individually, those alignments are concatenated into a single supergene alignment, and a phylogenetic reconstruction is performed and returned as an object of class "phylo" so it can be further analysed using ape/phangorn framework in R. Functions to download well curated HMMs from clade-specific orthologous sets from the EggNOG database are provided although any custom set of HMMs can be used as well.
Installation
The easiest way to install this package is using devtools package:
r
if(!require(devtools)){
install.packages("devtools")
}
devtools::install_github("iferres/phylen")
Requirements
phylen depends on HMMER 3.1b2 and MAFFT. You should have them installed on you $PATH variable prior to using this software. Besure that the following MAFFT aliases are also installed: linsi (alias for mafft --maxiterate 1000 --localpair), ginsi (alias for mafft --maxiterate 1000 --globalpair) and einsi (alias for mafft --ep 0 --maxiterate 1000 --genafpair). This shortcuts are installed together with mafft if you download and install the software from the MAFFT webpage (link above), but probably not if use a package manager as apt or brew.
It also depends on phangorn package, which in turn depends on igraph. Some system requirements are needed to install the latter, please check them here.
Standard workflow
This tutorial begins with the extraction of toy data attached on this package. It consists in 10 genomes of the Campylobacterales order, 5 of them are Campylobacter species and the other 5 are Helicobacter species.
Note: phylen input files must be in gff3 format, as returned by the prokka annotation software.
``` r
List the attached tar.gz file
tgz <- system.file('extdata', 'toydata.tar.gz', package = 'phylen')
List the files inside it
gff <- untar(tarfile = tgz, exdir = getwd(), list = T)
Decompress on current working directory
untar(tarfile = tgz,exdir = getwd())
List the decompressed files
gff ```
## [1] "C_fetus_subsp_testudinum_Sp3.gff"
## [2] "C_fetus_subsp_venerealis_str_84-112.gff"
## [3] "C_hyointestinalis_subsp_lawsonii_CCUG_27631.gff"
## [4] "C_iguaniorum_str_RM11343.gff"
## [5] "C_pinnipediorum_subsp_pinnipediorum_str_RM17262.gff"
## [6] "H_bilis_str_AAQJH.gff"
## [7] "H_himalayensis_str_YS1.gff"
## [8] "H_pylori_J166.gff"
## [9] "H_pylori_str_2018.gff"
## [10] "H_typhlonius_str_1.gff"
Let's load the phylen package and list the available Hidden Markov Models (HMMs) on the EggNOG database.
``` r
Load the phylen package
library(phylen) ```
## Loading required package: phangorn
## Loading required package: ape
``` r
List first 50 available sets of EggNOG
list_eggnogdb()[1:50, ] ```
## level.name nog.prefix
## 1 LUCA NOG
## 2 Acidobacteria aciNOG
## 3 Acidobacteriia acidNOG
## 4 Aconoidasida acoNOG
## 5 Actinobacteria actNOG
## 6 Agaricales agaNOG
## 7 Agaricomycetes agarNOG
## 8 Apicomplexa apiNOG
## 9 Proteobacteria_alpha aproNOG
## 10 Aquificae aquNOG
## 11 Archaea arNOG
## 12 Archaeoglobi arcNOG
## 13 Arthropoda artNOG
## 14 Arthrodermataceae arthNOG
## 15 Ascomycota ascNOG
## 16 Aves aveNOG
## 17 Bacilli bacNOG
## 18 Bacteria bactNOG
## 19 Bacteroidia bacteNOG
## 20 Basidiomycota basNOG
## 21 Bacteroidetes bctoNOG
## 22 Bilateria biNOG
## 23 Proteobacteria_beta bproNOG
## 24 Brassicales braNOG
## 25 Carnivora carNOG
## 26 Chaetomiaceae chaNOG
## 27 Chlorobi chlNOG
## 28 Chlamydiae chlaNOG
## 29 Chloroflexi chloNOG
## 30 Chloroflexi chlorNOG
## 31 Chlorophyta chloroNOG
## 32 Chordata chorNOG
## 33 Chromadorea chrNOG
## 34 Clostridia cloNOG
## 35 Coccidia cocNOG
## 36 Crenarchaeota creNOG
## 37 Cryptosporidiidae cryNOG
## 38 Cyanobacteria cyaNOG
## 39 Cytophagia cytNOG
## 40 Debaryomycetaceae debNOG
## 41 Deferribacteres defNOG
## 42 Dehalococcoidetes dehNOG
## 43 Deinococcusthermus deiNOG
## 44 delta/epsilon delNOG
## 45 Diptera dipNOG
## 46 Dothideomycetes dotNOG
## 47 Proteobacteria_delta dproNOG
## 48 Drosophilidae droNOG
## 49 Proteobacteria_epsilon eproNOG
## 50 Erysipelotrichi eryNOG
The order Campylobacterales belongs to the class Epsilonproteobacteria, which is listed at row number 49. The corresponding set of HMMs is, then, eproNOG.
We will download the HMMs directly from the EggNOG servers using the download_nog_hmm() function.
r
hmm <- download_nog_hmm(nog.prefix = "eproNOG", onDir = getwd())
hmm
## [1] "/home/iferres/Documents/eproNOG.hmm.tar.gz"
Now we have the HMMs, so we can proceed to run the main function in order to obtain a core genome alignment, and a phylogeny. This would take ~20-30 min using 4 CPUs.
r
p <- phylen(gffs = gff, # The gff files extracted on the first step
hmmFile = hmm, # The downloaded HMMs
isCompressed = TRUE, # The downloaded HMMs are compressed (tar.gz)
phyloMode = "ml", # nj or ml, in this case we will use maximum likelihood
nbs = 100, # The number of bootstrap.
level = 100, # The percentage of genomes a gene must be in to be considered as part of the coregenome.
outDir = "phylen", # Lets create a directory called "phylen" to put the output files
n_threads = 4) # Use 4 threads
## Decompressing.. DONE!
## Concatenating..
## ===========================================================================
## DONE!
## Pressing.. DONE!
## Getting information from hmms.. DONE!
## Searching.. DONE!
## Computing panmatrix.. DONE!
## Getting core-genes.. DONE!
## Writting fastas.. DONE!
## Aligning.. DONE!
## Concatenating.. DONE!
## Removing intermediate files.. DONE!
## Generating phylogeny..
## optimize edge weights: -5958637 --> -5731557
## optimize base frequencies: -5731557 --> -5686007
## optimize rate matrix: -5686007 --> -5621016
## optimize edge weights: -5621016 --> -5619537
## optimize base frequencies: -5619537 --> -5616203
## optimize rate matrix: -5616203 --> -5615171
## optimize edge weights: -5615171 --> -5615158
## optimize base frequencies: -5615158 --> -5614921
## optimize rate matrix: -5614921 --> -5614863
## optimize edge weights: -5614863 --> -5614861
## optimize base frequencies: -5614861 --> -5614846
## optimize rate matrix: -5614846 --> -5614842
## optimize edge weights: -5614842 --> -5614842
## optimize base frequencies: -5614842 --> -5614841
## optimize rate matrix: -5614841 --> -5614841
## optimize edge weights: -5614841 --> -5614841
## optimize base frequencies: -5614841 --> -5614841
## optimize rate matrix: -5614841 --> -5614841
## optimize edge weights: -5614841 --> -5614841
## Generating phylogeny.. DONE!
##
## Finished: 658 groups of orthologous from 10 isolates have been used in the alignment.
## Returning an object of class "phylo" with 10 tips and 8 nodes.
Finally we have a "phylo" object, and we can continue analizing it with phangorn and ape packages.
``` r
Print it
p ```
##
## Phylogenetic tree with 10 tips and 8 internal nodes.
##
## Tip labels:
## C_fetus_subsp_testudinum_Sp3, C_fetus_subsp_venerealis_str_84-112, C_hyointestinalis_subsp_lawsonii_CCUG_27631, C_iguaniorum_str_RM11343, C_pinnipediorum_subsp_pinnipediorum_str_RM17262, H_bilis_str_AAQJH, ...
## Node labels:
## 100, 100, 100, 100, 100, 100, ...
##
## Unrooted; includes branch lengths.
``` r
Class?
class(p) ```
## [1] "phylo"
``` r
Plot it
plot(p, type = 'unrooted', cex = 0.7, lab4ut = 'axial') ```

Any questions?
If you have any question about usage, implementation, or result interpretation, please feel free to ask. Send them to the issue tracker or directly to my e-mail (iferres@pasteur.edu.uy). If you think your concern could help others, we encourage you to use the first channel of communication.
Contributing
If you want to contribute to the development of this software, please refer to the contributing guidelines.
Citation
Ferrés et al., (2018). Phylen: automatic phylogenetic reconstruction using the EggNOG database. Journal of Open Source Software, 3(25), 593, DOI: 10.21105/joss.00593
Owner
- Name: Ignacio Ferrés
- Login: iferres
- Kind: user
- Location: Montevideo - Uruguay
- Company: Institut Pasteur Montevideo
- Repositories: 3
- Profile: https://github.com/iferres
Bioinformatics. Research assistant at Institut Pasteur Montevideo (Uruguay), Bioinformatics Unit, Microbial Genomics Lab
JOSS Publication
Phylen: automatic phylogenetic reconstruction using the EggNOG database
Authors
Tags
phylogenetics bioinformatics EggNOGGitHub Events
Total
Last Year
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| iferres | i****s@g****m | 101 |
| Gregorio Iraola | g****a@p****y | 10 |
| klash | k****p@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 7
- Total pull requests: 11
- Average time to close issues: 7 days
- Average time to close pull requests: about 5 hours
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 2.14
- Average comments per pull request: 0.36
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- kortschak (6)
- iferres (1)
Pull Request Authors
- giraola (10)
- KlausVigo (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ape * depends
- phangorn * depends
- XML * imports
- graphics * imports
- httr * imports
- parallel * imports
- seqinr * imports
- testthat * suggests
