Phylen

Phylen: automatic phylogenetic reconstruction using the EggNOG database - Published in JOSS (2018)

https://github.com/iferres/phylen

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bioinformatics eggnog genome hidden-markov-models phylogeny r
Last synced: 6 months ago · JSON representation

Repository

Compute a phylogeny using EggNOG database

Basic Info
  • Host: GitHub
  • Owner: iferres
  • License: mit
  • Language: R
  • Default Branch: master
  • Size: 6.44 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Topics
bioinformatics eggnog genome hidden-markov-models phylogeny r
Created over 8 years ago · Last pushed over 6 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

Travis-CI Build Status DOI

phylen

Phylen is an R package that performs automatic phylogenetic reconstruction given a set of Hidden Markov Models (HMMs). Genomes are screened against these HMMs, genes found in all genomes ("core genes") are aligned individually, those alignments are concatenated into a single supergene alignment, and a phylogenetic reconstruction is performed and returned as an object of class "phylo" so it can be further analysed using ape/phangorn framework in R. Functions to download well curated HMMs from clade-specific orthologous sets from the EggNOG database are provided although any custom set of HMMs can be used as well.

Installation

The easiest way to install this package is using devtools package:

r if(!require(devtools)){ install.packages("devtools") } devtools::install_github("iferres/phylen")

Requirements

phylen depends on HMMER 3.1b2 and MAFFT. You should have them installed on you $PATH variable prior to using this software. Besure that the following MAFFT aliases are also installed: linsi (alias for mafft --maxiterate 1000 --localpair), ginsi (alias for mafft --maxiterate 1000 --globalpair) and einsi (alias for mafft --ep 0 --maxiterate 1000 --genafpair). This shortcuts are installed together with mafft if you download and install the software from the MAFFT webpage (link above), but probably not if use a package manager as apt or brew.

It also depends on phangorn package, which in turn depends on igraph. Some system requirements are needed to install the latter, please check them here.

Standard workflow

This tutorial begins with the extraction of toy data attached on this package. It consists in 10 genomes of the Campylobacterales order, 5 of them are Campylobacter species and the other 5 are Helicobacter species.

Note: phylen input files must be in gff3 format, as returned by the prokka annotation software.

``` r

List the attached tar.gz file

tgz <- system.file('extdata', 'toydata.tar.gz', package = 'phylen')

List the files inside it

gff <- untar(tarfile = tgz, exdir = getwd(), list = T)

Decompress on current working directory

untar(tarfile = tgz,exdir = getwd())

List the decompressed files

gff ```

##  [1] "C_fetus_subsp_testudinum_Sp3.gff"                   
##  [2] "C_fetus_subsp_venerealis_str_84-112.gff"            
##  [3] "C_hyointestinalis_subsp_lawsonii_CCUG_27631.gff"    
##  [4] "C_iguaniorum_str_RM11343.gff"                       
##  [5] "C_pinnipediorum_subsp_pinnipediorum_str_RM17262.gff"
##  [6] "H_bilis_str_AAQJH.gff"                              
##  [7] "H_himalayensis_str_YS1.gff"                         
##  [8] "H_pylori_J166.gff"                                  
##  [9] "H_pylori_str_2018.gff"                              
## [10] "H_typhlonius_str_1.gff"

Let's load the phylen package and list the available Hidden Markov Models (HMMs) on the EggNOG database.

``` r

Load the phylen package

library(phylen) ```

## Loading required package: phangorn

## Loading required package: ape

``` r

List first 50 available sets of EggNOG

list_eggnogdb()[1:50, ] ```

##                level.name nog.prefix
## 1                    LUCA        NOG
## 2           Acidobacteria     aciNOG
## 3          Acidobacteriia    acidNOG
## 4            Aconoidasida     acoNOG
## 5          Actinobacteria     actNOG
## 6              Agaricales     agaNOG
## 7          Agaricomycetes    agarNOG
## 8             Apicomplexa     apiNOG
## 9    Proteobacteria_alpha    aproNOG
## 10              Aquificae     aquNOG
## 11                Archaea      arNOG
## 12           Archaeoglobi     arcNOG
## 13             Arthropoda     artNOG
## 14      Arthrodermataceae    arthNOG
## 15             Ascomycota     ascNOG
## 16                   Aves     aveNOG
## 17                Bacilli     bacNOG
## 18               Bacteria    bactNOG
## 19            Bacteroidia   bacteNOG
## 20          Basidiomycota     basNOG
## 21          Bacteroidetes    bctoNOG
## 22              Bilateria      biNOG
## 23    Proteobacteria_beta    bproNOG
## 24            Brassicales     braNOG
## 25              Carnivora     carNOG
## 26          Chaetomiaceae     chaNOG
## 27               Chlorobi     chlNOG
## 28             Chlamydiae    chlaNOG
## 29            Chloroflexi    chloNOG
## 30            Chloroflexi   chlorNOG
## 31            Chlorophyta  chloroNOG
## 32               Chordata    chorNOG
## 33            Chromadorea     chrNOG
## 34             Clostridia     cloNOG
## 35               Coccidia     cocNOG
## 36          Crenarchaeota     creNOG
## 37      Cryptosporidiidae     cryNOG
## 38          Cyanobacteria     cyaNOG
## 39             Cytophagia     cytNOG
## 40      Debaryomycetaceae     debNOG
## 41        Deferribacteres     defNOG
## 42      Dehalococcoidetes     dehNOG
## 43     Deinococcusthermus     deiNOG
## 44          delta/epsilon     delNOG
## 45                Diptera     dipNOG
## 46        Dothideomycetes     dotNOG
## 47   Proteobacteria_delta    dproNOG
## 48          Drosophilidae     droNOG
## 49 Proteobacteria_epsilon    eproNOG
## 50        Erysipelotrichi     eryNOG

The order Campylobacterales belongs to the class Epsilonproteobacteria, which is listed at row number 49. The corresponding set of HMMs is, then, eproNOG.

We will download the HMMs directly from the EggNOG servers using the download_nog_hmm() function.

r hmm <- download_nog_hmm(nog.prefix = "eproNOG", onDir = getwd()) hmm

## [1] "/home/iferres/Documents/eproNOG.hmm.tar.gz"

Now we have the HMMs, so we can proceed to run the main function in order to obtain a core genome alignment, and a phylogeny. This would take ~20-30 min using 4 CPUs.

r p <- phylen(gffs = gff, # The gff files extracted on the first step hmmFile = hmm, # The downloaded HMMs isCompressed = TRUE, # The downloaded HMMs are compressed (tar.gz) phyloMode = "ml", # nj or ml, in this case we will use maximum likelihood nbs = 100, # The number of bootstrap. level = 100, # The percentage of genomes a gene must be in to be considered as part of the coregenome. outDir = "phylen", # Lets create a directory called "phylen" to put the output files n_threads = 4) # Use 4 threads

## Decompressing.. DONE!
## Concatenating.. 
## ===========================================================================
## DONE!
## Pressing.. DONE!
## Getting information from hmms.. DONE!
## Searching.. DONE!
## Computing panmatrix.. DONE!
## Getting core-genes.. DONE!
## Writting fastas.. DONE!
## Aligning.. DONE!
## Concatenating.. DONE!
## Removing intermediate files.. DONE!
## Generating phylogeny..
## optimize edge weights:  -5958637 --> -5731557 
## optimize base frequencies:  -5731557 --> -5686007 
## optimize rate matrix:  -5686007 --> -5621016 
## optimize edge weights:  -5621016 --> -5619537 
## optimize base frequencies:  -5619537 --> -5616203 
## optimize rate matrix:  -5616203 --> -5615171 
## optimize edge weights:  -5615171 --> -5615158 
## optimize base frequencies:  -5615158 --> -5614921 
## optimize rate matrix:  -5614921 --> -5614863 
## optimize edge weights:  -5614863 --> -5614861 
## optimize base frequencies:  -5614861 --> -5614846 
## optimize rate matrix:  -5614846 --> -5614842 
## optimize edge weights:  -5614842 --> -5614842 
## optimize base frequencies:  -5614842 --> -5614841 
## optimize rate matrix:  -5614841 --> -5614841 
## optimize edge weights:  -5614841 --> -5614841 
## optimize base frequencies:  -5614841 --> -5614841 
## optimize rate matrix:  -5614841 --> -5614841 
## optimize edge weights:  -5614841 --> -5614841

## Generating phylogeny.. DONE!
## 
## Finished: 658 groups of orthologous from 10 isolates have been used in the alignment.
## Returning an object of class "phylo" with 10 tips and 8 nodes.

Finally we have a "phylo" object, and we can continue analizing it with phangorn and ape packages.

``` r

Print it

p ```

## 
## Phylogenetic tree with 10 tips and 8 internal nodes.
## 
## Tip labels:
##  C_fetus_subsp_testudinum_Sp3, C_fetus_subsp_venerealis_str_84-112, C_hyointestinalis_subsp_lawsonii_CCUG_27631, C_iguaniorum_str_RM11343, C_pinnipediorum_subsp_pinnipediorum_str_RM17262, H_bilis_str_AAQJH, ...
## Node labels:
##  100, 100, 100, 100, 100, 100, ...
## 
## Unrooted; includes branch lengths.

``` r

Class?

class(p) ```

## [1] "phylo"

``` r

Plot it

plot(p, type = 'unrooted', cex = 0.7, lab4ut = 'axial') ```

Any questions?

If you have any question about usage, implementation, or result interpretation, please feel free to ask. Send them to the issue tracker or directly to my e-mail (iferres@pasteur.edu.uy). If you think your concern could help others, we encourage you to use the first channel of communication.

Contributing

If you want to contribute to the development of this software, please refer to the contributing guidelines.

Citation

Ferrés et al., (2018). Phylen: automatic phylogenetic reconstruction using the EggNOG database. Journal of Open Source Software, 3(25), 593, DOI: 10.21105/joss.00593

Owner

  • Name: Ignacio Ferrés
  • Login: iferres
  • Kind: user
  • Location: Montevideo - Uruguay
  • Company: Institut Pasteur Montevideo

Bioinformatics. Research assistant at Institut Pasteur Montevideo (Uruguay), Bioinformatics Unit, Microbial Genomics Lab

JOSS Publication

Phylen: automatic phylogenetic reconstruction using the EggNOG database
Published
May 04, 2018
Volume 3, Issue 25, Page 593
Authors
Ignacio Ferrés ORCID
Unidad de Bioinformática, Institut Pasteur de Montevideo, Uruguay
Gregorio Iraola ORCID
Unidad de Bioinformática, Institut Pasteur de Montevideo, Uruguay
Editor
Kevin M. Moerman ORCID
Tags
phylogenetics bioinformatics EggNOG

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 113
  • Total Committers: 3
  • Avg Commits per committer: 37.667
  • Development Distribution Score (DDS): 0.106
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
iferres i****s@g****m 101
Gregorio Iraola g****a@p****y 10
klash k****p@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 7
  • Total pull requests: 11
  • Average time to close issues: 7 days
  • Average time to close pull requests: about 5 hours
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 2.14
  • Average comments per pull request: 0.36
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kortschak (6)
  • iferres (1)
Pull Request Authors
  • giraola (10)
  • KlausVigo (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • ape * depends
  • phangorn * depends
  • XML * imports
  • graphics * imports
  • httr * imports
  • parallel * imports
  • seqinr * imports
  • testthat * suggests