homoplasyfinder

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment

https://github.com/josephcrispell/homoplasyfinder

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary

Keywords

consistency homoplasy indels mutations nucleotide-alignment phylogeny
Last synced: 6 months ago · JSON representation ·

Repository

A tool to identify and annotate homoplasies on a phylogeny and sequence alignment

Basic Info
  • Host: GitHub
  • Owner: JosephCrispell
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 557 KB
Statistics
  • Stars: 18
  • Watchers: 3
  • Forks: 3
  • Open Issues: 6
  • Releases: 0
Topics
consistency homoplasy indels mutations nucleotide-alignment phylogeny
Created almost 8 years ago · Last pushed over 4 years ago
Metadata Files
Readme License Citation

README.md

GitHub stars


HomoplasyFinder

Author: Joseph Crispell

Licence: GPL-3

Requires: R (>= v3.3.3) & rJava (>= v10.0.1)



Description

HomoplasyFinder is an open-source tool designed to identify homoplasies on a phylogeny and its nucleotide alignment. HomoplasyFinder uses the consistency index to identify sites in the nucleotide alignment that are inconsistent with the phylogeny provided. The current R package was written to allow easy use of the Java code (which HomoplasyFinder uses) in R. Full documentation is provided on the HomoplasyFinder wiki.

Installation

install.packages("devtools") devtools::install_github("JosephCrispell/homoplasyFinder") devtools::install_github("JosephCrispell/basicPlotteR") # Makes annotated plotted phylogeny prettier :-) library(homoplasyFinder)

Executing

```

Find the FASTA and tree files attached to package

fastaFile <- system.file("extdata", "example.fasta", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

Get the current working directory

workingDirectory <- paste0(getwd(), "/")

Run the HomoplasyFinder jar tool

inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, fastaFile=fastaFile, path=workingDirectory)

Get the current date

date <- format(Sys.Date(), "%d-%m-%y")

Read in the output table

resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt") results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE)

Read in the annotated tree

tree <- readAnnotatedTree(workingDirectory)

Plot the annotated tree

plotAnnotatedTree(tree, inconsistentPositions, fastaFile) ``` You should get the following plot:

Now extended to deal with the presence/absence of INDELs

HomoplasyFinder can now calculate the consistency of INDELs (or any regions) on a phylogeny. To do this simply replace the FASTA file with a CSV formatted table reporting the presence/absence of regions. Here is an example of a format: start,end,isolateA,isolateB,isolateC 34802,35208,0,1,0 39068,39069,0,0,1

Test it out using the following: ```

Find the FASTA and tree files attached to package

presenceAbsenceFile <- system.file("extdata", "presenceAbsence_INDELs.csv", package = "homoplasyFinder") treeFile <- system.file("extdata", "example.tree", package = "homoplasyFinder")

Get the current working directory

workingDirectory <- paste0(getwd(), "/")

Run the HomoplasyFinder jar tool

inconsistentPositions <- runHomoplasyFinderInJava(treeFile=treeFile, presenceAbsenceFile=presenceAbsenceFile, path=workingDirectory)

Get the current date

date <- format(Sys.Date(), "%d-%m-%y")

Read in the output table

resultsFile <- paste0(workingDirectory, "consistencyIndexReport_", date, ".txt") results <- read.table(resultsFile, header=TRUE, sep="\t", stringsAsFactors=FALSE) ```

Source code

Java source code is available here and R package (wrapper) code here.

Citation

If you use HomoplasyFinder in your research, it would be great if you could cite the following article: Crispell, J., Balaz, D., & Gordon, S. V. (2019). HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microbial Genomics. https://doi.org/10.1099/mgen.0.000245

Owner

  • Name: Joseph Crispell
  • Login: JosephCrispell
  • Kind: user

I'm a data scientist helping to use and promote reproducible data science

Citation (CITATION)

@article{
	bibtype = "Manual",
	title = {HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny},
	author = c(person("Joseph", "Crispell"),
			   person("Joseph", "Crispell"),
			   person("Joseph", "Crispell")),
	doi = "10.1099/mgen.0.000245",
	issn = "2057-5858",
	journal = "Microbial Genomics",
	publisher = "Microbiology Society",
	url = "http://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000245.v1",
	year = 2019
}

GitHub Events

Total
  • Watch event: 1
  • Issue comment event: 1
Last Year
  • Watch event: 1
  • Issue comment event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 112
  • Total Committers: 1
  • Avg Commits per committer: 112.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
JosephCrispell c****h@g****m 112

Dependencies

DESCRIPTION cran
  • R >= 3.3.3 depends
  • rJava >= 0.9 depends
  • ape * imports
  • rJava * imports