https://github.com/csiro/nrca-phylodist
Calculating phylogenetic distances for weed biological control
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary
Repository
Calculating phylogenetic distances for weed biological control
Basic Info
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
readme.md
Calculating phylogenetic distances for weed biological control
Authors of code: Nunzio Knerr, Stephanie Chen, Alexander Schmidt-Lebuhn
What does this script do and what is it useful for?
This code was introduced in a submitted paper titled 'Phylogenomics-driven host test list selection for weed biological control'. It contains functions for calculating phylogenetic distance measures useful for creating host tests list in classical weed biological control.
The degree of relatedness between two taxa on a phylogeny is indicated by the number of nodes separating them. Here, we provide functions that calculate two distance measures, degree of separation i.e. node count and patristic distance, given an input phylogenetic tree.
Descendant List Function
First define a function to recursively collect descendants of a node. This is used by the 'degreeofsep' function later on.
{.r .cell-code}
descendantlist <- function(thistree, thisnode)
{
if (thisnode <= length(thistree$tip.label))
{
return (thisnode)
}
else
{
wherenext <- which(thistree$edge[,1]==thisnode) # get immediate descendants
thislist <- NULL
for (x in 1:length(wherenext))
{
thislist <- c(thislist, descendantlist(thistree, thistree$edge[wherenext[x],2]))
}
return(thislist)
}
}
Degree Of Separation Function
Function for calculating degrees of separation i.e. node count from a specified target weed.
{.r .cell-code}
degreesofsep <- function(thistree)
{
dosmatrix <- matrix(0, nrow=length(thistree$tip.label), ncol=length(thistree$tip.label))
colnames(dosmatrix) <- thistree$tip.label
rownames(dosmatrix) <- thistree$tip.label
for (x in 1:length(thistree$tip.label))
{
prior_y <- x # start at present terminal
y <- thistree$edge[which(thistree$edge[,2]==x),1] # get immediately ancestral node
currentdist <- 0
while (y != (length(thistree$tip.label)+1)) # move downtree until root node is found
{
currentdesc <- which(thistree$edge[,1]==y)
for (z in 1:length(currentdesc))
{
if (thistree$edge[currentdesc[z],2]!=prior_y)
{
dosmatrix[x,descendantlist(thistree,thistree$edge[currentdesc[z],2])] <- currentdist
}
}
prior_y <- y
y <- thistree$edge[which(thistree$edge[,2]==y),1] # get immediately ancestral node
currentdist <- currentdist + 1
}
currentdesc <- which(thistree$edge[,1]==y)
for (z in 1:length(currentdesc))
{
if (thistree$edge[currentdesc[z],2]!=prior_y)
{
dosmatrix[x,descendantlist(thistree,thistree$edge[currentdesc[z],2])] <- currentdist
}
}
}
return(dosmatrix)
}
User Input Variables
Specify the inputs and outputs for use in the script. A tree file in newick format is required. The outgroup(s) may be specified. The target taxon i.e. target weed for biological control is also specified here so that the distance measures can be calculated in relation to the target.
``` {.r .cell-code}
phylogenetic tree as newick file
treeFileName <- "astereae_concatenated.tre"
specify outgroup(s)
taxonListForOutgroup <- c("Dimorphothecapluvialis", "Ewartianubigena", "Abrotanellanivigena","Cotulacoronopifolia")
the target taxon to calculate distances from
myTargetTaxon <- "Erigeron_bonariensis"
the output file name
outputFileName <- paste0("phylodists_", myTargetTaxon, ".tsv") ```
Example Usage
``` {.r .cell-code}
load libraries
load libraries
library(ape) library(adephylo)
read phylogenetic tree
mytree <- ape::read.tree(treeFileName)
call the get Most Recent Common Ancesstors (MRCA)
myOG <- getMRCA(mytree, taxonListForOutgroup)
root the tree based on the MRCA results
mytree <- root(mytree, node = myOG)
infer matrix of pairwise patristic distances between all terminals
this takes quite some time for larger trees
myPatristic <- distTips(mytree) myPatristicM <- as.matrix(myPatristic) myPatristicMordered <- myPatristicM[order(rownames(myPatristicM)), order(rownames(myPatristicM))]
write.matrix(myPatristicMordered, file="patristicdists.tsv", sep="\t")
now calculate degrees of separation, i.e. counting nodes between any terminal and its ancestral lineage splits
this will take quite some time for larger trees
myDegsep <- degreesofsep(mytree) myDegsep <- myDegsep[order(rownames(myDegsep)),order(rownames(myDegsep))]
write.matrix(myDegsep, file="degsep.tsv", sep="\t")
make data frame for one target species with both its degrees of separation and patristic distances
targetTaxon <- myTargetTaxon Terminal <- row.names(myDegsep) Degsep <- myDegsep[which(row.names(myDegsep)==targetTaxon), ] PatristicDist <- myPatristicMordered[which(row.names(myPatristicMordered) == targetTaxon), ]
PhyloDists <- data.frame(Species = Terminal[order(PatristicDist)], DegSep = Degsep[order(PatristicDist)], PatristicDist = PatristicDist[order(PatristicDist)], row.names = NULL)
write.table(PhyloDists, file = outputFileName, sep = "\t", row.names = FALSE)
knitr::kable(PhyloDists) ```
The output is a tab separated file with columns for the scientific name, degree of separation and patristic distance. See the file phylodistsErigeronbonariensis.tsv for an example that was used as a case study in the paper.
Copyright and license information
Copyright (C) 2024 CSIRO
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Citation and contact information
Please cite the following paper if you use these scripts:
Stephanie H. Chen, Ben Gooden, Michelle A. Rafter, Gavin C. Hunter, Alicia Grealy, Nunzio Knerr, Alexander N. Schmidt-Lebuhn. Phylogenomics-driven host test list selection for weed biological control. Biological Control, Volume 193, 2024, 105529, https://doi.org/10.1016/j.biocontrol.2024.105529\ \ Contact the corresponding author of the paper, Alexander Schmidt-Lebuhn, if you have any questions.
Owner
- Name: CSIRO
- Login: csiro
- Kind: organization
- Location: Australia
- Repositories: 2
- Profile: https://github.com/csiro
CSIRO public facing GitHub organisation.
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Delete event: 1
- Member event: 1
- Issue comment event: 1
- Push event: 2
- Pull request review event: 1
- Pull request event: 4
Last Year
- Issues event: 1
- Watch event: 1
- Delete event: 1
- Member event: 1
- Issue comment event: 1
- Push event: 2
- Pull request review event: 1
- Pull request event: 4