Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Repository
IQR Tree Pruner
Basic Info
- Host: GitHub
- Owner: barizona
- License: mit
- Language: R
- Default Branch: main
- Size: 253 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
IQR Tree Pruner
The aim of the R script is to exclude extremely long branches -- representing potentially misclassified taxa or genomic regions with undetected recombinant parts -- from a phylogenetic tree.
Running the code
Rscript iqr_tree_pruner_v1.0.R --original_tree tree.nwk --tipprop 0.05
Be aware that running the script is going to delete all previously created output files in the working directory.
Citation
Please cite the IQR Tree Pruner if it was helpful for your research. This will allow me to continue maintaining this project in the future.
Ari, E. (2023). IQR Tree Pruner, (Version 1.0) [Computer software] DOI: 10.5281/zenodo.8220477.
Required R packages
optparse, tidyverse, magrittr, caper, treeio (Bioconductor), MASS, phytools, viridis, ggtree (Bioconductor)
Inputs and outputs
Input file and argument
- required:
-
--original_tree: a NEWICK tree file
-
- optional argument:
-
--tipprop: The proportion of tips on a single branch that can be excluded based on the upper fence of the IQR of branch lengths.The default value is 0.05 (5%).
-
Outputs files
pruned.nwk: the pruned NEWICK tree file
branchlengthnroftips.png: a scatter plot of branch lengths and number of tips with the IQR upper fence and the proportion of tips gave by the
--tippropargumentbranchlengthdistribution.png: a histogram of branch lengths indicating the count of included and excluded tips
originalvspruned_tree.png: a tree plot indicating the pruned tip branches
tree_pruning.log: a text file containing the summary comparison of the original tree and the pruned tree, and the R session info.
The IQR pruning algorithm
Abbreviations:
IQR: the upper fence of the Inter Quartile Range of a vector: Q3 + 3 * (Q3 - Q1) = extreme outlier threshold
R2T: root-to-tip distance on a midpoint rooted tree
1st part: Pruning the unrooted tree based on the upper fence of IQR branch lengths.
Calculating the minimum number of tips for each branch (unrooted bifurcating tree, both directions are looked up, than the least tip number is chosen to represent the branch).
Calculating the IQR for branch lengths.
Excluding those extreme outlier branches containing less than 0.05 (or a given) proportion of the tips.
2nd part: Pruning the midpoint rooted tree based on root-to-tip distances.
Excluding tips based on the R2T IQRs, while iteratively midpoint rooting and excluding the top gretaest extreme outlier.
Midpoint rooting the tree (after pruning with method described in the 1st part).
Calculating the IQR for root-to-tip distances.
Excluding the most extreme outlier tip based on the IQR for root-to-tip distances.
Repeating point 4. to 6. till there are no more extreme outlier IQR tip is found.
Unrooting the pruned tree.
Let's see an example
A tree with some extremely long branches:

We have tried to prune the tree with the Treeshrink software (v1.3.9; Mai & Mirarab, 2018) but no tip was removed, so the extreme long branches still remained.
Therefore, we applied the IQR Tree Pruner R script (v1.0) with which the following branches were pruned:

Here you can see the distribution of the branch length and number of tips of the original tree:

Where the branches to the right of the dark red line and below the dark green line were pruned during the 1st part of the algorithm.
Here is the histogram of tip branch lengths:

References
Mai,U. and Mirarab,S. (2018) TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees. BMC Genomics, 19, 272.
Owner
- Name: Eszter Ari
- Login: barizona
- Kind: user
- Location: Hungary
- Company: ELTE and BRC
- Website: https://genet.elte.hu/bioinformatic
- Twitter: EszterAri
- Repositories: 1
- Profile: https://github.com/barizona
Bioinformatician
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Ari
given-names: Eszter
orcid: https://orcid.org/0000-0001-7774-1067
title: IQR Tree Pruner
version: first
date-released: 2023-08-07