tcfinder
A lightweight tool to find clusters of samples within a phylogeny
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
Repository
A lightweight tool to find clusters of samples within a phylogeny
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
tcfinder
A lightweight tool to find clusters of samples from a list of identifiers within a phylogeny in phylo4 format
(see phylobase),
regarding a minimum cluster size and a minimum proportion of targets in the cluster.
Installation
This tool is available through the bioconda conda channel.
Check out the package recipe for more information.
Run the following command to install it in the current conda environment.
shell
conda install -c bioconda tcfinder
The most recent release can be retrieved through the Releases
section (see latest release).
As a Rust project, it can be built and tested using cargo.
Usage
Quick reference
```txt
Usage: tcfinder [OPTIONS] --tree
Options:
-i, --tree
For example, the following command will analyze the test data using the default thresholds.
shell
tcfinder -i test/rtree.csv -t test/targets.txt -o test/clusters.csv
Building a phylo4 tree
Trees can easily be converted to phylo4 format using the phylobase R package.
As an example, the following code was used for generating the test tree.
A similar approach may be used to convert an existing tree after reading it from disk
(for instance, Newick files can be read using the read.tree function from the ape package).
```R library(ape) # v5.6-2 library(phylobase) # v0.8.10
Generate random tree with 100 tips
tree <- rtree(100)
Convert to phylo4
tree.p4 <- as(tree, "phylo4")
Convert to dataframe and set undotted column names
tree.p4.df <- as(tree.p4, "data.frame") names(tree.p4.df) <- c("label", "node", "ancestor", "edgelength", "nodetype") ```
Note that column names are key. The input tree file must contain the following columns:
label: a node label string (often found only for tip nodes).node: an integer indicating the node index, starting from 1.ancestor: an integer indicating the node index of the ancestor of thenode. In a tree, anodecan only have a single ancestor. A value of 0 indicates that thenodehas no ancestors (i.e. it is the tree root).nodetype: a string, either "tip", "internal" or "root". It is only used for checking if the node is a tip.
Branch lengths are not considered in the current implementation. Users can take this variable into consideration by collapsing nodes beforehand.
Clade stats
Threshold clade stats are provided via the command line. Qualifying clades must meet both criteria.
--minimum-size: an integer indicating the minimum cluster size, i.e. the minimum number of target tips (or leaves) in a qualifying clade.--minimum-prop: a floating point number in [0,1] indicating the minimum proportion of target tips in a qualifying clade.
The following figure shows the stats of two clades, the first of whic qualifies under the default thresholds (✓), while the second does not (✗).

Citation
This tool was originally developed to support this work:
Álvarez-Herrera M, Ruiz-Rodriguez P, Navarro-Domínguez B, Zulaica J, Grau B, Bracho MA, Guerreiro M, Aguilar-Gallardo C, González-Candelas F, Comas I, Geller R & Coscollá M (2025). Genome data artifacts and functional studies of deletion repair in the BA.1 SARS-CoV-2 spike protein. Virus Evolution, 11(1), veaf015. https://doi.org/10.1093/ve/veaf015
If you find the tool helpful, please feel free to cite the above (see also CITATION.cff).
Owner
- Name: PathoGenOmics Lab
- Login: PathoGenOmics-Lab
- Kind: organization
- Location: Spain
- Website: https://www.uv.es/pathogenomic
- Twitter: gen_UV
- Repositories: 1
- Profile: https://github.com/PathoGenOmics-Lab
Citation (CITATION.cff)
cff-version: 1.2.0
message: "This pipeline was developed as part of a larger study. If you use this software, please cite it as below."
authors:
- family-names: "Álvarez-Herrera"
given-names: "Miguel"
orcid: "https://orcid.org/0000-0002-7922-3180"
title: "tcfinder"
version: 1.0.0
date-released: 2024-06-14
url: "https://github.com/PathoGenOmics-Lab/tcfinder"
preferred-citation:
type: article
authors:
- family-names: "Álvarez-Herrera"
given-names: "Miguel"
- family-names: "Ruiz-Rodriguez"
given-names: "Paula"
- family-names: "Navarro-Domínguez"
given-names: "Beatriz"
- family-names: "Zulaica"
given-names: "Joao"
- family-names: "Grau"
given-names: "Brayan"
- family-names: "Bracho"
given-names: "María Alma"
- family-names: "Guerreiro"
given-names: "Manuel"
- family-names: "Aguilar-Gallardo"
given-names: "Cristóbal"
- family-names: "González-Candelas"
given-names: "Fernando"
- family-names: "Comas"
given-names: "Iñaki"
- family-names: "Geller"
given-names: "Ron"
- family-names: "Coscollá"
given-names: "Mireia"
doi: "10.1093/ve/veaf015"
journal: "Virus Evolution"
month: 3
pages: veaf015
title: "Genome data artifacts and functional studies of deletion repair in the BA.1 SARS-CoV-2 spike protein"
issue: 1
volume: 11
year: 2025
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Dependencies
- actions/checkout v4 composite
- actions/checkout v4 composite
- anstream 0.6.14
- anstyle 1.0.7
- anstyle-parse 0.2.4
- anstyle-query 1.1.0
- anstyle-wincon 3.0.3
- clap 4.5.7
- clap_builder 4.5.7
- clap_derive 4.5.5
- clap_lex 0.7.1
- colorchoice 1.0.1
- csv 1.3.0
- csv-core 0.1.11
- deranged 0.3.11
- equivalent 1.0.1
- fixedbitset 0.4.2
- hashbrown 0.14.5
- heck 0.5.0
- indexmap 2.2.6
- is_terminal_polyfill 1.70.0
- itoa 1.0.11
- libc 0.2.155
- log 0.4.21
- memchr 2.7.2
- num-conv 0.1.0
- num_threads 0.1.7
- petgraph 0.6.5
- powerfmt 0.2.0
- proc-macro2 1.0.85
- quote 1.0.36
- ryu 1.0.18
- serde 1.0.203
- serde_derive 1.0.203
- simplelog 0.12.2
- strsim 0.11.1
- syn 2.0.66
- termcolor 1.4.1
- time 0.3.36
- time-core 0.1.2
- time-macros 0.2.18
- unicode-ident 1.0.12
- utf8parse 0.2.2
- winapi-util 0.1.8
- windows-sys 0.52.0
- windows-targets 0.52.5
- windows_aarch64_gnullvm 0.52.5
- windows_aarch64_msvc 0.52.5
- windows_i686_gnu 0.52.5
- windows_i686_gnullvm 0.52.5
- windows_i686_msvc 0.52.5
- windows_x86_64_gnu 0.52.5
- windows_x86_64_gnullvm 0.52.5
- windows_x86_64_msvc 0.52.5