TDApplied

TDApplied: An R package for machine learning and inference with persistence diagrams - Published in JOSS (2024)

https://github.com/shaelebrown/tdapplied

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation

Repository

An R package for statistics and machine learning with persistence diagrams

Basic Info
  • Host: GitHub
  • Owner: shaelebrown
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 8.33 MB
Statistics
  • Stars: 17
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created about 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# **TDApplied**




[![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)
[![CRAN version](http://www.r-pkg.org/badges/version/TDApplied)](https://CRAN.R-project.org/package=TDApplied)
[![CRAN Downloads](http://cranlogs.r-pkg.org/badges/grand-total/TDApplied)](https://CRAN.R-project.org/package=TDApplied)

[![JOSS DOI](https://joss.theoj.org/papers/10.21105/joss.06321/status.svg)](https://doi.org/10.21105/joss.06321)
[![Zenodo DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10814141.svg)](https://doi.org/10.5281/zenodo.10814141)

## Overview

**TDApplied** is an R package for analyzing persistence diagrams using machine learning and statistical inference, and is designed to interface with persistent (co)homology calculations from the R packages **TDA** and **TDAstats**. Please note that during the development of **TDApplied**, **TDA** was available on CRAN and therefore included in package examples and tests, however since that is presently not the case the dependency on **TDA** has been removed (and therefore some examples and tests have been modified)  but **TDApplied** will still work with **TDA** computed persistence diagrams and **TDA** functions if a user already has a working version installed.

R package **TDA**:

> Fasy, Brittany T., Jisu Kim, Fabrizio Lecci, Clement Maria, David L. Millman, and Vincent Rouvreau. 2021. TDA: Statistical Tools for Topological Data Analysis. https://CRAN.R-project.org/package=TDA.

R package **TDAstats**:

> Wadhwa, Raoul R., Drew R. K. Williamson, Andrew Dhawan, and Jacob G. Scott. 2018. TDAstats: R pipeline for computing persistent homology in topological data analysis. https://CRAN.R-project.org/package=TDAstats.

## Installation

To install the latest version of this R package directly from GitHub:

    install.packages("devtools")
    library(devtools)
    devtools::install_github("shaelebrown/TDApplied")
    library(TDApplied)

To install from GitHub you might need: 

- **Windows:** Rtools (https://cran.r-project.org/bin/windows/Rtools/)
- **OS X:** xcode (from the app store)
- **Linux:** apt-get install r-base-dev (or similar).

To install the stable version of this R package from CRAN:

    install.packages("TDApplied")

## Citation

If you use TDApplied, please consider citing as:

- Brown et al., (2024). TDApplied: An R package for machine learning and inference with persistence diagrams. Journal of Open Source Software, 9(95), 6321, https://doi.org/10.21105/joss.06321
  
If you wish to cite a particular method used in **TDApplied** see the REFERENCES.bib file in the vignette directory.

## Functionality

**TDApplied** has three major modules:

1. Computing and interpreting persistence diagrams. The `PyH` function connects with python creating a fast persistent (co)homology engine compared to alternatives. The `plot_diagram` function can be used to plot diagrams computed from `PyH` or the **TDA** and **TDAstats** packages. The `rips_graphs` and `plot_rips_graphs` functions can be used to visualize dataset structure at the scale of particular topological features. The `bootstrap_persistence_thresholds` function can be used to identify statistically significant topological features in a dataset.
2. Machine learning. The functions `diagram_mds`, `diagram_kpca` and `predict_diagram_kpca` can be used to project a group of diagrams into a low dimensional space (i.e. dimension reduction). The functions `diagram_kkmeans` and `predict_diagram_kkmeans` can be used to cluster a group of diagrams. The functions `diagram_ksvm` and `predict_diagram_ksvm` can be used to link, through a prediction function, persistence diagrams and an outcome (i.e. dependent) variable.
3. Statistics. The `permutation_test` function acts like an ANOVA test for identifying group differences of persistence diagrams. The `independence_test` function can determine if two groups of paired persistence diagrams are likely independent or not.

Not only does **TDApplied** provide methods for the applied analysis of persistence diagrams which were previously unavailable, but an emphasis on speed and scalability through parallelization, C code, avoiding redundant slow computations, etc., makes **TDApplied** a powerful tool for carrying out applied analyses of persistence diagrams.

## Example Code

This example creates six persistence diagrams, plots one and projects all six into 2D space using multidimensional scaling (MDS) to demonstrate **TDApplied** functionalities.

```{r example,eval = F}
library(TDApplied)

# create 6 persistence diagrams
# 3 from circles and 3 from spheres
circ1 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
circ2 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
circ3 <- TDAstats::calculate_homology(TDAstats::circle2d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere1 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere2 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)
sphere3 <- TDAstats::calculate_homology(TDAstats::sphere3d[sample(1:100,size = 50),],dim = 1,threshold = 2)

# plot a diagram
plot_diagram(circ1,title = "Circle 1")

# project into 2D and plot
proj_2D <- diagram_mds(list(circ1,circ2,circ3,sphere1,sphere2,sphere3),dim = 1,k = 2)
plot(x = proj_2D[,1],y = proj_2D[,2])
```

## Documentation

**TDApplied** has five major vignettes:

1. "TDApplied Theory and Practice", which documents the background theory and practical usage of all functions (on simple simulated data).
2. "Human Connectome Project Analysis", which provides a sample analysis of real neurological data using **TDApplied**.
3. "Benchmarking and Speedups", which describes all implemented optimizations of **TDApplied** functions and compares the runtime of **TDApplied** functions with functions from other packages.
4. "Personalized Analyses with TDApplied", which demonstrates how machine learning (or statistical) models and pipelines, other than those implemented in **TDApplied**, can be fit to persistence diagrams.
5. "Comparing Distance Calculations", which accounts for differences in distance functions of persistence diagrams across R packages.

## Contribute

To contribute to **TDApplied** you can create issues for any bugs/suggestions on the [issues page](https://github.com/shaelebrown/TDApplied/issues). You can also fork the **TDApplied** repository and create pull requests to add features you think will be useful for users.

## Published applications

- Shael Brown and Reza Farivar. The topology of representational geometry. bioRxiv, 2024.
- Yashbir Singh, Colleen M. Farrelly, Quincy A. Hathaway, Tim Leiner, Jaidip Jagtap, Gunnar E. Carlsson, and Bradley J. Erickson. Topological data analysis in medical imaging:
current state of the art. Insights into Imaging, 14(1):58, 2023.
- Rui Dong. Linguistics from a topological viewpoint. arXiv, 2024.

Owner

  • Login: shaelebrown
  • Kind: user

JOSS Publication

TDApplied: An R package for machine learning and inference with persistence diagrams
Published
March 27, 2024
Volume 9, Issue 95, Page 6321
Authors
Shael Brown ORCID
Department of Quantitative Life Sciences, McGill University, Montreal, Canada
Reza Farivar-Mohseni ORCID
McGill Vision Research, Department of Opthamology, McGill University, Montreal, Canada
Editor
AHM Mahfuzur Rahman ORCID
Tags
topological data analysis persistent homology

GitHub Events

Total
  • Watch event: 1
  • Push event: 15
Last Year
  • Watch event: 1
  • Push event: 15

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 446
  • Total Committers: 4
  • Avg Commits per committer: 111.5
  • Development Distribution Score (DDS): 0.675
Past Year
  • Commits: 23
  • Committers: 1
  • Avg Commits per committer: 23.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
JIB account j****t@J****l 145
shaelebrown s****n@g****m 144
JIB account j****t@J****l 134
sbrown-ww s****l@w****i 23
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 1
  • Average time to close issues: 6 days
  • Average time to close pull requests: 1 day
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • peekxc (4)
Pull Request Authors
  • danielskatz (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 533 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 13
  • Total maintainers: 1
cran.r-project.org: TDApplied

Machine Learning and Inference for Topological Data Analysis

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 533 Last month
Rankings
Stargazers count: 18.7%
Forks count: 21.9%
Dependent packages count: 29.8%
Average: 32.3%
Dependent repos count: 35.5%
Downloads: 55.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.2.2 depends
  • clue * imports
  • doParallel * imports
  • foreach * imports
  • iterators * imports
  • kernlab * imports
  • methods * imports
  • parallel * imports
  • parallelly * imports
  • rdist * imports
  • stats * imports
  • utils * imports
  • TDA * suggests
  • TDAstats * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests