torchdr

TorchDR - PyTorch Dimensionality Reduction

https://github.com/torchdr/torchdr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.4%) to scientific vocabulary

Keywords

affinity-matrix dimensionality-reduction manifold-learning optimal-transport python
Last synced: 6 months ago · JSON representation ·

Repository

TorchDR - PyTorch Dimensionality Reduction

Basic Info
  • Host: GitHub
  • Owner: TorchDR
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage: https://torchdr.github.io
  • Size: 6.2 MB
Statistics
  • Stars: 151
  • Watchers: 4
  • Forks: 11
  • Open Issues: 16
  • Releases: 4
Topics
affinity-matrix dimensionality-reduction manifold-learning optimal-transport python
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Code of conduct Citation

README.md

Torch Dimensionality Reduction

torchdr logo

Documentation Benchmark Version License Python 3.8+ Pytorch Ruff Test Status CircleCI codecov

TorchDR is an open-source library for dimensionality reduction (DR) built on PyTorch. DR constructs low-dimensional representations (or embeddings) that best preserve the intrinsic geometry of an input dataset encoded via a pairwise affinity matrix. TorchDR provides GPU-accelerated implementations of popular DR algorithms in a unified framework, ensuring high performance by leveraging the latest advances of the PyTorch ecosystem.

Key Features

🚀 Blazing Fast: engineered for speed with GPU acceleration, torch.compile support, and optimized algorithms leveraging sparsity and negative sampling.

🧩 Modular by Design: very component is designed to be easily customized, extended, or replaced to fit your specific needs.

🪶 Memory-Efficient: natively handles sparsity and memory-efficient symbolic operations to process massive datasets without memory overflows.

🤝 Seamless Integration: Fully compatible with the scikit-learn and PyTorch ecosystems. Use familiar APIs and integrate effortlessly into your existing workflows.

📦 Minimal Dependencies: requires only PyTorch, NumPy, and scikit‑learn; optionally add Faiss for fast k‑NN or KeOps for symbolic computation.

Getting Started

TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.

```python from sklearn.datasets import fetch_openml from torchdr import UMAP

x = fetchopenml("mnist784").data.astype("float32")

z = UMAP(nneighbors=30).fittransform(x) ```

🚀 GPU Acceleration

TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda" as shown in the example below:

python z_gpu = UMAP(n_neighbors=30, device="cuda").fit_transform(x)

Device Management: By default (device="auto"), computations use the input data's device. For optimal memory management, you can keep input data on CPU while specifying device="cuda" to perform computations on GPU - TorchDR will handle transfers automatically.

🔥 PyTorch 2.0+ torch.compile Support

TorchDR supports torch.compile for an additional performance boost on modern PyTorch versions. Just add the compile=True flag as follows:

python z_gpu_compile = UMAP(n_neighbors=30, device="cuda", compile=True).fit_transform(x)

⚙️ Backends

The backend keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.

  • Set backend="faiss" to rely on Faiss for fast kNN computations (Recommended).
  • To perform exact symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps library. This library also allows computing kNN graphs. To enable KeOps, set backend="keops".
  • Finally, setting backend=None will use raw PyTorch for all computations.

Methods

Neighbor Embedding (optimal for data visualization)

TorchDR provides a suite of neighbor embedding methods.

Linear-time (Negative Sampling). State-of-the-art speed on large datasets: UMAP, LargeVis, InfoTSNE, PACMAP.

Quadratic-time (Exact Repulsion). Compute the full pairwise repulsion: SNE, TSNE, TSNEkhorn, COSNE.

Remark. For quadratic-time algorithms, TorchDR provides exact implementations that scale linearly in memory using backend=keops. For TSNE specifically, one can also explore fast approximations, such as FIt-SNE implemented in tsne-cuda, which bypass full pairwise repulsion.

Spectral Embedding

TorchDR provides various spectral embedding methods: PCA, IncrementalPCA, ExactIncrementalPCA, KernelPCA, PHATE.

Benchmarks

Relying on TorchDR enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.

UMAP benchmark on single cell data

Examples

See the examples folder for all examples.

MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.

various neighbor embedding methods on MNIST

CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.

TSNE on CIFAR100 DINO features

Advanced Features

Affinities

TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:

Evaluation Metric

TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score.

Installation

Install the core torchdr library from PyPI:

bash pip install torchdr

:warning: torchdr does not install faiss-gpu or pykeops by default. You need to install them separately to use the corresponding backends.

  • Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using conda: bash conda install -c pytorch -c nvidia faiss-gpu

  • KeOps: For memory-efficient symbolic computations, install PyKeOps. bash pip install pykeops

Installation from Source

If you want to use the latest, unreleased version of torchdr, you can install it directly from GitHub:

bash pip install git+https://github.com/torchdr/torchdr

Finding Help

If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.

Owner

  • Name: TorchDR
  • Login: TorchDR
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: TorchDR
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Hugues
    family-names: Van Assel
    email: vanasselhugues@gmail.com
    affiliation: ENS Lyon
  - given-names: Nicolas
    family-names: Courty
    email: ncourty@irisa.fr
    affiliation: Université Bretagne Sud
  - given-names: Rémi
    family-names: Flamary
    email: remi.flamary@polytechnique.edu
    affiliation: École Polytechnique
  - given-names: Aurélien
    family-names: Garivier
    email: aurelien.garivier@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Mathurin
    family-names: Massias
    email: mathurin.massias@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Titouan
    family-names: Vayer
    email: titouan.vayer@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Cédric
    family-names: Vincent-Cuaz
    email: cedric.vincent-cuaz@inria.fr
    affiliation: EPFL
repository-code: 'https://github.com/TorchDR/TorchDR'
url: 'https://torchdr.github.io/'
abstract: Pytorch Dimensionality Reduction toolbox.
keywords:
  - machine learning
  - dimensionality reduction
  - manifold learning
  - clustering
  - GPU acceleration
license: BSD-3-Clause

GitHub Events

Total
  • Create event: 3
  • Release event: 3
  • Issues event: 27
  • Watch event: 68
  • Delete event: 2
  • Issue comment event: 27
  • Push event: 48
  • Pull request event: 106
  • Fork event: 4
Last Year
  • Create event: 3
  • Release event: 3
  • Issues event: 27
  • Watch event: 68
  • Delete event: 2
  • Issue comment event: 27
  • Push event: 48
  • Pull request event: 106
  • Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 45
  • Total pull requests: 280
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 5 days
  • Total issue authors: 9
  • Total pull request authors: 10
  • Average comments per issue: 0.6
  • Average comments per pull request: 0.25
  • Merged pull requests: 227
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 21
  • Pull requests: 131
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 4 days
  • Issue authors: 6
  • Pull request authors: 7
  • Average comments per issue: 0.86
  • Average comments per pull request: 0.24
  • Merged pull requests: 104
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • huguesva (33)
  • mathurinm (4)
  • e-pet (2)
  • jacobgil (1)
  • sirluk (1)
  • simon-burke (1)
  • qgallouedec (1)
  • KnSun99 (1)
  • rflamary (1)
Pull Request Authors
  • huguesva (209)
  • mathurinm (40)
  • rflamary (9)
  • cedricvincentcuaz (7)
  • ncourty (4)
  • guillaumehu (4)
  • sirluk (2)
  • Danqi7 (2)
  • tvayer (2)
  • dependabot[bot] (1)
Top Labels
Issue Labels
good first issue (6)
Pull Request Labels
dependencies (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 440 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
pypi.org: torchdr

Torch Dimensionality Reduction Library

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 440 Last month
Rankings
Dependent packages count: 9.9%
Average: 37.5%
Dependent repos count: 65.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/flake8.yaml actions
docs/requirements.txt pypi
requirements.txt pypi
setup.py pypi