torchdr

TorchDR - PyTorch Dimensionality Reduction

https://github.com/torchdr/torchdr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (18.4%) to scientific vocabulary

Keywords

affinity-matrix dimensionality-reduction manifold-learning optimal-transport python

Last synced: 9 months ago · JSON representation ·

Repository

TorchDR - PyTorch Dimensionality Reduction

Basic Info

Host: GitHub
Owner: TorchDR
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage: https://torchdr.github.io
Size: 6.2 MB

Statistics

Stars: 151
Watchers: 4
Forks: 11
Open Issues: 16
Releases: 4

Topics

affinity-matrix dimensionality-reduction manifold-learning optimal-transport python

Created over 2 years ago · Last pushed 9 months ago

Metadata Files

Readme License Code of conduct Citation

Torch Dimensionality Reduction

torchdr logo

TorchDR is an open-source library for dimensionality reduction (DR) built on PyTorch. DR constructs low-dimensional representations (or embeddings) that best preserve the intrinsic geometry of an input dataset encoded via a pairwise affinity matrix. TorchDR provides GPU-accelerated implementations of popular DR algorithms in a unified framework, ensuring high performance by leveraging the latest advances of the PyTorch ecosystem.

Key Features

🚀 Blazing Fast: engineered for speed with GPU acceleration, torch.compile support, and optimized algorithms leveraging sparsity and negative sampling.

🧩 Modular by Design: very component is designed to be easily customized, extended, or replaced to fit your specific needs.

🪶 Memory-Efficient: natively handles sparsity and memory-efficient symbolic operations to process massive datasets without memory overflows.

🤝 Seamless Integration: Fully compatible with the scikit-learn and PyTorch ecosystems. Use familiar APIs and integrate effortlessly into your existing workflows.

📦 Minimal Dependencies: requires only PyTorch, NumPy, and scikit‑learn; optionally add Faiss for fast k‑NN or KeOps for symbolic computation.

Getting Started

TorchDR offers a user-friendly API similar to scikit-learn where dimensionality reduction modules can be called with the fit_transform method. It seamlessly accepts both NumPy arrays and PyTorch tensors as input, ensuring that the output matches the type and backend of the input.

```python from sklearn.datasets import fetch_openml from torchdr import UMAP

x = fetchopenml("mnist784").data.astype("float32")

z = UMAP(nneighbors=30).fittransform(x) ```

🚀 GPU Acceleration

TorchDR is fully GPU compatible, enabling significant speed-ups when a GPU is available. To run computations on the GPU, simply set device="cuda" as shown in the example below:

python z_gpu = UMAP(n_neighbors=30, device="cuda").fit_transform(x)

Device Management: By default (device="auto"), computations use the input data's device. For optimal memory management, you can keep input data on CPU while specifying device="cuda" to perform computations on GPU - TorchDR will handle transfers automatically.

🔥 PyTorch 2.0+ torch.compile Support

TorchDR supports torch.compile for an additional performance boost on modern PyTorch versions. Just add the compile=True flag as follows:

python z_gpu_compile = UMAP(n_neighbors=30, device="cuda", compile=True).fit_transform(x)

⚙️ Backends

The backend keyword specifies which tool to use for handling kNN computations and memory-efficient symbolic computations.

Set backend="faiss" to rely on Faiss for fast kNN computations (Recommended).
To perform exact symbolic tensor computations on the GPU without memory limitations, you can leverage the KeOps library. This library also allows computing kNN graphs. To enable KeOps, set backend="keops".
Finally, setting backend=None will use raw PyTorch for all computations.

Methods

Neighbor Embedding (optimal for data visualization)

TorchDR provides a suite of neighbor embedding methods.

Linear-time (Negative Sampling). State-of-the-art speed on large datasets: UMAP, LargeVis, InfoTSNE, PACMAP.

Quadratic-time (Exact Repulsion). Compute the full pairwise repulsion: SNE, TSNE, TSNEkhorn, COSNE.

Remark. For quadratic-time algorithms, TorchDR provides exact implementations that scale linearly in memory using backend=keops. For TSNE specifically, one can also explore fast approximations, such as FIt-SNE implemented in tsne-cuda, which bypass full pairwise repulsion.

Spectral Embedding

TorchDR provides various spectral embedding methods: PCA, IncrementalPCA, ExactIncrementalPCA, KernelPCA, PHATE.

Benchmarks

Relying on TorchDR enables an orders-of-magnitude improvement in runtime performance compared to CPU-based implementations. See the code.

UMAP benchmark on single cell data

Examples

See the examples folder for all examples.

MNIST. (Code) A comparison of various neighbor embedding methods on the MNIST digits dataset.

various neighbor embedding methods on MNIST

CIFAR100. (Code) Visualizing the CIFAR100 dataset using DINO features and TSNE.

TSNE on CIFAR100 DINO features

Advanced Features

Affinities

TorchDR features a wide range of affinities which can then be used as a building block for DR algorithms. It includes:

Affinities based on k-NN normalizations: SelfTuningAffinity, MAGICAffinity, UMAPAffinity, PHATEAffinity, PACMAPAffinity.
Doubly stochastic affinities: SinkhornAffinity, DoublyStochasticQuadraticAffinity.
Adaptive affinities with entropy control: EntropicAffinity, SymmetricEntropicAffinity.

Evaluation Metric

TorchDR provides efficient GPU-compatible evaluation metrics: silhouette_score.

Installation

Install the core torchdr library from PyPI:

bash pip install torchdr

:warning: torchdr does not install faiss-gpu or pykeops by default. You need to install them separately to use the corresponding backends.

Faiss (Recommended): For the fastest k-NN computations, install Faiss. Please follow their official installation guide. A common method is using conda: bash conda install -c pytorch -c nvidia faiss-gpu
KeOps: For memory-efficient symbolic computations, install PyKeOps. bash pip install pykeops

Installation from Source

If you want to use the latest, unreleased version of torchdr, you can install it directly from GitHub:

bash pip install git+https://github.com/torchdr/torchdr

Finding Help

If you have any questions or suggestions, feel free to open an issue on the issue tracker or contact Hugues Van Assel directly.

Owner

Name: TorchDR
Login: TorchDR
Kind: organization

Repositories: 1
Profile: https://github.com/TorchDR

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: TorchDR
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Hugues
    family-names: Van Assel
    email: vanasselhugues@gmail.com
    affiliation: ENS Lyon
  - given-names: Nicolas
    family-names: Courty
    email: ncourty@irisa.fr
    affiliation: Université Bretagne Sud
  - given-names: Rémi
    family-names: Flamary
    email: remi.flamary@polytechnique.edu
    affiliation: École Polytechnique
  - given-names: Aurélien
    family-names: Garivier
    email: aurelien.garivier@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Mathurin
    family-names: Massias
    email: mathurin.massias@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Titouan
    family-names: Vayer
    email: titouan.vayer@ens-lyon.fr
    affiliation: ENS Lyon
  - given-names: Cédric
    family-names: Vincent-Cuaz
    email: cedric.vincent-cuaz@inria.fr
    affiliation: EPFL
repository-code: 'https://github.com/TorchDR/TorchDR'
url: 'https://torchdr.github.io/'
abstract: Pytorch Dimensionality Reduction toolbox.
keywords:
  - machine learning
  - dimensionality reduction
  - manifold learning
  - clustering
  - GPU acceleration
license: BSD-3-Clause

GitHub Events

Total

Create event: 3
Release event: 3
Issues event: 27
Watch event: 68
Delete event: 2
Issue comment event: 27
Push event: 48
Pull request event: 106
Fork event: 4

Last Year

Create event: 3
Release event: 3
Issues event: 27
Watch event: 68
Delete event: 2
Issue comment event: 27
Push event: 48
Pull request event: 106
Fork event: 4

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 45
Total pull requests: 280
Average time to close issues: about 1 month
Average time to close pull requests: 5 days
Total issue authors: 9
Total pull request authors: 10
Average comments per issue: 0.6
Average comments per pull request: 0.25
Merged pull requests: 227
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 21
Pull requests: 131
Average time to close issues: about 1 month
Average time to close pull requests: 4 days
Issue authors: 6
Pull request authors: 7
Average comments per issue: 0.86
Average comments per pull request: 0.24
Merged pull requests: 104
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

huguesva (33)
mathurinm (4)
e-pet (2)
jacobgil (1)
sirluk (1)
simon-burke (1)
qgallouedec (1)
KnSun99 (1)
rflamary (1)

Pull Request Authors

huguesva (209)
mathurinm (40)
rflamary (9)
cedricvincentcuaz (7)
ncourty (4)
guillaumehu (4)
sirluk (2)
Danqi7 (2)
tvayer (2)
dependabot[bot] (1)

Top Labels

Issue Labels

good first issue (6)

Pull Request Labels

dependencies (1)

Packages

Total packages: 1
Total downloads:
- pypi 440 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: torchdr

Torch Dimensionality Reduction Library

Documentation: https://torchdr.readthedocs.io/
License: BSD (3-Clause)
Latest release: 0.3
published 11 months ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 440 Last month

Rankings

Dependent packages count: 9.9%

Average: 37.5%

Dependent repos count: 65.2%

Maintainers (1)

hugues.van_assel