https://github.com/cvxgrp/pymde
Minimum-distortion embedding with PyTorch
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 10 committers (10.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.7%) to scientific vocabulary
Keywords
Repository
Minimum-distortion embedding with PyTorch
Basic Info
- Host: GitHub
- Owner: cvxgrp
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://pymde.org
- Size: 46.8 MB
Statistics
- Stars: 560
- Watchers: 9
- Forks: 26
- Open Issues: 27
- Releases: 5
Topics
Metadata Files
README.md
PyMDE
The official documentation for PyMDE is available at www.pymde.org.
This repository accompanies the monograph Minimum-Distortion Embedding.
PyMDE is a Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network, or any other abstract object.
What sets PyMDE apart from other embedding libraries is that it provides a simple but general framework for embedding, called Minimum-Distortion Embedding (MDE). With MDE, it is easy to recreate well-known embeddings and to create new ones, tailored to your particular application.
PyMDE is competitive in runtime with more specialized embedding methods. With a GPU, it can be even faster.
Overview
PyMDE can be enjoyed by beginners and experts alike. It can be used to:
- visualize datasets, small or large;
- generate feature vectors for supervised learning;
- compress high-dimensional vector data;
- draw graphs (in up to orders of magnitude less time than packages like NetworkX);
- create custom embeddings, with custom objective functions and constraints (such as having uncorrelated feature columns);
- and more.
PyMDE is very young software, under active development. If you run into issues, or have any feedback, please reach out by filing a Github issue.
This README gives a very brief overview of PyMDE. Make sure to read the official documentation at www.pymde.org, which has in-depth tutorials and API documentation.
Installation
PyMDE is available on the Python Package Index, and on Conda Forge.
To install with pip, use
pip install pymde
Alternatively, to install with conda, use
conda install -c pytorch -c conda-forge pymde
PyMDE has the following requirements:
- Python >= 3.7
- numpy >= 1.17.5
- scipy
- torch >= 1.7.1
- torchvision >= 0.8.2
- pynndescent
- requests
Getting started
Getting started with PyMDE is easy. For embeddings that work out-of-the box, we provide two main functions:
python3
pymde.preserve_neighbors
which preserves the local structure of original data, and
python3
pymde.preserve_distances
which preserves pairwise distances or dissimilarity scores in the original data.
Arguments. The input to these functions is the original data, represented
either as a data matrix in which each row is a feature vector, or as a
(possibly sparse) graph encoding pairwise distances. The embedding dimension is
specified by the embedding_dim keyword argument, which is 2 by default.
Return value. The return value is an MDE object. Calling the embed()
method on this object returns an embedding, which is a matrix
(torch.Tensor) in which each row is an embedding vector. For example, if the
original input is a data matrix of shape (n_items, n_features), then the
embedding matrix has shape (n_items, embeddimg_dim).
We give examples of using these functions below.
Preserving neighbors
The following code produces an embedding of the MNIST dataset (images of
handwritten digits), in a fashion similar to LargeVis, t-SNE, UMAP, and other
neighborhood-based embeddings. The original data is a matrix of shape (70000,
784), with each row representing an image.
```python3 import pymde
mnist = pymde.datasets.MNIST() embedding = pymde.preserveneighbors(mnist.data, verbose=True).embed() pymde.plot(embedding, colorby=mnist.attributes['digits']) ```

Unlike most other embedding methods, PyMDE can compute embeddings that satisfy constraints. For example:
python3
embedding = pymde.preserve_neighbors(mnist.data, constraint=pymde.Standardized(), verbose=True).embed()
pymde.plot(embedding, color_by=mnist.attributes['digits'])

The standardization constraint enforces the embedding vectors to be centered and have uncorrelated features.
Preserving distances
The function pymde.preserve_distances is useful when you're more interested
in preserving the gross global structure instead of local structure.
Here's an example that produces an embedding of an academic coauthorship network, from Google Scholar. The original data is a sparse graph on roughly 40,000 authors, with an edge between authors who have collaborated on at least one paper.
```python3 import pymde
googlescholar = pymde.datasets.googlescholar() embedding = pymde.preservedistances(googlescholar.data, verbose=True).embed() pymde.plot(embedding, colorby=googlescholar.attributes['coauthors'], colormap='viridis', backgroundcolor='black') ```

More collaborative authors are colored brighter, and are near the center of the embedding.
Example notebooks
We have several example notebooks that show how to use PyMDE on real (and synthetic) datasets.
Citing
To cite our work, please use the following BibTex entry.
@article{agrawal2021minimum,
author = {Agrawal, Akshay and Ali, Alnur and Boyd, Stephen},
title = {Minimum-Distortion Embedding},
journal = {arXiv},
year = {2021},
}
PyMDE was designed and developed by Akshay Agrawal.
Owner
- Name: Stanford University Convex Optimization Group
- Login: cvxgrp
- Kind: organization
- Location: Stanford, CA
- Website: www.stanford.edu/~boyd
- Repositories: 102
- Profile: https://github.com/cvxgrp
GitHub Events
Total
- Create event: 5
- Release event: 1
- Issues event: 3
- Watch event: 19
- Delete event: 1
- Issue comment event: 1
- Push event: 12
- Pull request event: 2
Last Year
- Create event: 5
- Release event: 1
- Issues event: 3
- Watch event: 19
- Delete event: 1
- Issue comment event: 1
- Push event: 12
- Pull request event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Akshay Agrawal | a****a@c****u | 151 |
| Bastian Zimmermann | 1****m | 3 |
| Kashif Rasul | k****l@g****m | 2 |
| Adina Wagner | a****r@t****e | 2 |
| Therese Koch | 4****h | 1 |
| Rajarshi Guha | r****a@g****m | 1 |
| Guillermo Angeris | g****e@a****t | 1 |
| Frederik Schubert | g****b@f****e | 1 |
| Erik Kruus | e****s@g****m | 1 |
| Adam Gayoso | a****o | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 50
- Total pull requests: 31
- Average time to close issues: 23 days
- Average time to close pull requests: 4 days
- Total issue authors: 39
- Total pull request authors: 11
- Average comments per issue: 2.72
- Average comments per pull request: 0.9
- Merged pull requests: 29
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 2
- Average time to close issues: 3 months
- Average time to close pull requests: about 1 hour
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- akshayka (6)
- cowjen01 (2)
- sjfleming (2)
- pkmital (2)
- mfansler (2)
- adamgayoso (2)
- lyf7115 (2)
- kevinsweeney84 (1)
- koaning (1)
- turmeric-blend (1)
- PierreBoyeau (1)
- kyleford8 (1)
- theholymath (1)
- jlmelville (1)
- ErrorUnknown88 (1)
Pull Request Authors
- akshayka (19)
- kashif (3)
- SSamDav (1)
- BastianZim (1)
- kruus (1)
- rajarshi (1)
- adamgayoso (1)
- theresekoch (1)
- frederikschubert (1)
- adswa (1)
- angeris (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 3,215 last-month
- Total docker downloads: 10,684
-
Total dependent packages: 6
(may contain duplicates) -
Total dependent repositories: 13
(may contain duplicates) - Total versions: 58
- Total maintainers: 1
pypi.org: pymde
Minimum-Distortion Embedding
- Homepage: https://github.com/cvxgrp/pymde
- Documentation: https://pymde.readthedocs.io/
- License: Apache License, Version 2.0
-
Latest release: 0.2.3
published 8 months ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/cvxgrp/pymde
- Documentation: https://pkg.go.dev/github.com/cvxgrp/pymde#section-documentation
- License: apache-2.0
-
Latest release: v0.2.3
published 8 months ago
Rankings
conda-forge.org: pymde
PyMDE is a Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network, or any other abstract object.
- Homepage: https://github.com/cvxgrp/pymde
- License: Apache-2.0
-
Latest release: 0.1.16
published over 3 years ago
Rankings
Dependencies
- cython *
- numpy >=1.17.5
- scipy >=1.6
- matplotlib *
- numpy *
- pynndescent *
- requests *
- scipy *
- torch *
- torchvision *
- actions/checkout v2 composite
- actions/download-artifact v2 composite
- actions/setup-python v2 composite
- actions/upload-artifact v2 composite
- pypa/cibuildwheel v2.11.2 composite
- pypa/gh-action-pypi-publish master composite
- actions/checkout v2 composite
- actions/setup-python v2 composite