opentsne

Extensible, parallel implementations of t-SNE

https://github.com/pavlin-policar/opentsne

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com, springer.com, nature.com
  • Committers with academic emails
    3 of 13 committers (23.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

dimensionality-reduction embedding machine-learning tsne visualization
Last synced: 6 months ago · JSON representation

Repository

Extensible, parallel implementations of t-SNE

Basic Info
  • Host: GitHub
  • Owner: pavlin-policar
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: master
  • Homepage: https://opentsne.rtfd.io
  • Size: 70 MB
Statistics
  • Stars: 1,555
  • Watchers: 21
  • Forks: 173
  • Open Issues: 9
  • Releases: 10
Topics
dimensionality-reduction embedding machine-learning tsne visualization
Created over 7 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.rst

openTSNE
========

|Build Status| |ReadTheDocs Badge| |License Badge|

openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE) [1]_, a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings [2]_, massive speed improvements [3]_ [4]_ [5]_, enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations [6]_.

.. figure:: docs/source/images/macosko_2015.png
   :alt: Macosko 2015 mouse retina t-SNE embedding
   :align: center

   A visualization of 44,808 single cell transcriptomes obtained from the mouse retina [7]_ embedded using the multiscale kernel trick to better preserve the global aligment of the clusters.

- `Documentation `__
- `User Guide and Tutorial `__
- Examples: `basic `__, `advanced `__, `preserving global alignment `__, `embedding large data sets `__
- `Speed benchmarks `__

Installation
------------

openTSNE requires Python 3.9 or higher in order to run.

Conda
~~~~~

openTSNE can be easily installed from ``conda-forge`` with

::

   conda install --channel conda-forge opentsne

`Conda package `__

PyPi
~~~~

openTSNE is also available through ``pip`` and can be installed with

::

   pip install opentsne

`PyPi package `__

Installing from source
~~~~~~~~~~~~~~~~~~~~~~

If you wish to install openTSNE from source, please run

::

   pip install .


in the root directory to install the appropriate dependencies and compile the necessary binary files.

Please note that openTSNE requires a C/C++ compiler to be available on the system.

In order for openTSNE to utilize multiple threads, the C/C++ compiler
must support ``OpenMP``. In practice, almost all compilers
implement this with the exception of older version of ``clang`` on OSX
systems.

To squeeze the most out of openTSNE, you may also consider installing
FFTW3 prior to installation. FFTW3 implements the Fast Fourier
Transform, which is heavily used in openTSNE. If FFTW3 is not available,
openTSNE will use numpy’s implementation of the FFT, which is slightly
slower than FFTW. The difference is only noticeable with large data sets
containing millions of data points.

A hello world example
---------------------

Getting started with openTSNE is very simple. First, we'll load up some data using scikit-learn

.. code:: python

   from sklearn import datasets

   iris = datasets.load_iris()
   x, y = iris["data"], iris["target"]

then, we'll import and run

.. code:: python

   from openTSNE import TSNE

   embedding = TSNE().fit(x)

Citation
--------

If you make use of openTSNE for your work we would appreciate it if you would cite the paper

.. code::

    @article{Policar2024,
        title={openTSNE: A Modular Python Library for t-SNE Dimensionality Reduction and Embedding},
        author={Poli{\v c}ar, Pavlin G. and Stra{\v z}ar, Martin and Zupan, Bla{\v z}},
        journal={Journal of Statistical Software},
        year={2024},
        volume={109},
        number={3},
        pages={1–30},
        doi={10.18637/jss.v109.i03},
        url={https://www.jstatsoft.org/index.php/jss/article/view/v109i03}
    }
    
openTSNE implements two efficient algorithms for t-SNE. Please consider citing the original authors of the algorithm that you use. If you use FIt-SNE (default), then the citation is [5]_ below, but if you use Barnes-Hut the citations are [3]_ and [4]_.


References
----------

.. [1] Van Der Maaten, Laurens, and Hinton, Geoffrey. `“Visualizing data using
    t-SNE.” `__
    Journal of Machine Learning Research 9.Nov (2008): 2579-2605.
.. [2] Poličar, Pavlin G., Martin Stražar, and Blaž Zupan. `“Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification.” `__ Machine Learning (2021): 1-20.
.. [3] Van Der Maaten, Laurens. `“Accelerating t-SNE using tree-based algorithms.”
    `__
    Journal of Machine Learning Research 15.1 (2014): 3221-3245.
.. [4] Yang, Zhirong, Jaakko Peltonen, and Samuel Kaski. `"Scalable optimization of neighbor embedding for visualization." `__ International Conference on Machine Learning. PMLR, 2013.
.. [5] Linderman, George C., et al. `"Fast interpolation-based t-SNE for improved
    visualization of single-cell RNA-seq data." `__ Nature Methods 16.3 (2019): 243.
.. [6] Kobak, Dmitry, and Berens, Philipp. `“The art of using t-SNE for single-cell transcriptomics.” `__
    Nature Communications 10, 5416 (2019).
.. [7] Macosko, Evan Z., et al. `“Highly parallel genome-wide expression profiling of
    individual cells using nanoliter droplets.”
    `__
    Cell 161.5 (2015): 1202-1214.

.. |Build Status| image:: https://dev.azure.com/pavlingp/openTSNE/_apis/build/status/Test?branchName=master
   :target: https://dev.azure.com/pavlingp/openTSNE/_build/latest?definitionId=1&branchName=master
.. |ReadTheDocs Badge| image:: https://readthedocs.org/projects/opentsne/badge/?version=latest
   :target: https://opentsne.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status
.. |License Badge| image:: https://img.shields.io/badge/License-BSD%203--Clause-blue.svg
   :target: https://opensource.org/licenses/BSD-3-Clause

Owner

  • Name: Pavlin Poličar
  • Login: pavlin-policar
  • Kind: user
  • Location: Slovenia
  • Company: University of Ljubljana

PhD student working on applying machine learning methods to biomedical and scRNA-seq data.

GitHub Events

Total
  • Issues event: 8
  • Watch event: 85
  • Issue comment event: 36
  • Push event: 4
  • Pull request review event: 5
  • Pull request review comment event: 8
  • Pull request event: 3
  • Fork event: 14
  • Create event: 2
Last Year
  • Issues event: 8
  • Watch event: 85
  • Issue comment event: 36
  • Push event: 4
  • Pull request review event: 5
  • Pull request review comment event: 8
  • Pull request event: 3
  • Fork event: 14
  • Create event: 2

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 580
  • Total Committers: 13
  • Avg Commits per committer: 44.615
  • Development Distribution Score (DDS): 0.112
Past Year
  • Commits: 6
  • Committers: 3
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Pavlin Poličar p****p@g****m 515
Dmitry Kobak d****k@u****e 41
BlazZupan b****n@f****i 6
cclauss c****s@b****h 5
PrimozGodec p****9@g****m 4
mstrazar m****r@g****m 2
inejc n****c@g****m 1
Todd t****8@g****m 1
Tim T****e 1
Sean Pedersen 3****n 1
Leo@DS 1****G 1
Jake j****g@g****m 1
Ales Erjavec a****c@f****i 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 77
  • Total pull requests: 50
  • Average time to close issues: 3 months
  • Average time to close pull requests: 6 days
  • Total issue authors: 46
  • Total pull request authors: 8
  • Average comments per issue: 3.78
  • Average comments per pull request: 2.1
  • Merged pull requests: 44
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 4
  • Average time to close issues: 11 months
  • Average time to close pull requests: about 1 hour
  • Issue authors: 5
  • Pull request authors: 3
  • Average comments per issue: 4.0
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dkobak (21)
  • yurivict (4)
  • fsvbach (3)
  • sbembenek18 (2)
  • pavlin-policar (2)
  • tomcsojn (2)
  • ritagonmar (2)
  • Yangxiaojun1230 (2)
  • jdtuck (2)
  • bbaabemhp (1)
  • austinpatrickm (1)
  • morningphase (1)
  • valentinlageard (1)
  • jlmelville (1)
  • ivan-marroquin (1)
Pull Request Authors
  • pavlin-policar (31)
  • dkobak (12)
  • Leoo99G (2)
  • PrimozGodec (2)
  • luanamarinho (2)
  • ales-erjavec (1)
  • jnboehm (1)
  • SeanPedersen (1)
  • TimRepke (1)
Top Labels
Issue Labels
bug (8) question (4) enhancement (4) wishlist (2) wontfix (1)
Pull Request Labels

Packages

  • Total packages: 5
  • Total downloads:
    • pypi 37,310 last-month
  • Total dependent packages: 11
    (may contain duplicates)
  • Total dependent repositories: 40
    (may contain duplicates)
  • Total versions: 91
  • Total maintainers: 1
pypi.org: opentsne

Extensible, parallel implementations of t-SNE

  • Versions: 25
  • Dependent Packages: 8
  • Dependent Repositories: 38
  • Downloads: 37,310 Last month
Rankings
Dependent packages count: 1.6%
Downloads: 1.8%
Stargazers count: 1.9%
Average: 2.3%
Dependent repos count: 2.4%
Forks count: 4.0%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/pavlin-policar/openTSNE
  • Versions: 22
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 6 months ago
proxy.golang.org: github.com/pavlin-policar/opentsne
  • Versions: 22
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 6.9%
Last synced: 6 months ago
conda-forge.org: opentsne
  • Versions: 17
  • Dependent Packages: 2
  • Dependent Repositories: 1
Rankings
Stargazers count: 11.8%
Forks count: 15.3%
Average: 17.8%
Dependent packages count: 19.6%
Dependent repos count: 24.4%
Last synced: 6 months ago
anaconda.org: opentsne

openTSNE is a modular Python implementation of t-Distributed Stochasitc Neighbor Embedding (t-SNE), a popular dimensionality-reduction algorithm for visualizing high-dimensional data sets. openTSNE incorporates the latest improvements to the t-SNE algorithm, including the ability to add new data points to existing embeddings, massive speed improvements, enabling t-SNE to scale to millions of data points and various tricks to improve global alignment of the resulting visualizations.

  • Versions: 5
  • Dependent Packages: 1
  • Dependent Repositories: 1
Rankings
Stargazers count: 22.2%
Forks count: 27.4%
Dependent packages count: 30.7%
Average: 32.9%
Dependent repos count: 51.4%
Last synced: 6 months ago