Jupyter Scatter

Jupyter Scatter: Interactive Exploration of Large-Scale Datasets - Published in JOSS (2024)

https://github.com/flekschas/jupyter-scatter

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

jupyter-notebook-extension jupyterlab-extension scatter-plot visualization

Keywords from Contributors

mesh
Last synced: 4 months ago · JSON representation ·

Repository

Interactive 2D scatter plot widget for Jupyter Lab and Notebook. Scales to millions of points!

Basic Info
  • Host: GitHub
  • Owner: flekschas
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage: https://jupyter-scatter.dev
  • Size: 3.4 MB
Statistics
  • Stars: 444
  • Watchers: 7
  • Forks: 26
  • Open Issues: 11
  • Releases: 52
Topics
jupyter-notebook-extension jupyterlab-extension scatter-plot visualization
Created over 5 years ago · Last pushed 4 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Jupyter Scatter

[![pypi version](https://img.shields.io/pypi/v/jupyter-scatter.svg?color=1a8cff&style=flat-square)](https://pypi.org/project/jupyter-scatter) [![python versions](https://img.shields.io/pypi/pyversions/jupyter-scatter.svg?color=1a8cff&style=flat-square)](https://pypi.python.org/project/jupyter-scatter) [![build status](https://img.shields.io/github/actions/workflow/status/flekschas/jupyter-scatter/ci.yml?branch=main&color=1a8cff&style=flat-square)](https://github.com/flekschas/jupyter-scatter/actions/workflows/ci.yml) [![API docs](https://img.shields.io/badge/API-docs-1a8cff.svg?style=flat-square)](https://jupyter-scatter.dev) [![tutorial](https://img.shields.io/badge/SciPy_'23-tutorial-1a8cff.svg?style=flat-square)](https://github.com/flekschas/jupyter-scatter-tutorial) [![DOI](https://img.shields.io/badge/JOSS-10.21105/joss.07059-1a8cff.svg?style=flat-square)](https://doi.org/10.21105/joss.07059)
An interactive scatter plot widget for Jupyter Notebook, Lab, and Google Colab
that can handle [millions of points](#visualize-millions-of-data-points) and supports [view linking](#linking-scatter-plots).


![Demo](https://user-images.githubusercontent.com/932103/223292112-c9ca18b9-bc6b-4c3b-94ac-984960e8f717.gif)

Features?

  • 🖱️ Interactive: Pan, zoom, and select data points interactively with your mouse or through the Python API.
  • 🚀 Scalable: Plot up to several millions data points smoothly thanks to WebGL rendering.
  • 🔗 Interlinked: Synchronize the view, hover, and selection across multiple scatter plot instances.
  • Effective Defaults: Rely on Jupyter Scatter to choose perceptually effective point colors and opacity by default.
  • 📚 Friendly API: Enjoy a readable API that integrates deeply with Pandas DataFrames.
  • 🛠️ Integratable: Use Jupyter Scatter in your own widgets by observing its traitlets.

Why?

Imagine trying to explore a dataset of millions of data points as a 2D scatter. Besides plotting, the exploration typically involves three things: First, we want to interactively adjust the view (e.g., via panning & zooming) and the visual point encoding (e.g., the point color, opacity, or size). Second, we want to be able to select and highlight data points. And third, we want to compare multiple datasets or views of the same dataset (e.g., via synchronized interactions). The goal of jupyter-scatter is to support all three requirements and scale to millions of points.

How?

Internally, Jupyter Scatter uses regl-scatterplot for WebGL rendering, traitlets for two-way communication between the JS and iPython kernels, and anywidget for composing the widget.

Quick Start

Try out Jupyter Scatter with our one-liner. This requires uv.

bash uvx jupyter-scatter demo

Docs

Visit https://jupyter-scatter.dev for detailed documentation including examples and a complete API description.


Index

  1. Install
  2. Get Started
    1. Simplest Example
    2. Pandas DataFrame Example
    3. Advanced Example
    4. Functional API Example
    5. Linking Scatter Plots
    6. Visualize Millions of Data Points
    7. Google Colab
  3. Development
  4. Citation

Install

bash pip install jupyter-scatter

The default installation includes 99% of features. If you want all additional features install Jupyter Scatter as follows:

bash pip install "jupyter-scatter[all]"

This includes the following additional features: 1. Contour annotation with Seaborn 2. Label positioning "largest_cluster" with HDBSCAN 3. Progress showing with tqdm when precomputing labels via label_placement.compute(show_progress=True)

If you want to use Jupyter Scatter in JupyterLab <=2 you need to manually install it as an extension as follows:

bash jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-scatter

If you want to instal Jupyter Scatter from source, make sure to have Node installed. While several version might work, we're primarily testing against the Active LTS and Maintenance LTS releases.

For a minimal working example, take a look at test-environments.

Get Started

[!TIP] Visit jupyter-scatter.dev for details on all essential features of Jupyter Scatter and check out our full-blown tutorial from SciPy '23.

Simplest Example

In the simplest case, you can pass the x/y coordinates to the plot function as follows:

```python import jscatter import numpy as np

x = np.random.rand(500) y = np.random.rand(500)

jscatter.plot(x, y) ```

Simplest scatter plotexample

Pandas DataFrame Example

Say your data is stored in a Pandas dataframe like the following:

```python import pandas as pd

Just some random float and int values

data = np.random.rand(500, 4) df = pd.DataFrame(data, columns=['mass', 'speed', 'pval', 'group'])

We'll convert the group column to strings to ensure it's recognized as

categorical data. This will come in handy in the advanced example.

df['group'] = df['group'].map(lambda c: chr(65 + round(c)), na_action=None) ```

| | x | y | value | group | |---|------|------|-------|-------| | 0 | 0.13 | 0.27 | 0.51 | G | | 1 | 0.87 | 0.93 | 0.80 | B | | 2 | 0.10 | 0.25 | 0.25 | F | | 3 | 0.03 | 0.90 | 0.01 | G | | 4 | 0.19 | 0.78 | 0.65 | D |

You can then visualize this data by referencing column names:

python jscatter.plot(data=df, x='mass', y='speed')

Show the resulting scatter plot Pandas scatter plot example

Advanced Example

Often you want to customize the visual encoding, such as the point color, size, and opacity.

python jscatter.plot( data=df, x='mass', y='speed', size=8, # static encoding color_by='group', # data-driven encoding opacity_by='density', # view-driven encoding )

Advanced scatter plot example

In the above example, we chose a static point size of 8. In contrast, the point color is data-driven and assigned based on the categorical group value. The point opacity is view-driven and defined dynamically by the number of points currently visible in the view.

Also notice how jscatter uses an appropriate color map by default based on the data type used for color encoding. In this examples, jscatter uses the color blindness safe color map from Okabe and Ito as the data type is categorical and the number of categories is less than 9.

Important: in order for jscatter to recognize categorical data, the dtype of the corresponding column needs to be category!

You can, of course, customize the color map and many other parameters of the visual encoding as shown next.

Functional API Example

The flat API can get overwhelming when you want to customize a lot of properties. Therefore, jscatter provides a functional API that groups properties by type and exposes them via meaningfully-named methods.

python scatter = jscatter.Scatter(data=df, x='mass', y='speed') scatter.selection(df.query('mass < 0.5').index) scatter.color(by='mass', map='plasma', order='reverse') scatter.opacity(by='density') scatter.size(by='pval', map=[2, 4, 6, 8, 10]) scatter.height(480) scatter.background('black') scatter.show()

Functional API scatter plot example

When you update properties dynamically, i.e., after having called scatter.show(), the plot will update automatically. For instance, try calling scatter.xy('speed', 'mass')and you will see how the points are mirrored along the diagonal.

Moreover, all arguments are optional. If you specify arguments, the methods will act as setters and change the properties. If you call a method without any arguments it will act as a getter and return the property (or properties). For example, scatter.selection() will return the currently selected points.

Finally, the scatter plot is interactive and supports two-way communication. Hence, if you select some point with the lasso tool and then call scatter.selection() you will get the current selection.

Linking Scatter Plots

To explore multiple scatter plots and have their view, selection, and hover interactions link, use jscatter.link().

python jscatter.link([ jscatter.Scatter(data=embeddings, x='pcaX', y='pcaY', **config), jscatter.Scatter(data=embeddings, x='tsneX', y='tsneY', **config), jscatter.Scatter(data=embeddings, x='umapX', y='umapY', **config), jscatter.Scatter(data=embeddings, x='caeX', y='caeY', **config) ], rows=2)

https://user-images.githubusercontent.com/932103/162584133-85789d40-04f5-428d-b12c-7718f324fb39.mp4

See notebooks/linking.ipynb for more details.

Visualize Millions of Data Points

With jupyter-scatter you can easily visualize and interactively explore datasets with millions of points.

In the following we're visualizing 5 million points generated with the Rössler attractor.

python points = np.asarray(roesslerAttractor(5000000)) jscatter.plot(points[:,0], points[:,1], height=640)

https://user-images.githubusercontent.com/932103/162586987-0b5313b0-befd-4bd1-8ef5-13332d8b15d1.mp4

See notebooks/examples.ipynb for more details.

Google Colab

While jscatter is primarily developed for Jupyter Lab and Notebook, it also runs just fine in Google Colab. See jupyter-scatter-colab-test.ipynb for an example.

Development

Setting up a development environment

**Requirements:** - [uv](https://astral.sh/uv) >= v0.4.0 - [Node](https://nodejs.org) [Active LTS or Maintenance LTS release](https://nodejs.org/en/about/previous-releases) **Installation:** ```bash git clone https://github.com/flekschas/jupyter-scatter/ jupyter-scatter && cd jupyter-scatter uv pip install -e ".[all]" uv run jupyter-lab ``` **After Changing Python code:** restart the kernel. Alternatively, you can enable auto reloading by enabling the `autoreload` extension. To do so, run the following code at the beginning of a notebook: ```py %load_ext autoreload %autoreload 2 ``` **After Changing JavaScript code:** do `cd js && npm run build`. Alternatively, you can enable anywidgets hot-module-reloading (HMR) as follows and run `npm run watch` to rebundle the JS code on the fly. ```py %env ANYWIDGET_HMR=1 ```

Setting up a test environment

Go to [test-environments](test-environments) and follow the instructions.

Running tests

Run `uv run pytest`.

Citation

If you use Jupyter Scatter in your research, please cite our JOSS paper:

bibtex @article{lekschas2024jupyter, title = {{Jupyter Scatter}: Interactive Exploration of Large-Scale Datasets}, author = {Fritz Lekschas and Trevor Manz}, journal = {Journal of Open Source Software}, publisher = {The Open Journal}, year = {2024}, volume = {9}, number = {101}, pages = {7059}, doi = {10.21105/joss.07059}, url = {https://doi.org/10.21105/joss.07059}, }

Owner

  • Name: Fritz Lekschas
  • Login: flekschas
  • Kind: user
  • Location: Somerville, MA

Computer scientist researching visualization systems for large-scale exploration of biomedical data. Harvard CS PhD '21.

JOSS Publication

Jupyter Scatter: Interactive Exploration of Large-Scale Datasets
Published
September 10, 2024
Volume 9, Issue 101, Page 7059
Authors
Fritz Lekschas ORCID
Ozette Technologies, Seattle, WA, USA
Trevor Manz ORCID
Harvard Medical School, Boston, MA, USA
Editor
Fabian-Robert Stöter ORCID
Tags
Jupyter widget scatterplot 2D scatter interactive data visualization embedding plot WebGL

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Lekschas
  given-names: Fritz
  orcid: "https://orcid.org/0000-0001-8432-4835"
- family-names: Manz
  given-names: Trevor
  orcid: "https://orcid.org/0000-0001-7694-5164"
doi: 10.5281/zenodo.13391017
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Lekschas
    given-names: Fritz
    orcid: "https://orcid.org/0000-0001-8432-4835"
  - family-names: Manz
    given-names: Trevor
    orcid: "https://orcid.org/0000-0001-7694-5164"
  date-published: 2024-09-10
  doi: 10.21105/joss.07059
  issn: 2475-9066
  issue: 101
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 7059
  title: "Jupyter Scatter: Interactive Exploration of Large-Scale
    Datasets"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.07059"
  volume: 9
title: "Jupyter Scatter: Interactive Exploration of Large-Scale
  Datasets"

GitHub Events

Total
  • Create event: 19
  • Release event: 4
  • Issues event: 22
  • Watch event: 58
  • Delete event: 13
  • Issue comment event: 58
  • Push event: 59
  • Pull request review comment event: 11
  • Pull request review event: 13
  • Pull request event: 32
  • Fork event: 6
Last Year
  • Create event: 19
  • Release event: 4
  • Issues event: 22
  • Watch event: 58
  • Delete event: 13
  • Issue comment event: 58
  • Push event: 59
  • Pull request review comment event: 11
  • Pull request review event: 13
  • Pull request event: 32
  • Fork event: 6

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 353
  • Total Committers: 7
  • Avg Commits per committer: 50.429
  • Development Distribution Score (DDS): 0.173
Past Year
  • Commits: 48
  • Committers: 3
  • Avg Commits per committer: 16.0
  • Development Distribution Score (DDS): 0.167
Top Committers
Name Email Commits
Fritz Lekschas c****e@l****e 292
dependabot[bot] 4****] 46
Trevor Manz t****z@g****m 11
pablo-gar p****o@c****m 1
Sehi L'Yi s****i@g****m 1
Kurt McKee c****e@k****g 1
Dan Rosén d****2@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 69
  • Total pull requests: 164
  • Average time to close issues: 2 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 33
  • Total pull request authors: 11
  • Average comments per issue: 2.54
  • Average comments per pull request: 1.14
  • Merged pull requests: 148
  • Bot issues: 0
  • Bot pull requests: 61
Past Year
  • Issues: 19
  • Pull requests: 38
  • Average time to close issues: 4 days
  • Average time to close pull requests: 4 days
  • Issue authors: 8
  • Pull request authors: 4
  • Average comments per issue: 1.37
  • Average comments per pull request: 1.53
  • Merged pull requests: 33
  • Bot issues: 0
  • Bot pull requests: 13
Top Authors
Issue Authors
  • flekschas (14)
  • abast (13)
  • hadim (6)
  • hamelin (3)
  • jacowp357 (3)
  • manzt (2)
  • mjohnson11 (2)
  • arogozhnikov (2)
  • GeorgePearse (2)
  • InquilineKea (2)
  • drorbar (1)
  • lmcinnes (1)
  • jdonaldson (1)
  • armsp (1)
  • sergpolly (1)
Pull Request Authors
  • flekschas (80)
  • dependabot[bot] (62)
  • manzt (15)
  • codeanticode (2)
  • danr (2)
  • pablo-gar (1)
  • kurtmckee (1)
  • hamelin (1)
  • sehilyi (1)
  • faroit (1)
  • askartemir (1)
Top Labels
Issue Labels
bug (26) enhancement (16) question (6) unreproducible (4) documentation (2) help wanted (1) wontfix (1)
Pull Request Labels
dependencies (62) bug (19) javascript (18) enhancement (13) github_actions (11) documentation (4)

Packages

  • Total packages: 4
  • Total downloads:
    • npm 4 last-month
    • pypi 2,188 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 134
  • Total maintainers: 2
proxy.golang.org: github.com/flekschas/jupyter-scatter
  • Versions: 53
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 4 months ago
npmjs.org: jupyter-scatter

A scatter plot extension for Jupyter Notebook and Lab

  • Versions: 28
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 4 Last month
Rankings
Stargazers count: 5.0%
Forks count: 11.3%
Downloads: 11.7%
Average: 13.9%
Dependent packages count: 16.2%
Dependent repos count: 25.3%
Maintainers (1)
Last synced: 4 months ago
pypi.org: jupyter-scatter

An interactive scatter plot widget for Jupyter Notebook, Lab, and Google Colab that can handle millions of points and supports view linking

  • Versions: 52
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 2,054 Last month
Rankings
Dependent packages count: 9.6%
Average: 16.1%
Downloads: 16.7%
Dependent repos count: 21.9%
Maintainers (1)
Last synced: 4 months ago
pypi.org: jupyter-scatter-scsketch

An interactive scatter plot widget for Jupyter Notebook, Lab, and Google Colab that can handle millions of points and supports view linking (and adds support for directional search)

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 134 Last month
Rankings
Dependent packages count: 8.7%
Average: 28.7%
Dependent repos count: 48.8%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v1 composite
  • actions/setup-python v2 composite
.github/workflows/publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v1 composite
  • actions/setup-python v2 composite
js/package-lock.json npm
  • 672 dependencies
js/package.json npm
  • @jupyterlab/builder ^3.5.2 development
  • css-loader ^3.6.0 development
  • eslint ^8.30.0 development
  • eslint-config-prettier ^8.5.0 development
  • eslint-plugin-prettier ^4.2.1 development
  • lint-staged ^10.5.4 development
  • prettier ^2.8.1 development
  • pretty-quick ^3.1.3 development
  • rimraf ^3.0.2 development
  • style-loader ^1.3.0 development
  • webpack ^5.75.0 development
  • webpack-cli ^4.10.0 development
  • @jupyter-widgets/base ^1.1 || ^2 || ^3 || ^4 || ^5 || ^6
  • camera-2d-simple ~2.2.1
  • d3-axis ~3.0.0
  • d3-scale ~4.0.2
  • d3-selection ~3.0.0
  • dom-2d-camera ~2.2.3
  • gl-matrix ~3.3.0
  • lodash ~4.17.21
  • pub-sub-es ~2.0.1
  • regl ~2.1.0
  • regl-scatterplot ~1.4.2
docs/package-lock.json npm
  • @algolia/autocomplete-core 1.9.3 development
  • @algolia/autocomplete-plugin-algolia-insights 1.9.3 development
  • @algolia/autocomplete-preset-algolia 1.9.3 development
  • @algolia/autocomplete-shared 1.9.3 development
  • @algolia/cache-browser-local-storage 4.19.1 development
  • @algolia/cache-common 4.19.1 development
  • @algolia/cache-in-memory 4.19.1 development
  • @algolia/client-account 4.19.1 development
  • @algolia/client-analytics 4.19.1 development
  • @algolia/client-common 4.19.1 development
  • @algolia/client-personalization 4.19.1 development
  • @algolia/client-search 4.19.1 development
  • @algolia/logger-common 4.19.1 development
  • @algolia/logger-console 4.19.1 development
  • @algolia/requester-browser-xhr 4.19.1 development
  • @algolia/requester-common 4.19.1 development
  • @algolia/requester-node-http 4.19.1 development
  • @algolia/transporter 4.19.1 development
  • @babel/parser 7.22.14 development
  • @docsearch/css 3.5.2 development
  • @docsearch/js 3.5.2 development
  • @docsearch/react 3.5.2 development
  • @esbuild/android-arm 0.18.20 development
  • @esbuild/android-arm64 0.18.20 development
  • @esbuild/android-x64 0.18.20 development
  • @esbuild/darwin-arm64 0.18.20 development
  • @esbuild/darwin-x64 0.18.20 development
  • @esbuild/freebsd-arm64 0.18.20 development
  • @esbuild/freebsd-x64 0.18.20 development
  • @esbuild/linux-arm 0.18.20 development
  • @esbuild/linux-arm64 0.18.20 development
  • @esbuild/linux-ia32 0.18.20 development
  • @esbuild/linux-loong64 0.18.20 development
  • @esbuild/linux-mips64el 0.18.20 development
  • @esbuild/linux-ppc64 0.18.20 development
  • @esbuild/linux-riscv64 0.18.20 development
  • @esbuild/linux-s390x 0.18.20 development
  • @esbuild/linux-x64 0.18.20 development
  • @esbuild/netbsd-x64 0.18.20 development
  • @esbuild/openbsd-x64 0.18.20 development
  • @esbuild/sunos-x64 0.18.20 development
  • @esbuild/win32-arm64 0.18.20 development
  • @esbuild/win32-ia32 0.18.20 development
  • @esbuild/win32-x64 0.18.20 development
  • @jridgewell/sourcemap-codec 1.4.15 development
  • @types/web-bluetooth 0.0.17 development
  • @vue/compiler-core 3.3.4 development
  • @vue/compiler-dom 3.3.4 development
  • @vue/compiler-sfc 3.3.4 development
  • @vue/compiler-ssr 3.3.4 development
  • @vue/devtools-api 6.5.0 development
  • @vue/reactivity 3.3.4 development
  • @vue/reactivity-transform 3.3.4 development
  • @vue/runtime-core 3.3.4 development
  • @vue/runtime-dom 3.3.4 development
  • @vue/server-renderer 3.3.4 development
  • @vue/shared 3.3.4 development
  • @vueuse/core 10.4.1 development
  • @vueuse/integrations 10.4.1 development
  • @vueuse/metadata 10.4.1 development
  • @vueuse/shared 10.4.1 development
  • algoliasearch 4.19.1 development
  • ansi-sequence-parser 1.1.1 development
  • csstype 3.1.2 development
  • esbuild 0.18.20 development
  • estree-walker 2.0.2 development
  • focus-trap 7.5.2 development
  • fsevents 2.3.3 development
  • jsonc-parser 3.2.0 development
  • magic-string 0.30.3 development
  • mark.js 8.11.1 development
  • minisearch 6.1.0 development
  • nanoid 3.3.6 development
  • picocolors 1.0.0 development
  • postcss 8.4.31 development
  • preact 10.17.1 development
  • rollup 3.28.1 development
  • search-insights 2.8.0 development
  • shiki 0.14.3 development
  • source-map-js 1.0.2 development
  • tabbable 6.2.0 development
  • vite 4.4.9 development
  • vitepress 1.0.0-rc.10 development
  • vscode-oniguruma 1.7.0 development
  • vscode-textmate 8.0.0 development
  • vue 3.3.4 development
  • vue-demi 0.14.6 development
docs/package.json npm
  • vitepress ^1.0.0-rc.10 development
pyproject.toml pypi
  • anywidget >=0.2.0
  • ipython *
  • ipywidgets >=7.6,<9
  • matplotlib *
  • numpy *
  • pandas *
  • traittypes >=0.2.1
  • typing_extensions *