kangas

🦘 Explore multimedia datasets at scale

https://github.com/comet-ml/kangas

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • ✓
    CITATION.cff file
    Found CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • ✓
    DOI references
    Found 3 DOI reference(s) in README
  • ✓
    Academic publication links
    Links to: zenodo.org
  • ✓
    Committers with academic emails
    1 of 10 committers (10.0%) from academic institutions
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (15.6%) to scientific vocabulary

Keywords

data-analysis data-exploration dataframe datagrid machine-learning
Last synced: 4 months ago · JSON representation ·

Repository

🦘 Explore multimedia datasets at scale

Basic Info
Statistics
  • Stars: 1,064
  • Watchers: 15
  • Forks: 52
  • Open Issues: 1
  • Releases: 44
Topics
data-analysis data-exploration dataframe datagrid machine-learning
Created about 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md



PyPI version GitHub Kangas Live Demo Kangas Documentation Downloads DOI

Kangas: Explore Multimedia Datasets at Scale :kangaroo:

Kangas is a tool for exploring, analyzing, and visualizing large-scale multimedia data. It provides a straightforward Python API for logging large tables of data, along with an intuitive visual interface for performing complex queries against your dataset.

The key features of Kangas include:

  • Scalability. Kangas DataGrid, the fundamental class for representing datasets, can easily store millions of rows of data.
  • Performance. Group, sort, and filter across millions of data points in seconds with a simple, fast UI.
  • Interoperability. Any data, any environment. Kangas can run in a notebook or as a standalone app, both locally and remotely.
  • Integrated computer vision support. Visualize and filter bounding boxes, labels, and metadata without any extra setup.

You can access a live demo of Kangas at kangas.comet.com.

Getting Started

Kangas is accessible as a Python library via pip pip install kangas

Once installed, there are many ways to load or create a DataGrid.

Without writing any code, you can even download a DataGrid and begin exploring the data. At the console:

kangas server https://github.com/caleb-kaiser/kangas_examples/raw/master/coco-500.datagrid.zip

That's it!

In the next example, we load a publicly available DataGrid file, but the Kangas API also provides methods for ingesting CSVs, Pandas DataFrames, and for manually constructing a new DataGrid:

```python import kangas as kg

Load an existing DataGrid

dg = kg.readdatagrid("https://github.com/caleb-kaiser/kangasexamples/raw/master/coco-500.datagrid.zip") ```

After your DataGrid is initialized, you can render it within the Kangas Viewer directly from Python:

python dg.show() image

From the Kangas Viewer, you can group, sort, and filter data. In addition, Kangas will do its best to parse any metadata attached to your assets. For example, if you're using the COCO-500 DataGrid from the quickstart above, Kangas will automatically parse labels and scores for each image:

And voilà! Now you're started using Kangas.

Pandas DataFrames

Kangas can also read Pandas DataFrame objects directly:

```python import kangas as kg import pandas as pd

df = pd.DataFrame({"hiddenlayersize": [8, 16, 64], "loss": [0.97, 0.53, 0.12]}) dg = kg.read_dataframe(df) ```

HuggingFace Datasets

HuggingFace's datasets can also be loaded into DataGrid directly because they use rows of dictionaries, and images are represented by PIL images. DataGrid will automatically convert PIL images into a Kangas Image:

```python import kangas as kg from datasets import load_dataset

dataset = load_dataset("beans", split="train") dg = kg.DataGrid(dataset) ```

Parquet files

Note: You will need to have pyarrow installed to read parquet files.

```python import kangas as kg

dg = kg.read_parquet("https://github.com/Teradata/kylo/raw/master/samples/sample-data/parquet/userdata5.parquet") ```

If you'd like to explore further, take a look at our example notebooks below:

Documentation

  1. Documentation Homepage
  2. Quickstart Notebook
  3. Integrations Notebook
  4. MNIST Classification Example

FAQ

Is Kangas ready for public use?

Kangas is currently in an open beta. We stress test Kangas heavily and often, and are confident in sharing with the public. That being said, it is a very young project, and there will be bugs and rough edges. Additionally, new features will be added at a fast pace, so if you find a bug or have a request, please do not hesitate to open a ticket or start a discussion.

Does Kangas support _____ system?

Kangas can be run as a standalone application on newer versions of Windows, MacOS, and most popular Linux distributions. In addition, Kangas can run remotely via Google Colab, or within any Jupyter notebook environment.

When should I use Kangas instead of _____?

Pandas

Kangas and Pandas are complimentary tools. When you've wrangled your data into a Pandas DataFrame, Kangas can ingest that DataFrame via the DataGrid.read_dataframe() method, making it easy to visualize and explore your tabular data. Additionally, if your data is too large to process in Pandas or involves multimedia assets, Kangas is a strong alternative.

Tensorboard

TensorBoard is one of several tools (including Kangas parent organization, Comet that specializes in experiment management and monitoring). Like Kangas, it provides charting and visualizations out of the box, but is specifically designed for analyzing training workflows. Kangas, in contrast, is designed to analyze any dataset. For example, even if you use a tool like TensorBoard for analyzing training runs, you may still use Kangas before training for exploratory data analysis, or for prediction analysis post-deployment.

What is Kangas relationship with Comet?

Kangas is developed and maintained by the Research team at Comet. It began life as a prototype for Comet users who needed to visualize large computer vision datasets, and was later spun out into a standalone open source project. Kangas is and always will be free and open source software, and we are more than happy to accept community contributions.

Contributing

Kangas has only recently been released, and as such, we don't have much of a formal process for contributions. If you have an idea or would like to make a contribution, we recommend opening a ticket describing your proposed contribution so that we can collaborate directly. We love working with community contributors.

Owner

  • Name: Comet
  • Login: comet-ml
  • Kind: organization
  • Email: support@comet.com

Comet offers a self-hosted & cloud-based ML platform for the complete ML lifecycle. Trusted by over 150 enterprise customers, free to individuals and academics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use Kangas DataGrid, please feel free to cite it as below"
authors:
- family-names: "Blank"
  given-names: "Douglas"
  orcidid: "https://orcid.org/0000-0003-3538-8829"
title: "Kangas DataGrid: Explore multimedia datasets at scale"
version: 1.3.2
doi: 10.5281/zenodo.7410883
date-released: 2022-12-07
url: "https://github.com/comet-ml/kangas"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 25
  • Member event: 1
  • Issue comment event: 1
  • Push event: 3
  • Gollum event: 1
  • Fork event: 6
Last Year
  • Issues event: 2
  • Watch event: 25
  • Member event: 1
  • Issue comment event: 1
  • Push event: 3
  • Gollum event: 1
  • Fork event: 6

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 906
  • Total Committers: 10
  • Avg Commits per committer: 90.6
  • Development Distribution Score (DDS): 0.234
Past Year
  • Commits: 6
  • Committers: 2
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
Douglas Blank d****k@g****m 694
Caleb Kaiser c****b@c****u 173
Caleb Kaiser 4****r 14
DN6 d****r@g****m 11
Sid Mehta s****0@g****m 4
Mark Mayo m****k@t****z 4
Douglas Blank d****g@c****m 3
nerdyespresso 1****o 1
Kishan Savant k****7@g****m 1
Javier Cruz-Mota j****m@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 15
  • Total pull requests: 100
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 10
  • Total pull request authors: 8
  • Average comments per issue: 5.4
  • Average comments per pull request: 0.18
  • Merged pull requests: 93
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • julie-puckett (3)
  • caleb-kaiser (3)
  • sanderfoobar (1)
  • RRighart (1)
  • datavistics (1)
  • cs-mshah (1)
  • jackperkins98 (1)
  • Khaihuyennguyen (1)
  • fnauman (1)
  • SilvanGondolin (1)
Pull Request Authors
  • dsblank (59)
  • caleb-kaiser (26)
  • DN6 (3)
  • sherpan (2)
  • NeoKish (1)
  • marksmayo (1)
  • nerdyespresso (1)
  • ja-bot (1)
Top Labels
Issue Labels
enhancement (2) bug (2) question (1)
Pull Request Labels
enhancement (8) bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 717 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 3
  • Total versions: 96
  • Total maintainers: 1
pypi.org: kangas

Tool for exploring columnar data, including multimedia

  • Versions: 96
  • Dependent Packages: 0
  • Dependent Repositories: 3
  • Downloads: 717 Last month
Rankings
Stargazers count: 2.0%
Forks count: 6.2%
Average: 7.7%
Dependent repos count: 9.0%
Dependent packages count: 10.0%
Downloads: 11.1%
Maintainers (1)
Last synced: 4 months ago

Dependencies

examples/hugging_face/datagrids_with_hub/requirements.txt pypi
  • datasets *
  • huggingface_hub *
  • kangas *
  • transformers *
examples/hugging_face/object_detection/requirements.txt pypi
  • datasets *
  • kangas *
  • kornia *
  • torch *
  • torchvision *
  • transformers *
frontend/package.json npm
  • eslint ^8.25.0 development
  • eslint-config-next 12.3.1 development
  • eslint-config-prettier ^8.5.0 development
  • prettier ^2.7.1 development
  • @dnd-kit/core ^6.0.5
  • @dnd-kit/modifiers ^6.0.0
  • @dnd-kit/sortable ^7.0.1
  • @emotion/styled ^11.9.3
  • @material-ui/core ^4.12.4
  • @material-ui/icons ^4.11.3
  • @mui/icons-material ^5.10.9
  • @mui/material ^5.9.2
  • @react-hook/resize-observer ^1.2.6
  • axios ^0.27.2
  • highlight.js ^11.6.0
  • ms 2.1.3
  • next 12.2.1-canary.1
  • plotly.js ^2.13.2
  • react 18.1.0
  • react-async ^10.0.1
  • react-dom 18.1.0
  • react-intersection-observer ^9.4.0
  • react-plotly.js ^2.5.1
  • react-select ^5.4.0
  • react-table ^7.8.0
  • sharp ^0.31.0
  • use-debounce ^8.0.4
  • uuid ^9.0.0
  • wavesurfer.js ^6.2.0
frontend/yarn.lock npm
  • 648 dependencies
backend/setup.py pypi
  • Pillow *
  • matplotlib *
  • nodejs-bin ==16.15.1a4
  • numpy *
  • psutil *
  • requests *
  • scipy *
  • tornado *
  • tqdm *