https://github.com/vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second πŸš€

https://github.com/vaexio/vaex

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • β—‹
    CITATION.cff file
  • βœ“
    codemeta.json file
    Found codemeta.json file
  • βœ“
    .zenodo.json file
    Found .zenodo.json file
  • β—‹
    DOI references
  • β—‹
    Academic publication links
  • βœ“
    Committers with academic emails
    6 of 74 committers (8.1%) from academic institutions
  • β—‹
    Institutional organization owner
  • β—‹
    JOSS paper metadata
  • β—‹
    Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization

Keywords from Contributors

distributed parallel transformer closember mlops notebook tensor cryptocurrencies orchestration large-language-models
Last synced: 5 months ago · JSON representation

Repository

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second πŸš€

Basic Info
  • Host: GitHub
  • Owner: vaexio
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage: https://vaex.io
  • Size: 133 MB
Statistics
  • Stars: 8,422
  • Watchers: 139
  • Forks: 600
  • Open Issues: 550
  • Releases: 4
Topics
bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization
Created over 11 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Security Authors

README.md

Supported Python Versions Documentation Slack

What is Vaex?

Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It calculates statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) samples/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

Installing

With pip: $ pip install vaex Or conda: $ conda install -c conda-forge vaex

For more details, see the documentation

Key features

Instant opening of Huge data files (memory mapping)

HDF5 and Apache Arrow supported.

opening1a

opening1b

Read the documentation on how to efficiently convert your data from CSV files, Pandas DataFrames, or other sources.

Lazy streaming from S3 supported in combination with memory mapping.

opening1c

Expression system

Don't waste memory or time with feature engineering, we (lazily) transform your data when needed.

expression

Out-of-core DataFrame

Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk, and will be streamed only when needed. Delay the time before you need a cluster.

occ-animated

Fast groupby / aggregations

Vaex implements parallelized, highly performant groupby operations, especially when using categories (>1 billion/second).

groupby

Fast and efficient join

Vaex doesn't copy/materialize the 'right' table when joining, saving gigabytes of memory. With subsecond joining on a billion rows, it's pretty fast!

join

More features

Contributing

See contributing page.

Slack

Join the discussion in our Slack channel!

Learn more about Vaex

Owner

  • Name: vaex io
  • Login: vaexio
  • Kind: organization
  • Email: contact@vaex.io
  • Location: the Netherlands

Big data made simple. Visualization and exploration. Machine learning and deployment.

GitHub Events

Total
  • Issues event: 12
  • Watch event: 198
  • Issue comment event: 60
  • Push event: 2
  • Pull request review comment event: 7
  • Pull request review event: 10
  • Pull request event: 11
  • Fork event: 14
  • Create event: 1
Last Year
  • Issues event: 12
  • Watch event: 198
  • Issue comment event: 60
  • Push event: 2
  • Pull request review comment event: 7
  • Pull request review event: 10
  • Pull request event: 11
  • Fork event: 14
  • Create event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 3,357
  • Total Committers: 74
  • Avg Commits per committer: 45.365
  • Development Distribution Score (DDS): 0.139
Past Year
  • Commits: 11
  • Committers: 2
  • Avg Commits per committer: 5.5
  • Development Distribution Score (DDS): 0.455
Top Committers
Name Email Commits
Maarten A. Breddels m****s@g****m 2,889
Jovan Veljanoski j****i@g****m 299
Jovan jv@r****k 31
Bulat Yaminov b****v@g****m 10
Ben Epstein b****n@w****u 8
xdssio 3****o 7
Kyle McEntush s****s@g****m 6
shareactor y****n@s****o 5
ddelange 1****e 5
Steven Rieder s****n@r****l 5
Matthew Barber q****t@g****m 4
Kyle McEntush k****h@i****m 4
Nick Crews n****s@g****m 3
Sai Kiran n****3@g****m 3
Thomas Delteil t****i@m****m 3
marload r****8@g****m 3
Naohiro Heya d****b@g****m 3
franz.media f****r@g****m 3
yohplala y****a 2
Christian Laforte c****e@a****m 2
Chiao c****n@g****m 2
Dougal J. Sutherland d****l@g****m 2
Eduardo Balbinot e****t@g****m 2
Franz WΓΆllert f****t@g****m 2
Marco Paolini m****i@g****m 2
Meredith Durbin m****n@g****m 2
Ralf Gommers r****s@g****m 2
fsiola f****a@g****m 2
Alex V. Kotlar a****r@b****u 1
Alenka Frim A****F 1
and 44 more...

Issues and Pull Requests

Last synced: 5 months ago

All Time
  • Total issues: 154
  • Total pull requests: 137
  • Average time to close issues: 5 months
  • Average time to close pull requests: 3 months
  • Total issue authors: 115
  • Total pull request authors: 26
  • Average comments per issue: 2.38
  • Average comments per pull request: 3.12
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 19
  • Pull requests: 23
  • Average time to close issues: 2 days
  • Average time to close pull requests: 2 days
  • Issue authors: 16
  • Pull request authors: 5
  • Average comments per issue: 0.26
  • Average comments per pull request: 3.52
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • Ben-Epstein (8)
  • NickCrews (8)
  • ashsharma96 (6)
  • mfouesneau (4)
  • schwingkopf (3)
  • honno (3)
  • myloe00 (2)
  • DougRzz (2)
  • grafail (2)
  • ddelange (2)
  • iisakkirotko (2)
  • khanfarhan10 (2)
  • meta-ks (2)
  • Piyush23Rai (2)
  • vignesh-bungee (2)
Pull Request Authors
  • maartenbreddels (40)
  • JovanVeljanoski (34)
  • ddelange (15)
  • xdssio (11)
  • Ben-Epstein (7)
  • EwoutH (4)
  • 2maz (3)
  • NickCrews (3)
  • Shashank1202 (2)
  • mgorny (2)
  • ghost (1)
  • And0k (1)
  • AlenkaF (1)
  • detayotella (1)
  • jaegglic (1)
Top Labels
Issue Labels
bug (3) feature-request (3) needed: more information (3) priority: low (2) performance (1) duplicate (1) good first issue (1) priority: medium (1)
Pull Request Labels
priority: high (10) priority: medium (8) bug (3) new-feature (3) enhancement (3) priority: low (1) major-addition (1) dependencies (1) github_actions (1)

Packages

  • Total packages: 8
  • Total downloads:
    • pypi 43,616 last-month
  • Total docker downloads: 13,450
  • Total dependent packages: 40
    (may contain duplicates)
  • Total dependent repositories: 150
    (may contain duplicates)
  • Total versions: 187
  • Total maintainers: 1
pypi.org: vaex

Out-of-Core DataFrames to visualize and explore big tabular datasets

  • Versions: 58
  • Dependent Packages: 24
  • Dependent Repositories: 90
  • Downloads: 23,357 Last month
  • Docker Downloads: 2,117
Rankings
Stargazers count: 0.3%
Dependent packages count: 0.6%
Downloads: 1.3%
Average: 1.4%
Dependent repos count: 1.6%
Forks count: 2.1%
Docker downloads count: 2.3%
Maintainers (1)
Last synced: 5 months ago
pypi.org: vaex-ml

Machine learning support for vaex

  • Versions: 34
  • Dependent Packages: 2
  • Dependent Repositories: 47
  • Downloads: 20,117 Last month
  • Docker Downloads: 11,212
Rankings
Stargazers count: 0.3%
Downloads: 1.3%
Average: 1.6%
Docker downloads count: 1.8%
Forks count: 2.1%
Dependent repos count: 2.1%
Dependent packages count: 2.2%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vaex-graphql

GraphQL support for accessing vaex DataFrame

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 9
  • Downloads: 123 Last month
  • Docker Downloads: 121
Rankings
Stargazers count: 0.3%
Forks count: 2.1%
Docker downloads count: 2.6%
Dependent repos count: 4.8%
Average: 6.2%
Dependent packages count: 10.1%
Downloads: 17.0%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/vaexio/vaex
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
conda-forge.org: vaex-core
  • Versions: 54
  • Dependent Packages: 10
  • Dependent Repositories: 1
Rankings
Stargazers count: 3.5%
Dependent packages count: 5.9%
Forks count: 6.7%
Average: 10.1%
Dependent repos count: 24.4%
Last synced: 6 months ago
conda-forge.org: vaex-viz
  • Versions: 16
  • Dependent Packages: 3
  • Dependent Repositories: 1
Rankings
Stargazers count: 3.5%
Forks count: 6.7%
Average: 12.6%
Dependent packages count: 15.6%
Dependent repos count: 24.4%
Last synced: 6 months ago
pypi.org: vaex-contrib

Community contributed modules to vaex

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 19 Last month
Rankings
Stargazers count: 0.3%
Forks count: 2.1%
Dependent packages count: 10.1%
Average: 15.4%
Dependent repos count: 21.5%
Downloads: 42.8%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: vaex-ml

Wrappers for various machine learning libraries to make them integrate into vaex.

  • Versions: 16
  • Dependent Packages: 1
  • Dependent Repositories: 1
Rankings
Stargazers count: 3.5%
Forks count: 6.7%
Average: 15.9%
Dependent repos count: 24.4%
Dependent packages count: 29.0%
Last synced: 6 months ago

Dependencies

.github/workflows/pythonpackage.yml actions
  • ./ci/actions/windll * composite
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/upload-artifact v2 composite
  • google-github-actions/setup-gcloud v0 composite
  • ifaxity/wait-on-action v1 composite
  • mamba-org/provision-with-micromamba main composite
  • maxim-lobanov/setup-xcode v1 composite
.github/workflows/wheel-universal.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v1 composite
.github/workflows/wheel.yml actions
  • ./ci/actions/windll * composite
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
  • actions/upload-artifact v1 composite
ci/actions/windll/action.yml actions
binder/requirements.txt pypi
  • healpy *
  • scipy *
  • vaex ==3.0.0
misc/experiments/setup.py pypi