https://github.com/rapidsai/cudf

cuDF - GPU DataFrame Library

https://github.com/rapidsai/cudf

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    7 of 296 committers (2.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids

Keywords from Contributors

nvidia parallel-computing parallel-algorithm gpu-programming parallel-programming cuda-kernels nvidia-gpu modern-cpp tensor cuda-programming
Last synced: 6 months ago · JSON representation

Repository

cuDF - GPU DataFrame Library

Basic Info
Statistics
  • Stars: 9,157
  • Watchers: 159
  • Forks: 967
  • Open Issues: 1,112
  • Releases: 72
Topics
arrow cpp cuda cudf dask data-analysis data-science dataframe gpu pandas pydata python rapids
Created almost 9 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Codeowners

README.md

 cuDF - GPU DataFrames

📢 cuDF can now be used as a no-code-change accelerator for pandas! To learn more, see here!

cuDF (pronounced "KOO-dee-eff") is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF leverages libcudf, a blazing-fast C++/CUDA dataframe library and the Apache Arrow columnar format to provide a GPU-accelerated pandas API.

You can import cudf directly and use it like pandas:

```python import cudf

tipsdf = cudf.readcsv("https://github.com/plotly/datasets/raw/master/tips.csv") tipsdf["tippercentage"] = tipsdf["tip"] / tipsdf["total_bill"] * 100

display average tip by dining party size

print(tipsdf.groupby("size").tippercentage.mean()) ```

Or, you can use cuDF as a no-code-change accelerator for pandas, using cudf.pandas. cudf.pandas supports 100% of the pandas API, utilizing cuDF for supported operations and falling back to pandas when needed:

```python %load_ext cudf.pandas # pandas operations now use the GPU!

import pandas as pd

tipsdf = pd.readcsv("https://github.com/plotly/datasets/raw/master/tips.csv") tipsdf["tippercentage"] = tipsdf["tip"] / tipsdf["total_bill"] * 100

display average tip by dining party size

print(tipsdf.groupby("size").tippercentage.mean()) ```

Resources

See the RAPIDS install page for the most up-to-date information and commands for installing cuDF and other RAPIDS packages.

Installation

CUDA/GPU requirements

  • CUDA 12.0+ with a compatible NVIDIA driver
  • Volta architecture or better (Compute Capability >=7.0)

Pip

cuDF can be installed via pip from the NVIDIA Python Package Index. Be sure to select the appropriate cuDF package depending on the major version of CUDA available in your environment:

```bash

CUDA 13

pip install cudf-cu13

CUDA 12

pip install cudf-cu12 ```

Conda

cuDF can be installed with conda (via miniforge) from the rapidsai channel:

```bash

CUDA 13

conda install -c rapidsai -c conda-forge cudf=25.10 cuda-version=13.0

CUDA 12

conda install -c rapidsai -c conda-forge cudf=25.10 cuda-version=12.9 ```

We also provide nightly Conda packages built from the HEAD of our latest development branch.

Note: cuDF is supported only on Linux, and with Python versions 3.10 and later.

See the RAPIDS installation guide for more OS and version info.

Build/Install from Source

See build instructions.

Contributing

Please see our guide for contributing to cuDF.

Owner

  • Name: RAPIDS
  • Login: rapidsai
  • Kind: organization

Open GPU Data Science

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 29,723
  • Total Committers: 296
  • Avg Commits per committer: 100.416
  • Development Distribution Score (DDS): 0.923
Past Year
  • Commits: 1,816
  • Committers: 82
  • Avg Commits per committer: 22.146
  • Development Distribution Score (DDS): 0.82
Top Committers
Name Email Commits
galipremsagar s****5@g****m 2,296
David Wendt d****t@n****m 2,171
Ashwin Srinath s****a 1,866
Jake Hemstad j****d@n****m 1,747
Ram (Ramakrishna Prabhu) r****p@n****m 1,004
Vukasin Milovanovic v****c@n****m 973
brandon-b-miller b****r@n****m 916
Keith Kraus k****s@n****m 875
Mark Harris m****s@n****m 860
Devavret Makkar d****r@n****m 843
Karthikeyan Natarajan k****n 828
Olivier Lapicque o****e@n****m 599
Vyas Ramasubramani v****r@n****m 595
ptaylor p****r@m****m 584
Conor Hoekstra c****t@o****m 572
Matthew Roeschke 1****e 560
rjzamora r****7@g****m 555
Siu Kwan Lam m****k@g****m 468
Bradley Dice b****e@b****m 433
Christopher Harris c****s@n****m 407
H. Thomson Comer t****m@g****m 398
Trevor Smith t****7@g****m 376
Nick Becker n****0@g****m 344
Jeremy Dyer j****4@g****m 340
Jaime Ieong j****g@n****m 328
Dave Baranec d****c@n****m 326
Mike Wendt m****t@m****m 315
Robert (Bobby) Evans b****y@a****g 302
Andrei Schaffer a****r@n****m 274
Tadahito Kobayashi t****i@n****m 271
and 266 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1,731
  • Total pull requests: 6,264
  • Average time to close issues: 9 months
  • Average time to close pull requests: 19 days
  • Total issue authors: 325
  • Total pull request authors: 141
  • Average comments per issue: 2.2
  • Average comments per pull request: 2.54
  • Merged pull requests: 4,277
  • Bot issues: 1
  • Bot pull requests: 431
Past Year
  • Issues: 680
  • Pull requests: 3,627
  • Average time to close issues: 15 days
  • Average time to close pull requests: 7 days
  • Issue authors: 124
  • Pull request authors: 89
  • Average comments per issue: 0.96
  • Average comments per pull request: 2.42
  • Merged pull requests: 2,482
  • Bot issues: 1
  • Bot pull requests: 333
Top Authors
Issue Authors
  • vyasr (147)
  • galipremsagar (99)
  • wence- (95)
  • GregoryKimball (89)
  • Matt711 (74)
  • mroeschke (58)
  • ttnghia (57)
  • beckernick (47)
  • revans2 (44)
  • brandon-b-miller (43)
  • MarcoGorelli (41)
  • rjzamora (38)
  • bdice (34)
  • shwina (31)
  • abellina (29)
Pull Request Authors
  • mroeschke (926)
  • davidwendt (566)
  • Matt711 (456)
  • rapids-bot[bot] (429)
  • vyasr (337)
  • galipremsagar (330)
  • bdice (293)
  • wence- (241)
  • vuule (241)
  • rjzamora (204)
  • PointKernel (159)
  • mhaseeb123 (147)
  • brandon-b-miller (138)
  • shrshi (124)
  • ttnghia (98)
Top Labels
Issue Labels
feature request (771) bug (710) libcudf (391) Python (220) cuIO (197) cuDF (Python) (167) 0 - Backlog (150) question (86) Performance (80) ? - Needs Triage (78) cudf.polars (72) cudf.pandas (70) helps: Spark (60) improvement (60) Spark (56) doc (53) good first issue (41) pylibcudf (31) non-breaking (29) 2 - In Progress (29) CMake (28) dask (26) tests (20) proposal (17) 1 - On Deck (16) Needs Triage (15) strings (15) cuDF (Java) (15) Java (12) 0 - Waiting on Author (12)
Pull Request Labels
non-breaking (4,924) improvement (3,043) libcudf (2,241) Python (2,111) bug (1,210) 3 - Ready for Review (1,026) CMake (1,014) feature request (816) cuDF (Python) (713) pylibcudf (550) cudf.polars (540) 5 - Ready to Merge (536) cuIO (323) breaking (292) cudf.pandas (291) 2 - In Progress (227) Java (223) strings (174) doc (166) ci (162) Performance (148) conda (126) cuDF (Java) (109) Spark (107) cudf-polars (79) 5 - DO NOT MERGE (73) DO NOT MERGE (65) tests (61) dask (61) helps: Spark (56)

Packages

  • Total packages: 12
  • Total downloads:
    • pypi 76,720 last-month
  • Total docker downloads: 41,669
  • Total dependent packages: 47
    (may contain duplicates)
  • Total dependent repositories: 38
    (may contain duplicates)
  • Total versions: 231
  • Total maintainers: 2
repo1.maven.org: ai.rapids:cudf

This project provides java bindings for cudf, to be able to process large amounts of data on a GPU. This is still a work in progress so some APIs may change until the 1.0 release.

  • Versions: 44
  • Dependent Packages: 17
  • Dependent Repositories: 32
  • Docker Downloads: 41,381
Rankings
Stargazers count: 3.6%
Dependent packages count: 3.9%
Dependent repos count: 4.1%
Average: 4.8%
Docker downloads count: 6.2%
Forks count: 6.3%
Last synced: 6 months ago
pypi.org: cudf-cu11

cuDF - GPU Dataframe

  • Versions: 31
  • Dependent Packages: 11
  • Dependent Repositories: 2
  • Downloads: 3,046 Last month
  • Docker Downloads: 144
Rankings
Dependent packages count: 2.4%
Downloads: 3.0%
Docker downloads count: 4.3%
Average: 5.3%
Dependent repos count: 11.5%
Maintainers (2)
Last synced: 6 months ago
pypi.org: dask-cudf-cu11

Utilities for Dask and cuDF interactions

  • Versions: 29
  • Dependent Packages: 4
  • Dependent Repositories: 2
  • Downloads: 1,193 Last month
  • Docker Downloads: 144
Rankings
Dependent packages count: 3.2%
Docker downloads count: 4.3%
Average: 7.6%
Downloads: 11.3%
Dependent repos count: 11.5%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/rapidsai/cudf
  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: cudf-cu12

cuDF - GPU Dataframe

  • Versions: 27
  • Dependent Packages: 12
  • Dependent Repositories: 1
  • Downloads: 20,129 Last month
Rankings
Dependent packages count: 3.2%
Downloads: 3.8%
Average: 9.5%
Dependent repos count: 21.5%
Maintainers (1)
Last synced: 6 months ago
pypi.org: dask-cudf-cu12

Utilities for Dask and cuDF interactions

  • Versions: 25
  • Dependent Packages: 3
  • Dependent Repositories: 1
  • Downloads: 14,937 Last month
Rankings
Dependent packages count: 10.1%
Downloads: 13.1%
Average: 14.9%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago
pypi.org: pylibcudf-cu12

pylibcudf - Python bindings for libcudf

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 18,814 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.0%
Dependent repos count: 57.8%
Maintainers (2)
Last synced: 6 months ago
pypi.org: libcudf-cu11

cuDF - GPU Dataframe (C++)

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,733 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.0%
Dependent repos count: 57.8%
Maintainers (2)
Last synced: 6 months ago
pypi.org: pylibcudf-cu11

pylibcudf - Python bindings for libcudf

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 1,589 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.0%
Dependent repos count: 57.8%
Maintainers (2)
Last synced: 6 months ago
pypi.org: libcudf-cu12

cuDF - GPU Dataframe (C++)

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 9,723 Last month
Rankings
Dependent packages count: 10.3%
Average: 34.0%
Dependent repos count: 57.8%
Maintainers (2)
Last synced: 7 months ago
pypi.org: cudf-polars-cu11

Executor for polars using cudf

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 52 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.4%
Dependent repos count: 60.0%
Maintainers (2)
Last synced: 6 months ago
pypi.org: cudf-polars-cu12

Executor for polars using cudf

  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 5,504 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.4%
Dependent repos count: 60.0%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/add_to_project.yml actions
  • actions/add-to-project v0.3.0 composite
.github/workflows/build.yaml actions
.github/workflows/jni-docker-build.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v3 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
  • docker/setup-qemu-action v2 composite
.github/workflows/labeler.yml actions
  • actions/labeler main composite
.github/workflows/new-issues-to-triage-projects.yml actions
  • docker://takanabe/github-actions-automate-projects v0.0.1 composite
.github/workflows/pr.yaml actions
.github/workflows/test.yaml actions
java/pom.xml maven
  • org.slf4j:slf4j-api 1.7.30 compile
  • org.apache.arrow:arrow-vector 0.15.1 test
  • org.apache.hadoop:hadoop-common 3.2.4 test
  • org.apache.parquet:parquet-avro 1.10.0 test
  • org.junit.jupiter:junit-jupiter-api 5.4.2 test
  • org.junit.jupiter:junit-jupiter-params 5.4.2 test
  • org.mockito:mockito-core 2.25.0 test
  • org.slf4j:slf4j-simple 1.7.30 test
pyproject.toml pypi
python/cudf/pyproject.toml pypi
  • cachetools *
  • cubinlinker *
  • cuda-python >=11.7.1,<12.0a0
  • cupy-cuda11x >=12.0.0
  • fsspec >=0.6.0
  • numba >=0.57
  • numpy >=1.21
  • nvtx >=0.2.1
  • packaging *
  • pandas >=1.3,<1.6.0dev0
  • protobuf >=4.21,<5
  • ptxcompiler *
  • pyarrow ==12.*
  • rmm ==23.10.*
  • typing_extensions >=4.0.0
python/cudf/setup.py pypi
python/cudf_kafka/pyproject.toml pypi
  • cudf ==23.10.*
python/cudf_kafka/setup.py pypi
python/custreamz/pyproject.toml pypi
  • confluent-kafka >=1.9.0,<1.10.0a0
  • cudf ==23.10.*
  • cudf_kafka ==23.10.*
  • streamz *
python/custreamz/setup.py pypi
python/dask_cudf/pyproject.toml pypi
  • cudf ==23.10.*
  • cupy-cuda11x >=12.0.0
  • dask >=2023.7.1
  • distributed >=2023.7.1
  • fsspec >=0.6.0
  • numpy >=1.21
  • pandas >=1.3,<1.6.0dev0
python/dask_cudf/setup.py pypi