airflow-provider-vineyard

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

https://github.com/v6d-io/v6d

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: acm.org
  • Committers with academic emails
    3 of 43 committers (7.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.8%) to scientific vocabulary

Keywords

big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage

Keywords from Contributors

graph-computation graph-data graph-neural-networks gremlin
Last synced: 6 months ago · JSON representation ·

Repository

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)

Basic Info
  • Host: GitHub
  • Owner: v6d-io
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Homepage: https://v6d.io
  • Size: 19.4 MB
Statistics
  • Stars: 921
  • Watchers: 27
  • Forks: 127
  • Open Issues: 116
  • Releases: 145
Topics
big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage
Created over 5 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation Security Governance

README.rst

.. raw:: html

    

vineyard

an in-memory immutable data manager

|Vineyard CI| |Coverage| |Docs| |FAQ| |Discussion| |Slack| |License| |CII Best Practices| |FOSSA| |PyPI| |crates.io| |Docker HUB| |Artifact HUB| |ACM DL| Vineyard (v6d) is an innovative in-memory immutable data manager that offers **out-of-the-box high-level** abstractions and **zero-copy in-memory** sharing for distributed data in various big data tasks, such as graph analytics (e.g., `GraphScope`_), numerical computing (e.g., `Mars`_), and machine learning. .. image:: https://v6d.io/_static/cncf-color.svg :width: 400 :alt: Vineyard is a CNCF sandbox project Vineyard is a `CNCF sandbox project`_ and indeed made successful by its community. Table of Contents ----------------- * `Overview <#what-is-vineyard>`_ * `Features of vineyard <#features>`_ * `Efficient sharing for in-memory immutable data <#in-memory-immutable-data-sharing>`_ * `Out-of-the-box high level data structures <#out-of-the-box-high-level-data-abstraction>`_ * `Pipelining using stream <#stream-pipelining>`_ * `I/O Drivers <#drivers>`_ * `Getting started with Vineyard <#try-vineyard>`_ * `Deploying on Kubernetes <#deploying-on-kubernetes>`_ * `Frequently asked questions <#faq>`_ * `Getting involved in our community <#getting-involved>`_ * `Third-party dependencies <#acknowledgements>`_ What is vineyard ---------------- Vineyard is specifically designed to facilitate zero-copy data sharing among big data systems. To illustrate this, let's consider a typical machine learning task of `time series prediction with LSTM`_. This task can be broken down into several steps: - First, we read the data from the file system as a ``pandas.DataFrame``. - Next, we apply various preprocessing tasks, such as eliminating null values, to the dataframe. - Once the data is preprocessed, we define the model and train it on the processed dataframe using PyTorch. - Finally, we evaluate the performance of the model. In a single-machine environment, pandas and PyTorch, despite being two distinct systems designed for different tasks, can efficiently share data with minimal overhead. This is achieved through an end-to-end process within a single Python script. .. image:: https://v6d.io/_static/vineyard_compare.png :alt: Comparing the workflow with and without vineyard What if the input data is too large to be processed on a single machine? As depicted on the left side of the figure, a common approach is to store the data as tables in a distributed file system (e.g., HDFS) and replace ``pandas`` with ETL processes using SQL over a big data system such as Hive and Spark. To share the data with PyTorch, the intermediate results are typically saved back as tables on HDFS. However, this can introduce challenges for developers. 1. For the same task, users must program for multiple systems (SQL & Python). 2. Data can be polymorphic. Non-relational data, such as tensors, dataframes, and graphs/networks (in `GraphScope`_) are becoming increasingly common. Tables and SQL may not be the most efficient way to store, exchange, or process them. Transforming the data from/to "tables" between different systems can result in significant overhead. 3. Saving/loading the data to/from external storage incurs substantial memory-copies and IO costs. Vineyard addresses these issues by providing: 1. **In-memory** distributed data sharing in a **zero-copy** fashion to avoid introducing additional I/O costs by leveraging a shared memory manager derived from plasma. 2. Built-in **out-of-the-box high-level** abstractions to share distributed data with complex structures (e.g., distributed graphs) with minimal extra development cost, while eliminating transformation costs. As depicted on the right side of the above figure, we demonstrate how to integrate vineyard to address the task in a big data context. First, we utilize `Mars`_ (a tensor-based unified framework for large-scale data computation that scales Numpy, Pandas, and Scikit-learn) to preprocess the raw data, similar to the single-machine solution, and store the preprocessed dataframe in vineyard. +-------------+-----------------------------------------------------------------------------+ | | .. code-block:: python | | single | | | | data_csv = pd.read_csv('./data.csv', usecols=[1]) | +-------------+-----------------------------------------------------------------------------+ | | .. code-block:: python | | | | | | import mars.dataframe as md | | distributed | dataset = md.read_csv('hdfs://server/data_full', usecols=[1]) | | | # after preprocessing, save the dataset to vineyard | | | vineyard_distributed_tensor_id = dataset.to_vineyard() | +-------------+-----------------------------------------------------------------------------+ Then, we modify the training phase to get the preprocessed data from vineyard. Here vineyard makes the sharing of distributed data between `Mars`_ and PyTorch just like a local variable in the single machine solution. +-------------+-----------------------------------------------------------------------------+ | | .. code-block:: python | | single | | | | data_X, data_Y = create_dataset(dataset) | +-------------+-----------------------------------------------------------------------------+ | | .. code-block:: python | | | | | | client = vineyard.connect(vineyard_ipc_socket) | | distributed | dataset = client.get(vineyard_distributed_tensor_id).local_partition() | | | data_X, data_Y = create_dataset(dataset) | +-------------+-----------------------------------------------------------------------------+ Finally, we execute the training phase in a distributed manner across the cluster. From this example, it is evident that with vineyard, the task in the big data context can be addressed with only minor adjustments to the single-machine solution. Compared to existing approaches, vineyard effectively eliminates I/O and transformation overheads. Features -------- Efficient In-Memory Immutable Data Sharing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Vineyard serves as an in-memory immutable data manager, enabling efficient data sharing across different systems via shared memory without additional overheads. By eliminating serialization/deserialization and IO costs during data exchange between systems, Vineyard significantly improves performance. Out-of-the-Box High-Level Data Abstractions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Computation frameworks often have their own data abstractions for high-level concepts. For example, tensors can be represented as `torch.tensor`, `tf.Tensor`, `mxnet.ndarray`, etc. Moreover, every `graph processing engine `_ has its unique graph structure representation. The diversity of data abstractions complicates data sharing. Vineyard addresses this issue by providing out-of-the-box high-level data abstractions over in-memory blobs, using hierarchical metadata to describe objects. Various computation systems can leverage these built-in high-level data abstractions to exchange data with other systems in a computation pipeline concisely and efficiently. Stream Pipelining for Enhanced Performance ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A computation doesn't need to wait for all preceding results to arrive before starting its work. Vineyard provides a stream as a special kind of immutable data for pipelining scenarios. The preceding job can write immutable data chunk by chunk to Vineyard while maintaining data structure semantics. The successor job reads shared-memory chunks from Vineyard's stream without extra copy costs and triggers its work. This overlapping reduces the overall processing time and memory consumption. Versatile Drivers for Common Tasks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Many big data analytical tasks involve numerous boilerplate routines that are unrelated to the computation itself, such as various IO adapters, data partition strategies, and migration jobs. Since data structure abstractions usually differ between systems, these routines cannot be easily reused. Vineyard provides common manipulation routines for immutable data as drivers. In addition to sharing high-level data abstractions, Vineyard extends the capability of data structures with drivers, enabling out-of-the-box reusable routines for the boilerplate parts in computation jobs. Try Vineyard ------------ Vineyard is available as a `python package`_ and can be effortlessly installed using ``pip``: .. code:: shell pip3 install vineyard For comprehensive and up-to-date documentation, please visit https://v6d.io. If you wish to build vineyard from source, please consult the `Installation`_ guide. For instructions on building and running unittests locally, refer to the `Contributing`_ section. After installation, you can initiate a vineyard instance using the following command: .. code:: shell python3 -m vineyard For further details on connecting to a locally deployed vineyard instance, please explore the `Getting Started`_ guide. Deploying on Kubernetes ----------------------- Vineyard is designed to efficiently share immutable data between different workloads, making it a natural fit for cloud-native computing. By embracing cloud-native big data processing and Kubernetes, Vineyard enables efficient distributed data sharing in cloud-native environments while leveraging the scaling and scheduling capabilities of Kubernetes. To effectively manage all components of Vineyard within a Kubernetes cluster, we have developed the Vineyard Operator. For more information, please refer to the `Vineyard Operator`_ documentation. FAQ --- Vineyard shares many similarities with other open-source projects, yet it also has distinct features. We often receive the following questions about Vineyard: * Q: Can clients access the data while the stream is being filled? Sharing one piece of data among multiple clients is a target scenario for Vineyard, as the data stored in Vineyard is *immutable*. Multiple clients can safely consume the same piece of data through memory sharing, without incurring extra costs or additional memory usage from copying data back and forth. * Q: How does Vineyard avoid serialization/deserialization between systems in different languages? Vineyard provides high-level data abstractions (e.g., ndarrays, dataframes) that can be naturally shared between different processes, eliminating the need for serialization and deserialization between systems in different languages. * . . . . . . For more detailed information, please refer to our `FAQ`_ page. Get Involved ------------ - Join the `CNCF Slack`_ and participate in the ``#vineyard`` channel for discussions and collaboration. - Familiarize yourself with our `contribution guide`_ to understand the process of contributing to vineyard. - If you encounter any bugs or issues, please report them by submitting a `GitHub issue`_ or engage in a conversation on `Github discussion`_. - We welcome and appreciate your contributions! Submit them using pull requests. Thank you in advance for your valuable contributions to vineyard! Publications ------------ - Wenyuan Yu, Tao He, Lei Wang, Ke Meng, Ye Cao, Diwen Zhu, Sanhong Li, Jingren Zhou. `Vineyard: Optimizing Data Sharing in Data-Intensive Analytics `_. ACM SIG Conference on Management of Data (SIGMOD), industry, 2023. |ACM DL|. If you use this software, please cite our paper using the following metadata: .. code:: bibtex @article{yu2023vineyard, author = {Yu, Wenyuan and He, Tao and Wang, Lei and Meng, Ke and Cao, Ye and Zhu, Diwen and Li, Sanhong and Zhou, Jingren}, title = {Vineyard: Optimizing Data Sharing in Data-Intensive Analytics}, year = {2023}, issue_date = {June 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {1}, number = {2}, url = {https://doi.org/10.1145/3589780}, doi = {10.1145/3589780}, journal = {Proc. ACM Manag. Data}, month = {jun}, articleno = {200}, numpages = {27}, keywords = {data sharing, in-memory object store} } Acknowledgements ---------------- We thank the following excellent open-source projects: - `apache-arrow `_, a cross-language development platform for in-memory analytics. - `boost-leaf `_, a C++ lightweight error augmentation framework. - `cityhash `_, CityHash, a family of hash functions for strings. - `dlmalloc `_, Doug Lea's memory allocator. - `etcd-cpp-apiv3 `_, a C++ API for etcd's v3 client API. - `flat_hash_map `_, an efficient hashmap implementation. - `gulrak/filesystem `_, an implementation of C++17 std::filesystem. - `libcuckoo `_, libcuckoo, a high-performance, concurrent hash table. - `mimalloc `_, a general purpose allocator with excellent performance characteristics. - `nlohmann/json `_, a json library for modern c++. - `pybind11 `_, a library for seamless operability between C++11 and Python. - `s3fs `_, a library provide a convenient Python filesystem interface for S3. - `skywalking-infra-e2e `_ A generation End-to-End Testing framework. - `skywalking-swck `_ A kubernetes operator for the Apache Skywalking. - `wyhash `_, C++ wrapper around wyhash and wyrand. - `rax `_, an ANSI C radix tree implementation. - `MurmurHash3 `_, a fast non-cryptographic hash function. License ------- **Vineyard** is distributed under `Apache License 2.0`_. Please note that third-party libraries may not have the same license as vineyard. |FOSSA Status| .. _Mars: https://github.com/mars-project/mars .. _GraphScope: https://github.com/alibaba/GraphScope .. _Installation: https://github.com/v6d-io/v6d/blob/main/docs/notes/developers/build-from-source.rst .. _Contributing: https://github.com/v6d-io/v6d/blob/main/CONTRIBUTING.rst .. _Getting Started: https://v6d.io/notes/getting-started.html .. _Vineyard Operator: https://v6d.io/notes/cloud-native/vineyard-operator.html .. _Apache License 2.0: https://github.com/v6d-io/v6d/blob/main/LICENSE .. _contribution guide: https://github.com/v6d-io/v6d/blob/main/CONTRIBUTING.rst .. _time series prediction with LSTM: https://github.com/L1aoXingyu/code-of-learn-deep-learning-with-pytorch/blob/master/chapter5_RNN/time-series/lstm-time-series.ipynb .. _python package: https://pypi.org/project/vineyard/ .. _CNCF Slack: https://slack.cncf.io/ .. _GitHub issue: https://github.com/v6d-io/v6d/issues/new .. _Github discussion: https://github.com/v6d-io/v6d/discussions/new .. _FAQ: https://v6d.io/notes/faq.html .. _CNCF sandbox project: https://www.cncf.io/sandbox-projects/ .. |Vineyard CI| image:: https://github.com/v6d-io/v6d/actions/workflows/build-test.yml/badge.svg :target: https://github.com/v6d-io/v6d/actions/workflows/build-test.yml .. |Coverage| image:: https://codecov.io/gh/v6d-io/v6d/branch/main/graph/badge.svg :target: https://codecov.io/gh/v6d-io/v6d .. |Docs| image:: https://img.shields.io/badge/docs-latest-brightgreen.svg :target: https://v6d.io .. |FAQ| image:: https://img.shields.io/badge/-FAQ-blue?logo=Read%20The%20Docs :target: https://v6d.io/notes/faq.html .. |Discussion| image:: https://img.shields.io/badge/Discuss-Ask%20Questions-blue?logo=GitHub :target: https://github.com/v6d-io/v6d/discussions .. |Slack| image:: https://img.shields.io/badge/Slack-Join%20%23vineyard-purple?logo=Slack :target: https://slack.cncf.io/ .. |PyPI| image:: https://img.shields.io/pypi/v/vineyard?color=blue :target: https://pypi.org/project/vineyard .. |crates.io| image:: https://img.shields.io/crates/v/vineyard.svg :target: https://crates.io/crates/vineyard .. |Docker HUB| image:: https://img.shields.io/badge/docker-ready-blue.svg :target: https://hub.docker.com/u/vineyardcloudnative .. |Artifact HUB| image:: https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/vineyard :target: https://artifacthub.io/packages/helm/vineyard/vineyard .. |CII Best Practices| image:: https://bestpractices.coreinfrastructure.org/projects/4902/badge :target: https://bestpractices.coreinfrastructure.org/projects/4902 .. |FOSSA| image:: https://app.fossa.com/api/projects/git%2Bgithub.com%2Fv6d-io%2Fv6d.svg?type=shield :target: https://app.fossa.com/projects/git%2Bgithub.com%2Fv6d-io%2Fv6d?ref=badge_shield .. |FOSSA Status| image:: https://app.fossa.com/api/projects/git%2Bgithub.com%2Fv6d-io%2Fv6d.svg?type=large :target: https://app.fossa.com/projects/git%2Bgithub.com%2Fv6d-io%2Fv6d?ref=badge_large .. |License| image:: https://img.shields.io/github/license/v6d-io/v6d :target: https://github.com/v6d-io/v6d/blob/main/LICENSE .. |ACM DL| image:: https://img.shields.io/badge/ACM%20DL-10.1145%2F3589780-blue :target: https://dl.acm.org/doi/10.1145/3589780

Owner

  • Name: v6d.io
  • Login: v6d-io
  • Kind: organization
  • Email: info@cncf.io

An in-memory immutable data manager.

Citation (CITATION.cff)

cff-version: 1.2.0
message: >-
  If you use this software, please cite our paper using the
  metadata from this file.
title: 'Vineyard: Optimizing Data Sharing in Data-Intensive Analytics'
authors:
  - given-names: Wenyuan
    family-names: Yu
    affiliation: Alibaba Group
  - given-names: Tao
    family-names: He
    affiliation: Alibaba Group
  - given-names: Lei
    family-names: Wang
    affiliation: Alibaba Group
  - given-names: Ke
    family-names: Meng
    affiliation: Alibaba Group
  - given-names: Ye
    family-names: Cao
    affiliation: Alibaba Group
  - given-names: Diwen
    family-names: Zhu
    affiliation: Alibaba Group
  - given-names: Sanhong
    family-names: Li
    affiliation: Alibaba Group
  - given-names: Jingren
    family-names: Zhou
    affiliation: Alibaba Group
license: Apache-2.0
identifiers:
  - type: doi
    value: 10.1145/3589780
repository-code: 'https://github.com/v6d-io/v6d'
url: 'https://v6d.io'
abstract: >-
  Modern data analytics and AI jobs become increasingly complex and involve
  multiple tasks performed on specialized systems. Sharing of intermediate
  data between different systems is often a significant bottleneck in such
  jobs. When the intermediate data is large, it is mostly exchanged through
  files in standard formats (e.g., CSV and ORC), causing high I/O and
  (de)serialization overheads. To solve these problems, we develop Vineyard,
  a high-performance, extensible, and cloud-native object store, trying to
  provide an intuitive experience for users to share data across systems in
  complex real-life workflows. Since different systems usually work on data
  structures (e.g., dataframes, graphs, hashmaps) with similar interfaces,
  and their computation logic is often loosely-coupled with how such interfaces
  are implemented over specific memory layouts, it enables Vineyard to conduct
  data sharing efficiently at a high level via memory mapping and method sharing.
  Vineyard provides an IDL named VCDL to facilitate users to register their
  own intermediate data types into Vineyard such that objects of the registered
  types can then be efficiently shared across systems in a polyglot workflow.
  As a cloud-native system, Vineyard is designed to work closely with Kubernetes,
  as well as achieve fault-tolerance and high performance in production
  environments. Evaluations on real-life datasets and data analytics jobs show
  that the above optimizations of Vineyard can significantly improve the end-to-end
  performance of data analytics jobs, by reducing their data-sharing time up
  to 68.4x.
preferred-citation:
  type: article
  title: 'Vineyard: Optimizing Data Sharing in Data-Intensive Analytics'
  authors:
  - given-names: Wenyuan
    family-names: Yu
    affiliation: Alibaba Group
  - given-names: Tao
    family-names: He
    affiliation: Alibaba Group
  - given-names: Lei
    family-names: Wang
    affiliation: Alibaba Group
  - given-names: Ke
    family-names: Meng
    affiliation: Alibaba Group
  - given-names: Ye
    family-names: Cao
    affiliation: Alibaba Group
  - given-names: Diwen
    family-names: Zhu
    affiliation: Alibaba Group
  - given-names: Sanhong
    family-names: Li
    affiliation: Alibaba Group
  - given-names: Jingren
    family-names: Zhou
    affiliation: Alibaba Group
  year: 2023
  journal: "Proc. ACM Manag. Data"
  doi: 10.1145/3589780
  month: 06
  volume: 1
  number: 2
  publisher:
    name: Association for Computing Machinery
  keywords:
  - data sharing
    in-memory object store

GitHub Events

Total
  • Create event: 8
  • Release event: 2
  • Issues event: 29
  • Watch event: 83
  • Delete event: 4
  • Issue comment event: 172
  • Push event: 18
  • Pull request review event: 5
  • Pull request review comment event: 2
  • Pull request event: 26
  • Fork event: 7
Last Year
  • Create event: 8
  • Release event: 2
  • Issues event: 29
  • Watch event: 83
  • Delete event: 4
  • Issue comment event: 172
  • Push event: 18
  • Pull request review event: 5
  • Pull request review comment event: 2
  • Pull request event: 26
  • Fork event: 7

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 994
  • Total Committers: 43
  • Avg Commits per committer: 23.116
  • Development Distribution Score (DDS): 0.48
Top Committers
Name Email Commits
Tao He l****t@a****m 517
Tao He s****w@g****m 207
Ye Cao c****o@a****m 60
Siyuan Zhang s****y@a****m 30
Siyuan Zhang s****2@g****m 27
Diwen Zhu d****w@a****m 25
Rohan Gupta r****u@g****m 13
Liang Geng g****l@a****m 12
Ke Meng s****k@g****m 12
luoxiaojian l****1@a****m 9
Chaitravi Chalke 6****e@u****m 9
Weibin Zeng q****b@a****m 9
dependabot[bot] 4****]@u****m 6
Weibin Zeng w****n@g****m 6
Sijie 5****p@u****m 6
Zhang Lei z****9@s****n 4
DongZe Li 9****6@q****m 4
Pei Li 7****i@u****m 4
Yitao Wang 4****W@u****m 3
Jiang Shanshan u****e@o****m 3
linlih 3****h@u****m 3
Liang Geng p****g@g****m 2
ShiHao 1****x@u****m 2
liusitan s****g@o****m 2
Diwen Zhu d****u@g****m 1
HouliangQi n****n@1****m 1
Rayan y****n@1****m 1
Jingbo Xu x****7@g****m 1
Wenyuan Yu 1****0@q****m 1
Sutou Kouhei k****u@c****g 1
and 13 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 300
  • Total pull requests: 428
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 7 days
  • Total issue authors: 38
  • Total pull request authors: 20
  • Average comments per issue: 1.54
  • Average comments per pull request: 0.94
  • Merged pull requests: 383
  • Bot issues: 0
  • Bot pull requests: 25
Past Year
  • Issues: 33
  • Pull requests: 30
  • Average time to close issues: 4 days
  • Average time to close pull requests: 8 days
  • Issue authors: 17
  • Pull request authors: 6
  • Average comments per issue: 3.91
  • Average comments per pull request: 0.63
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 9
Top Authors
Issue Authors
  • dashanji (90)
  • sighingnow (81)
  • vegetableysm (50)
  • qiranq99 (8)
  • fduxzbin (5)
  • Daniel-blue (4)
  • hsh258 (4)
  • acezen (3)
  • wuyueandrew (3)
  • wangsq (2)
  • 1076881851 (2)
  • cho-m (2)
  • ysbqiaqia (2)
  • raulcd (2)
  • chenrui333 (2)
Pull Request Authors
  • dashanji (247)
  • sighingnow (136)
  • vegetableysm (97)
  • dependabot[bot] (29)
  • siyuan0322 (14)
  • acezen (9)
  • zhuyi1159 (6)
  • songqing (4)
  • chenrui333 (3)
  • SighingSnow (3)
  • septicmk (3)
  • wuyueandrew (2)
  • pmokeev (2)
  • BSWANG (2)
  • Rajdeep1311 (2)
Top Labels
Issue Labels
enhancement (92) bug (71) kubernetes (47) llm (41) stale (35) component:vineyardd (22) component:python (19) documentation (18) component:graph (14) component:client (14) good first issue (13) contrib (12) dev-infra (12) performance (10) RDMA (9) component:io (9) proposal (9) upstream (8) priority:high (8) component:hive (7) component:go (3) community (3) question (3) blog (2) component:java (2) priority:medium (2) component:rust (1) needs-more-info (1) gpu (1) dependencies (1)
Pull Request Labels
dependencies (29) go (17) stale (15) bug (10) RDMA (10) java (9) component:vineyardd (4) llm (3) github_actions (2) documentation (1) kubernetes (1) contrib (1)

Packages

  • Total packages: 17
  • Total downloads:
    • pypi 16,427 last-month
    • cargo 18,567 total
  • Total dependent packages: 19
    (may contain duplicates)
  • Total dependent repositories: 37
    (may contain duplicates)
  • Total versions: 927
  • Total maintainers: 3
pypi.org: vineyard

An in-memory immutable data manager

  • Versions: 128
  • Dependent Packages: 12
  • Dependent Repositories: 17
  • Downloads: 7,295 Last month
  • Docker Downloads: 0
Rankings
Dependent packages count: 0.8%
Stargazers count: 2.3%
Average: 2.9%
Downloads: 3.0%
Dependent repos count: 3.5%
Docker downloads count: 3.6%
Forks count: 4.4%
Maintainers (2)
Last synced: 6 months ago
pypi.org: vineyard-io

IO drivers for vineyard

  • Versions: 112
  • Dependent Packages: 3
  • Dependent Repositories: 13
  • Downloads: 846 Last month
  • Docker Downloads: 0
Rankings
Stargazers count: 2.3%
Dependent packages count: 2.4%
Docker downloads count: 3.6%
Average: 3.6%
Dependent repos count: 4.0%
Forks count: 4.4%
Downloads: 5.0%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/v6d-io/v6d/go/vineyard
  • Versions: 18
  • Dependent Packages: 1
  • Dependent Repositories: 1
Rankings
Stargazers count: 2.3%
Forks count: 2.6%
Average: 3.9%
Dependent repos count: 4.7%
Dependent packages count: 5.8%
Last synced: 6 months ago
proxy.golang.org: github.com/v6d-io/v6d/k8s
  • Versions: 113
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 1.5%
Forks count: 1.7%
Average: 4.9%
Dependent packages count: 7.0%
Dependent repos count: 9.3%
Last synced: 6 months ago
proxy.golang.org: github.com/v6d-io/v6d
  • Versions: 144
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: vineyard-kedro

Vineyard provider for kedro

  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 61 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.4%
Average: 9.2%
Dependent packages count: 10.0%
Dependent repos count: 11.6%
Downloads: 17.9%
Maintainers (1)
Last synced: 6 months ago
pypi.org: airflow-provider-vineyard

Vineyard provider for apache-airflow

  • Versions: 97
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 541 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.4%
Downloads: 9.2%
Average: 9.5%
Dependent packages count: 10.0%
Dependent repos count: 21.8%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-bdist

An in-memory immutable data manager

  • Versions: 43
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 4,403 Last month
  • Docker Downloads: 0
Rankings
Stargazers count: 2.3%
Forks count: 4.7%
Downloads: 5.3%
Dependent packages count: 6.6%
Average: 9.9%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-dask

Vineyard integration with Dask

  • Versions: 96
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 184 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.4%
Dependent packages count: 10.0%
Average: 10.2%
Downloads: 12.7%
Dependent repos count: 21.8%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-ml

Vineyard integration with machine learning frameworks

  • Versions: 98
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 2,770 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.4%
Dependent packages count: 10.0%
Average: 10.4%
Downloads: 13.5%
Dependent repos count: 21.8%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-migrate

Object migration drivers for vineyard

  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 82 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.4%
Dependent packages count: 10.1%
Average: 14.2%
Dependent repos count: 21.5%
Downloads: 32.5%
Maintainers (2)
Last synced: 6 months ago
pypi.org: vineyard-ray

Vineyard integration with Ray

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 6 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.7%
Dependent packages count: 6.6%
Average: 18.8%
Dependent repos count: 30.6%
Downloads: 49.9%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-pyspark

Vineyard integration with PySpark

  • Versions: 20
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 93 Last month
Rankings
Stargazers count: 2.3%
Forks count: 4.5%
Dependent packages count: 7.4%
Average: 20.9%
Dependent repos count: 69.2%
Maintainers (1)
Last synced: 6 months ago
crates.io: vineyard-datafusion

Vineyard Rust SDK: arrow datafusion integration for DataFrame

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,411 Total
Rankings
Stargazers count: 6.6%
Forks count: 6.8%
Dependent repos count: 30.3%
Dependent packages count: 31.6%
Average: 34.7%
Downloads: 98.1%
Maintainers (1)
Last synced: 6 months ago
crates.io: vineyard-polars

Vineyard Rust SDK: polars integration for DataFrame

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 6,690 Total
Rankings
Stargazers count: 6.6%
Forks count: 6.8%
Dependent repos count: 30.4%
Dependent packages count: 31.8%
Average: 34.7%
Downloads: 98.2%
Maintainers (1)
Last synced: 6 months ago
pypi.org: vineyard-llm

Vineyard llm kv cache

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 146 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.9%
Dependent repos count: 61.0%
Maintainers (1)
Last synced: 6 months ago
crates.io: vineyard

Vineyard Rust SDK: core library

  • Versions: 8
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 9,466 Total
Rankings
Dependent repos count: 28.9%
Dependent packages count: 34.0%
Average: 54.1%
Downloads: 99.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Cargo.toml cargo
  • serde 1.0
  • serde_derive 1.0
  • serde_json 1.0
rust/vineyard/Cargo.toml cargo
  • arrow 5.0
  • dyn-clone 1.0
  • lazy_static 1.4.0
  • rand 0.8.0
  • serde 1.0
  • serde_derive 1.0
  • serde_json 1.0
go/vineyard/go.mod go
  • github.com/apache/arrow/go/arrow v0.0.0-20210806232545-fe0861f127cf
  • github.com/google/go-cmp v0.5.6
  • github.com/pkg/errors v0.8.1
  • golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1
  • gotest.tools/v3 v3.0.3
go/vineyard/go.sum go
  • 146 dependencies
k8s/go.mod go
  • github.com/go-logr/logr v0.4.0
  • github.com/googleapis/gnostic=>github.com/googleapis/gnostic v0.4.2
  • github.com/onsi/ginkgo v1.16.4
  • github.com/onsi/gomega v1.15.0
  • k8s.io/api v0.20.2
  • k8s.io/api=>k8s.io/api v0.19.11
  • k8s.io/apiextensions-apiserver=>k8s.io/apiextensions-apiserver v0.19.11
  • k8s.io/apimachinery v0.20.2
  • k8s.io/apimachinery=>k8s.io/apimachinery v0.19.11
  • k8s.io/apiserver=>k8s.io/apiserver v0.19.11
  • k8s.io/cli-runtime=>k8s.io/cli-runtime v0.19.11
  • k8s.io/client-go v0.20.2
  • k8s.io/client-go=>k8s.io/client-go v0.19.11
  • k8s.io/cloud-provider=>k8s.io/cloud-provider v0.19.11
  • k8s.io/cluster-bootstrap=>k8s.io/cluster-bootstrap v0.19.11
  • k8s.io/code-generator v0.19.11
  • k8s.io/code-generator=>k8s.io/code-generator v0.19.11
  • k8s.io/component-base=>k8s.io/component-base v0.19.11
  • k8s.io/cri-api=>k8s.io/cri-api v0.19.11
  • k8s.io/csi-translation-lib=>k8s.io/csi-translation-lib v0.19.11
  • k8s.io/klog/v2 v2.2.0
  • k8s.io/kube-aggregator=>k8s.io/kube-aggregator v0.19.11
  • k8s.io/kube-controller-manager=>k8s.io/kube-controller-manager v0.19.11
  • k8s.io/kube-proxy=>k8s.io/kube-proxy v0.19.11
  • k8s.io/kube-scheduler v0.19.11
  • k8s.io/kube-scheduler=>k8s.io/kube-scheduler v0.19.11
  • k8s.io/kubectl=>k8s.io/kubectl v0.19.11
  • k8s.io/kubelet=>k8s.io/kubelet v0.19.11
  • k8s.io/kubernetes v0.19.11
  • k8s.io/kubernetes=>k8s.io/kubernetes v1.19.11
  • k8s.io/legacy-cloud-providers=>k8s.io/legacy-cloud-providers v0.19.11
  • k8s.io/metrics=>k8s.io/metrics v0.19.11
  • k8s.io/sample-apiserver=>k8s.io/sample-apiserver v0.19.11
  • sigs.k8s.io/controller-runtime v0.8.3
k8s/go.sum go
  • 797 dependencies
java/core/pom.xml maven
  • org.projectlombok:lombok provided
  • ch.qos.logback:logback-classic
  • ch.qos.logback:logback-core
  • com.fasterxml.jackson.core:jackson-annotations
  • com.fasterxml.jackson.core:jackson-core
  • com.fasterxml.jackson.core:jackson-databind
  • com.github.jnr:jnr-posix
  • com.github.jnr:jnr-unixsocket
  • com.google.guava:guava
  • org.scijava:native-lib-loader
  • org.slf4j:slf4j-api
  • junit:junit test
java/modules/basic/pom.xml maven
  • org.projectlombok:lombok provided
  • ch.qos.logback:logback-classic
  • ch.qos.logback:logback-core
  • com.google.guava:guava
  • io.v6d.core:vineyard-core
  • org.apache.arrow:arrow-memory 5.0.0
  • org.apache.arrow:arrow-memory-core 5.0.0
  • org.apache.arrow:arrow-memory-netty 5.0.0
  • org.apache.arrow:arrow-memory-unsafe 5.0.0
  • org.apache.arrow:arrow-vector 5.0.0
  • org.apache.commons:commons-lang3
  • org.slf4j:slf4j-api
  • junit:junit test
java/modules/graph/pom.xml maven
  • junit:junit test
java/modules/pom.xml maven
  • io.v6d.core:vineyard-core 0.1-SNAPSHOT
java/pom.xml maven
  • org.projectlombok:lombok 1.18.20 provided
  • ch.qos.logback:logback-classic 1.2.9
  • ch.qos.logback:logback-core 1.2.9
  • com.fasterxml.jackson.core:jackson-annotations 2.12.6.1
  • com.fasterxml.jackson.core:jackson-core 2.12.6.1
  • com.fasterxml.jackson.core:jackson-databind 2.12.6.1
  • com.github.jnr:jnr-posix 3.1.7
  • com.github.jnr:jnr-unixsocket 0.38.8
  • com.google.guava:guava 31.0.1-jre
  • org.apache.commons:commons-lang3 3.12.0
  • org.scijava:native-lib-loader 2.3.5
  • org.slf4j:slf4j-api 1.7.32
  • junit:junit 4.13.1 test
packages-java/pom.xml maven
  • org.projectlombok:lombok 1.18.22 provided
  • com.alibaba.fastffi:annotation-processor 0.1
  • com.alibaba.fastffi:binding-generator 0.1
  • com.alibaba.fastffi:ffi 0.1
  • com.alibaba.fastffi:llvm4jni-runtime 0.1
  • com.google.guava:guava 31.0.1-jre
  • junit:junit 4.13.2 test
  • org.apache.arrow:arrow-memory-core 6.0.0 test
  • org.apache.arrow:arrow-memory-netty 6.0.0 test
  • org.apache.arrow:arrow-vector 6.0.0 test
setup.py pypi
  • argcomplete *
  • etcd-distro *
  • numpy >=0.18.5
  • pandas <1.0.0
  • pandas <1.2.0
  • pandas >=1.0.0
  • pickle5 *
  • psutil *
  • pyarrow *
  • setuptools *
  • shared-memory38 *
  • sortedcontainers *
  • treelib *
.github/workflows/build-archlinux-latest.yml actions
  • actions/checkout v3 composite
  • mxschmitt/action-tmate v2 composite
.github/workflows/build-centos-latest.yaml actions
  • actions/checkout v3 composite
  • mxschmitt/action-tmate v2 composite
.github/workflows/build-compatibility.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • mxschmitt/action-tmate v2 composite
.github/workflows/build-test.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • codecov/codecov-action v3 composite
  • mxschmitt/action-tmate v2 composite
  • sighingnow/action-tmate master composite
  • svenstaro/upload-release-action v2 composite
.github/workflows/build-vineyardd-and-wheels-linux.yaml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/upload-artifact v3 composite
  • svenstaro/upload-release-action v2 composite
.github/workflows/build-vineyardd-and-wheels-macos.yaml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • svenstaro/upload-release-action v2 composite
.github/workflows/codeball.yml actions
  • sturdy-dev/codeball-action main composite
.github/workflows/docs.yaml actions
  • actions-cool/maintain-one-comment v3 composite
  • actions/checkout v3 composite
  • netlify/actions/cli master composite
.github/workflows/release-latest.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • marvinpinto/action-automatic-releases latest composite
  • svenstaro/upload-release-action v2 composite
.github/workflows/rust-ci.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite
.github/workflows/vineyard-operator.yaml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-go v3 composite
  • docker/login-action v2 composite
k8s/Dockerfile docker
  • gcr.io/distroless/static nonroot build
  • golang 1.18 build
k8s/test/e2e/Dockerfile docker
  • ghcr.io/v6d-io/v6d/vineyard-python-dev latest build
python/vineyard/contrib/airflow/docker/Dockerfile docker
  • apache/airflow 2.3.2-python3.9 build
java/spark/pom.xml maven
  • org.apache.spark:spark-core_2.12 3.2.2 provided
  • org.apache.spark:spark-sql_2.12 3.2.2 provided
  • io.v6d.core:vineyard-core 0.1-SNAPSHOT
  • io.v6d.modules:vineyard-basic 0.1-SNAPSHOT
  • org.apache.arrow:arrow-memory
  • org.apache.arrow:arrow-memory-core
  • org.apache.arrow:arrow-memory-netty
  • org.apache.arrow:arrow-memory-unsafe
  • org.apache.arrow:arrow-vector
  • org.scala-lang:scala-library 2.12.15
  • org.scala-lang:scala-reflect 2.12.15
  • org.scalatestplus:scalatestplus-junit_2.12 1.0.0-M2
  • junit:junit test
  • org.scalatest:scalatest_2.12 3.2.14 test
docker/dev/build_scripts/requirements.txt pypi
  • argcomplete * development
  • black * development
  • breathe * development
  • docutils ==0.16 development
  • etcd-distro * development
  • flake8 * development
  • furo * development
  • isort * development
  • jinja2 >=3.0.0 development
  • libclang * development
  • nbsphinx * development
  • numpy >=1.18.5 development
  • pandas <1.0.0 development
  • pandas <1.2.0 development
  • pandas >=1.0.0 development
  • parsec * development
  • pickle5 * development
  • psutil * development
  • pygments >=2.4.1 development
  • pytest * development
  • pytest-benchmark * development
  • pytest-datafiles * development
  • setuptools * development
  • shared-memory38 * development
  • sortedcontainers * development
  • sphinx >=3.0.2 development
  • sphinx-copybutton * development
  • sphinx-panels * development
  • sphinxemoji * development
  • sphinxext-opengraph * development
  • treelib * development
  • wheel * development
requirements-dev.txt pypi
  • black * development
  • breathe * development
  • docutils ==0.16 development
  • flake8 * development
  • furo * development
  • isort * development
  • jinja2 >=3.0.0 development
  • libclang * development
  • nbsphinx * development
  • pygments >=2.4.1 development
  • pytest * development
  • pytest-benchmark * development
  • pytest-datafiles * development
  • sphinx >=3.0.2,<6 development
  • sphinx-copybutton * development
  • sphinx-panels * development
  • sphinxemoji * development
  • sphinxext-opengraph * development
requirements-kubernetes.txt pypi
  • kubernetes *
requirements-setup.txt pypi
  • libclang *
  • parsec *
  • setuptools *
  • wheel *
requirements.txt pypi
  • argcomplete *
  • etcd-distro *
  • numpy >=1.18.5
  • pandas <1.0.0
  • pandas <1.2.0
  • pandas >=1.0.0
  • pickle5 *
  • psutil *
  • pyarrow *
  • setuptools *
  • shared-memory38 *
  • sortedcontainers *
  • treelib *