https://github.com/OpenDCAI/DataFlow

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

https://github.com/OpenDCAI/DataFlow

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.9%) to scientific vocabulary

Keywords

data data-agent data-cleaning data-pipelines data-processing data-science data-synthesis gradio-interface llms operators quick-data-processing sglang-bankend vllm-backend
Last synced: 5 months ago · JSON representation

Repository

Easy Data Preparation with latest LLMs-based Operators and Pipelines.

Basic Info
Statistics
  • Stars: 1,178
  • Watchers: 15
  • Forks: 77
  • Open Issues: 13
  • Releases: 7
Topics
data data-agent data-cleaning data-pipelines data-processing data-science data-synthesis gradio-interface llms operators quick-data-processing sglang-bankend vllm-backend
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License

README-dev.md

DataFlow-Preview-开发文档

你可以先创建一个纯净的python==3.10的运行环境。

然后克隆本仓库后本地安装: shell pip install -e .

安装后可以使用如下指令检验是否正确安装: shell dataflow -v dataflow env

测试reasoning Pipeline的方式

目前测试用入口文件在/test/testreasoning.py中 默认使用/dataflow/example/ReasoningPipeline/pipelinemath_short.json作为样例输入。

向系统export全局的key环境变量。 shell export API_KEY=<your key>

随后切换工作路径到/test,直接执行即可体验一个超短的pipeline shell python test_reasoning.py

Owner

  • Name: OpenDCAI
  • Login: OpenDCAI
  • Kind: organization
  • Email: PKU_DCML@hotmail.com

Define the future of Data-centric AI together

GitHub Events

Total
  • Create event: 16
  • Issues event: 42
  • Release event: 5
  • Watch event: 571
  • Delete event: 5
  • Member event: 1
  • Issue comment event: 71
  • Push event: 132
  • Pull request review comment event: 30
  • Pull request review event: 48
  • Pull request event: 214
  • Fork event: 49
Last Year
  • Create event: 16
  • Issues event: 42
  • Release event: 5
  • Watch event: 571
  • Delete event: 5
  • Member event: 1
  • Issue comment event: 71
  • Push event: 132
  • Pull request review comment event: 30
  • Pull request review event: 48
  • Pull request event: 214
  • Fork event: 49

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 117
  • Average time to close issues: 3 days
  • Average time to close pull requests: about 10 hours
  • Total issue authors: 22
  • Total pull request authors: 27
  • Average comments per issue: 0.66
  • Average comments per pull request: 0.2
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 29
  • Pull requests: 117
  • Average time to close issues: 3 days
  • Average time to close pull requests: about 10 hours
  • Issue authors: 22
  • Pull request authors: 27
  • Average comments per issue: 0.66
  • Average comments per pull request: 0.2
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • SunnyHaze (7)
  • ElsaReedz (2)
  • ninjaX2o (1)
  • foxFallingSkies (1)
  • linzx3501 (1)
  • tpoisonooo (1)
  • samure1995 (1)
  • jansonheal (1)
  • miharakator (1)
  • ixeraby (1)
  • ackevjajameda (1)
  • miaode74 (1)
  • acie-1 (1)
  • haolpku (1)
  • lcysyzxdxc (1)
Pull Request Authors
  • haolpku (12)
  • zzy1127 (12)
  • ZhaoyangHan04 (11)
  • SunnyHaze (10)
  • Qmeiyi (8)
  • MOLYHECI (7)
  • wongzhenhao (7)
  • HeRunming (7)
  • scuuy (6)
  • TechNomad-ds (6)
  • gty1829 (5)
  • YqjMartin (4)
  • DeepMindLiuZhou (4)
  • leaderwolfpipi (3)
  • Yalin-Feng (2)
Top Labels
Issue Labels
bug (9) enhancement (8) question (4)
Pull Request Labels
enhancement (1)

Dependencies

requirements.txt pypi
  • PyYAML ==6.0.2
  • av ==12.3.0
  • decord ==0.6.0
  • einops ==0.8.0
  • fasttext ==0.9.3
  • filelock ==3.15.4
  • fsspec ==2024.6.1
  • ftfy ==6.2.3
  • google-api-core ==2.19.1
  • google-api-python-client ==2.140.0
  • google-auth ==2.33.0
  • google-auth-httplib2 ==0.2.0
  • googleapis-common-protos ==1.63.2
  • jsonargparse ==4.32.0
  • kenlm ==0.2.0
  • langkit ==0.0.33
  • loguru ==0.7.2
  • matplotlib ==3.9.2
  • multiprocess ==0.70.16
  • nltk ==3.8
  • numpy ==1.26.4
  • openai =1.44.1
  • pandas ==2.2.2
  • prettytable ==3.11.0
  • pyspark ==3.5.2
  • regex ==2024.7.24
  • safetensors ==0.4.4
  • scikit-learn ==1.5.1
  • scikit-video ==1.1.11
  • scipy ==1.13.1
  • sentencepiece ==0.2.0
  • setuptools ==72.1.0
  • timm ==1.0.8
  • torch ==2.4.0
  • torchvision ==0.19.0
  • tqdm ==4.66.5
  • transformers ==4.44.2
  • vendi-score ==0.0.3
  • vllm ==0.6.0
  • wget ==3.2
.github/workflows/python-publish.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/test.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v3 composite
pyproject.toml pypi
requirements-kbc.txt pypi
  • chonkie *
  • fairy-doc *
  • trafilatura *
requirements-muxi.txt pypi
  • accelerate *
  • addict *
  • aisuite *
  • appdirs *
  • colorlog *
  • datasets *
  • datasketch *
  • math_verify *
  • modelscope *
  • numpy <2.0.0
  • pytest *
  • rapidfuzz *
  • scipy *
  • torch *
  • tqdm *
  • transformers *
  • word2number *
requirements-text.txt pypi
  • bert_score *
  • datasketch *
  • fasttext ==0.9.3
  • filelock ==3.15.4
  • gdown *
  • gensim *
  • google-api-core ==2.19.1
  • google-api-python-client ==2.140.0
  • google-auth ==2.33.0
  • google-auth-httplib2 ==0.2.0
  • googleapis-common-protos ==1.63.2
  • hlepor *
  • kenlm ==0.3.0
  • langkit ==0.0.33
  • loguru ==0.7.2
  • matplotlib ==3.9.2
  • multiprocess ==0.70.16
  • nltk *
  • nptyping *
  • openai =
  • pot *
  • presidio_analyzer *
  • presidio_anonymizer *
  • prettytable ==3.11.0
  • pyspark ==3.5.2
  • sacrebleu *
  • sentencepiece ==0.2.0
  • simhash *
  • vendi-score ==0.0.3
  • wget ==3.2