https://github.com/OpenDCAI/DataFlow
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.9%) to scientific vocabulary
Keywords
data
data-agent
data-cleaning
data-pipelines
data-processing
data-science
data-synthesis
gradio-interface
llms
operators
quick-data-processing
sglang-bankend
vllm-backend
Last synced: 5 months ago
·
JSON representation
Repository
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Basic Info
- Host: GitHub
- Owner: OpenDCAI
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://OpenDCAI.github.io/DataFlow-Doc/
- Size: 74.4 MB
Statistics
- Stars: 1,178
- Watchers: 15
- Forks: 77
- Open Issues: 13
- Releases: 7
Topics
data
data-agent
data-cleaning
data-pipelines
data-processing
data-science
data-synthesis
gradio-interface
llms
operators
quick-data-processing
sglang-bankend
vllm-backend
Created over 1 year ago
· Last pushed 6 months ago
Metadata Files
Readme
License
README-dev.md
DataFlow-Preview-开发文档
你可以先创建一个纯净的python==3.10的运行环境。
然后克隆本仓库后本地安装:
shell
pip install -e .
安装后可以使用如下指令检验是否正确安装:
shell
dataflow -v
dataflow env
测试reasoning Pipeline的方式
目前测试用入口文件在/test/testreasoning.py中 默认使用/dataflow/example/ReasoningPipeline/pipelinemath_short.json作为样例输入。
向系统export全局的key环境变量。
shell
export API_KEY=<your key>
随后切换工作路径到/test下,直接执行即可体验一个超短的pipeline
shell
python test_reasoning.py
Owner
- Name: OpenDCAI
- Login: OpenDCAI
- Kind: organization
- Email: PKU_DCML@hotmail.com
- Repositories: 1
- Profile: https://github.com/OpenDCAI
Define the future of Data-centric AI together
GitHub Events
Total
- Create event: 16
- Issues event: 42
- Release event: 5
- Watch event: 571
- Delete event: 5
- Member event: 1
- Issue comment event: 71
- Push event: 132
- Pull request review comment event: 30
- Pull request review event: 48
- Pull request event: 214
- Fork event: 49
Last Year
- Create event: 16
- Issues event: 42
- Release event: 5
- Watch event: 571
- Delete event: 5
- Member event: 1
- Issue comment event: 71
- Push event: 132
- Pull request review comment event: 30
- Pull request review event: 48
- Pull request event: 214
- Fork event: 49
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 29
- Total pull requests: 117
- Average time to close issues: 3 days
- Average time to close pull requests: about 10 hours
- Total issue authors: 22
- Total pull request authors: 27
- Average comments per issue: 0.66
- Average comments per pull request: 0.2
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 29
- Pull requests: 117
- Average time to close issues: 3 days
- Average time to close pull requests: about 10 hours
- Issue authors: 22
- Pull request authors: 27
- Average comments per issue: 0.66
- Average comments per pull request: 0.2
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- SunnyHaze (7)
- ElsaReedz (2)
- ninjaX2o (1)
- foxFallingSkies (1)
- linzx3501 (1)
- tpoisonooo (1)
- samure1995 (1)
- jansonheal (1)
- miharakator (1)
- ixeraby (1)
- ackevjajameda (1)
- miaode74 (1)
- acie-1 (1)
- haolpku (1)
- lcysyzxdxc (1)
Pull Request Authors
- haolpku (12)
- zzy1127 (12)
- ZhaoyangHan04 (11)
- SunnyHaze (10)
- Qmeiyi (8)
- MOLYHECI (7)
- wongzhenhao (7)
- HeRunming (7)
- scuuy (6)
- TechNomad-ds (6)
- gty1829 (5)
- YqjMartin (4)
- DeepMindLiuZhou (4)
- leaderwolfpipi (3)
- Yalin-Feng (2)
Top Labels
Issue Labels
bug (9)
enhancement (8)
question (4)
Pull Request Labels
enhancement (1)
Dependencies
requirements.txt
pypi
- PyYAML ==6.0.2
- av ==12.3.0
- decord ==0.6.0
- einops ==0.8.0
- fasttext ==0.9.3
- filelock ==3.15.4
- fsspec ==2024.6.1
- ftfy ==6.2.3
- google-api-core ==2.19.1
- google-api-python-client ==2.140.0
- google-auth ==2.33.0
- google-auth-httplib2 ==0.2.0
- googleapis-common-protos ==1.63.2
- jsonargparse ==4.32.0
- kenlm ==0.2.0
- langkit ==0.0.33
- loguru ==0.7.2
- matplotlib ==3.9.2
- multiprocess ==0.70.16
- nltk ==3.8
- numpy ==1.26.4
- openai =1.44.1
- pandas ==2.2.2
- prettytable ==3.11.0
- pyspark ==3.5.2
- regex ==2024.7.24
- safetensors ==0.4.4
- scikit-learn ==1.5.1
- scikit-video ==1.1.11
- scipy ==1.13.1
- sentencepiece ==0.2.0
- setuptools ==72.1.0
- timm ==1.0.8
- torch ==2.4.0
- torchvision ==0.19.0
- tqdm ==4.66.5
- transformers ==4.44.2
- vendi-score ==0.0.3
- vllm ==0.6.0
- wget ==3.2
.github/workflows/python-publish.yml
actions
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/test.yml
actions
- actions/checkout v4 composite
- actions/setup-python v3 composite
pyproject.toml
pypi
requirements-kbc.txt
pypi
- chonkie *
- fairy-doc *
- trafilatura *
requirements-muxi.txt
pypi
- accelerate *
- addict *
- aisuite *
- appdirs *
- colorlog *
- datasets *
- datasketch *
- math_verify *
- modelscope *
- numpy <2.0.0
- pytest *
- rapidfuzz *
- scipy *
- torch *
- tqdm *
- transformers *
- word2number *
requirements-text.txt
pypi
- bert_score *
- datasketch *
- fasttext ==0.9.3
- filelock ==3.15.4
- gdown *
- gensim *
- google-api-core ==2.19.1
- google-api-python-client ==2.140.0
- google-auth ==2.33.0
- google-auth-httplib2 ==0.2.0
- googleapis-common-protos ==1.63.2
- hlepor *
- kenlm ==0.3.0
- langkit ==0.0.33
- loguru ==0.7.2
- matplotlib ==3.9.2
- multiprocess ==0.70.16
- nltk *
- nptyping *
- openai =
- pot *
- presidio_analyzer *
- presidio_anonymizer *
- prettytable ==3.11.0
- pyspark ==3.5.2
- sacrebleu *
- sentencepiece ==0.2.0
- simhash *
- vendi-score ==0.0.3
- wget ==3.2