kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

https://github.com/kedro-org/kedro

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
4 of 254 committers (1.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Keywords

experiment-tracking hacktoberfest kedro machine-learning machine-learning-engineering mlops pipeline python

Keywords from Contributors

data-profilers datacleaner pipeline-testing agents observability model-management mlflow llmops llm-evaluation langchain

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: kedro-org
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://kedro.org
Size: 232 MB

Statistics

Stars: 10,505
Watchers: 104
Forks: 961
Open Issues: 162
Releases: 63

Topics

experiment-tracking hacktoberfest kedro machine-learning machine-learning-engineering mlops pipeline python

Created almost 7 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation Codeowners Security

README.md

Kedro

GitHub Actions Workflow Status - Main

What is Kedro?

Kedro is an open-source Python framework hosted by the LF AI & Data Foundation.

How do I install Kedro?

To install Kedro from the Python Package Index (PyPI) run:

uv pip install kedro

It is also possible to install Kedro using conda:

conda install -c conda-forge kedro

Our Get Started guide contains full installation instructions, and includes how to set up Python virtual environments.

Installation from source

To access the latest Kedro version before its official release, install it from the main branch. uv pip install git+https://github.com/kedro-org/kedro@main

What are the main features of Kedro?

| Feature | What is this? | | -------------------- |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Project Template | A standard, modifiable and easy-to-use project template based on Cookiecutter Data Science. | | Data Catalog | A series of lightweight data connectors used to save and load data across many different file formats and file systems, including local and network file systems, cloud object stores, and HDFS. The Data Catalog also includes data and model versioning for file-based systems. | | Pipeline Abstraction | Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using Kedro-Viz. | | Coding Standards | Test-driven development using pytest, produce well-documented code using Sphinx, create linted code with support for ruff and make use of the standard Python logging library. | | Flexible Deployment | Deployment strategies that include single or distributed-machine deployment as well as additional support for deploying on Argo, Prefect, Kubeflow, AWS Batch, and Databricks. |

How do I use Kedro?

The Kedro documentation first explains how to install Kedro and then introduces key Kedro concepts.

You can then review the spaceflights tutorial to build a Kedro project for hands-on experience.

For new and intermediate Kedro users, there's a comprehensive section on how to visualise Kedro projects using Kedro-Viz.

A pipeline visualisation generated using Kedro-Viz

Additional documentation explains how to work with Kedro and Jupyter notebooks, and there are a set of advanced user guides for advanced for key Kedro features. We also recommend the API reference documentation for further information.

Why does Kedro exist?

Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of raw unvetted data. We developed Kedro to achieve the following:

To address the main shortcomings of Jupyter notebooks, one-off scripts, and glue-code because there is a focus on creating maintainable data science code
To enhance team collaboration when different team members have varied exposure to software engineering concepts
To increase efficiency, because applied concepts like modularity and separation of concerns inspire the creation of reusable analytics code

Find out more about how Kedro can answer your use cases from the product FAQs on the Kedro website.

The humans behind Kedro

The Kedro product team and a number of open source contributors from across the world maintain Kedro.

Can I contribute?

Yes! We welcome all kinds of contributions. Check out our guide to contributing to Kedro.

Where can I learn more?

There is a growing community around Kedro. We encourage you to ask and answer technical questions on Slack and bookmark the Linen archive of past discussions.

We keep a list of technical FAQs in the Kedro documentation and you can find a growing list of blog posts, videos and projects that use Kedro over on the awesome-kedro GitHub repository. If you have created anything with Kedro we'd love to include it on the list. Just make a PR to add it!

How can I cite Kedro?

If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on our repository to generate a citation from the CITATION.cff file.

Python version support policy

The core Kedro Framework supports all Python versions that are actively maintained by the CPython core team. When a Python version reaches end of life, support for that version is dropped from Kedro. This is not considered a breaking change.
The Kedro Datasets package follows the NEP 29 Python version support policy. This means that kedro-datasets generally drops Python version support before kedro. This is because kedro-datasets has a lot of dependencies that follow NEP 29 and the more conservative version support approach of the Kedro Framework makes it hard to manage those dependencies properly.

☕️ Kedro Coffee Chat 🔶

We appreciate our community and want to stay connected. For that, we offer a public Coffee Chat format where we share updates and cool stuff around Kedro once every two weeks and give you time to ask your questions live.

Check out the upcoming demo topics and dates at the Kedro Coffee Chat wiki page.

Follow our Slack announcement channel to see Kedro Coffee Chat announcements and access demo recordings.

Owner

Name: Kedro
Login: kedro-org
Kind: organization
Email: info@lfaidata.foundation
Location: United States of America

Website: https://kedro.org/
Repositories: 14
Profile: https://github.com/kedro-org

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code. It is hosted in incubation in LF AI & Data.

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
- family-names: Alam
  given-names: Sajid
- family-names: Chan
  given-names: Nok Lam
- family-names: Couto
  given-names: Laura
- family-names: Dada
  given-names: Yetunde
  orcid: https://orcid.org/0000-0002-5273-954X
- family-names: Danov
  given-names: Ivan
- family-names: Datta
  given-names: Deepyaman
  orcid: https://orcid.org/0009-0006-7814-1446
- family-names: DeBold
  given-names: Tynan
- family-names: Gundaniya
  given-names: Jitendra
- family-names: Honoré-Rougé
  given-names: Yolan
- family-names: Kaiser
  given-names: Stephanie
- family-names: Kanchwala
  given-names: Rashida
- family-names: Katiyar
  given-names: Ankita
- family-names: Pilla
  given-names: Ravi Kumar
- family-names: Nguyen
  given-names: Huong
- family-names: Cano Rodríguez
  given-names: Juan Luis
  orcid: https://orcid.org/0000-0002-2187-161X
- family-names: Schwarzmann
  given-names: Joel
- family-names: Sorokin
  given-names: Dmitry
- family-names: Theisen
  given-names: Merel
- family-names: Zabłocki
  given-names: Marcin
- family-names: Brugman
  given-names: Simon
- family-names: Khaustova
  given-names: Elena
- family-names: Ko
  given-names: Elijah
title: Kedro
version: 1.0.0
date-released: 2025-07-22
url: https://github.com/kedro-org/kedro

Committers

Last synced: 9 months ago

All Time

Total Commits: 2,571
Total Committers: 254
Avg Commits per committer: 10.122
Development Distribution Score (DDS): 0.919

Past Year

Commits: 250
Committers: 46
Avg Commits per committer: 5.435
Development Distribution Score (DDS): 0.832

Top Committers

Name	Email	Commits
Lorena Bălan	l**n@q**m	208
Deepyaman Datta	d**a@u**u	158
Merel Theisen	4****t	151
Nok Lam Chan	n**n@q**m	149
Ankita Katiyar	1****r	108
Kiyohito Kunii (Kiyo)	8****o	105
Jo Stichbury	j**y@m**m	99
Lim Hoang	l**o@g**m	95
Andrii Ivaniuk	a**k@g**m	94
Antony Milne	4****B	90
Dmitrii Deriabin	4****B	83
Merel Theisen	4****B	83
Sajid Alam	9****B	83
dependabot[bot]	4****]	72
Yetunde Dada	4****a	64
Ahdra Merali	9****B	56
Zain Patel	z**l@q**m	51
ElenaKhaustova	1****a	49
Ivan Danov	i****v	41
Juan Luis Cano Rodríguez	j**o@m**m	41
Dmitry Sorokin	4****S	34
L. R. Couto	5****o	34
Zain Patel	5****B	32
Anton Kirilenko	a**o@q**m	32
Gordon Wrigley	g**y@q**m	27
LorenaBalanQB	4****B	23
CarolineMLynch	1****h	22
Jannic	3****r	22
Waylon Walker	w**n@w**m	20
Jiri Klein	4****n	19
and 224 more...

Committer Domains (Top 20 + Academic)

quantumblack.com: 15 mckinsey.com: 3 utexas.edu: 1 waylonwalker.com: 1 python.ie: 1 me.com: 1 curalate.com: 1 da-robotteknik.se: 1 stem.com: 1 yahoo.com.hk: 1 ymail.com: 1 bnmerchant.com: 1 wazoku.com: 1 yamx.net: 1 okra.ai: 1 pascalbrokmeier.de: 1 hotmail.com.au: 1 qq.com: 1 well.ox.ac.uk: 1 artefact.com: 1 epfl.ch: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1,356
Total pull requests: 1,584
Average time to close issues: 6 months
Average time to close pull requests: 12 days
Total issue authors: 203
Total pull request authors: 105
Average comments per issue: 3.5
Average comments per pull request: 1.84
Merged pull requests: 1,136
Bot issues: 205
Bot pull requests: 92

Past Year

Issues: 429
Pull requests: 655
Average time to close issues: 16 days
Average time to close pull requests: 7 days
Issue authors: 77
Pull request authors: 49
Average comments per issue: 1.27
Average comments per pull request: 1.34
Merged pull requests: 445
Bot issues: 125
Bot pull requests: 32

View more stats

Top Authors

Issue Authors

github-actions[bot] (205)
merelcht (190)
noklam (148)
astrojuanlu (117)
ElenaKhaustova (95)
stichbury (77)
ankatiyar (38)
AhdraMeraliQB (29)
deepyaman (24)
yetudada (20)
antonymilne (20)
amandakys (19)
DimedS (18)
yury-fedotov (15)
ravi-kumar-pilla (13)

Pull Request Authors

merelcht (228)
ElenaKhaustova (196)
noklam (162)
ankatiyar (155)
dependabot[bot] (91)
lrcouto (85)
DimedS (73)
deepyaman (60)
AhdraMeraliQB (55)
stichbury (52)
SajidAlamQB (51)
Huongg (42)
idanov (42)
astrojuanlu (36)
ravi-kumar-pilla (34)

Top Labels

Issue Labels

Issue: Feature Request (361) Component: Documentation 📄 (180) develop nightly build (91) main nightly build (89) Community (80) Issue: Bug Report 🐞 (59) Type: Parent Issue (54) Stage: Technical Design 🎨 (39) Component: IO (28) Component: CLI (25) Component: DevOps (20) Hacktoberfest (17) Bug Bash :bug: (14) Component: Configuration (14) good first issue (12) Stage: User Research :microscope: (12) Component: Testing (12) Component: Framework (12) Design: Research (11) Help Wanted :pray: (8) Component: Runners (7) Type: Technical DR 💾 (7) Type: User Research Synthesis :writing_hand: (5) support: needs more info (5) nightly build (5) Component: Jupyter/IPython (5) TD: implementation (4) Should we delete? (4) pinned (3) roadmap (3)

Pull Request Labels

dependencies (90) Component: Documentation 📄 (46) Community (23) Hacktoberfest (12) Bug Bash :bug: (6) Stage: Technical Design 🎨 (5) performance (5) automerge (4) TD: should we? (3) Issue: Feature Request (2) TD: implementation (1) develop nightly build (1)

Dependencies

.github/workflows/all-checks.yml actions

.github/workflows/docs-language-linter.yml actions

actions/checkout v3 composite
errata-ai/vale-action reviewdog composite

.github/workflows/docs-only-checks.yml actions

.github/workflows/e2e-tests.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v3 composite
microsoft/setup-msbuild v1 composite

.github/workflows/issues_metrics.yml actions

github/issue-metrics v2 composite
peter-evans/create-issue-from-file v4 composite

.github/workflows/lint.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v3 composite

.github/workflows/merge-gatekeeper.yml actions

upsidr/merge-gatekeeper v1 composite

.github/workflows/nightly-build.yml actions

jayqi/failed-build-issue-action v1 composite

.github/workflows/pip-compile.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v3 composite
microsoft/setup-msbuild v1 composite

.github/workflows/release-starters.yml actions

peter-evans/repository-dispatch v2 composite

.github/workflows/unit-tests.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v3 composite
microsoft/setup-msbuild v1 composite

tools/circleci/docker_build_img/Dockerfile docker

cimg/python 3.8 build

features/steps/test_starter/{{ cookiecutter.repo_name }}/pyproject.toml pypi

kedro/templates/project/{{ cookiecutter.repo_name }}/pyproject.toml pypi

pyproject.toml pypi

PyYAML >=4.2,<7.0
anyconfig >=0.10.0
attrs >=21.3
build >=0.7.0
cachetools >=4.1
click >=4.0
cookiecutter >=2.1.1,<3.0
dynaconf >=3.1.2,<4.0
fsspec >=2021.4
gitpython >=3.0
importlib-metadata >=3.6,<7.0; python_version >= '3.8'
importlib_metadata >=3.6,<5.0; python_version < '3.8'
importlib_resources >=1.3,<7.0
jmespath >=0.9.5
more_itertools >=8.14.0
omegaconf >=2.1.1
parse >=1.19.0
pip-tools >=6.5
pluggy >=1.0,<1.3
rich >=12.0,<14.0
rope >=0.21,<2.0
setuptools >=65.5.1
toml >=0.10.0
toposort >=1.5

tools/circleci/requirements.txt pypi

pip >=21.2
twine *

features/steps/test_plugin/pyproject.toml pypi

features/steps/test_starter/{{ cookiecutter.repo_name }}/requirements.txt pypi

black * test
ipython >=7.31.1,<8.0 test
ipython * test
jupyter * test
jupyterlab * test
jupyterlab_server >=2.11.1,<2.16.0 test
kedro * test
kedro-datasets * test
kedro-telemetry >=0.3.1 test
pytest * test
pytest-cov * test
pytest-mock >=1.7.1,<2.0 test

kedro/templates/project/{{ cookiecutter.repo_name }}/requirements.txt pypi

black *
ipython *
jupyter *
jupyterlab *
jupyterlab_server >=2.11.1,<2.16.0
kedro *
kedro-telemetry >=0.3.1
pytest *
pytest-cov *
pytest-mock >=1.7.1,<2.0
ruff *
traitlets <5.10.0