great_expectations

Always know what to expect from your data.

https://github.com/great-expectations/great_expectations

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    11 of 433 committers (2.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests

Keywords from Contributors

closember agents parsing meshing application fine-tuning hydrology standardization data-mining python39
Last synced: 6 months ago · JSON representation ·

Repository

Always know what to expect from your data.

Basic Info
Statistics
  • Stars: 10,690
  • Watchers: 86
  • Forks: 1,606
  • Open Issues: 91
  • Releases: 311
Topics
cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests
Created over 8 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation Codeowners

README.md

Python Versions PyPI PyPI Downloads Build Status pre-commit.ci Status codecov DOI Twitter Follow Slack Status Contributors Ruff

About GX Core

GX Core combines the collective wisdom of thousands of community members with a proven track record in data quality deployments worldwide, wrapped into a super-simple package for data teams.

Its powerful technical tools start with Expectations: expressive and extensible unit tests for your data. Expectations foster collaboration by giving teams a common language to express data quality tests in an intuitive way. You can automatically generate documentation for each set of validation results, making it easy for everyone to stay on the same page. This not only simplifies your data quality processes, but helps preserve your organization’s institutional knowledge about its data.

Learn more about how data teams are using GX Core in our featured case studies.

Integration support policy

GX Core supports Python 3.9 through 3.12. Experimental support for Python 3.13 and later can be enabled by setting a GX_PYTHON_EXPERIMENTAL environment variable when installing great_expectations.

For data sources and other integrations that GX supports, see the compatibility reference for additional information.

Get started

GX recommends deploying GX Core within a virtual environment. For more information about getting started with GX Core, see Introduction to GX Core.

  1. Run the following command in an empty base directory inside a Python virtual environment to install GX Core:

    bash title="Terminal input" pip install great_expectations

  2. Run the following command to import the great_expectations module and create a Data Context:

    ```python import great_expectations as gx

    context = gx.get_context() ```

Get support from GX and the community

They are listed in the order in which GX is prioritizing the support issues:

  1. Issues and PRs in the GX GitHub repository
  2. Questions posted to the GX Core Discourse forum
  3. Questions posted to the GX Slack community channel

Contribute

We deeply value the contributions of our community. We're now accepting PRs for bug fixes.

To ensure the long-term quality of the GX Core codebase, we're not yet ready to accept feature contributions to the parts of the codebase that don't have clear interfaces for extensions. We're actively working to increase the surface area for contributions. Thank you for being a crucial part of GX Core!

Levels of contribution readiness

🟢 Ready. Have a clear and public interface for extensions.

🟡 Partially ready. Case-by-case.

🔴 Not ready. Will accept contributions that fix existing bugs or workflows.

| GX Component | Readiness | Notes | | -------------------- | ------------------ | ----- | | CredentialStore | 🟢 Ready | | | BatchDefinition | 🟡 Partially ready | Formerly known as splitters | | Action | 🟢 Ready | | | DataSource | 🔴 Not ready | Includes MetricProvider and ExecutionEngine | | DataContext | 🔴 Not ready | Also known as Configuration Stores | | DataAsset | 🔴 Not ready | | | Expectation | 🔴 Not ready | | | ValidationDefinition | 🔴 Not ready | | | Checkpoint | 🔴 Not ready | | | CustomExpectations | 🔴 Not ready | | | Data Docs | 🔴 Not ready | Also known as Renderers |

Code of conduct

Everyone interacting in GX Core project codebases, Discourse forums, Slack channels, and email communications is expected to adhere to the GX Community Code of Conduct.

Owner

  • Name: Great Expectations Core
  • Login: great-expectations
  • Kind: organization
  • Email: info@greatexpectations.io
  • Location: United States of America

Revolutionizing the speed and integrity of data collaboration.

Citation (CITATION.cff)

abstract: Great Expectations is a shared, open standard for data quality. It helps
  data teams eliminate pipeline debt, through data testing, documentation, and profiling.
authors:
- family-names: Gong
  given-names: Abe
- family-names: Campbell
  given-names: James
- name: Great Expectations
  website: https://greatexpectations.io
  email: team@greatexpectations.io
cff-version: 1.2.0
identifiers:
- description: This is the collection of all archived snapshots of all versions of
    Great Expectations
  type: doi
  value: 10.5281/zenodo.5683574
keywords:
- data quality
- pipeline testing
- data testing
- pipeline debt
- data observability
- data monitoring
- data profiling
- data documentation
license: Apache-2.0
message: If you use this software, please cite it using these metadata.
repository-code: https://github.com/great-expectations/great_expectations
title: Great Expectations

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 11,411
  • Total Committers: 433
  • Avg Commits per committer: 26.353
  • Development Distribution Score (DDS): 0.832
Past Year
  • Commits: 886
  • Committers: 45
  • Avg Commits per committer: 19.689
  • Development Distribution Score (DDS): 0.83
Top Committers
Name Email Commits
James Campbell j****l@g****m 1,915
Abe a****g@g****m 1,230
Chetan Kini c****n@s****m 990
Robert Moses Lim r****m@g****m 767
Alex Sherstinsky a****y 661
Eugene Mandel e****e@s****m 511
Anthony Burdi a****y@g****o 482
Aylr A****r 379
Gabriel g****g@g****m 361
Nathan Farmer N****r 308
William Shin w****l@s****m 293
Bill Dirks b****l@g****o 266
Tyler Hoffman t****n@g****m 260
Rachel-Reverie 9****e 216
Rob Gray 1****k 160
kenwade4 9****4 134
Joshua Stauffer 6****r 130
ccnobbli c****i@n****u 96
William Shin w****l@g****o 94
Austin Ziech Robinson 4****r 81
Don Heppner d****r@g****m 74
ayirplm p****a@s****m 73
talagluck t****l@s****m 70
dependabot[bot] 4****] 66
anhollis a****s@n****u 59
Derek Martin 4****3 59
Christian Selig c****g@u****u 58
T Pham 2****m 57
Kristen Lavavej 3****j 51
Péter Szécsi s****4@s****u 50
and 403 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 507
  • Total pull requests: 4,369
  • Average time to close issues: 7 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 372
  • Total pull request authors: 203
  • Average comments per issue: 2.54
  • Average comments per pull request: 2.04
  • Merged pull requests: 3,220
  • Bot issues: 0
  • Bot pull requests: 84
Past Year
  • Issues: 159
  • Pull requests: 1,466
  • Average time to close issues: 15 days
  • Average time to close pull requests: 7 days
  • Issue authors: 112
  • Pull request authors: 64
  • Average comments per issue: 1.1
  • Average comments per pull request: 2.0
  • Merged pull requests: 1,068
  • Bot issues: 0
  • Bot pull requests: 43
Top Authors
Issue Authors
  • kujaska (9)
  • data-han (7)
  • jmcorreia (7)
  • MarcelBeining (7)
  • victorgrcp (7)
  • Erua-chijioke (6)
  • Chr96er (6)
  • franciskuttivelil (5)
  • jschra (5)
  • leodrivera (5)
  • itaise (5)
  • satniks (4)
  • gerileka (4)
  • VolkovGeoPhy (4)
  • tyler-hoffman (4)
Pull Request Authors
  • cdkini (625)
  • tyler-hoffman (599)
  • Kilo59 (330)
  • NathanFarmer (296)
  • joshua-stauffer (295)
  • billdirks (292)
  • kwcanuck (184)
  • Rachel-Reverie (174)
  • klavavej (162)
  • Shinnnyshinshin (124)
  • anthonyburdi (110)
  • JessSaavedra (100)
  • deborahniesz (86)
  • dependabot[bot] (64)
  • TrangPham (54)
Top Labels
Issue Labels
community (75) devrel (61) bug (27) feature (24) triage (16) DevRel Triage (15) documentation (13) request-for-help (9) help wanted (8) feature-request (7) fluent-datasources (6) stale (6) dependencies (6) feature:optimization (5) core-engineering-queue (5) core-team-priority (4) stack:mssql (4) stack:snowflake (3) feature:new_backend (3) query-asset (3) stack:databricks (3) workaround_exists (3) stack:windows (2) dx (2) feature:integration (2) fix-in-prog (2) bounty-board (2) blocker bug (2) azure (2) in investigation (1)
Pull Request Labels
core (520) community (258) dx (197) devrel (132) dependencies (42) javascript (27) core-team (19) documentation (14) stack:snowflake (12) bug (11) platform (9) databricks-sql (8) cloud (7) fluent-datasources (6) triage (5) in-progress (4) maintenance (3) stack:mysql (2) performance (2) python (2) feature:slack_notification (1) question (1) investigation (1) DevRel Triage (1) community-supported (1) zep (1) help wanted (1) feature (1) hackathon-2022 (1) docs (1)

Packages

  • Total packages: 7
  • Total downloads:
    • pypi 22,659,103 last-month
  • Total docker downloads: 6,750,002
  • Total dependent packages: 62
    (may contain duplicates)
  • Total dependent repositories: 286
    (may contain duplicates)
  • Total versions: 1,033
  • Total maintainers: 11
pypi.org: great-expectations

Always know what to expect from your data.

  • Versions: 328
  • Dependent Packages: 58
  • Dependent Repositories: 284
  • Downloads: 22,133,714 Last month
  • Docker Downloads: 6,750,002
Rankings
Downloads: 0.1%
Dependent packages count: 0.3%
Stargazers count: 0.6%
Average: 0.7%
Docker downloads count: 0.7%
Dependent repos count: 0.9%
Forks count: 1.5%
Last synced: 6 months ago
pypi.org: great-expectations-experimental

Always know what to expect from your data.

  • Versions: 530
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 481,539 Last month
Rankings
Stargazers count: 0.3%
Downloads: 0.8%
Forks count: 1.1%
Average: 6.3%
Dependent packages count: 7.3%
Dependent repos count: 22.1%
Last synced: 6 months ago
proxy.golang.org: github.com/great-expectations/great_expectations
  • Versions: 26
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.5%
Average: 6.7%
Dependent repos count: 7.0%
Last synced: 6 months ago
conda-forge.org: great-expectations

Great Expectations helps teams save time and promote analytic integrity by offering a unique approach to automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.

  • Versions: 144
  • Dependent Packages: 2
  • Dependent Repositories: 1
Rankings
Stargazers count: 3.4%
Forks count: 4.2%
Average: 12.8%
Dependent packages count: 19.6%
Dependent repos count: 24.1%
Last synced: 6 months ago
pypi.org: great-expectations-cta

Always know what to expect from your data.

  • Versions: 2
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 29 Last month
Rankings
Stargazers count: 0.3%
Forks count: 1.2%
Dependent packages count: 2.9%
Average: 12.9%
Downloads: 29.3%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 6 months ago
pypi.org: acryl-great-expectations

Always know what to expect from your data.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 43,821 Last month
Rankings
Stargazers count: 0.7%
Forks count: 1.5%
Dependent packages count: 9.2%
Average: 15.8%
Dependent repos count: 51.9%
Maintainers (2)
Last synced: 6 months ago
anaconda.org: great-expectations

Great Expectations helps teams save time and promote analytic integrity by offering a unique approach to automated testing: pipeline tests. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. Software developers have long known that automated testing is essential for managing complex codebases. Great Expectations brings the same discipline, confidence, and acceleration to data science and engineering teams.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 45.9%
Average: 48.2%
Dependent repos count: 50.4%
Last synced: 6 months ago