great-expectations__great_expectations
https://github.com/swe-gym-raw/great-expectations__great_expectations
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: SWE-Gym-Raw
- License: apache-2.0
- Language: Python
- Default Branch: 0.18.x
- Size: 218 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md

Great Expectations
Always know what to expect from your data.
What is GX?
Great Expectations (GX) helps data teams build a shared understanding of their data through quality testing, documentation, and profiling.
Data practitioners know that testing and documentation are essential for managing complex data pipelines. GX makes it possible for data science and engineering teams to quickly deploy extensible, flexible data quality testing into their data stacks. Its human-readable documentation makes the results accessible to technical and nontechnical users.
See Down with Pipeline Debt! for an introduction to our philosophy of pipeline data quality testing.
Key features
Seamless operation
GX fits into your existing tech stack, and can integrate with your CI/CD pipelines to add data quality exactly where you need it. Connect to and validate your data wherever it already is, so you can focus on honing your Expectation Suites to perfectly meet your data quality needs.
Start fast
Get useful results quickly even for large data volumes. GXs Data Assistants provide curated Expectations for different domains, so you can accelerate your data discovery to rapidly deploy data quality throughout your pipelines. Auto-generated Data Docs ensure your DQ documentation will always be up-to-date.

Unified understanding
Expectations are GXs workhorse abstraction: each Expectation declares an expected state of the data. The Expectation library provides a flexible, extensible vocabulary for data qualityone thats human-readable, meaningful for technical and nontechnical users alike. Bundled into Expectation Suites, Expectations are the ideal tool for characterizing exactly what you expect from your data.
expect_column_values_to_not_be_nullexpect_column_values_to_match_regexexpect_column_values_to_be_uniqueexpect_column_values_to_match_strftime_formatexpect_table_row_count_to_be_betweenexpect_column_median_to_be_between- ...and many more
Secure and transparent
GX doesnt ask you to exchange security for your insight. It processes your data in place, on your systems, so your security and governance procedures can maintain control at all times. And because GXs core is and always will be open source, its complete transparency is the opposite of a black box.
Data contracts support
Checkpoints are a transparent, central, and automatable mechanism for testing Expectations and evaluating your data quality. Every Checkpoint run produces human-readable Data Docs reporting the results. You can also configure Checkpoints to take Actions based on the results of the evaluation, like sending alerts and preventing low-quality data from moving further in your pipelines.

Readable for collaboration
Everyone stays on the same page about your data quality with GXs inspectable, shareable, and human-readable Data Docs. You can publish Data Docs to the locations where you need them in a variety of formats, making it easy to integrate Data Docs into your existing data catalogs, dashboards, and other reporting and data governance tools.

Quick start
To see Great Expectations in action on your own data:
You can install it using pip
pip install great_expectations
and then run
```python import great_expectations as gx
context = gx.get_context() ```
(We recommend deploying within a virtual environment. If youre not familiar with pip, virtual environments, notebooks, or git, you may want to check out the Supporting Resources, which will teach you how to get up and running in minutes.)
For full documentation, visit https://docs.greatexpectations.io/.
If you need help, hop into our Slack channel—there are always contributors and other users there.
Integrations
Great Expectations works with the tools and systems that you're already using with your data, including:
| Integration | Notes | |
|---|---|---|
|
|
DataHub | Data Catalog |
|
|
AWS Glue | Data Integration |
|
|
Athena | Data Source |
|
|
AWS Redshift | Data Source |
|
|
AWS S3 | Data Source |
|
|
BigQuery | Data Source |
|
|
Databricks | Data Source |
|
|
Deepnote | Collaborative data notebook |
|
|
Google Cloud Platform (GCP) | Data Source |
|
|
Microsoft Azure Blob Storage | Data Source |
|
|
Microsoft SQL Server | Data Source |
|
|
MySQL | Data Source |
|
|
Pandas | Data Source |
|
|
PostgreSQL | Data Source |
|
|
Snowflake | Data Source |
|
|
Spark | Data Source |
|
|
SQLite | Data Source |
|
|
Trino | Data Source |
|
|
Apache Airflow | Orchestrator |
|
|
Flyte | Orchestrator |
|
|
Meltano | Orchestrator |
|
|
Prefect | Orchestrator |
|
|
ZenML | Orchestrator |
|
|
Slack | Plugin |
|
|
Jupyter Notebooks | Utility |
What is GX not?
Great Expectations is not a pipeline execution framework. Instead, it integrates seamlessly with DAG execution tools like Spark, Airflow, dbt, prefect, dagster, Kedro, Flyte, etc. GX carries out your data quality pipeline testing while these tools execute the pipelines.
Great Expectations is not a database or storage software. It processes your data in place, on your existing systems. Expectations and Validation Results that GX produces are metadata about your data.
Great Expectations is not a data versioning tool. If you want to bring your data itself under version control, check out tools like DVC, Quilt, and lakeFS.
Great Expectations is not a language-agnostic platform. Instead, it follows the philosophy of take the compute to the data by using the popular Python language to support native execution of Expectations in pandas, SQL (via SQLAlchemy), and Spark environments.
Great Expectations is not exclusive to Python programming environments. It can be invoked from the command line without a Python environment. However, if youre working into another ecosystem, you may want to explore ecosystem-specific alternatives such as assertR (for R environments) or TFDV (for Tensorflow environments).
Who maintains Great Expectations?
Great Expectations OSS is under active development by GX Labs and the Great Expectations community.
What's the best way to get in touch with the Great Expectations team?
If you have questions, comments, or just want to have a good old-fashioned chat about data quality, please hop on our public Slack channel or post in our Discourse.
Can I contribute to the library?
Absolutely. Yes, please. See Contributing code, Contributing Expectations, Contributing packages, or Contribute to Great Expectations documentation, and please don't be shy with questions.
How do I stay up to date with Great Expectations?
You can get updates on everything GX with our email newsletter. Subscribe here!
Owner
- Name: SWE-Gym-Raw
- Login: SWE-Gym-Raw
- Kind: organization
- Email: jingmai@pku.edu.cn
- Repositories: 1
- Profile: https://github.com/SWE-Gym-Raw
GitHub Events
Total
- Push event: 1
- Create event: 305
Last Year
- Push event: 1
- Create event: 305
Dependencies
- yandex/clickhouse-server latest
- databricksruntime/python latest
- python 2.7 build
- linkchecker 1.0
- 258143015559.dkr.ecr.us-east-1.amazonaws.com/mercury/api latest
- 258143015559.dkr.ecr.us-east-1.amazonaws.com/mercury/provisioner 0.4.1
- postgres 13.7
- rabbitmq 3.10.20-management
- mcr.microsoft.com/mssql/server 2019-latest
- mysql 8.0.20
- postgres 15.1
- readthedocs/build 6.0
- bitnami/spark 3.3.2
- trinodb/trino latest
- python 3.9-slim build
- mcr.microsoft.com/mssql/server 2019-latest
- integration_test latest
- apache/airflow 2.6.2-python3.10 build
- @algolia/cache-browser-local-storage 4.13.0
- @algolia/cache-common 4.13.0
- @algolia/cache-in-memory 4.13.0
- @algolia/client-account 4.13.0
- @algolia/client-analytics 4.13.0
- @algolia/client-common 4.13.0
- @algolia/client-personalization 4.13.0
- @algolia/client-search 4.13.0
- @algolia/logger-common 4.13.0
- @algolia/logger-console 4.13.0
- @algolia/requester-browser-xhr 4.13.0
- @algolia/requester-common 4.13.0
- @algolia/requester-node-http 4.13.0
- @algolia/transporter 4.13.0
- algoliasearch 4.13.0
- dotenv 16.0.2
- node-fetch 2.6.7
- remove-markdown 0.5.0
- tr46 0.0.3
- webidl-conversions 3.0.1
- whatwg-url 5.0.0
- algoliasearch ^4.12.1
- dotenv ^16.0.2
- node-fetch ^2.6.7
- remove-markdown ^0.5.0
- @docusaurus/module-type-aliases 2.4.1 development
- @algolia/client-search 4.19.1
- @cmfcmf/docusaurus-search-local 0.11.0
- @docusaurus-terminology/parser 1.3.0
- @docusaurus-terminology/term 1.0.0
- @docusaurus/core 2.4.1
- @docusaurus/plugin-google-gtag 2.4.1
- @docusaurus/plugin-sitemap 2.4.1
- @docusaurus/preset-classic 2.4.1
- @docusaurus/theme-mermaid 2.4.1
- @docusaurus/theme-search-algolia 2.4.1
- @mdx-js/react 1.6.22
- clsx 1.2.1
- docusaurus-gtm-plugin 0.0.2
- docusaurus-plugin-sass 0.2.2
- plugin-image-zoom ataft/plugin-image-zoom
- react 16.14.0
- react-dom 16.14.0
- react-loadable 5.5.0
- react-router-dom 5.3.4
- react-select 4.3.1
- remark-code-import 0.3.0
- sass 1.56.1
- search-insights 2.2.3
- standard 17.1.0
- typescript 5.1.6
- webpack 5.88.2
- 1301 dependencies
- great_expectations *
- pyarrow *
- pyodbc >=4.0.30
- pytest *
- great_expectations *
- psycopg2-binary *
- pyarrow *
- pytest ==7.0.1
- sqlalchemy-redshift >=0.7.7
- dataprofiler *
- great_expectations *
- numpy *
- scikit-learn *
- tensorflow *
- Click >=7.1.2
- black ==23.10.1
- cookiecutter ==2.1.1
- mypy ==1.10.1
- pydantic >=1.0
- pytest >=5.3.5
- ruff ==0.5.7
- twine ==3.7.1
- wheel ==0.38.1
- aequitas *
- great_expectations *
- pandas *
- scikit_learn *
- setuptools *
- Shapely *
- geopandas *
- geopy *
- global_land_mask *
- great_expectations *
- pygeos *
- python_geohash *
- rtree *
- scikit_learn *
- scipy *
- setuptools *
- timezonefinder *
- uszipcode *
- arxiv *
- barcodenumber *
- blockcypher *
- coinaddrvalidator *
- cryptoaddress *
- cryptocompare *
- disposable_email_domains *
- geonamescache *
- great_expectations *
- gtin *
- holidays *
- ipwhois *
- isbnlib *
- langid *
- lxml *
- pgeocode *
- phonenumbers *
- price_parser *
- primefac *
- pwnedpasswords *
- py_moneyed *
- pycountry *
- pydnsbl *
- pyephem *
- python_dateutil *
- python_stdnum *
- pytz *
- pyvat *
- requests *
- schwifty *
- setuptools *
- simple_icd_10 *
- sympy *
- us *
- user_agents *
- yahoo_fin *
- zipcodes *
- great_expectations *
- pgeocode *
- setuptools *
- uszipcode *
- zipcodes *
- great_expectations *
- docstring-parser ==0.15 development
- myst-parser * development
- pydata-sphinx-theme ==0.11.0 development
- sphinx * development
- Babel ==2.9.1
- GitPython ==3.1.37
- Jinja2 ==2.11.3
- MarkupSafe ==1.1.1
- PyYAML ==5.4
- Pygments ==2.15.0
- Send2Trash ==1.8.0
- Sphinx ==2.4.4
- Unidecode ==1.1.1
- alabaster ==0.7.12
- altair ==4.1.0
- appdirs ==1.4.4
- appnope ==0.1.0
- argon2-cffi ==20.1.0
- astroid ==2.4.2
- attrs ==19.3.0
- autoapi ==2.0.1
- backcall ==0.2.0
- bleach ==3.3.0
- certifi ==2023.7.22
- cffi ==1.14.1
- chardet ==3.0.4
- click ==7.1.2
- decorator ==4.4.2
- defusedxml ==0.6.0
- docutils ==0.16
- entrypoints ==0.3
- gitdb ==4.0.5
- idna ==2.10
- imagesize ==1.2.0
- importlib-metadata ==1.7.0
- ipykernel ==5.3.4
- ipython ==8.12.0
- ipython-genutils ==0.2.0
- ipywidgets ==7.6.6
- jedi ==0.17.2
- jsonpatch ==1.26
- jsonpointer ==2.0
- jsonschema ==3.2.0
- jupyter-client ==6.1.6
- jupyter-core ==4.11.2
- lazy-object-proxy ==1.4.3
- mistune ==0.8.4
- nbconvert ==6.5.1
- nbformat ==5.8.0
- notebook ==6.5.4
- numpy ==1.22.0
- packaging ==20.4
- pandas ==1.0.5
- pandocfilters ==1.4.2
- parso ==0.7.1
- pathspec ==0.8.0
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prometheus-client ==0.8.0
- prompt-toolkit ==3.0.38
- ptyprocess ==0.6.0
- pycparser ==2.20
- pyparsing ==2.4.7
- pyrsistent ==0.16.0
- python-dateutil ==2.8.1
- pytz ==2020.1
- pyzmq ==19.0.2
- regex ==2020.7.14
- requests ==2.31.0
- ruamel.yaml ==0.16.10
- ruamel.yaml.clib ==0.2.0
- scipy ==1.10.0
- six ==1.15.0
- smmap ==3.0.4
- snowballstemmer ==2.0.0
- sphinx-autoapi ==1.4.0
- sphinx-gitstamp ==0.3.1
- sphinx-rtd-theme ==0.5.0
- sphinxcontrib-applehelp ==1.0.2
- sphinxcontrib-contentui ==0.2.5
- sphinxcontrib-devhelp ==1.0.2
- sphinxcontrib-htmlhelp ==1.0.3
- sphinxcontrib-jsmath ==1.0.1
- sphinxcontrib-qthelp ==1.0.3
- sphinxcontrib-serializinghtml ==1.1.4
- sybil ==1.4.0
- terminado ==0.8.3
- testpath ==0.4.4
- toml ==0.10.1
- toolz ==0.10.0
- tornado ==6.3.3
- traitlets ==5.9.0
- typed-ast ==1.4.1
- tzlocal ==2.1
- urllib3 ==1.26.18
- wcwidth ==0.2.5
- webencodings ==0.5.1
- widgetsnbextension ==3.5.1
- wrapt ==1.12.1
- zipp ==3.1.0
- arxiv * development
- barcodenumber * development
- blockcypher * development
- coinaddrvalidator * development
- cryptoaddress * development
- cryptocompare * development
- dataprofiler * development
- disposable_email_domains * development
- dnspython * development
- edtf_validate * development
- ephem * development
- geonamescache * development
- geopandas * development
- geopy * development
- global-land-mask * development
- gtin * development
- holidays * development
- ipwhois * development
- isbnlib * development
- langid >=1.1.6 development
- pgeocode * development
- phonenumbers * development
- price_parser * development
- primefac * development
- prophet * development
- pwnedpasswords * development
- py-moneyed * development
- pydnsbl * development
- pygeos * development
- pyogrio * development
- python-geohash * development
- python-stdnum * development
- pyvat * development
- rtree * development
- schwifty * development
- scikit-learn * development
- shapely * development
- simple_icd_10 * development
- sympy * development
- tensorflow * development
- timezonefinder * development
- us * development
- user_agents * development
- uszipcode * development
- yahoo_fin * development
- zipcodes * development
- docstring-parser ==0.15 development
- feather-format >=0.4.1 development
- pyarrow * development
- pyathena >=2.0.0,<3 development
- azure-identity >=1.10.0 development
- azure-keyvault-secrets >=4.0.0 development
- azure-storage-blob >=12.5.0 development
- gcsfs >=0.5.1 development
- google-cloud-bigquery >=3.3.6 development
- google-cloud-bigquery-storage >=2.20.0 development
- google-cloud-secret-manager >=1.0.0 development
- google-cloud-storage >=2.10.0 development
- google-cloud-storage >=1.28.0 development
- sqlalchemy-bigquery >=1.3.0 development
- clickhouse-sqlalchemy >=0.2.2 development
- pandas <2.2.0 development
- orjson >=3.9.7 development
- adr-tools-python ==1.0.3 development
- black ==23.10.1 development
- invoke >=2.0.0 development
- mypy ==1.10.1 development
- pre-commit >=2.21.0 development
- ruff ==0.5.7 development
- tomli >=2.0.1 development
- databricks-sql-connector >=2.0.0 development
- pyodbc >=4.0.30 development
- sqlalchemy-dremio ==1.2.1 development
- openpyxl >=3.0.7 development
- xlrd >=1.1.0,<2.0.0 development
- PyHive >=0.6.5 development
- thrift >=0.16.0 development
- thrift-sasl >=0.4.3 development
- boto3 >=1.17.106 development
- flaky >=3.7.0 development
- flask >=1.0.0 development
- freezegun >=0.3.15 development
- moto >=2.0.0,<3.0.0 development
- nbconvert >=5 development
- pact-python >=2.0.1 development
- pyfakefs >=4.5.1 development
- pytest >=6.2.0 development
- pytest-benchmark >=3.4.1 development
- pytest-cov >=2.8.1 development
- pytest-icdiff >=0.6 development
- pytest-mock >=3.8.2 development
- pytest-order >=0.9.5 development
- pytest-random-order >=1.0.4 development
- pytest-timeout >=2.1.0 development
- pytest-xdist >=3.3.1 development
- requirements-parser >=0.2.0 development
- responses >=0.23.1 development
- snapshottest ==0.6.0 development
- sqlalchemy >=1.4.0 development
- pyodbc >=4.0.30 development
- PyMySQL >=1.1.1 development
- pypd ==1.1.0 development
- psycopg2-binary >=2.7.6 development
- psycopg2-binary >=2.7.6 development
- sqlalchemy-redshift >=0.8.8 development
- pandas <2.2.0 development
- snowflake-connector-python >=2.5.0 development
- snowflake-connector-python >2.9.0 development
- snowflake-sqlalchemy >=1.2.3 development
- pyspark >=2.3.2 development
- sqlalchemy <2.0.0 development
- teradatasqlalchemy ==17.0.0.5 development
- jupyter * development
- jupyterlab * development
- matplotlib * development
- scikit-learn * development
- trino >=0.310.0, development
- sqlalchemy-vertica-python >=0.5.10 development
- pandas-stubs *
- types-PyYAML *
- types-decorator *
- types-jsonschema *
- types-protobuf *
- types-psycopg2 *
- types-pycurl *
- types-python-dateutil *
- types-pytz *
- types-requests *
- types-six *
- types-tabulate *
- types-tqdm *
- types-typed-ast *
- types-tzlocal *
- types-urllib3 *
- Click >=7.1.2
- Ipython >=7.16.3
- altair >=4.2.1,<5.0.0
- colorama >=0.4.3
- cryptography >=3.2
- ipywidgets >=7.5.1
- jinja2 >=2.10
- jsonpatch >=1.22
- jsonschema >=2.5.1
- makefun >=1.7.0,<2
- marshmallow >=3.7.1,<4.0.0
- mistune >=0.8.4
- nbformat >=5.0
- notebook >=6.4.10
- numpy >=1.20.3,<2.0.0
- numpy >=1.21.6,<2.0.0
- numpy >=1.22.4,<2.0.0
- packaging *
- pandas >=1.3.0
- pandas >=1.1.0
- pandas >=1.1.3
- pydantic >=1.9.2
- pyparsing >=2.4
- python-dateutil >=2.8.1
- pytz >=2021.3
- requests >=2.20
- scipy >=1.6.0
- tqdm >=4.59.0
- typing-extensions >=3.10.0.0
- tzlocal >=1.2
- urllib3 >=1.26