https://github.com/datafold/data-diff

Compare tables within or across databases

https://github.com/datafold/data-diff

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.0%) to scientific vocabulary

Keywords

data data-diffing data-engineering data-quality data-quality-monitoring data-science database databricks-sql dataengineering dataquality dbt mysql oracle-database postgres postgresql python rdbms snowflake sql trino

Keywords from Contributors

data-wrangling
Last synced: 5 months ago · JSON representation

Repository

Compare tables within or across databases

Basic Info
  • Host: GitHub
  • Owner: datafold
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage: https://docs.datafold.com
  • Size: 3.98 MB
Statistics
  • Stars: 2,988
  • Watchers: 20
  • Forks: 294
  • Open Issues: 0
  • Releases: 63
Archived
Topics
data data-diffing data-engineering data-quality data-quality-monitoring data-science database databricks-sql dataengineering dataquality dbt mysql oracle-database postgres postgresql python rdbms snowflake sql trino
Created almost 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Contributing License Code of conduct

README.md

⚠️ As of May 17, 2024, Datafold is no longer actively supporting or developing open source data-diff. We’re grateful to everyone who made contributions along the way. Please see our blog post for additional context on this decision.


data-diff: Compare datasets fast, within or across SQL databases

Contributors

License

This project is licensed under the terms of the MIT License.

Owner

  • Name: Datafold
  • Login: datafold
  • Kind: organization
  • Email: hello@datafold.com
  • Location: San Francisco, CA

Data quality platform

GitHub Events

Total
  • Watch event: 65
  • Fork event: 31
Last Year
  • Watch event: 65
  • Fork event: 31

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 1,412
  • Total Committers: 62
  • Avg Commits per committer: 22.774
  • Development Distribution Score (DDS): 0.594
Past Year
  • Commits: 229
  • Committers: 20
  • Avg Commits per committer: 11.45
  • Development Distribution Score (DDS): 0.624
Top Committers
Name Email Commits
Erez Shinan e****t@g****m 573
Dan d****l@d****m 247
Sergey Vasilyev sv@d****m 104
Ilia Pinchuk p****a@g****m 94
Simon Eskildsen s****p@s****m 77
Valentin Khomutenko v****o@g****m 45
cfernhout f****l@g****m 29
Leo Folsom l****m@g****m 27
Kyle McNair 4****r 21
Sung Won Chung s****3@g****m 19
Gleb Mezhanskiy g****h 15
Sung Won Chung s****g@d****m 15
Doug Beatty d****y@d****m 12
Roderick Dunn r****n@w****m 10
Matthias Ekundayo m****s@m****m 10
Jardayn j****n@g****m 10
Daniel Palma d****y@g****m 9
Nicolás Aldecoa n****a@e****i 8
Alexey Mikhaylov a****y@d****m 6
Stefan Keidel s****l@l****e 6
Will Sweet 1****t 5
Sarad Mohanan s****n@g****m 5
vvkh v****h 4
Pierre Moizard p****d@M****l 4
Leo Folsom l****m@v****m 4
Dan Lawin d****n@g****m 4
atsumi a****a@g****m 3
Ivan Toriya i****a@g****m 3
danieldiamond d****1@g****m 3
Dave Connors d****s@f****m 2
and 32 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 189
  • Total pull requests: 238
  • Average time to close issues: 4 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 105
  • Total pull request authors: 40
  • Average comments per issue: 3.08
  • Average comments per pull request: 0.71
  • Merged pull requests: 174
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dlawin (33)
  • leoebfolsom (6)
  • sungchun12 (6)
  • kylemcnair (5)
  • JCZuurmond (5)
  • sar009 (5)
  • erezsh (4)
  • mimoyer21 (3)
  • MattDelac (2)
  • RoderickJDunn (2)
  • MiConnell (2)
  • EnCeT (2)
  • hamer101 (2)
  • saurasingh (2)
  • Schumpeterx (2)
Pull Request Authors
  • dlawin (72)
  • nolar (46)
  • sungchun12 (33)
  • vvkh (14)
  • leoebfolsom (12)
  • teraamp (10)
  • sar009 (9)
  • kylemcnair (8)
  • pik94 (6)
  • erezsh (4)
  • elliotgunn (4)
  • ivan-toriya (3)
  • pppsunil (3)
  • sebaap (3)
  • etnnth (2)
Top Labels
Issue Labels
bug (77) enhancement (64) stale (58) triage (57) --dbt (48) stale_immune (35) new-db-driver (19) non-dbt (8) good first issue (6) awaiting_response (6) cloud (5) in-progress (5) Linear (4) help wanted (3) sqeleton (2) performance (2)
Pull Request Labels
stale (32) --dbt (28) bug (20) enhancement (16) cloud (5) Linear (2) new-db-driver (1) stale_immune (1) documentation (1)

Packages

  • Total packages: 5
  • Total downloads:
    • pypi 243,084 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 148
  • Total maintainers: 10
pypi.org: data-diff

Command-line tool and Python library to efficiently diff rows across two different databases.

  • Versions: 75
  • Dependent Packages: 1
  • Dependent Repositories: 2
  • Downloads: 45,804 Last month
Rankings
Downloads: 1.0%
Stargazers count: 1.4%
Forks count: 3.7%
Average: 4.5%
Dependent packages count: 4.8%
Dependent repos count: 11.5%
Last synced: 5 months ago
proxy.golang.org: github.com/datafold/data-diff
  • Versions: 59
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 9.0%
Average: 9.6%
Dependent repos count: 10.2%
Last synced: 5 months ago
conda-forge.org: data-diff
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 8.7%
Average: 14.7%
Forks count: 20.8%
Last synced: 5 months ago
pypi.org: cz-data-diff

Command-line tool and Python library to efficiently diff rows across two different databases.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 11 Last month
Rankings
Stargazers count: 1.4%
Forks count: 3.9%
Dependent packages count: 7.3%
Average: 20.3%
Dependent repos count: 68.4%
Maintainers (2)
Last synced: 5 months ago
pypi.org: collate-data-diff

Command-line tool and Python library to efficiently diff rows across two different databases.

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 197,269 Last month
Rankings
Dependent packages count: 10.9%
Average: 36.1%
Dependent repos count: 61.3%
Maintainers (3)
Last synced: 5 months ago

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/ci_full.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/formatter.yml actions
  • actions/checkout v2 composite
  • reviewdog/action-suggester v1 composite
  • rickstaa/action-black v1 composite
.github/workflows/label-update_awaiting-response-to-triage.yml actions
  • andymckay/labeler master composite
.github/workflows/label-update_stale-to-triage.yml actions
  • andymckay/labeler master composite
.github/workflows/stale.yml actions
  • actions/stale v5 composite
.github/workflows/triage_labels.yml actions
  • actions/github-script v6 composite
Dockerfile docker
  • python 3.10 build
docker-compose.yml docker
  • clickhouse/clickhouse-server 21.12.3.32
  • mysql oracle
  • postgres 14.1-alpine
  • trinodb/trino 389
  • vertica/vertica-ce 12.0.0-0
docs/requirements.txt pypi
  • data_diff *
  • enum-tools *
  • recommonmark *
  • sphinx-copybutton *
  • sphinx-gallery *
  • sphinx-rtd-theme *
  • sphinx_markdown_tables *
poetry.lock pypi
  • PyJWT 2.6.0
  • agate 1.6.3
  • arrow 1.2.3
  • asn1crypto 1.5.1
  • attrs 23.1.0
  • babel 2.11.0
  • backports.zoneinfo 0.2.1
  • certifi 2022.12.7
  • cffi 1.15.1
  • charset-normalizer 2.0.12
  • click 8.1.3
  • clickhouse-driver 0.2.5
  • colorama 0.4.4
  • commonmark 0.9.1
  • coverage 6.5.0
  • cryptography 36.0.2
  • dbt-core 1.2.6
  • dbt-extractor 0.4.1
  • dsnparse 0.1.15
  • duckdb 0.7.1
  • filelock 3.12.2
  • future 0.18.3
  • hologram 0.0.14
  • idna 3.4
  • importlib-metadata 4.13.0
  • importlib-resources 5.12.0
  • isodate 0.6.1
  • jaraco-classes 3.2.3
  • jeepney 0.8.0
  • jinja2 2.11.3
  • jsonschema 3.1.1
  • keyring 23.13.1
  • lark-parser 0.11.3
  • leather 0.3.4
  • logbook 1.5.3
  • markupsafe 2.0.1
  • mashumaro 2.9
  • minimal-snowplow-tracker 0.0.2
  • more-itertools 9.1.0
  • msgpack 1.0.4
  • mysql-connector-python 8.0.29
  • networkx 2.6.3
  • oracledb 1.3.2
  • oscrypto 1.3.0
  • packaging 21.3
  • parameterized 0.8.1
  • parsedatetime 2.4
  • preql 0.2.19
  • presto-python-client 0.8.3
  • prompt-toolkit 3.0.36
  • protobuf 4.22.3
  • psycopg2 2.9.5
  • pycparser 2.21
  • pycryptodomex 3.16.0
  • pydantic 1.10.12
  • pygments 2.15.1
  • pyodbc 4.0.39
  • pyopenssl 22.0.0
  • pyparsing 3.0.9
  • pyrsistent 0.19.3
  • python-dateutil 2.8.2
  • python-slugify 7.0.0
  • pytimeparse 1.1.8
  • pytz 2022.6
  • pytz-deprecation-shim 0.1.0.post0
  • pywin32-ctypes 0.2.0
  • pyyaml 6.0
  • requests 2.28.1
  • rich 12.0.1
  • runtype 0.2.7
  • secretstorage 3.3.3
  • setuptools 65.6.3
  • six 1.16.0
  • snowflake-connector-python 3.0.4
  • sortedcontainers 2.4.0
  • sqlparse 0.4.3
  • tabulate 0.9.0
  • text-unidecode 1.3
  • toml 0.10.2
  • trino 0.314.0
  • typing-extensions 4.7.1
  • tzdata 2022.7
  • tzlocal 4.2
  • unittest-parallel 1.5.3
  • urllib3 1.26.13
  • vertica-python 1.3.2
  • wcwidth 0.2.5
  • werkzeug 2.1.2
  • zipp 3.11.0
pyproject.toml pypi
  • clickhouse-driver * develop
  • cryptography * develop
  • dbt-core ^1.0.0 develop
  • duckdb ^0.7.0 develop
  • mysql-connector-python * develop
  • parameterized * develop
  • preql ^0.2.19 develop
  • presto-python-client * develop
  • psycopg2 * develop
  • snowflake-connector-python >=3.0.2,<4.0.0 develop
  • trino ^0.314.0 develop
  • unittest-parallel * develop
  • vertica-python * develop
  • click ^8.1
  • clickhouse-driver *
  • cryptography *
  • dbt-core ^1.0.0
  • dsnparse <0.2.0
  • duckdb *
  • keyring *
  • mysql-connector-python 8.0.29
  • oracledb *
  • preql ^0.2.19
  • presto-python-client *
  • psycopg2 *
  • pydantic 1.10.12
  • pyodbc ^4.0.39
  • python ^3.7.2
  • rich *
  • runtype ^0.2.6
  • snowflake-connector-python >=3.0.2,<4.0.0
  • tabulate ^0.9.0
  • toml ^0.10.2
  • trino ^0.314.0
  • typing-extensions >=4.0.1
  • urllib3 <2
  • vertica-python *