https://github.com/datafold/data-diff
Compare tables within or across databases
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary
Keywords
data
data-diffing
data-engineering
data-quality
data-quality-monitoring
data-science
database
databricks-sql
dataengineering
dataquality
dbt
mysql
oracle-database
postgres
postgresql
python
rdbms
snowflake
sql
trino
Keywords from Contributors
data-wrangling
Last synced: 5 months ago
·
JSON representation
Repository
Compare tables within or across databases
Basic Info
- Host: GitHub
- Owner: datafold
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://docs.datafold.com
- Size: 3.98 MB
Statistics
- Stars: 2,988
- Watchers: 20
- Forks: 294
- Open Issues: 0
- Releases: 63
Archived
Topics
data
data-diffing
data-engineering
data-quality
data-quality-monitoring
data-science
database
databricks-sql
dataengineering
dataquality
dbt
mysql
oracle-database
postgres
postgresql
python
rdbms
snowflake
sql
trino
Created almost 4 years ago
· Last pushed almost 2 years ago
Metadata Files
Readme
Contributing
License
Code of conduct
README.md
⚠️ As of May 17, 2024, Datafold is no longer actively supporting or developing open source data-diff. We’re grateful to everyone who made contributions along the way. Please see our blog post for additional context on this decision.
data-diff: Compare datasets fast, within or across SQL databases
Contributors
License
This project is licensed under the terms of the MIT License.
Owner
- Name: Datafold
- Login: datafold
- Kind: organization
- Email: hello@datafold.com
- Location: San Francisco, CA
- Website: https://datafold.com
- Twitter: datafoldcom
- Repositories: 10
- Profile: https://github.com/datafold
Data quality platform
GitHub Events
Total
- Watch event: 65
- Fork event: 31
Last Year
- Watch event: 65
- Fork event: 31
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Erez Shinan | e****t@g****m | 573 |
| Dan | d****l@d****m | 247 |
| Sergey Vasilyev | sv@d****m | 104 |
| Ilia Pinchuk | p****a@g****m | 94 |
| Simon Eskildsen | s****p@s****m | 77 |
| Valentin Khomutenko | v****o@g****m | 45 |
| cfernhout | f****l@g****m | 29 |
| Leo Folsom | l****m@g****m | 27 |
| Kyle McNair | 4****r | 21 |
| Sung Won Chung | s****3@g****m | 19 |
| Gleb Mezhanskiy | g****h | 15 |
| Sung Won Chung | s****g@d****m | 15 |
| Doug Beatty | d****y@d****m | 12 |
| Roderick Dunn | r****n@w****m | 10 |
| Matthias Ekundayo | m****s@m****m | 10 |
| Jardayn | j****n@g****m | 10 |
| Daniel Palma | d****y@g****m | 9 |
| Nicolás Aldecoa | n****a@e****i | 8 |
| Alexey Mikhaylov | a****y@d****m | 6 |
| Stefan Keidel | s****l@l****e | 6 |
| Will Sweet | 1****t | 5 |
| Sarad Mohanan | s****n@g****m | 5 |
| vvkh | v****h | 4 |
| Pierre Moizard | p****d@M****l | 4 |
| Leo Folsom | l****m@v****m | 4 |
| Dan Lawin | d****n@g****m | 4 |
| atsumi | a****a@g****m | 3 |
| Ivan Toriya | i****a@g****m | 3 |
| danieldiamond | d****1@g****m | 3 |
| Dave Connors | d****s@f****m | 2 |
| and 32 more... | ||
Committer Domains (Top 20 + Academic)
datafold.com: 5
sirupsen.com: 1
dbtlabs.com: 1
wealthsimple.com: 1
matthias-a2442.evbqa.com: 1
emilabs.ai: 1
lichtblick.de: 1
vendr.com: 1
fishtownanalytics.com: 1
hived.space: 1
cpan.org: 1
rebuy.com: 1
netlify.com: 1
clickzetta.com: 1
twilio.com: 1
transwarp.io: 1
coinlist.co: 1
bese.it: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 189
- Total pull requests: 238
- Average time to close issues: 4 months
- Average time to close pull requests: 12 days
- Total issue authors: 105
- Total pull request authors: 40
- Average comments per issue: 3.08
- Average comments per pull request: 0.71
- Merged pull requests: 174
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- dlawin (33)
- leoebfolsom (6)
- sungchun12 (6)
- kylemcnair (5)
- JCZuurmond (5)
- sar009 (5)
- erezsh (4)
- mimoyer21 (3)
- MattDelac (2)
- RoderickJDunn (2)
- MiConnell (2)
- EnCeT (2)
- hamer101 (2)
- saurasingh (2)
- Schumpeterx (2)
Pull Request Authors
- dlawin (72)
- nolar (46)
- sungchun12 (33)
- vvkh (14)
- leoebfolsom (12)
- teraamp (10)
- sar009 (9)
- kylemcnair (8)
- pik94 (6)
- erezsh (4)
- elliotgunn (4)
- ivan-toriya (3)
- pppsunil (3)
- sebaap (3)
- etnnth (2)
Top Labels
Issue Labels
bug (77)
enhancement (64)
stale (58)
triage (57)
--dbt (48)
stale_immune (35)
new-db-driver (19)
non-dbt (8)
good first issue (6)
awaiting_response (6)
cloud (5)
in-progress (5)
Linear (4)
help wanted (3)
sqeleton (2)
performance (2)
Pull Request Labels
stale (32)
--dbt (28)
bug (20)
enhancement (16)
cloud (5)
Linear (2)
new-db-driver (1)
stale_immune (1)
documentation (1)
Packages
- Total packages: 5
-
Total downloads:
- pypi 243,084 last-month
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 2
(may contain duplicates) - Total versions: 148
- Total maintainers: 10
pypi.org: data-diff
Command-line tool and Python library to efficiently diff rows across two different databases.
- Homepage: https://github.com/datafold/data-diff
- Documentation: https://data-diff.readthedocs.io/
- License: MIT
-
Latest release: 0.11.2
published almost 2 years ago
Rankings
Downloads: 1.0%
Stargazers count: 1.4%
Forks count: 3.7%
Average: 4.5%
Dependent packages count: 4.8%
Dependent repos count: 11.5%
Maintainers (5)
Last synced:
5 months ago
proxy.golang.org: github.com/datafold/data-diff
- Documentation: https://pkg.go.dev/github.com/datafold/data-diff#section-documentation
- License: mit
-
Latest release: v0.11.1
published about 2 years ago
Rankings
Dependent packages count: 9.0%
Average: 9.6%
Dependent repos count: 10.2%
Last synced:
5 months ago
conda-forge.org: data-diff
- Homepage: https://github.com/datafold/data-diff
- License: MIT
-
Latest release: 0.2.8
published over 3 years ago
Rankings
Stargazers count: 8.7%
Average: 14.7%
Forks count: 20.8%
Last synced:
5 months ago
pypi.org: cz-data-diff
Command-line tool and Python library to efficiently diff rows across two different databases.
- Homepage: https://github.com/datafold/data-diff
- Documentation: https://docs.datafold.com/reference/open_source/cli
- License: MIT
-
Latest release: 0.0.4
published about 2 years ago
Rankings
Stargazers count: 1.4%
Forks count: 3.9%
Dependent packages count: 7.3%
Average: 20.3%
Dependent repos count: 68.4%
Last synced:
5 months ago
pypi.org: collate-data-diff
Command-line tool and Python library to efficiently diff rows across two different databases.
- Homepage: https://github.com/datafold/data-diff
- Documentation: https://collate-data-diff.readthedocs.io/
- License: MIT
-
Latest release: 0.11.7
published 6 months ago
Rankings
Dependent packages count: 10.9%
Average: 36.1%
Dependent repos count: 61.3%
Maintainers (3)
Last synced:
5 months ago
Dependencies
.github/workflows/ci.yml
actions
- actions/checkout v3 composite
- actions/setup-python v3 composite
.github/workflows/ci_full.yml
actions
- actions/checkout v3 composite
- actions/setup-python v3 composite
.github/workflows/formatter.yml
actions
- actions/checkout v2 composite
- reviewdog/action-suggester v1 composite
- rickstaa/action-black v1 composite
.github/workflows/label-update_awaiting-response-to-triage.yml
actions
- andymckay/labeler master composite
.github/workflows/label-update_stale-to-triage.yml
actions
- andymckay/labeler master composite
.github/workflows/stale.yml
actions
- actions/stale v5 composite
.github/workflows/triage_labels.yml
actions
- actions/github-script v6 composite
Dockerfile
docker
- python 3.10 build
docker-compose.yml
docker
- clickhouse/clickhouse-server 21.12.3.32
- mysql oracle
- postgres 14.1-alpine
- trinodb/trino 389
- vertica/vertica-ce 12.0.0-0
docs/requirements.txt
pypi
- data_diff *
- enum-tools *
- recommonmark *
- sphinx-copybutton *
- sphinx-gallery *
- sphinx-rtd-theme *
- sphinx_markdown_tables *
poetry.lock
pypi
- PyJWT 2.6.0
- agate 1.6.3
- arrow 1.2.3
- asn1crypto 1.5.1
- attrs 23.1.0
- babel 2.11.0
- backports.zoneinfo 0.2.1
- certifi 2022.12.7
- cffi 1.15.1
- charset-normalizer 2.0.12
- click 8.1.3
- clickhouse-driver 0.2.5
- colorama 0.4.4
- commonmark 0.9.1
- coverage 6.5.0
- cryptography 36.0.2
- dbt-core 1.2.6
- dbt-extractor 0.4.1
- dsnparse 0.1.15
- duckdb 0.7.1
- filelock 3.12.2
- future 0.18.3
- hologram 0.0.14
- idna 3.4
- importlib-metadata 4.13.0
- importlib-resources 5.12.0
- isodate 0.6.1
- jaraco-classes 3.2.3
- jeepney 0.8.0
- jinja2 2.11.3
- jsonschema 3.1.1
- keyring 23.13.1
- lark-parser 0.11.3
- leather 0.3.4
- logbook 1.5.3
- markupsafe 2.0.1
- mashumaro 2.9
- minimal-snowplow-tracker 0.0.2
- more-itertools 9.1.0
- msgpack 1.0.4
- mysql-connector-python 8.0.29
- networkx 2.6.3
- oracledb 1.3.2
- oscrypto 1.3.0
- packaging 21.3
- parameterized 0.8.1
- parsedatetime 2.4
- preql 0.2.19
- presto-python-client 0.8.3
- prompt-toolkit 3.0.36
- protobuf 4.22.3
- psycopg2 2.9.5
- pycparser 2.21
- pycryptodomex 3.16.0
- pydantic 1.10.12
- pygments 2.15.1
- pyodbc 4.0.39
- pyopenssl 22.0.0
- pyparsing 3.0.9
- pyrsistent 0.19.3
- python-dateutil 2.8.2
- python-slugify 7.0.0
- pytimeparse 1.1.8
- pytz 2022.6
- pytz-deprecation-shim 0.1.0.post0
- pywin32-ctypes 0.2.0
- pyyaml 6.0
- requests 2.28.1
- rich 12.0.1
- runtype 0.2.7
- secretstorage 3.3.3
- setuptools 65.6.3
- six 1.16.0
- snowflake-connector-python 3.0.4
- sortedcontainers 2.4.0
- sqlparse 0.4.3
- tabulate 0.9.0
- text-unidecode 1.3
- toml 0.10.2
- trino 0.314.0
- typing-extensions 4.7.1
- tzdata 2022.7
- tzlocal 4.2
- unittest-parallel 1.5.3
- urllib3 1.26.13
- vertica-python 1.3.2
- wcwidth 0.2.5
- werkzeug 2.1.2
- zipp 3.11.0
pyproject.toml
pypi
- clickhouse-driver * develop
- cryptography * develop
- dbt-core ^1.0.0 develop
- duckdb ^0.7.0 develop
- mysql-connector-python * develop
- parameterized * develop
- preql ^0.2.19 develop
- presto-python-client * develop
- psycopg2 * develop
- snowflake-connector-python >=3.0.2,<4.0.0 develop
- trino ^0.314.0 develop
- unittest-parallel * develop
- vertica-python * develop
- click ^8.1
- clickhouse-driver *
- cryptography *
- dbt-core ^1.0.0
- dsnparse <0.2.0
- duckdb *
- keyring *
- mysql-connector-python 8.0.29
- oracledb *
- preql ^0.2.19
- presto-python-client *
- psycopg2 *
- pydantic 1.10.12
- pyodbc ^4.0.39
- python ^3.7.2
- rich *
- runtype ^0.2.6
- snowflake-connector-python >=3.0.2,<4.0.0
- tabulate ^0.9.0
- toml ^0.10.2
- trino ^0.314.0
- typing-extensions >=4.0.1
- urllib3 <2
- vertica-python *