Recent Releases of https://github.com/awslabs/deequ

https://github.com/awslabs/deequ - 2.0.12

What's Changed

  • Added Implementation of DQDL Rules and Execution

    • add implementation of DQDL rule execution by @happy-coral in https://github.com/awslabs/deequ/pull/620
    • Add implementation of outcome mapping in DeequOutcomeTranslator by @happy-coral in https://github.com/awslabs/deequ/pull/621
    • Add implementation for DQDL rules: CompletenessRule, IsCompleteRule, UniquenessRule, IsUniqueRule, ColumnCorrelationRule by @happy-coral in https://github.com/awslabs/deequ/pull/622
    • Add implementation for DQDL rules: DistinctValuesCount, Entropy, Mean, StandardDeviation, Sum, UniqueValueRatio by @happy-coral in https://github.com/awslabs/deequ/pull/624
    • Update README to describe DQDL support and add Java & Scala DQDL examples by @happy-coral in https://github.com/awslabs/deequ/pull/634
    • Add support for DQDL IsPrimaryKey rule by @happy-coral in https://github.com/awslabs/deequ/pull/635
    • Add support for DQDL ColumnLength rule by @eycho-am in https://github.com/awslabs/deequ/pull/636
  • Modify Histogram to be in descending frequency by @kyraman in https://github.com/awslabs/deequ/pull/630

  • Introduce HistogramBase for common histogram behavior by @kyraman in https://github.com/awslabs/deequ/pull/631

  • Modify maven publishing to use central portal by @eycho-am in https://github.com/awslabs/deequ/pull/633

  • Add support for DQDL CustomSql rule & Deequ CustomSql check by @happy-coral in https://github.com/awslabs/deequ/pull/632

  • fix(kll): Add SerDe Implementation for KLLSketch by @mdrakiburrahman in https://github.com/awslabs/deequ/pull/628

  • Updated version in pom.xml to 2.0.12-spark-3.5 by @eycho-am in https://github.com/awslabs/deequ/pull/637

New Contributors

  • @kyraman made their first contribution in https://github.com/awslabs/deequ/pull/630
  • @mdrakiburrahman made their first contribution in https://github.com/awslabs/deequ/pull/628

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.11...2.0.12

- Scala
Published by eycho-am 6 months ago

https://github.com/awslabs/deequ - 2.0.11

What's Changed

  • Add AnalyzerOptions to Analyzer serialize / deserialize logic by @kchaturvedi in https://github.com/awslabs/deequ/pull/597
  • Refine row count retrieval to skip redundant Size() scans by @lawofcycles in https://github.com/awslabs/deequ/pull/605
  • Updated version in pom.xml to 2.0.11-spark-3.5 by @eycho-am in https://github.com/awslabs/deequ/pull/615

New Contributors

  • @kchaturvedi made their first contribution in https://github.com/awslabs/deequ/pull/597
  • @lawofcycles made their first contribution in https://github.com/awslabs/deequ/pull/605

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.10...2.0.11

- Scala
Published by eycho-am 6 months ago

https://github.com/awslabs/deequ - 2.0.10

New Features

  • Are unique check by @eycho-am in https://github.com/awslabs/deequ/pull/599
  • add DQDL parser dependency by @happy-coral in https://github.com/awslabs/deequ/pull/603
  • scaffolding for checking data quality agains DQDL rulesets by @happy-coral in https://github.com/awslabs/deequ/pull/604
  • Implement translation of rules and add converter for RowCount rule by @happy-coral in https://github.com/awslabs/deequ/pull/606

Maintenance / Fixes

  • feature/replace-rdd by @shriyavanvari in https://github.com/awslabs/deequ/pull/586
  • Adds a test to verify that Deequ's isContainedIn constraint correctly handles string values containing single quotes in the verification process. by @D-Minor in https://github.com/awslabs/deequ/pull/602

New Contributors

  • @shriyavanvari made their first contribution in https://github.com/awslabs/deequ/pull/586
  • @D-Minor made their first contribution in https://github.com/awslabs/deequ/pull/602
  • @happy-coral made their first contribution in https://github.com/awslabs/deequ/pull/603

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.9...2.0.10

- Scala
Published by eycho-am 10 months ago

https://github.com/awslabs/deequ - 2.0.9

Maintenance / Fixes

  • Fix row level bug when composing outcome https://github.com/awslabs/deequ/pull/594

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.8...2.0.9

- Scala
Published by eycho-am 10 months ago

https://github.com/awslabs/deequ - 2.0.8

New Features

  • Configurable RetainCompletenessRule by @zeotuan in https://github.com/awslabs/deequ/pull/564
  • Optional specification of instance name in CustomSQL analyzer metric. by @tylermcdaniel0 in https://github.com/awslabs/deequ/pull/569
  • Adding Wilson Score Confidence Interval Strategy by @zeotuan in https://github.com/awslabs/deequ/pull/567
  • CustomAggregator by @joshuazexter in https://github.com/awslabs/deequ/pull/572
  • Add commits from master branch to release/2.0.8-spark-3.5 by @eycho-am in https://github.com/awslabs/deequ/pull/587

Maintenance / Fixes

  • fix typo by @bojackli in https://github.com/awslabs/deequ/pull/574
  • Fix performance of building row-level results by @marcantony in https://github.com/awslabs/deequ/pull/577

New Contributors

  • @joshuazexter made their first contribution in https://github.com/awslabs/deequ/pull/572
  • @bojackli made their first contribution in https://github.com/awslabs/deequ/pull/574

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.7...2.0.8

- Scala
Published by eycho-am 10 months ago

https://github.com/awslabs/deequ - 2.0.7

What's Changed

Upgrades

  • Add Spark 3.5 support by @jhchee in https://github.com/awslabs/deequ/pull/514

New Features

  • New type of MetricsRepository by @VenkataKarthikP:
    • Using Spark tables as the data source in https://github.com/awslabs/deequ/pull/518
  • Row Level Result Treatment Options by @eycho-am:
    • Uniqueness and Completeness in https://github.com/awslabs/deequ/pull/532
    • Miminum and Maximum in https://github.com/awslabs/deequ/pull/535
  • Anomaly Detection Changes by @zeotuan:
    • Add Daily Season with Hourly Interval to HoltWinter in https://github.com/awslabs/deequ/pull/546
  • New analyzers:
    • RatioOfSums by @scott-gunn in https://github.com/awslabs/deequ/pull/552
    • Column Count Analyzer and Check by @mentekid in https://github.com/awslabs/deequ/pull/555

Maintenance/Fixes

  • Fix Breeze dependency conflict in Anomaly Detection Spark 3.4+ by @zeotuan in https://github.com/awslabs/deequ/pull/545
  • Data Sync / DatasetMatch changes by @VenkataKarthikP:
    • add data synchronization test to verification Suite in https://github.com/awslabs/deequ/pull/526
    • support col match and change to DatasetMatch in https://github.com/awslabs/deequ/pull/529
  • Row level results fixes:
    • Add analyzerOption to add filteredRowOutcome for isPrimaryKey Check by @eycho-am in https://github.com/awslabs/deequ/pull/537
    • Fix bug in MinLength and MaxLength when NullBehavior.EmptyString by @eycho-am in https://github.com/awslabs/deequ/pull/538
    • [Min/Max] Apply filtered row behavior at the row level evaluation by @rdsharma26 in https://github.com/awslabs/deequ/pull/543
    • [MinLength/MaxLength] Apply filtered row behavior at the row level evaluation by @rdsharma26 in https://github.com/awslabs/deequ/pull/547
    • Fix for satisfies row level results bug by @rdsharma26 in https://github.com/awslabs/deequ/pull/553

New Contributors

  • @VenkataKarthikP made their first contribution in https://github.com/awslabs/deequ/pull/518
  • @scott-gunn made their first contribution in https://github.com/awslabs/deequ/pull/552

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.6...2.0.7

- Scala
Published by rdsharma26 over 1 year ago

https://github.com/awslabs/deequ - 2.0.6

What's Changed

  • NEW: Exact Quantile Check
    • Creation of Exact Quantile Check by @jmilis2000 in https://github.com/awslabs/deequ/pull/512
  • Data Synchronization/Matching fixes
    • Delegate to Spark for checking existence of columns in the given dataframes by @rdsharma26 in https://github.com/awslabs/deequ/pull/515
    • Verify that non key columns exist in each dataset by @rdsharma26 in https://github.com/awslabs/deequ/pull/517
  • Addition of tests
    • Test that exceptions within a check's constraints do not affect other… by @tylermcdaniel0 in https://github.com/awslabs/deequ/pull/516

New Contributors

  • @jmilis2000 made their first contribution in https://github.com/awslabs/deequ/pull/512
  • @tylermcdaniel0 made their first contribution in https://github.com/awslabs/deequ/pull/516

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.5...2.0.6

- Scala
Published by rdsharma26 over 2 years ago

https://github.com/awslabs/deequ - 2.0.5

What's Changed

  • Spark 3.4 Update
    • Add Spark 3.4 support by @jhchee in https://github.com/awslabs/deequ/pull/505
    • Update minor version for Spark 3.4 maven release by @eycho-am in https://github.com/awslabs/deequ/pull/513
  • NEW: Custom SQL analyzer
    • Custom SQL Analyzer by @mentekid in https://github.com/awslabs/deequ/pull/509
    • Fail when CustomSql has syntax errors by @mentekid in https://github.com/awslabs/deequ/pull/510
    • Fix CustomSQL test syntax by @eycho-am in https://github.com/awslabs/deequ/pull/511
  • Analyzer Improvements
    • Allow all DQ constraints to be generated from an Analyzer by @mentekid in https://github.com/awslabs/deequ/pull/508

New Contributors

  • @jhchee made their first contribution in https://github.com/awslabs/deequ/pull/505

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.4...2.0.5

- Scala
Published by rdsharma26 over 2 years ago

https://github.com/awslabs/deequ - 2.0.4

What's Changed

  • Row-Level Results:
    • MinLength by @eycho-am in https://github.com/awslabs/deequ/pull/465
    • Uniqueness by @eycho-am in https://github.com/awslabs/deequ/pull/471
    • ColumnValues by @zixianzh1 in https://github.com/awslabs/deequ/pull/476
    • ReferentialIntegrity by @rdsharma26 in https://github.com/awslabs/deequ/pull/466
    • [Experimental] DataSynchronization by @rdsharma26 in https://github.com/awslabs/deequ/pull/473
  • Referential Integrity:
    • Updated Referential Integrity to support multiple columns by @rdsharma26 in https://github.com/awslabs/deequ/pull/463
  • Constraints and Condition Changes:
    • Add population stability index (PSI) to distance methods by @bevhanno in https://github.com/awslabs/deequ/pull/480
    • Fix chi-square test conditions by @bevhanno in https://github.com/awslabs/deequ/pull/482
    • Missing Column Precondition for Compliance Check - issue fix 467 by @samarth-c1 in https://github.com/awslabs/deequ/pull/478
    • Addition of HasMax/HasMin/HasStandardDeviation/HasMean constraint suggestions by @rdsharma26 in https://github.com/awslabs/deequ/pull/489
    • Alternative aggregate functions to calculate histogram values. by @akalotkin in https://github.com/awslabs/deequ/pull/475

New Contributors

  • @zixianzh1 made their first contribution in https://github.com/awslabs/deequ/pull/476
  • @samarth-c1 made their first contribution in https://github.com/awslabs/deequ/pull/478
  • @akalotkin made their first contribution in https://github.com/awslabs/deequ/pull/475

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.3...2.0.4

- Scala
Published by eycho-am over 2 years ago

https://github.com/awslabs/deequ - 2.0.3

What's Changed

  • Adding chi-square distance method for categorical variables by @bevhanno in https://github.com/awslabs/deequ/pull/444
  • [WIP] Row Level Results by @mentekid in https://github.com/awslabs/deequ/pull/451
  • [Experimental] Addition of dataset comparison utilities by @rdsharma26 in https://github.com/awslabs/deequ/pull/449

New Contributors

  • @rdsharma26 made their first contribution in https://github.com/awslabs/deequ/pull/447
  • @bevhanno made their first contribution in https://github.com/awslabs/deequ/pull/444
  • @mentekid made their first contribution in https://github.com/awslabs/deequ/pull/451

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.2...2.0.3

- Scala
Published by eycho-am almost 3 years ago

https://github.com/awslabs/deequ - 2.0.2

Adds Spark 3.3 compatibility.

What's Changed

  • Upgrade to Spark 3.3.0 by @eycho-am in https://github.com/awslabs/deequ/pull/442

New Contributors

  • @eycho-am made their first contribution in https://github.com/awslabs/deequ/pull/442

Full Changelog: https://github.com/awslabs/deequ/compare/2.0.1...2.0.2

- Scala
Published by shehzad-qureshi about 3 years ago

https://github.com/awslabs/deequ - 2.0.1

Adds Spark 3.2 compatibility.

- Scala
Published by TammoR about 4 years ago

https://github.com/awslabs/deequ - 2.0.0

Add Spark 3.1 compatibility.

Note: this version is no longer compatible with Spark <=3.0. Use previous versions and branch legacy-spark-3.0 instead.

- Scala
Published by lange-labs over 4 years ago

https://github.com/awslabs/deequ - Fix build setup to make artefact importable with maven/sbt

This release updates the build setup (i.e. the pom.xml and the publishing process) so that the artefacts published to maven can now be imported using maven or sbt. There are four branches associated with this new release: - for spark 2.2: https://github.com/awslabs/deequ/tree/release/1.2.2-spark-2.2 - for spark 2.3: https://github.com/awslabs/deequ/tree/release/1.2.2-spark-2.3 - for spark 2.4: https://github.com/awslabs/deequ/tree/release/1.2.2-spark-2.4 - for spark 2.5: https://github.com/awslabs/deequ/tree/release/1.2.2-spark-2.5

- Scala
Published by twollnik almost 5 years ago

https://github.com/awslabs/deequ - 1.1.0

Changes to the build setup to support Spark 2.2.x to 2.4.x and 3.0.x. There now is one maven release available per Spark version: - spark-3.0-scala-2.12 - spark-2.4-scala-2.11 - spark-2.3-scala-2.11 - spark-2.2-scala-2.11

- Scala
Published by twollnik about 5 years ago

https://github.com/awslabs/deequ - 1.0.5

- Scala
Published by iamsteps over 5 years ago

https://github.com/awslabs/deequ - 1.0.4

Correct version in pom.xml

- Scala
Published by tdhd over 5 years ago

https://github.com/awslabs/deequ - 1.0.3

  • Histogram metrics backwards compatability
  • support for Spark SQL case sensitivity
  • several bug fixes
  • added documentation

- Scala
Published by tdhd over 5 years ago

https://github.com/awslabs/deequ - 1.0.3-RC2

- Scala
Published by iamsteps almost 6 years ago

https://github.com/awslabs/deequ -

- Scala
Published by iamsteps about 6 years ago

https://github.com/awslabs/deequ - 1.0.2

- Scala
Published by iamsteps over 6 years ago

https://github.com/awslabs/deequ - 1.0.1

  • Spark 2.4 compatibility

- Scala
Published by iamsteps almost 7 years ago

https://github.com/awslabs/deequ - 1.0.0-rc5

  • Check-applicability result now contains all constraints and their applicabilities
  • Include metric in ConstraintResult

https://github.com/awslabs/deequ/pull/76

- Scala
Published by tdhd over 7 years ago

https://github.com/awslabs/deequ - 1.0.0-rc4

  • Anomaly detection with seasonal Holt Winters method
  • Check applicability support for additional data types

- Scala
Published by tdhd over 7 years ago

https://github.com/awslabs/deequ - 1.0.0-rc3

Column profiling handles boolean histograms correctly

- Scala
Published by tdhd over 7 years ago

https://github.com/awslabs/deequ - 1.0.0-rc2

Spark 2.3 compatibility

- Scala
Published by sscdotopen over 7 years ago

https://github.com/awslabs/deequ - 1.0.0-rc1

Additional few convenience functions for our API.

- Scala
Published by sscdotopen over 7 years ago

https://github.com/awslabs/deequ - 1.0.0-RC0

Release candidate for deequ 1.0.

- Scala
Published by sscdotopen over 7 years ago

https://github.com/awslabs/deequ -

Test release for validating maven publishing. DONT USE IN PRODUCTION.

- Scala
Published by sscdotopen over 7 years ago