Recent Releases of substrait
substrait - v0.72.0
0.72.0 (2025-05-04)
⚠ BREAKING CHANGES
- direct output order of semi/anti/mark joins no longer includes invalid side (i.e., right of lefty joins, and left of righty joins).
- single join raises runtime error if there more than one matching rows to adhere to the original proposed usage (unnesting scalar subqueries).
Problems
- Semi-joins and anti-joins are one-sided and fields from the other
side of joins are not valid. The
Direct Output Orderin the document is okay for other types of joins but not in one-side joins. - Single joins are proposed to be used unnesting scalar subqueries where exactly 1 row is expected. The current documentation citing the paper and behavior although relaxed the behavior so that the implementation can silently produce wrong result.
What this PR do?
- Clarify the direct output order by explicitly stating the semi, anti,
and mark joins. Introducing
Input Order(the previousDirect Output Order) so that all the properties referencingInput Orderto reduce ambiguity. - Single joins expecting at most one row for each join key. Otherwise, runtime error. This behavior can be extended in the future if such generalization is justified with correct use cases.
Features
- add description field to types definition in schema (#811) (a4e3a82)
- clarify behavior and direct output order of joins (#803) (fe3f1c6)
- Python
Published by substrait-project-bot[bot] 10 months ago
substrait - v0.70.0
0.70.0 (2025-04-13)
⚠ BREAKING CHANGES
- Hash Equijoin no longer preserves ordering for inner joins
The original Property Maintenance of hash join operator is following.
Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated.
This holds ONLY very specific implementation of a hash join, for instance, when build side input completely fits in memory, and probe side input is streamed in single thread. It is also strange why INNER JOIN is specifically called out because other joins can preserve order of probe (LEFT) when build (RIGHT) fits in memory.
Nonetheless, if you throw some kicks and chops, this order preserving claim quickly falls apart unless implementation does some non-trivial work under following scenarios.
- build does not fit in memory (i.e., spill to storage)
- parallel probe
So in general, we should not say hash join preserves order of probe. It may assuming a specific implementation under particular conditions, which is more of optimization or hint territory.
Features
- Python
Published by substrait-project-bot[bot] 11 months ago
substrait - v0.64.0
0.64.0 (2025-01-12)
Features
- additional boolean comparison functions (#764) (2d8b1b6)
- introduce Iceberg table type using metadata file (#758) (7434e2f)
- run pytest in pr workflow to check function test coverage (#765) (7bfc37c)
Bug Fixes
- bump flake8 version to 7.0.0 (#768) (57770b6)
- update the doc to clarify that function names are case-sensitive (#757) (203e6e4)
- Python
Published by substrait-project-bot[bot] about 1 year ago
substrait - v0.63.0
0.63.0 (2024-12-15)
⚠ BREAKING CHANGES
- The encoding of FetchRel has changed in a strictly backwards incompatible way. The change involves transitioning offset and count from a standalone int64 field to a oneof structure, where the original int64 field is marked as deprecated, and a new field of Expression type is introduced. Using a oneof may cause ambiguity between unset and set-to-zero states in older messages. However, the fields are defined such that their logical meaning remains indistinguishable, ensuring consistency across encodings.
Features
- add expression support for count and offset in the fetch operator (#748) (bd4b431)
- add simple linking to the examples (#702) (4c00b1c)
- support missing variants for regexp string functions (#750) (3410a3e)
- Python
Published by substrait-project-bot[bot] about 1 year ago
substrait - v0.61.0
0.61.0 (2024-11-17)
Features
- add substrait test files to go embedded fs (#740) (e3a7773)
- handle parsing of list arguments in func testcases (#737) (1f9c710)
- update operator to update a table (#734) (adb1079)
Bug Fixes
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.60.0
0.60.0 (2024-11-10)
Features
- add antlr grammar for test file format (#728) (752aa63)
- add CreateMode for CTAS in WriteRel (#715) (2e13d0b)
- update test file format to support aggregate functions (#736) (c18c0c1)
Bug Fixes
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.57.0
0.57.0 (2024-10-02)
⚠ BREAKING CHANGES
- This PR changes the definition of grouping sets in
AggregateRelto consist of references into a list of grouping expressions instead of consisting of expressions directly.
With the previous definition, consumers had to deduplicate the expressions in the grouping sets in order to execute the query or even derive the output schema (which is problematic, as explained below). With this change, the responsibility of deduplicating expressions is now on the producer. Concretely, consumers are now expected to be simpler: The list of grouping expressions immediately provides the information needed to derive the output schema and the list of grouping sets explicitly and unambiguously provides the equality of grouping expressions. Producers now have to specify the grouping sets explicitly. If their internal representation of grouping sets consists of full grouping expressions (rather than references), then they must deduplicate these expressions according to their internal notion of expression equality in order to produce grouping sets consisting of references to these deduplicated expressions.
If the previous format is desired, it can be obtained from the new format by (1) deduplicating the grouping expressions (according to the previously applicable definition of expression equality), (2) re-establishing the duplicates using the emit clause, and (3) "dereferencing" the references in the grouping sets, i.e., by replacing each reference in the grouping sets with the expression it refers to.
The previous version was problematic because it required the consumers to deduplicate the expressions from the grouping sets. This, in turn, requires to parse and understand 100% of these expression even in cases where that understanding is otherwise optional, which is in opposition to the general philosophy of allowing for simple-minded consumers. The new version avoids that problem and, thus, allows consumers to be
Features
- change grouping expressions in AggregateRel to references (#706) (65a7d38), closes #700
- clarify behaviour of SetRel operations (#708) (f796521)
- make substrait repo a go module (#712) (3dca9b5)
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.54.0
0.54.0 (2024-08-11)
⚠ BREAKING CHANGES
- The encoding of IntervalDay literals has changed in a strictly backwards incompatible way. However, the logical meaning across encoding is maintained using a oneof. Moving a field into a oneof makes unset/set to zero unclear with older messages but the fields are defined such that the logical meaning of the two is indistinct. If neither microseconds nor precision is set, the value can be considered a precision 6 value. If you aren't using IntervalDay type, you will not need to make any changes.
- TypeExpression and Parameterized type protobufs (used to serialize output derivation) are updated to match the now compound nature of IntervalDay. If you use protobuf to serialize output derivation that refer to IntervalDay type, you will need to rework that logic.
- JoinRel's type enum now has LEFTSINGLE instead of SINGLE. Similarly there is now LEFTANTI and LEFT_SEMI. Other values are available in all join type enums. This affects JSON and text formats only (binary plans -- the interoperable part of Substrait -- will still be compatible before and after this change).
Features
- add arithmetic function "power" with decimal type (#660) (9af2d66)
- add CSV (text) file support (#646) (5d49e04)
- add precision to IntervalDay and new IntervalCompound type (#665) (e41eff2), closes #664
- normalize the join types (#662) (bed84ec)
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.53.0
0.53.0 (2024-08-04)
⚠ BREAKING CHANGES
- PrecisionTimestamp(Tz) literal's value is now int64 instead of uint64
Features
- add aggregate count functions with decimal return type (#670) (2aa516b)
- add arithmetic function "sqrt" and "factorial" with decimal type (#674) (e4f5b68)
- add arithmetic function for bitwise(AND/OR/XOR) operation with decimal arguments (#675) (a70cf72)
- add logarithmic functions with decimal type args (#669) (d9fb1e3)
- add precision timestamp datetime fn variants (#666) (60c93d2)
- clarify the meaning of plans (#616) (c1553df), closes #612 #613
Bug Fixes
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.52.0
0.52.0 (2024-07-14)
⚠ BREAKING CHANGES
- changes the message type for Literal PrecisionTimestamp and PrecisionTimestampTZ
The PrecisionTimestamp and PrecisionTimestampTZ literals were introduced
Bug Fixes
- include precision information in PrecisionTimestamp and PrecisionTimestampTZ literals (#659) (f9e5f9c), closes #594 /github.com/substrait-io/substrait/pull/594#discussion_r1471844566
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.51.0
0.51.0 (2024-07-07)
Features
- add "initcap" function (#656) (95bc6ba), closes /github.com/Blizzara/substrait/blob/70d1eb71623ca0754157dd5d87348bae51d420c4/extensions/functions_string.yaml#L1023
- add null input handling options for
any_value(#652) (1890e6a) - allow naming/aliasing relations (#649) (4cf8108), closes #648 #571
- define SetRel output nullability derivation (#558) (#654) (612123a)
- Python
Published by substrait-project-bot[bot] over 1 year ago
substrait - v0.49.0
0.49.0 (2024-05-23)
Features
Bug Fixes
- ci: pin
conventional-changelog-conventionalcommitsto7.0.2(#644) (9528bd2) - specify a minimum length for the options of enum args (#642) (8e65af5), closes /github.com/substrait-io/substrait-rs/pull/185#discussion_r1603513149
- Python
Published by substrait-project-bot[bot] almost 2 years ago
substrait - v0.47.0
0.47.0 (2024-04-18)
Features
- add i64 variant for exp, ln, log10, log2 and logb functions (#628) (fef2253)
- allow FetchRel to specify a return of ALL results (#622) (#627) (37f43b4)
Bug Fixes
- index_in has wrong return type (#632) (4cd2089)
- use any1 instead of T in function extensions (#629) (0bddf68)
- Python
Published by substrait-project-bot[bot] almost 2 years ago
substrait - v0.44.0
0.44.0 (2024-03-03)
⚠ BREAKING CHANGES
- Adding a NULL option to the ondomainerrors.
SQLite returns null for some inputs such as negative infinity
Features
- add extra option for on domain errors in log functions (#536) (cbec079)
- add ignore nulls options to concat function (#605) (55db05b)
- Python
Published by substrait-project-bot[bot] almost 2 years ago
substrait - v0.40.0
0.40.0 (2023-12-17)
⚠ BREAKING CHANGES
- The enum
WriteRel::OutputModehad an option change fromOUTPUT_MODE_MODIFIED_TUPLEStoOUTPUT_MODE_MODIFIED_RECORDS - The message
AggregateFunction.ReferenceRelhas moved toReferenceRel.
Features
- Python
Published by substrait-project-bot[bot] about 2 years ago
substrait - v0.39.0
0.39.0 (2023-11-26)
⚠ BREAKING CHANGES
- * Map keys may be repeated.
- Map keys must not be NULL.
- The map key type may be nullable.
This is based on the current restrictions found in the wild.
DuckDB, Velox, Spark, and Acero all reject attempts to provide NULL as a key.
Despite DuckDB specifically calling out that keys must be unique in its implementation other implementations such as Velox and Acero do not require the key to be unique so we cannot require the map key to be 1:1 with map values.
Features
Documentation
- Python
Published by substrait-project-bot[bot] over 2 years ago
substrait - v0.35.0
0.35.0 (2023-10-01)
⚠ BREAKING CHANGES
- nullability of isnotdistinct_from has changed
- The minimum precision for floating point numbers is now mandated.
Features
- add approval guidelines for documentation updates (#553) (da4b32a)
- add geometric data types and functions (#543) (db52bbd)
- add geometry editor functions (#554) (727467c)
- adding geometry accessor functions (#552) (784fa9b)
- explicitly reference IEEE 754 and mandate precision as well as range (#449) (54e3d52), closes #447
Bug Fixes
- Python
Published by substrait-project-bot[bot] over 2 years ago
substrait - v0.32.0
0.32.0 (2023-08-21)
⚠ BREAKING CHANGES
- plans referencing functions using simple names (e.g. not vs not:bool) will no longer be valid.
Features
- add ExchangeRel as a type in Rel (#518) (89b0c62)
- add expand rel (#368) (98380b0)
- add options to substring for start parameter being negative (#508) (281dc0f)
- add windowrel support in proto (#399) (bd14e0e)
- require compound functions names in extension references (#537) (2503beb)
- Python
Published by substrait-project-bot[bot] over 2 years ago
substrait - v0.29.0
0.29.0 (2023-04-23)
⚠ BREAKING CHANGES
- text: mark
nameandstructureproperty oftypeextension item as required (#495)
Bug Fixes
- referenced simple extension in tutorial (set instead of string) (#494) (b5d7ed2)
- text: mark
nameandstructureproperty oftypeextension item as required (#495) (7246102)
- Python
Published by substrait-project-bot[bot] almost 3 years ago
substrait - v0.27.0
0.27.0 (2023-03-26)
⚠ BREAKING CHANGES
groupargument added toregexp_match_substringfunction
Add regexpmatchsubstring_all function
Resolves https://github.com/substrait-io/substrait/issues/466
Features
Bug Fixes
- ci: fix link to conventional commits spec (#482) (45b4e48)
- remove duplication in simple extensions schema (#404) (b7df38d)
- Python
Published by substrait-project-bot[bot] almost 3 years ago
substrait - v0.25.0
0.25.0 (2023-02-26)
⚠ BREAKING CHANGES
- (add/subtract)ing an interval to a timestamptz now requires a time zone and returns a timestamptz
Bug Fixes
- correct return of temporal add and subtract and add timezone parameter (#337) (1b184cc)
- extension: fix typo in scalar function argument type (#445) (7d7ddf1)
- Python
Published by substrait-project-bot[bot] about 3 years ago
substrait - v0.20.0
0.20.0 (2022-11-20)
⚠ BREAKING CHANGES
- optional arguments are no longer allowed to be specified as a part of FunctionArgument messages. Instead they are now specified separately as part of the function invocation.
- optional arguments are now specified separately from required arguments in the YAML specification.
Co-authored-by: Benjamin Kietzman bengilgit@gmail.com
Co-authored-by: Benjamin Kietzman bengilgit@gmail.com
Features
- add best effort filter to read rel and clarify that the pre-masked schema should be used (#271) (4beff87)
- optional args are now specified separately from required args (#342) (bd29ea3)
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.14.0
0.14.0 (2022-09-11)
⚠ BREAKING CHANGES
- option argument added to std_dev and variance aggregate functions
Features
- add booland and boolor aggregate functions (#314) (52fa523)
- add corr and mode aggregation functions (#296) (96b13d7)
- add median and count_distinct aggregation functions (#278) (9be62e5)
- add population option to variance and standard deviation functions (#295) (c47fffa)
- add quantile aggregate function (#279) (de6bc9f)
- add string_agg aggregate function (#297) (fbe5e09)
Bug Fixes
- mark string_agg aggregate as being sensitive to input order (#312) (683faaa)
- naming: add missing arg names in functions_arithmetic.yaml (#315) (d433a06)
- naming: add missing arg names in functions_datetime.yaml (#318) (b7347d1)
- naming: add missing arg names in functionslogarithmic.yaml and functionsset.yaml (#319) (1c14d27)
- naming: add/replace arg names in functions_boolean.yaml (#317) (809a2f4)
- revert addition of count_distinct aggregate function (#311) (90d7c0d)
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.13.0
0.13.0 (2022-09-04)
⚠ BREAKING CHANGES
- nullability behavior of isnan, isfinite, and is_infinite has changed
- compound name for concat has changed to concat:str and concat:vchar (one argument) to make it 1+ variadic
Features
- add center function (#282) (7697d39)
- add coalesce function (#301) (63c5da0)
- add dwrf file format (#304) (0f7c2ea)
- add exp function (#299) (7ed31f6)
- add factorial scalar function (#300) (a4d6f35)
- add hyperbolic functions (#290) (4252824)
- add log1p function (#273) (55e8275)
- add regexpmatchsubstring, regexpstrpos, and regexpcount_substring (#293) (6b8191f)
- add regexp_replace function (#281) (433d049)
- add string transform functions (#267) (ff2f7f1)
- clarify behavior of isnull, isnotnull, isnan, isfinite, and isinfinite for nulls (#285) (cb25124)
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.9.0
0.9.0 (2022-07-31)
⚠ BREAKING CHANGES
- arithmetic: Options SILENT, SATURATE, ERROR are no longer valid for use with floating point arguments to add, subtract, multiply or divide
- function argument bindings were open to interpretation before, and were often produced incorrectly; therefore, this change semantically shifts some responsibilities from the consumers to the producers.
- the grouping set index column now only exists if there is more than one grouping set.
- Existing plans that are modeling
castwith thecastfunction (as opposed to thecastexpression) will no longer be valid. All producers/consumers should use thecastexpression type.
Features
- add functions for arithmetic, rounding, logarithmic, and string transformations (#245) (f7c5da5)
- add standard deviation functions (#257) (1339534)
- add string containment functions (#256) (d6b9b34)
- add string trimming and padding functions (#248) (8a8f65d)
- add trigonometry functions (#241) (d83d566)
- add variance function (#263) (b6c3772)
- arithmetic: add abs and sign to scalar function extensions (#244) (1b9a45f)
- support window functions (#224) (4b2072a)
Bug Fixes
- message: commit lint issue (#250) (34ec8f5)
- removes cast function definition (#253) (66a3476), closes #88 #152
- specify how function arguments are to be bound (#231) (d4cfbe0)
Documentation
Code Refactoring
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.5.0
0.5.0 (2022-06-12)
⚠ BREAKING CHANGES
- The
substrait/ReadRel/LocalFiles/formatfield is deprecated. This will cause a hard break in compatibility. Newer consumers will not be able to read older files. Older consumers will not be able to read newer files. One should now express format concepts using the file_format oneof field.
Co-authored-by: Jacques Nadeau jacques@apache.org
Features
- add aggregate function min/max support (#219) (48b6b12)
- add Arrow and Orc file formats (#169) (43be00a)
- support nullable and non-default variation user-defined types (#217) (5851b02)
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.4.0
0.4.0 (2022-06-05)
⚠ BREAKING CHANGES
- there was an accidental inclusion of a binary
notfunction with unspecified behavior. This function was removed. Use the unarynotfunction to return the compliment of an input argument.
Bug Fixes
- Python
Published by substrait-project-bot[bot] over 3 years ago
substrait - v0.3.0
0.3.0 (2022-05-22)
Features
- define APPROXCOUNTDISTINCT in new yaml for approximate aggregate functions (#204) (8e206b9)
- literals for extension types (#197) (296c266)
- support fractional seconds for interval_day literals (#199) (129e52f)
- Python
Published by substrait-project-bot[bot] almost 4 years ago