Recent Releases of rsample

rsample - rsample 1.3.1

  • The new internal_calibration_split() function and its methods for various resamples is for usage in tune to create a internal split of the analysis set to fit the preprocessor and model on one part and the post-processor on the other part (#483, #488, #489, #569, #575, #577, #582).

  • New accessor function calibration() for the calibration set of an internal calibration split (#581).

- R
Published by hfrick 11 months ago

rsample - rsample 1.3.0

  • Bootstrap intervals via int_pctl(), int_t(), and int_bca() now allow for more flexible grouping (#465).

  • Errors and warnings are now styled via cli (#499, #502). Largely done by @PriKalra (#523, #526, #528, #530, #531, #532), @Dpananos (#516, #517, #529), and @JamesHWade (#518) as part of the tidyverse dev day.

  • rolling_origin() is now superseded by sliding_window(), sliding_index(), and sliding_period() which provide more flexibility and control (@nmercadeb, #524).

  • The deprecation of validation_split(), validation_time_split(), and group_validation_split() has been moved to the next level so that they now warn.

Bug fixes

  • vfold_cv() now utilizes the breaks argument correctly for repeated cross-validation (@ZWael, #471).

  • Grouped resampling functions now work with an explicit strata = NULL instead of strata being either a name or missing (#485).

Breaking changes

  • vfold_cv() and clustering_cv() now error on implicit leave-one-out cross-validation (@seb09, #527).

  • The class of grouped MC splits is now group_mc_split instead of grouped_mc_split, aligning it with the other grouped splits (#478).

  • The rsplit objects of an apparent() split now have the correct class inheritance structure. The order is now apparent_split and then rsplit rather than the other way around (#477).

Documentation improvements

  • Improved documentation and formatting: function names are now more easily identifiable through either () at the end or being links to the function documentation (@brshallo , #521).

  • Fixed example for nested_cv() (@seb09, #520).

  • Formatting improvement: package names are now not in backticks anymore (@agmurray, #525).

  • Improved documentation for initial_split() and friends (@laurabrianna, #519).

  • Removed trailing space in printing of mc_cv() objects (@ccani007, #464).

- R
Published by hfrick about 1 year ago

rsample - rsample 1.2.1

  • nested_cv() no longer errors if outside is a long call (#459, #461).

  • The validation_set class now has its own pretty() method (#456).

- R
Published by hfrick over 2 years ago

rsample - rsample 1.2.0

  • The new initial_validation_split(), along with variants initial_validation_time_split() and group_initial_validation_split(), generates a three-way split of the data into training, validation, and test sets. With the new validation_set(), this can be turned into an rset object for tuning (#403, #446).

  • validation_split(), validation_time_split(), and group_validation_split() have been soft-deprecated in favor of the new functions implementing a 3-way split (initial_validation_split(), initial_validation_time_split(), and group_initial_validation_split()) (#449).

  • Functions which don't use the ellipsis ... now enforce empty dots (#429).

  • make_splits() gained an example in the documentation (@AngelFelizR, #432).

  • training(), testing(), analysis(), and assessment() are now S3 generics with methods for rsplit objects. Previously they manually required the input to be an rsplit object (#384).

  • The int_*() functions are now S3 generics and have corresponding methods for class bootstraps (#435).

  • The underlying mechanics of data splitting were changed so that Surv objects maintain their class. This change affects the row names of the resulting objects; they are reindexed from one instead of being a subset of the original row names (#443).

  • rsample does not re-export gather() anymore (#451).

- R
Published by hfrick almost 3 years ago

rsample - rsample 1.1.1

  • All grouped resampling functions (group_vfold_cv(), group_mc_cv(), group_initial_split() and group_validation_split(), and group_bootstraps()) now support stratification. Strata must be constant within each group (@mikemahoney218, #317, #360, #363, #364, #365).

  • Added a new function, clustering_cv(), for blocked cross-validation in various predictor spaces. This is a very flexible function, taking arguments to both distance_function and cluster_function, allowing it to be used for spatial clustering as well as potentially phylogenetic and other forms of clustering (@mikemahoney218, #351).

  • bootstraps() and group_bootstraps() now warn if resampling returns any empty assessment sets. Previously, bootstraps() was silent while group_bootstraps() errored (@mikemahoney218, #356, #357).

  • The assessment set of validation_time_split() now also contains the lagged observations (#376).

  • The new helper get_rsplit() lets you conveniently access the rsplit objects inside an rset objects (@mikemahoney218, #399).

  • The result of initial_time_split() now has its own subclass "initial_time_split", in addition to existing classes (#397).

  • The dependency on the ellipsis package has been removed (#393).

  • Removed an overly strict test in preparation for dplyr 1.1.0 (#380).

- R
Published by hfrick over 3 years ago

rsample - rsample 1.1.0

  • rset objects now include all parameters used to create them as attributes (#329).

  • Objects returned by sliding functions now have an index attribute, where appropriate, containing the column name used as an index (#329).

  • Objects returned by permutations() now have a permutes attribute containing the column name used for permutation (#329).

  • Added breaks and pool as attributes to all functions which support stratification (#329).

  • Changed the "strata" attribute on rset objects so that it now is either a character vector identifying the column used to stratify the data, and is not present (set to NULL) if stratification was not used. (#329)

  • Added a new function, reshuffle_rset(), which takes an rset object and generates a new version of it using the same arguments but the current random seed. (#79, #329)

  • Added arguments to control how group_vfold_cv() combines groups. Use balance = "groups" to assign (roughly) the same number of groups to each fold, or balance = "observations" to assign (roughly) the same number of observations to each fold.

  • Added a repeats argument to group_vfold_cv() (#330).

  • Added new functions for grouped resampling: group_mc_cv() (#313), group_initial_split() and group_validation_split() (#315), and group_bootstraps() (#316).

  • Added a new function, reverse_splits(), to swap analysis and assessment splits (#319, #284).

  • Improved the error thrown when calling assessment() on a perm_split object created by permutations() (#321, #322).

- R
Published by juliasilge almost 4 years ago

rsample - rsample 1.0.0

  • Fixed how nested_cv() handles call objects so variables in the environment can be used when specifying resampling schemes (#81).

  • Updated to testthat 3e (#280) and added better checking for vfold_cv() (#293).

  • Finally removed the gather() method for rset objects. Use tidyr::pivot_longer() instead (#280).

  • Changed initial_split() to avoid calling tidyselect twice on strata (#296). This fix stops initial_split() from generating messages like:

Note: Using an external vector in selections is ambiguous. i Use `all_of(strata)` instead of `strata` to silence this message. i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

  • Added better printing methods for initial split objects.

- R
Published by juliasilge about 4 years ago

rsample - rsample 0.1.1

  • Updated documentation on stratified sampling (#245).

  • Changed make_splits() to an S3 generic, with the original functionality a method for list and a new method for dataframes that allows users to create a split from existing analysis & assessment sets (@LiamBlake, #246).

  • Added validation_time_split() for a single validation sample taking the first samples for training (@mine-cetinkaya-rundel, #256).

  • Escalated the deprecation of the gather() method for rset objects to a hard deprecation. Use tidyr::pivot_longer() instead (#257).

  • Changed resample "fingerprint" to hash the indices only rather than the entire resample result (including the data object). This is much faster and will still ensure the same resample for the same original data object (#259).

- R
Published by juliasilge over 4 years ago

rsample - rsample 0.1.0

  • Fixed how mc_cv(), initial_split(), and validation_split() use the prop argument to first compute the assessment indices, rather than the analysis indices. This is a minor but breaking change in some situations; the previous implementation could cause an inconsistency in the sizes of the generated analysis and assessment sets when compared to how prop is documented to function (#217, @issactoast).

  • Fixed problem with creation of apparent() (#223) and caret2rsample() (#232) resamples.

  • Re-licensed package from GPL-2 to MIT. See consent from copyright holders here.

  • Attempts to stratify on a Surv object now error more informatively (#230).

  • Exposed pool argument from make_strata() in user-facing resampling functions (#229).

  • Deprecated the gather() method for rset objects in favor of tidyr::pivot_longer() (#233).

  • Fixed bug in make_strata() for numeric variables with NA values (@brian-j-smith, #236).

- R
Published by juliasilge about 5 years ago

rsample - rsample 0.0.9

  • New rset_reconstruct(), a developer tool to ease creation of new rset subclasses (#210).

  • Added permutations(), a function for creating permutation resamples by performing column-wise shuffling (@mattwarkentin, #198).

  • Fixed an issue where empty assessment sets couldn't be created by make_splits() (#188).

  • rset objects now contain a "fingerprint" attribute that can be used to check to see if the same object uses the same resamples.

  • The reg_intervals() function is a convenience function for lm(), glm(), survreg(), and coxph() models (#206).

  • A few internal functions were exported so that rsample-adjacent packages can use the same underlying code.

  • The obj_sum() method for rsplit objects was updated (#215).

  • Changed the inheritance structure for rsplit objects from specific to general and simplified the methods for the complement() generic (#216).

- R
Published by juliasilge over 5 years ago

rsample - rsample 0.0.8

  • New manual_rset() for constructing rset objects manually from custom rsplits (tidymodels/tune#273).

  • Three new time based resampling functions have been added: sliding_window(), sliding_index(), and sliding_period(), which have more flexibility than the pre-existing rolling_origin().

  • Correct alpha parameter handling for bootstrap CI functions (#179, #184).

- R
Published by topepo almost 6 years ago

rsample - rsample 0.0.7

  • Lower threshold for pooling strata to 10% (from 15%) (#149).

  • The print() methods for rsplit and val_split objects were adjusted to show "<Analysis/Assess/Total>" and <Training/Validation/Total>, respectively.

  • The drinks, attrition, and two_class_dat data sets were removed. They are in the modeldata package.

  • Compatability with dplyr 1.0.0.

- R
Published by topepo about 6 years ago

rsample - rsample 0.0.6

  • Added validation_set() for making a single resample.

  • Correct the tidy method for bootstraps (#115).

  • Changes for upcoming `tibble release.

  • Exported constructors for rset and split objects (#40)

  • initial_time_split() and rolling_origin() now have a lag parameter that ensures that previous data are available so that lagged variables can be calculated. (#135, #136)

- R
Published by topepo about 6 years ago

rsample - 0.0.5 CRAN . release

- R
Published by topepo almost 7 years ago