Recent Releases of The drake R package
The drake R package - igraph patch
Version 7.13.11
- Compatibility with
igraph>= 2.1.2.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 1 year ago
The drake R package - CRAN patch
Version 7.13.10
- Remove environment locking, c.f. https://github.com/r-lib/rlang/issues/1705.
- Export S3 methods.
- Avoid
memo_expr()because it causes errors on R-devel.
Scientific Software - Peer-reviewed
- R
Published by wlandau about 2 years ago
The drake R package - Avoid `is.R()`
Version 7.13.9
- Avoid
is.R().
Scientific Software - Peer-reviewed
- R
Published by wlandau over 2 years ago
The drake R package - clustermq CRAN check note
Version 7.13.8
- Fix CRAN note.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 2 years ago
The drake R package - CRAN patch
Version 7.13.7
- Fix test for CRAN.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 2 years ago
The drake R package - clustermq 0.9.1 combat
Version 7.13.6
- Migrate to the new interface in
clustermq0.9.0 (@mschubert).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 2 years ago
The drake R package - CRAN patch
Version 7.13.5
- Always pass a character vector to
rm()andremove().
Scientific Software - Peer-reviewed
- R
Published by wlandau about 3 years ago
The drake R package - Fix HTML docs
Version 7.13.4
- Fix HTML documentation files.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 4 years ago
The drake R package - rlang compat
Version 7.13.3
- Improve error messages from static code analysis of malformed code (#1371, @billdenney).
- Handle invalid language objects in commands (#1372, @gorgitko).
- Do not lock namespaces (#1373, @gorgitko).
- Compatibility with
rlangPR 1255.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 4 years ago
The drake R package - Document succession to targets.
Version 7.13.2
- Update SLURM
batchtoolstemplate file can be brewed (#1359, @pat-s). - Change start-up message to tip about
targets.
Scientific Software - Peer-reviewed
- R
Published by wlandau about 5 years ago
The drake R package - Add NOTICE and inst/NOTICE
- Add files
NOTICEandinst/NOTICEto more explicitly credit code included from other open source projects. (Previouslydrakeonly took a decentralized approach: comments in the source to indicate the project, package, and GitHub link (if available) to any code borrowed.)
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 5 years ago
The drake R package - Level separation
Version 7.13.0
Bug fixes
- Avoid checking printed output in test of testing infrastructure.
- Use
dsl_sym()instead ofas.symbol()when constructing commands forcombine()(#1340, @vkehayas).
New features
- Add a new
level_separationargument tovis_drake_graph()andrender_drake_graph()to control the aspect ratio ofvisNetworkgraphs (#1303, @matthewstrasiotto, @matthiasgomolka, @robitalec).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 5 years ago
The drake R package - Nomenclature
Version 7.12.7
- Deprecate
caching = "master"in favor ofcaching = "main". - Improve the error message when a valid plan is not supplied (#1334, @robitalec).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 5 years ago
The drake R package - Patch
Version 7.12.6
Bug fixes
- Fix defunct functions error message when using namespace (#1310, @malcolmbarrett).
- Preserve names of list elements in
.datain DSL (#1323, @shirdekel). - Use
identical()to compare file hashes (#1324, @shirdekel). - Set
seed = TRUEinfuture::future(). - Manually relay warnings when
parallelism = "clustermq"andcaching = "worker"(@richardbayes).
Enhancements
- Make logs more machine-readable by sanitizing messages and preventing race conditions (#1331, @Plebejer).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 5 years ago
The drake R package - Safeguards and worker logs
Bug fixes
- Sanitize empty symbols in language columns (#1299, @odaniel1).
- Handle cases where
NROW()throws an error (#1300,julian-tagellon Stack Overflow). - Prohibit dynamic branching over non-branching dynamic files (#1302, @djbirke).
Enhancements
- Transition to updated
lifecyclethat does not require badges to be inman/figures. - Improve error message for empty dynamic grouping variables (#1308, @saadaslam).
- Expose the
log_workerargument ofclustermq::workers()tomake()anddrake_config()(#1305, @billdenney, @mschubert). - Set
as.istoTRUEinutils::type.convert()(#1309, @bbolker).
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 6 years ago
The drake R package - CRAN hotfix
Fixes a documentation-related warning in the last submission (7.12.3).
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly almost 6 years ago
The drake R package - Lockfile bugfix and better-looking graphs
Version 7.12.3
Bug fixes
cached_planned()andcached_unplanned()now work with non-standard cache locations (#1268, @Plebejer).- Set
use_cachetoFALSEmore often (#1257, @Plebejer). - Use namespaced function calls in mtcars example instead of loading packages.
- Replace the
irisdataset with theairqualitydataset in all documentation, examples, and tests (#1271). - Assign functions created with
code_to_function()to the proper environment (#1275, @robitalec). - Store tracebacks as character vectors and restrict the contents of error objects to try to prevent accidental storage of large data from the environment (#1276, @billdenney).
- Strongly depend on
tidyselect(#1274, @dernst). - Avoid
txtqlockfiles (#1232, #1239, #1280, @danwwilson, @pydupont, @mattwarkentin).
New features
- Add a new
drake_script()function to write_drake.Rfiles forr_make()(#1282).
Enhancements
- Deprecate
expose_imports()in favor ofmake(envir = getNamespace("yourPackage")(#1286, @mvarewyck). - Suppress the message recommending
r_make()ifgetOption("drake_r_make_message")isFALSE(#1238, @januz). - Improve the appearance of the
visNetworkgraph by using the hierarchical layout withvisEdges(smooth = list(type = "cubicBezier", forceDirection = TRUE))(#1289, @mstr3336).
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 6 years ago
The drake R package - Important reproducibility bugfix for dynamic branching
Bug fixes
- Invalidate old sub-targets when finalizing a dynamic target (@richardbayes). Solves a major reproducibility bug (#1260).
- Prevent
splice_inner()from dropping formal arguments shared byc()(#1262, @bart1).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 6 years ago
The drake R package - Fully custom target names in static branching
Version 7.12.1
Bug fixes
- Repair
subtarget_hashes.cross()for crosses on a single grouping variable. - Repair dynamic
group()used with specialized formats (#1236, @adamaltmejd). - Enforce
tidyselect>= 1.0.0.
New features
- Allow user-defined target names in static branching with the
.namesargument (#1240, @maciejmotyka, @januz).
Enhancements
- Do not analyze dependencies of calls to
drake_plan()(#1237, @januz). - Error message for locked cache gives paste-able error message in Windows (#1243, @billdenney).
- Prevent stack traces from accidentally storing large amounts of data (#1253, @sclewis23).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 6 years ago
The drake R package - Mostly bug fixes and guard rails
Version 7.12.0
Bug fixes
- Ensure up-to-date sub-targets are skipped even if the dynamic parent does not get a chance to finalize (#1209, #1211, @psadil, @kendonB).
- Restrict static transforms so they only use the upstream part of the plan (#1199, #1200, @bart1).
- Correctly match the names and values of dynamic
cross()sub-targets (#1204, @psadil). Expansion order is the same, but names are correctly matched now. - Stop trying to remove
file_out()files inclean(), even whengarbage_collectionisTRUE(#521, @the-Hull). - Fix
keep_going = TRUEfor formatted targets (#1206). - Use the correct variable names in logger helper (
progress_barinstead ofprogress) so thatdrakeworks without theprogresspackage (#1208, @mbaccou). - Avoid conflict between formats and upstream dynamic targets (#1210, @psadil).
- Always compute trigger metadata up front because recovery keys need it.
- Deprecate and remove hasty mode and custom parallel backends (#1222).
- Compartmentalize fixed runtime parameters in
config$settings(#965).
New features
- Add new functions
drake_done()anddrake_cancelled()(#1205).
Speedups
- Avoid reading build times of dynamic sub-targets in
drake_graph_info()(#1207).
Enhancements
- Show an empty progress bar just before targets start to build when
verboseis2(#1203, @kendonB). - Deprecate the
jobsargument ofclean(). - Show an informative error message for empty dynamic grouping variables (#1212, @kendonB).
- Throw error messages if users supply dynamic targets to
drake_build()ordrake_debug()(#1214, @kendonB). - Log the sub-target name and index of the failing sub-target in the metadata of the sub-target and its parent (#1214, @kendonB).
- Shorten the call stack in error metadata.
- Deprecate and remove custom schedulers (#1222).
- Deprecate
hasty_build(#1222). - Migrate constant runtime parameters to
config$settings(#965). - Warn the user if
file_in()/file_out()/knitr_in()files are not literal strings (#1229). - Prohibit
file_out()andknitr_in()in imported functions (#1229). - Prohibit
knitr_in()in dynamic branching (#1229). - Improve the help file of
target(). - Deprecate and rename progress functions to avoid potential name conflicts (
progress()=>drake_progress(),running()=>drake_running(),failed()=>drake_failed()) (#1205).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 6 years ago
The drake R package - Dynamic files
Version 7.11.0
Bug fixes
- Sanitize internal S3 classes for target storage (#1159, @rsangole).
- Bump
digestversion to require 0.6.21 (#1166, @boshek) - Actually store output file sizes in metadata.
- Use the
dependtrigger to toggle invalidation from dynamic-only dependencies, including themax_expandargument ofmake(). - Repair
session_infoargument parsing (and reduce calls toutils::sessionInfo()in tests). - Ensure compatibility with
tibble3.0.0.
New features
- Allow dynamic files with
target(format = "file")(#1168, #1127). - Implement dynamic
max_expandon a target-by-target basis viatarget()(#1175, @kendonB).
Enhancements
- Assert dependencies of formats at the very beginning of
make(), not indrake_config()(#1156). - In
make(verbose = 2), remove the spinner and use a progress bar to track how many targets are done so far. - Reduce logging of utility functions.
- Improve the aesthetics of console messages using
cli(optional package). - Deprecate
console_log_filein favor oflog_makeas an argument tomake()anddrake_config(). - Immediately relay warnings and messages in
"loop"and"future"parallel backends (#400). - Warn when converting trailing dots (#1147).
- Warn about imports with trailing dots on Windows (#1147).
- Allow user-defined caches for the
loadd()RStudio addin through the newrstudio_drake_cacheglobal option (#1169, @joelnitta). - Change dynamic target finalization message to "finalize" instead of "aggregate" (#1176, @kendonB).
- Describe the limits of
recoverable(), e.g. dynamic branching + dynamic files. - Throw an error instead of a warning in
drake_plan()if a grouping variable is undefined or invalid (#1182, @kendonB). - Rigorous S3 framework for static code analysis objects of type
drake_depsanddrake_deps_ht(#1183). - Use
rlang::trace_back()to makediagnose()$error$callsnicer (#1198).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 6 years ago
The drake R package - Down with drake_config()!
Version 7.10.0
Unavoidable but minor breaking changes
These changes invalidate some targets in some workflows, but they are necessary bug fixes.
- Remove spurious local variables detected in
$<-()and@<-()(#1144). - Avoid target names with trailing dots (#1147, @Plebejer).
Bug fixes
- Handle unequal list columns in
bind_plans()(#1136, @jennysjaarda). - Handle non-vector sub-targets in dynamic branching (#1138).
- Handle calls in
analyze_assign()(#1119, @jennysjaarda). - Restore correct environment locking (#1143, @kuriwaki).
- Log
"running"progress of dynamic targets. - Log dynamic targets as failed if a sub-target fails (#1158).
New features
- Add a new
"fst_tbl"format for largetibbletargets (#1154, @kendonB). - Add a new
formatargument tomake(), an optional custom storage format for targets without an explicittarget(format = ...)in the plan (#1124). - Add a new
lock_cacheargument tomake()to optionally suppress cache locking (#1129). (It can be annoying to interruptmake()repeatedly and unlock the cache manually every time.) - Add new functions
cancel()andcancel_if()function to cancel targets mid-build (#1131). - Add a new
subtarget_listargument toloadd()andreadd()to optionally load a dynamic target as a list of sub-targets (#1139, @MilesMcBain). - Prohibit dynamic
file_out()(#1141).
Enhancements
- Check for illegal formats early on at the
drake_config()level (#1156, @MilesMcBain). - Smoothly deprecate the
configargument in all user-side functions (#1118, @vkehayas). Users can now supply the plan and othermake()arguments directly, without bothering withdrake_config(). Now, you only need to calldrake_config()in the_drake.Rfile forr_make()and friends. Old code withconfigobjects should still work. Affected functions:make()outdated()drake_build()drake_debug()recoverable()missed()deps_target()deps_profile()drake_graph_info()vis_drake_graph()sankey_drake_graph()drake_graph()text_drake_graph()predict_runtime(). Needed to rename thetargetsargument totargets_predictandjobstojobs_predict.predict_workers(). Same argument name changes aspredict_runtime().
- Because of #1118, the only remaining user-side purpose of
drake_config()is to serve functionsr_make()and friends. - Document the limitations of grouping variables (#1128).
- Handle the
@operator. For example, in the static code analysis ofx@y, do not registeryas a dependency (#1130, @famuvie). - Remove superfluous/incorrect information about imports from the output of
deps_profile()(#1134, @kendonB). - Append hashes to
deps_target()output (#1134, @kendonB). - Add S3 class and pretty print method for
drake_meta_()objects objects. - Use call stacks instead of environment inheritance to power
drake_envir()andid_chr()(#1132). - Allow
drake_envir()to select the environment with imports (#882). - Improve visualization labels for dynamic targets: clarify that the listed runtime is a total runtime over all sub-targets and list the number of sub-targets.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 6 years ago
The drake R package - Speedups and better dynamic branching
Version 7.9.0
Breaking changes in dynamic branching
- Embrace the
vctrsparadigm and its type stability for dynamic branching (#1105, #1106). - Accept
targetas a symbol by default inread_trace(). Required for the trace to make sense in #1107.
Bug fixes
- Repair reference to custom HPC resources in the
"future"backend (#1083, @jennysjaarda). - Properly copy data when importing targets from one cache into another (#1120, @brendanf).
- Prevent dynamic vector sizes from conflicting with file sizes in metadata.
New features
- Add a new
log_build_timesargument tomake()anddrake_config(). Allows users to disable the recording of build times. Produces a speedup of up to 20% on Macs (#1078). - Implement cache locking to prohibit concurrent calls to
make(),outdated(make_imports = TRUE),recoverable(make_imports = TRUE),vis_drake_graph(make_imports = TRUE),clean(), etc. on the same cache. - Add a new
formattrigger to invalidate targets when the specialized data format changes (#1104, @kendonB). - Add new functions
cache_planned()andcache_unplanned()to help selectively clean workflows with dynamic targets (#1110, @kendonB). - Add S3 classes and pretty print methods for
drake_config()objects andanalyze_code()objects. - Add a new
"qs"format (#1121, @kendonB).
Speedups
- Avoid setting seeds for imports (#1086, @adamkski).
- Avoid working directly with POSIXct times (#1086, @adamkski)
- Avoid excessive calls to
%||%(%|||%is faster). (#1089, @billdenney) - Remove
%||NAdue to slowness (#1089, @billdenney). - Use hash tables to speed up
is_dynamic()andis_subtarget()(#1089, @billdenney). - Use
getVDigest()instead ofdigest()(#1089, #1092, https://github.com/eddelbuettel/digest/issues/139#issuecomment-561870289, @eddelbuettel, @billdenney). - Pre-compute
backtickand.deparseOpts()to speed updeparse()(#1086,https://stackoverflow.com/users/516548/g-grothendieck, @adamkski). - Pre-compute which targets exist in advance (#1095).
- Avoid gratuitous cache interactions and data frame operations in
build_times()(#1098). - Use
mget_hash()inprogress()(#1098). - Get target progress info only once in
drake_graph_info()(#1098). - Speed up the retrieval of old metadata in
outdated()(#1098). - In
make(), avoid checking for nonexistent metadata for missing targets. - Reduce logging in
drake_config().
Enhancements
- Write a complete project structure in
use_drake()(#1097, @lorenzwalthert, @tjmahr). - Add a minor logger note to say how many dynamic sub-targets are registered at a time (#1102, @kendonB).
- Handle dependencies that are dynamic targets but not declared as such for the current target (#1107).
- Internally, the "layout" data structure is now called the "workflow specification", or "spec" for short. The spec is
drake's interpretation of the plan. In the plan, all the dependency relationships among targets and files are implicit. In the spec, they are all explicit. We get from the plan to the spec using static code analysis, e.g.analyze_code().
Scientific Software - Peer-reviewed
- R
Published by wlandau over 6 years ago
The drake R package - Dynamic branching
Version 7.8.0
Bug fixes
- Prevent
drake::drake_plan(x = target(...))from throwing an error ifdrakeis not loaded (#1039, @mstr3336). - Move the
transformationslifecycle badge to the proper location in the docstring (#1040, @jeroen). - Prevent
readd()/loadd()from turning an imported function into a target (#1067). - Align in-memory
disk.frametargets with their stored values (#1077, @brendanf).
New features
- Implement dynamic branching (#685).
- Add a new
subtargets()function to get the cached names of the sub-targets of a dynamic target. - Add new
subtargetsarguments toloadd()andreadd()to retrieve specific sub-targets from a parent dynamic target. - Add new
get_trace()andread_trace()functions to help track which values of grouping variables go into the making of dynamic sub-targets. - Add a new
id_chr()function to get the name of the target whilemake()is running. - Implement
plot(plan)(#1036). vis_drake_graph(),drake_graph_info(), andrender_drake_graph()now take arguments that allow behavior to be defined upon selection of nodes. (#1031, @mstr3336).- Add a new
max_expandargument tomake()anddrake_config()to scale down dynamic branching (#1050, @hansvancalster).
Enhancements
- Document transformation functions in a way that avoids having to create true functions (#979).
- Avoid always invalidating the memoized layout when we set the knitr hash.
- Change the names of environments in
drake_config()objects. - Assert that
preworkis a language object, list of language objects, or character vector (#1 at pat-s/multicore-debugging on GitHub, @pat-s). - Use an environment instead of a list for
config$layout. Supports internal modifications by reference. Required for #685. - Clean up the code of the parallel backends.
- Make
dynamica formal argument oftarget(). - Always lock/unlock the environment target by target, allowing informative error messages to appear more readily (#1062, @PedramNavid)
- Automatically ignore
storrs and decoratedstorrs (#1071). - Speed up memory management by avoiding a call to
setdiff()and avoidingnames(config$envir_targets).
Scientific Software - Peer-reviewed
- R
Published by wlandau over 6 years ago
The drake R package - disk.frame and code_to_function()
Version 7.7.0
Bug fixes
- Take the sum instead of the max in
dir_size(). Incurs rehashing for some workflows, but should not invalidate any targets.
New features
- Add a new
which_clean()function to preview which targets will be invalidated byclean()(#1014, @pat-s). - Add serious import and export methods for the decorated
storr(#1015, @billdenney, @noamross). - Add a new
"diskframe"format for larger-than-memory data (#1004, @xiaodaigh). - Add a new
drake_tempfile()function to help with"diskframe"format. It makes sure we are not copying large datasets across different physical storage media (#1004, @xiaodaigh). - Add new function
code_to_function()to allow for parsing script based workflows into functions sodrake_plan()can begin to manage the workflow and track dependencies. (#994, @thebioengineer)
Scientific Software - Peer-reviewed
- R
Published by wlandau over 6 years ago
The drake R package - Continuing with efficient data formats
Version 7.6.2
Bug fixes
- Remove README.md from CRAN altogether. Also remove all links from the news and vignette. The links trigger too many CRAN notes, which made the automated checks too brittle.
- Serialize formats that need serialization (like "keras") before sending the data from HPC workers to the master process (#989).
- Check for custom-formatted files when checking checksums.
- Force fst-formatted targets to plain data frames. Same goes for the new "fst_dt" format.
- Change the meaning and behavior of
max_expandindrake_plan().max_expandis now the maximum number of targets produced bymap(),split(), andcross(). Forcross(), this reduces the number of targets (less cumbersome) and makes the subsample of targets more representative of the complete grid. It also. ensures consistent target naming when.idisFALSE(#1002). Note:max_expandis not for production workflows anyway, so this change does not break anything important. Unfortunately, we do lose the speed boost indrake_plan()originally due tomax_expand, butdrake_plan()is still fast, so that is not so bad. - Drop specialized formats of
NULLtargets (#998). - Prevent false grouping variables from partially tagging along in
cross()(#1009). The same fix should apply tomap()andsplit()too. - Respect graph topology when recovering old grouping variables for
map()(#1010).
New features
- Add a new "fst_dt" format for
fst-powered saving ofdata.tableobjects. - Support a custom "caching" column of the plan to select master vs worker caching for each target individually (#988).
- Make
transforma formal argument oftarget()so that users do not have to type "transform =" all the time indrake_plan()(#993). - Migrate the documentation website from
ropensci.github.io/draketodocs.ropensci.org/drake.
Enhancements
- Document the HPC limitations of
target(format = "keras")(#989). - Remove the now-superfluous vignette.
- Wrap up console and text file logging functionality into a reference class (#964).
- Deprecate the
verboseargument in various caching functions. The location of the cache is now only printed inmake(). This made the previous feature easier to implement. - Carry forward nested grouping variables in
combine()(#1008). - Improve the encapsulation of hash tables in the decorated
storr(#968).
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 7 years ago
The drake R package - CRAN hotfix
Fix broken README links.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 7 years ago
The drake R package - Big data formats
Version 7.6.0
New features
- Support specialized data storage via a decorated cache and
formatargument oftarget()(#971). This allows users to leverage faster ways to save and load targets, such aswrite_fst()for data frames andsave_model_hdf5()for Keras models. It also improves memory because it preventsstorrfrom making a serialized in-memory copy of large data objects. - Add
tidyselectfunctionality for...inprogress(), analogous toloadd(),build_times(), andclean(). - Support S3 for user-defined generics (#959). If the generic
do_stuff()and the methodstuff.your_class()are defined inenvir, and ifdo_stuff()has a call toUseMethod("stuff"), thendrake's code analysis will detectstuff.your_class()as a dependency ofdo_stuff(). - Add authentication support for
file_in()URLs. Requires the newcurl_handlesargument ofmake()anddrake_config()(#981).
Bug fixes
- Make
drake_plan(transform = slice())understand.idand grouping variables (#963). - Repair
clean(garbage_collection = TRUE, destroy = TRUE). Previously it destroyed the cache before trying to collect garbage. - Ensure that
r_make()passes informative error messages back to the calling process (#969). - Avoid downloading full contents of URLs when rehashing (#982)
- Retain upstream grouping variables of
map()andcross()on topologically side-by-side targets (#983). - Manually enforce the correct ordering in
dsl_left_outer_join()socross()selects the right combinations of existing targets (#986). This bug was probably introduced in the solution to #983. - Make the output of
progress()more consistent, less dependent on whethertidyselectis installed.
Enhancements
- Document DSL keywords as if they were true functions:
target(),map(),split(),cross(), andcombine()(#979). - Do garbage collection between the unloading and loading phases of memory management.
- Keep
file_out()files inclean()unlessgarbage_collectionisTRUE. That way,make(recover = TRUE)is a true "undo button" forclean().clean(garbage_collection = TRUE)still removes data in the cache, as well as anyfile_out()files from targets currently being cleaned. - The menu in
clean()only appears ifgarbage_collectionisTRUE. Also, this menu is added torescue_cache(garbage_collection = TRUE). - Reorganize the internal code files and functions to make development easier.
- Move the history inside the cache folder
.drake/. The old.drake_history/folder was awkward. Old histories are migrated duringdrake_config(), anddrake_history(). - Add lifecycle badges to exported functions.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 7 years ago
The drake R package - CRAN hotfix
- Eliminate accidental creations of
.drake_history/inplan_to_code(),plan_to_notebook(), and the help file examples. Should fix the note at https://win-builder.r-project.org/incomingpretest/drake7.5.120190721153755/Debian/00check.log. - Repair long examples.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 7 years ago
The drake R package - History, provenance, and recovery
Version 7.5.0
New features
- Add automated data recovery (#945). This is still experimental and disabled by default. Requires
make(recover = TRUE). - Add new functions
recoverable()andr_recoverable()to show targets that are outdated but recoverable viamake(recover = TRUE). - Track the history and provenance of targets, viewable with
drake_history(). Powered bytxtq(#918, #920). - Add a new
no_deps()function, similar toignore().no_deps()suppresses dependency detection but still tracks changes to the literal code (#910). - Add a new "autoclean" memory strategy (#917).
- Export
transform_plan(). - Allow a custom
seedcolumn ofdrakeplans to set custom seeds (#947). - Add a new
seedtrigger to optionally ignore changes to the target seed (#947).
Enhancements
- In
drake_plan(), interpret custom columns as non-language objects (#942). - Suggest and assert
clustermq>= 0.8.8. - Log the target name in a special column in the console log file (#909).
- Rename the "memory" memory strategy to "preclean" (with deprecation; #917).
- Deprecate
ensure_workersindrake_config()andmake(). - Warn when the user supplies additional arguments to
make()afterconfigis already supplied. - Prevent users from running
make()from inside the cache (#927). - Add
CITATIONfile with JOSS paper. - In
deps_profile(), include the seed and change the names. - Allow the user to set a different seed in
make(). All this does is invalidate old targets. - Use
set_hash()andget_hash()instorrto double the speed of progress tracking.
Bug fixes
- In the static code analysis for dependency detection, ignore list elements referenced with
$(#938). - Minor: handle strings embedded in language objects (#934).
- Minor: supply
xxhash64as the default hash algorithm for non-storrhashing if the driver does not have a hash algorithm.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 7 years ago
The drake R package - Data splitting, and URL tracking, and advanced memory management
Version 7.4.0
Mildly breaking changes
These changes are technically breaking changes, but they should only affect advanced users.
rescue_cache()no longer returns a value.
Bug fixes
- Restore compatibility with
clustermq(#898). Suggest version >= 0.8.8 but allow 0.8.7 as well. - Ensure
drakerecomputesconfig$layoutwhenknitrreports change (#887). - Do not rehash large imported files every
make()(#878). - Repair parsing of long tidy eval inputs in the DSL (#878).
- Clear up cache confusion when a custom cache exists adjacent to the default cache (#883).
- Accept targets as symbols in
r_drake_build(). - Log progress during
r_make()(#889). - Repair
expose_imports(): do not do theenvironment<-trick unless the object is a non-primitive function. - Use different static analyses of
assign()vsdelayedAssign(). - Fix a superfluous code analysis warning incurred by multiple
file_in()files and other strings (#896). - Make
ignore()work insideloadd(),readd(),file_in(),file_out(), andknitr_in().
New features
- Add experimental support for URLs in
file_in()andfile_out().drakenow treatsfile_in()/file_out()files as URLS if they begin with "http://", "https://", or "ftp://". The fingerprint is a concatenation of the ETag and last-modified timestamp. If neither can be found or if there is no internet connection,drakethrows an error. - Implement new memory management strategies
"unload"and"none", which do not attempt to load a target's dependencies from memory (#897). - Allow users to give each target its own memory strategy (#897).
- Add
drake_slice()to help split data across multiple targets. Related: #77, #685, #833. - Introduce a new
drake_cache()function, which is now recommended instead ofget_cache()(#883). - Introduce a new
r_deps_target()function. - Add RStudio addins for
r_make(),r_vis_drake_graph(), andr_outdated()(#892).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 7 years ago
The drake R package - max_expand and text_drake_graph()
Version 7.3.0
Bug fixes
- Accommodate
rlang's new interpolation operator{{, which was causingmake()to fail whendrake_plan()commands are enclosed in curly braces (#864). - Move "
config$lock_envir <- FALSE" fromloop_build()tobackend_loop(). This makes sureconfig$enviris correctly locked inmake(parallelism = "clustermq"). - Convert factors to characters in the optional
.dataargument ofmap()andcross()in the DSL. - In the DSL of
drake_plan(), repaircross(.data = !!args), whereargsis an optional data frame of grouping variables. - Handle trailing slashes in
file_in()/file_out()directories for Windows (#855). - Make
.id_chrwork withcombine()in the DSL (#867). - Do not try
make_spinner()unless the version ofcliis at least 1.1.0.
New features
- Add functions
text_drake_graph()(andr_text_drake_graph()andrender_text_drake_graph()). Uses text art to print a dependency graph to the terminal window. Handy for when users SSH into remote machines without X Window support. - Add a new
max_expandargument todrake_plan(), an optional upper bound on the lengths of grouping variables formap()andcross()in the DSL. Comes in handy when you have a massive number of targets and you want to test on a miniature version of your workflow before you scale up to production.
Enhancements
- Delay the initialization of
clustermqworkers for as long as possible. Before launching them, build/check targets locally until we reach an outdated target withhpcequal toFALSE. In other words, if no targets actually requireclustermqworkers, no workers get created. - In
make(parallelism = "future"), reset theconfig$sleep()backoff interval whenever a new target gets checked. - Add a "done" message to the console log file when the workflow has completed.
- Replace
CodeDependswith a base R solution incode_to_plan(). Fixes a CRAN note. - The DSL (transformations in
drake_plan()) is no longer experimental. - The
callrAPI (r_make()and friends) is no longer experimental. - Deprecate the wildcard/text-based functions for creating plans:
evaluate_plan(),expand_plan(),map_plan(),gather_plan(),gather_by(),reduce_plan(),reduce_by(). - Change some deprecated functions to defunct:
deps(),max_useful_jobs(), andmigrate_drake_project().
Scientific Software - Peer-reviewed
- R
Published by wlandau about 7 years ago
The drake R package - Improved visuals
Version 7.2.0
drake version 7.2.0 is being released early in order to ensure compatibility with development testthat, re https://github.com/ropensci/drake/issues/849.
Mildly breaking changes
- In the DSL (e.g.
drake_plan(x = target(..., transform = map(...)))avoid inserting extra dots in target names when the grouping variables are character vectors (#847). Target names come out much nicer this way, but those name changes will invalidate some targets (i.e. they need to be rebuilt withmake()).
Bug fixes
- Use
config$jobs_preprocess(local jobs) in several places wheredrakewas incorrectly usingconfig$jobs(meant for targets). - Allow
loadd(x, deps = TRUE, config = your_config)to work even ifxis not cached (#830). Required disablingtidyselectfunctionality whendepsTRUE. There is a new note in the help file about this, and an informative console message prints out onloadd(deps = TRUE, tidyselect = TRUE). The default value oftidyselectis now!deps. - Minor: avoid printing messages and warnings twice to the console (#829).
- Ensure compatibility with
testthat>= 2.0.1.9000.
New features
- In
drake_plan()transformations, allow the user to refer to a target's own name using a special.id_chrsymbol, which is treated like a character string. - Add a
transparencyargument todrake_ggraph()andrender_drake_ggraph()to disable transparency in the rendered graph. Useful for R installations without transparency support.
Enhancements
- Use a custom layout to improve node positions and aspect ratios of
vis_drake_graph()anddrake_ggraph()displays. Only activated invis_drake_graph()when there are at least 10 nodes distributed in both the vertical and horizontal directions. - Allow nodes to be dragged both vertically and horizontally in
vis_drake_graph()andrender_drake_graph(). - Prevent dots from showing up in target names when you supply grouping variables to transforms in
drake_plan()(#847). - Do not keep
drakeplans (drake_plan()) insidedrake_config()objects. When other bottlenecks are removed, this will reduce the burden on memory (re #800). - Do not retain the
targetsargument insidedrake_config()objects. This is to reduce memory consumption. - Deprecate the
layoutanddirectionarguments ofvis_drake_graph()andrender_drake_graph(). Direction is now always left to right and the layout is always Sugiyama. - Write the cache log file in CSV format (now
drake_cache.csvby default) to avoid issues with spaces (e.g. entry names with spaces in them, such as "file report.Rmd")`.
Scientific Software - Peer-reviewed
- R
Published by wlandau about 7 years ago
The drake R package - Maintenance release
Version 7.1.0
Bug fixes
- In
drake7.0.0, if you runmake()in interactive mode and respond to the menu prompt with an option other than1or2, targets will still build. - Make sure file outputs show up in
drake_graph(). The bug came fromappend_output_file_nodes(), a utility function ofdrake_graph_info(). - Repair
r_make(r_fn = callr::r_bg())re https://github.com/ropensci/drake/issues/799. - Allow
drake_ggraph()andsankey_drake_graph()to work when the graph has no edges.
New features
- Add a new
use_drake()function to write themake.Rand_drake.Rfiles from the main example. Does not write other supporting scripts. - With an optional logical
hpccolumn in yourdrake_plan(), you can now select which targets to deploy to HPC and which to run locally. - Add a
listargument tobuild_times(), just likeloadd(). - Add a new RStudio addin: 'loadd target at cursor' which can be bound a keyboard shortcut to load the target identified by the symbol at the cursor position to the global environment.
Enhancements
file_in()andfile_out()can now handle entire directories, e.g.file_in("your_folder_of_input_data_files")andfile_out("directory_with_a_bunch_of_output_files").- Send less data from
configto HPC workers. - Improve
drake_ggraph()- Hide node labels by default and render the arrows behind the nodes.
- Print an informative error message when the user supplies a
drakeplan to theconfigargument of a function. - By default, use gray arrows and a black-and-white background with no gridlines.
- For the
map()andcross()transformations in the DSL, prevent the accidental sorting of targets by name. Neededmerge(sort = FALSE)indsl_left_outer_join(). - Simplify verbosity. The
verboseargument ofmake()now takes values 0, 1, and 2, and maximum verbosity in the console prints targets, retries, failures, and a spinner. The console log file, on the other hand, dumps maximally verbose runtime info regardless of theverboseargument. - In previous versions, functions generated with
f <- Rcpp::cppFunction(...)did not stay up to date from session to session because the addresses corresponding to anonymous pointers were showing up indeparse(f). Now,drakeignores those pointers, andRcppfunctions compiled inline appear to stay up to date. This problem was more of an edge case than a bug. - Prepend time stamps with sub-second times to the lines of the console log file.
- In
drake_plan(), deprecate thetidy_evaluationargument in favor of the new and more concisetidy_eval. To preserve back compatibility for now, if you supply a non-NULLvalue totidy_evaluation, it overwritestidy_eval. - Reduce the object size of
drake_config()objects by assigning closure ofconfig$sleeptobaseenv().
Scientific Software - Peer-reviewed
- R
Published by wlandau about 7 years ago
The drake R package - drake transformed
Version 7.0.0
Breaking changes
- The enhancements that increase cache access speed also invalidate targets in old projects. Workflows built with drake <= 6.2.1 will need to run from scratch again.
- In
drakeplans, thecommandandtriggercolumns are now lists of language objects instead of character vectors.make()and friends still work if you have character columns, but the default output ofdrake_plan()has changed to this new format. - All parallel backends (
parallelismargument ofmake()) except "clustermq" and "future" are removed. A new "loop" backend covers local serial execution. - A large amount of deprecated functionality is now defunct, including several functions (
built(),find_project(),imported(), andparallel_stages(); full list here) and the single-quoted file API. - Set the default value of
lock_envirtoTRUEinmake()anddrake_config(). Somake()will automatically quit in error if the act of building a target tries to change upstream dependencies. make()no longer returns a value. Users will need to calldrake_config()separately to get the old return value ofmake().- Require the
jobsargument to be of length 1 (make()anddrake_config()). To parallelize the imports and other preprocessing steps, usejobs_preprocess, also of length 1. - Get rid of the "kernels"
storrnamespace. As a result,drakeis faster, but users will no longer be able to load imported functions usingloadd()orreadd(). - In
target(), users must now explicitly name all the arguments exceptcommand, e.g.target(f(x), trigger = trigger(condition = TRUE))instead oftarget(f(x), trigger(condition = TRUE)). - Fail right away in
bind_plans()when the result has duplicated target names. This makesdrake's API more predictable and helps users catch malformed workflows earlier. loadd()only loads targets listed in the plan. It no longer loads imports or file hashes.- The return values of
progress(),deps_code(),deps_target(), andpredict_workers()are now data frames. - Change the default value of
hovertoFALSEin visualization functions. Improves speed.
Bug fixes
- Allow
bind_plans()to work with lists of plans (bind_plans(list(plan1, plan2))was returningNULLindrake6.2.0 and 6.2.1). - Ensure that
get_cache(path = "non/default/path", search = FALSE)looks for the cache in"non/default/path"instead ofgetwd(). - Remove strict dependencies on package
tibble. - Pass the correct data structure to
ensure_loaded()inmeta.Randtriggers.Rwhen ensuring the dependencies of theconditionandchangetriggers are loaded. - Require a
configargument todrake_build()andloadd(deps = TRUE).
New features
- Introduce a new experimental domain-specific language for generating large plans (#233). Details here.
- Implement a
lock_envirargument to safeguard reproducibility. See this thread for a demonstration of the problem solved bymake(lock_envir = TRUE). More discussion: #619, #620. - The new
from_plan()function allows the users to reference custom plan columns from within commands. Changes to values in these columns columns do not invalidate targets. - Add a menu prompt (https://github.com/ropensci/drake/pull/762) to safeguard against
make()pitfalls in interactive mode (https://github.com/ropensci/drake/issues/761). Appears once per session. Disable withoptions(drake_make_menu = FALSE). - Add new API functions
r_make(),r_outdated(), etc. to rundrakefunctions more reproducibly in a clean session. See the help file ofr_make()for details. progress()gains aprogressargument for filtering results. For example,progress(progress = "failed")will report targets that failed.
Enhancements
- Large speed boost: move away from
storr's key mangling in favor ofdrake's own encoding of file paths and namespaced functions forstorrkeys. - Exclude symbols
.,.., and.gitignorefrom being target names (consequence of the above). - Use only one hash algorithm per
drakecache, which the user can set with thehash_algorithmargument ofnew_cache(),storr::storr_rds(), and various other cache functions. Thus, the concepts of a "short hash algorithm" and "long hash algorithm" are deprecated, and the functionslong_hash(),short_hash(),default_long_hash_algo(),default_short_hash_algo(), andavailable_hash_algos()are deprecated. Caches are still back-compatible withdrake> 5.4.0 and <= 6.2.1. - Allow the
magrittrdot symbol to appear in some commands sometimes. - Deprecate the
fetch_cacheargument in all functions. - Remove packages
DBIandRSQLitefrom "Suggests". - Define a special
config$eval <- new.env(parent = config$envir)for storing built targets and evaluating commands in the plan. Now,make()no longer modifies the user's environment. This move is a long-overdue step toward purity. - Remove dependency on the
codetoolspackage. - Deprecate and remove the
sessionargument ofmake()anddrake_config(). Details: https://github.com/ropensci/drake/issues/623#issue-391894088. - Deprecate the
graphandlayoutarguments tomake()anddrake_config(). The change simplifies the internals, and memoization allows us to do this. - Warn the user if running
make()in a subdirectory of thedrakeproject root (determined by the location of the.drakefolder in relation to the working directory). - In the code analysis, explicitly prohibit targets from being dependencies of imported functions.
- Increase options for the
verboseargument, including the option to print execution and total build times. - Separate the building of targets from the processing of imports. Imports are processed with rudimentary staged parallelism (
mclapply()orparLapply(), depending on the operating system). - Ignore the imports when it comes to build times. Functions
build_times(),predict_runtime(), etc. focus on only the targets. - Deprecate many API functions, including
plan_analyses(),plan_summaries(),analysis_wildcard(),cache_namespaces(),cache_path(),check_plan(),dataset_wildcard(),drake_meta(),drake_palette(),drake_tip(),recover_cache(),cleaned_namespaces(),target_namespaces(),read_drake_config(),read_drake_graph(), andread_drake_plan(). - Deprecate
target()as a user-side function. From now on, it should only be called from withindrake_plan(). drake_envir()now throws an error, not a warning, if called in the incorrect context. Should be called only inside commands in the user'sdrakeplan.- Replace
*expr*()rlangfunctions with their*quo*()counterparts. We still keeprlang::expr()in the few places where we know the expressions need to be evaluated inconfig$eval. - The
preworkargument tomake()anddrake_config()can now be an expression (language object) or list of expressions. Character vectors are still acceptable. - At the end of
make(), print messages about triggers etc. only ifverbose >= 2L. - Deprecate and rename
in_progress()torunning(). - Deprecate and rename
knitr_deps()todeps_knitr(). - Deprecate and rename
dependency_profile()todeps_profile(). - Deprecate and rename
predict_load_balancing()topredict_workers(). - Deprecate
this_cache()and defer toget_cache()andstorr::storr_rds()for simplicity. - Change the default value of
hovertoFALSEin visualization functions. Improves speed. Also a breaking change. - Deprecate
drake_cache_log_file(). We recommend usingmake()with thecache_log_fileargument to create the cache log. This way ensures that the log is always up to date withmake()results.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 7 years ago
The drake R package - CRAN hotfix
Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0. Chiefly, in CRAN's Debian R-devel (2018-12-10) check platform, errors of the form "length > 1 in coercion to logical" occurred when either argument to && or || was not of length 1 (e.g. nzchar(letters) && length(letters)). In addition to fixing these errors, version 6.2.1 also removes a problematic link from the vignette.
For more information, please see the release notes of version 6.2.0.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 7 years ago
The drake R package - Faster, leaner, and compatible with tibble 2.0.0
New features
- Add a
separgument togather_by(),reduce_by(),reduce_plan(),evaluate_plan(),expand_plan(),plan_analyses(), andplan_summaries(). Allows the user to set the delimiter for generating new target names. - Expose a
hasty_buildargument tomake()anddrake_config(). Here, the user can set the function that builds targets in "hasty mode" (make(parallelism = "hasty")). - Add a new
drake_envir()function that returns the environment wheredrakebuilds targets. Can only be accessed from inside the commands in the workflow plan data frame. The primary use case is to allow users to remove individual targets from memory at predetermined build steps.
Bug fixes
- Ensure compatibility with
tibble2.0.0. - Stop returning
0sfrompredict_runtime(targets_only = TRUE)when some targets are outdated and others are not. - Remove
sort(NULL)warnings fromcreate_drake_layout(). (Affects R-3.3.x.)
Enhancements
- Large speed boost: reduce repeated calls to
parse()incode_dependencies(). - Large speed boost: change the default value of
memory_strategy(previouslypruning_strategy) to"speed"(previously"lookahead"). - Compute a special data structure in
drake_config()(config$layout) just to store the code analysis results. This is an intermediate structure between the workflow plan data frame and the graph. It will help clean up the internals in future development. - Improve memoized preprocessing: deparse all the functions in the environment so the memoization does not react so spurious changes in R internals. Related: #345.
- Use the
labelargument tofuture()insidemake(parallelism = "future"). That way , job names are target names by default ifjob.nameis used correctly in thebatchtoolstemplate file. - Remove strict dependencies on packages
dplyr,evaluate,fs,future,magrittr,parallel,R.utils,stats,stringi,tidyselect, andwithr. - Remove package
rprojrootfrom "Suggests". - Deprecate the
forceargument in all functions exceptmake()anddrake_config(). - Change the name of
prune_envir()tomanage_memory(). - Deprecate and rename the
pruning_strategyargument tomemory_strategy(make()anddrake_config()). - Print warnings and messages to the
console_log_filein real time (#588). - Use HTML line breaks in
vis_drake_graph()hover text to display commands in thedrakeplan more elegantly. - Speed up
predict_load_balancing()and remove its reliance on internals that will go away in 2019 via #561. - Remove support for the
workercolumn ofconfig$planinpredict_runtime()andpredict_load_balancing(). This functionality will go away in 2019 via #561. - Change the names of the return value of
predict_load_balancing()totimeandworkers. - Bring the documentation of
predict_runtime()andpredict_load_balancing()up to date. - Deprecate
drake_session()and rename todrake_get_session_info(). - Deprecate the
timeoutargument in the API ofmake()anddrake_config(). A value oftimeoutcan be still passed to these functions without error, but only theelapsedandcpuarguments impose actual timeouts now.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 7 years ago
The drake R package - map_plan() and other niceties
Version 6.1.0
New features
- Add a new
map_plan()function to easily create a workflow plan data frame to execute a function call over a grid of arguments. - Add a new
plan_to_code()function to turndrakeplans into generic R scripts. New users can use this function to better understand the relationship between plans and code, and unsatisfied customers can use it to disentangle their projects fromdrakealtogether. Similarly,plan_to_notebook()generates an R notebook from adrakeplan. - Add a new
drake_debug()function to run a target's command in debug mode. Analogous todrake_build(). - Add a
modeargument totrigger()to control how theconditiontrigger factors into the decision to build or skip a target. See the?triggerfor details. - Add a new
sleepargument tomake()anddrake_config()to help the master process consume fewer resources during parallel processing. - Enable the
cachingargument for the"clustermq"and"clustermq_staged"parallel backends. Now,make(parallelism = "clustermq", caching = "master")will do all the caching with the master process, andmake(parallelism = "clustermq", caching = "worker")will do all the caching with the workers. The same is true forparallelism = "clustermq_staged". - Add a new
appendargument togather_plan(),gather_by(),reduce_plan(), andreduce_by(). Theappendargument control whether the output includes the originalplanin addition to the newly generated rows. - Add new functions
load_main_example(),clean_main_example(), andclean_mtcars_example(). - Add a
filterargument togather_by()andreduce_by()in order to restrict what we gather even whenappendisTRUE. - Add a hasty mode:
make(parallelism = "hasty")skips all ofdrake's expensive caching and checking. All targets run every single time and you are responsible for saving results to custom output files, but almost all the by-target overhead is gone.
Bug fixes
- Ensure commands in the plan are re-analyzed for dependencies when new imports are added (https://github.com/ropensci/drake/issues/548). Was a bug in version 6.0.0 only.
- Call
path.expand()on thefileargument torender_drake_graph()andrender_sankey_drake_graph(). That way, tildes in file paths no longer interfere with the rendering of static image files. Compensates for https://github.com/wch/webshot. - Skip tests and examples if the required "Suggests" packages are not installed.
- Stop checking for non-standard columns. Previously, warnings about non-standard columns were incorrectly triggered by
evaluate_plan(trace = TRUE)followed byexpand_plan(),gather_plan(),reduce_plan(),gather_by(), orreduce_by(). The more relaxed behavior also gives users more options about how to construct and maintain their workflow plan data frames. - Use checksums in
"future"parallelism to make sure files travel over network file systems before proceeding to downstream targets. - Refactor and clean up checksum code.
- Skip more tests and checks if
visNetworkis not installed.
Enhancements
- Stop earlier in
make_targets()if all the targets are already up to date. - Improve the documentation of the
seedargument inmake()anddrake_config(). - Set the default
cachingargument ofmake()anddrake_config()to"master"rather than"worker". The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows. - Allow the
conditiontrigger to evaluate to non-logical values as long as those values can be coerced to logicals. - Require that the
conditiontrigger evaluate to a vector of length 1. - Keep non-standard columns in
drake_plan_source(). make(verbose = 4)now prints to the console when a target is stored.gather_by()andreduce_by()now gather/reduce everything if no columns are specified.- Change the default parallelization of the imports. Previously,
make(jobs = 4)was equivalent tomake(jobs = c(imports = 4, targets = 4)). Now,make(jobs = 4)is equivalent tomake(jobs = c(imports = 1, targets = 4)). See issue 553 for details. - Add a console message for building the priority queue when
verboseis at least 2. - Condense
load_mtcars_example(). - Deprecate the
hookargument ofmake()anddrake_config(). - In
gather_by()andreduce_by(), do not exclude targets with allNAgathring variables.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 7 years ago
The drake R package - Major release: proper clustermq support and reduced overhead in make()
Breaking changes
For the sake of reproducibility and speed, drake version 6.0.0 is more discerning in how it detects dependencies:
- Targets in the plan.
- Functions and objects in the environment.
- Objects and functions from packages that are explicitly namespaced with
::and:::.
In other words, there is a clearer line between what drake detects and what it does not. And it no longer dives into packages or parent environments automatically by default. The old approach
- Made workflows more brittle (likely to fall out of date).
- Was categorically inferior to
packratin terms of package reproducibility.
Unfortunately, the change also puts old workflows out of date. Sorry for the inconvenience.
Other breaking changes that put old projects out of date:
- Avoid serialization in
digest()wherever possible. This puts olddrakeprojects out of date, but it improves speed. - Require R version >= 3.3.0 rather than >= 3.2.0. Tests and checks still run fine on 3.3.0, but the required version of the
stringipackage no longer compiles on 3.2.0.
Bug fixes
- In the call to
unlink()inclean(), setrecursiveandforcetoFALSE. This should prevent the accidental deletion of whole directories. - Previously,
clean()deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release. loadd(not_a_target)no longer loads every target in the cache.- Exclude each target from its own dependency metadata in the "deps"
igraphvertex attribute (fixes https://github.com/ropensci/drake/issues/503). - Detect inline code dependencies in
knitr_in()file code chunks. - Remove more calls to
sort(NULL)that caused warnings in R 3.3.3. - Fix a bug on R 3.3.3 where
analyze_loadd()was sometimes quitting with "Error: attempt to set an attribute on NULL". - Do not call
digest::digest(file = TRUE)on directories. Instead, set hashes of directories toNA. Users should still not directories as file dependencies. - If files are declared as dependnecies of custom triggers ("condition" and "change") include them in
vis_drake_graph(). Previously, these files were missing from the visualization, but actual workflows worked just fine. Ref: https://stackoverflow.com/questions/52121537/trigger-notification-from-report-generation-in-r-drake-package - Work around mysterious
codetoolsfailures in R 3.3 (add atryCatch()statement infind_globals()).
New features
- Add a proper
clustermq-based parallel backend:make(parallelism = "clustermq"). evaluate_plan(trace = TRUE)now adds a*_fromcolumn to show the origins of the evaluated targets. Tryevaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE).- Add functions
gather_by()andreduce_by(), which gather on custom columns in the plan (or columns generated byevaluate_plan(trace = TRUE)) and append the new targets to the previous plan. - Expose the
templateargument ofclustermqfunctions (e.g.Q()andworkers()) as an argument ofmake()anddrake_config(). - Add a new
code_to_plan()function to turn R scripts and R Markdown reports into workflow plan data frames. - Add a new
drake_plan_source()function, which generates lines of code for adrake_plan()call. Thisdrake_plan()call produces the plan passed todrake_plan_source(). The main purpose is visual inspection (we even have syntax highlighting viaprettycode) but users may also save the output to a script file for the sake of reproducibility or simple reference. - Deprecate
deps_targets()in favor of a newdeps_target()function (singular) that behaves more likedeps_code().
Enhancements
- Smooth the edges in
vis_drake_graph()andrender_drake_graph(). - Make hover text slightly more readable in in
vis_drake_graph()andrender_drake_graph(). - Align hover text properly in
vis_drake_graph()using the "title" node column. - Optionally collapse nodes into clusters with
vis_drake_graph(collapse = TRUE). - Improve
dependency_profile()show major trigger hashes side-by-side to tell the user if the command, a dependency, an input file, or an ouptut file changed since the lastmake(). - Choose more appropriate places to check that the
txtqpackage is installed. - Improve the help files of
loadd()andreadd(), giving specific usage guidance in prose. - Memoize all the steps of
build_drake_graph()and print to the console the ones that execute. - Skip some tests if
txtqis not installed.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 7 years ago
The drake R package - Interim development release: proper clustermq support and memoized preprocessing
- Add a proper
clustermq-based parallel backend:make(parallelism = "clustermq"). - Smooth the edges in
vis_drake_graph()andrender_drake_graph(). - Make hover text slightly more readable in in
vis_drake_graph()andrender_drake_graph(). - Align hover text properly in
vis_drake_graph()using the "title" node column. - Optionally collapse nodes into clusters with
vis_drake_graph(collapse = TRUE). - Improve
dependency_profile()show major trigger hashes side-by-side to tell the user if the command, a dependency, an input file, or an ouptut file changed since the lastmake(). - Choose more appropriate places to check that the
txtqpackage is installed. - Expose the
templateargument ofclustermqfunctions (e.g.Q()andworkers()) as an argument ofmake()anddrake_config(). - Improve the help files of
loadd()andreadd(), giving specific usage guidance in prose. - Bugfix:
loadd(not_a_target)no longer loads every target in the cache. - Bugfix: exclude each target from its own dependency metadata in the "deps"
igraphvertex attribute (fixes https://github.com/ropensci/drake/issues/503). - Add a new
code_to_plan()function to turn R scripts and R Markdown reports into workflow plan data frames. - Add a new
drake_plan_source()function, which generates lines of code for adrake_plan()call. Thisdrake_plan()call produces the plan passed todrake_plan_source(). The main purpose is visual inspection (we even have syntax highlighting viaprettycode) but users may also save the output to a script file for the sake of reproducibility or simple reference. - Memoize all the steps of
build_drake_graph()and print to the console the ones that execute.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 8 years ago
The drake R package - Flexible triggers
- Overhaul the interface for triggers and add new trigger types ("condition" and "change").
- Offload
drake's code examples to this repository and make makedrake_example()anddrake_examples()download examples from there. - Optionally show output files in graph visualizations. See the
show_output_filesargument tovis_drake_graph()and friends. - Repair output file checksum operations for distributed backends like
"clustermq_staged"and"future_lapply". - Internally refactor the
igraphattributes of the dependency graph to allow for smarter dependency/memory management duringmake(). - Enable
vis_drake_graph()andsankey_drake_graph()to save static image files viawebshot. - Deprecate
static_drake_graph()andrender_static_drake_graph()in favor ofdrake_ggraph()andrender_drake_ggraph(). - Add a
columnsargument toevaluate_plan()so users can evaluate wildcards in columns other than thecommandcolumn ofplan. - Name the arguments of
target()so users do not have to (explicitly). - Lay the groundwork for a special pretty print method for workflow plan data frames.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 8 years ago
The drake R package - Node clusters, Sankey diagrams, and clustermq_staged parallelism
- Allow multiple output files per command.
- Add Sankey diagram visuals:
sankey_drake_graph()andrender_sankey_drake_graph(). - Add
static_drake_graph()andrender_static_drake_graph()forggplot2/ggraphstatic graph visualizations. - Add
groupandclustersarguments tovis_drake_graph(),static_drake_graph(), anddrake_graph_info()to optionally condense nodes into clusters. - Implement a
traceargument toevaluate_plan()to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values. - Rename the
always_renameargument torenameinevaluate_plan(). - Add a
renameargument toexpand_plan(). - Implement
make(parallelism = "clustermq_staged"), aclustermq-based staged parallelism backend (see https://github.com/ropensci/drake/pull/452). - Implement
make(parallelism = "future_lapply_staged"), afuture-based staged parallelism backend (see https://github.com/ropensci/drake/pull/450). - Depend on
codetoolsrather thanCodeDependsfor finding global variables. - Detect
loadd()andreadd()dependencies inknitrreports referenced withknitr_in()inside imported functions. Previously, this feature was only available in explicitknitr_in()calls in commands. - Skip more tests on CRAN. White-list tests instead of blacklisting them in order to try to keep check time under the official 10-minute cap.
- Disallow wildcard names to grep-match other wildcard names or any replacement values. This will prevent careless mistakes and confusion when generating
drake_plan()s. - Prevent persistent workers from hanging when a target fails.
- Move the example template files to https://github.com/ropensci/drake/tree/master/inst/hpctemplatefiles.
- Deprecate
drake_batchtools_tmpl_file()in favor ofdrake_hpc_template_file()anddrake_hpc_template_files(). - Add a
garbage_collectionargument tomake(). IfTRUE,gc()is called after every new build of a target. - Remove redundant calls to
sanitize_plan()inmake(). - Change
tracked()to accept only adrake_config()object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice. - Move visualization and hpc package dependencies to "Suggests:" rather than "Imports:" in the
DESCRIPTIONfile. - Allow processing of codeless
knitrreports without warnings.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 8 years ago
The drake R package - Intermediate mini-release
This release comes right before an implementation of #283. Changelog:
- Add Sankey diagram visuals:
sankey_drake_graph()andrender_sankey_drake_graph(). - Add
static_drake_graph()andrender_static_drake_graph()forggplot2/ggraphstatic graph visualizations. - Add
groupandclustersarguments tovis_drake_graph(),static_drake_graph(), anddrake_graph_info()to optionally condense nodes into clusters. - Implement a
traceargument toevaluate_plan()to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values. - Rename the
always_renameargument torenameinevaluate_plan(). - Add a
renameargument toexpand_plan(). - Implement
make(parallelism = "clustermq_staged"), aclustermq-based staged parallelism backend (see https://github.com/ropensci/drake/pull/452). - Implement
make(parallelism = "future_lapply_staged"), afuture-based staged parallelism backend (see https://github.com/ropensci/drake/pull/450). - Depend on
codetoolsrather thanCodeDependsfor finding global variables. - Detect
loadd()andreadd()dependencies inknitrreports referenced withknitr_in()inside imported functions. Previously, this feature was only available in explicitknitr_in()calls in commands. - Skip more tests on CRAN. White-list tests instead of blacklisting them in order to try to keep check time under the official 10-minute cap.
- Disallow wildcard names to grep-match other wildcard names or any replacement values. This will prevent careless mistakes and confusion when generating
drake_plan()s. - Prevent persistent workers from hanging when a target fails.
- Move the example template files to https://github.com/ropensci/drake/tree/master/inst/hpctemplatefiles.
- Deprecate
drake_batchtools_tmpl_file()in favor ofdrake_hpc_template_file()anddrake_hpc_template_files(). - Add a
garbage_collectionargument tomake(). IfTRUE,gc()is called after every new build of a target. - Remove redundant calls to
sanitize_plan()inmake(). - Change
tracked()to accept only adrake_config()object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice.
Scientific Software - Peer-reviewed
- R
Published by wlandau almost 8 years ago
The drake R package - Parallel computing improvements
- Sequester staged parallelism in backends "mclapplystaged" and "parLapplystaged". For the other
lapply-like backends,drakeuses persistent workers and a master process. In the case of"future_lapply"parallelism, the master process is a separate background process called byRscript. - Remove the appearance of staged parallelism from single-job
make()'s. (Previously, there were "check" messages and a call tostaged_parallelism().) - Remove uncontained remnants of staged parallelism internals.
- Allow different parallel backends for imports vs targets. For example,
make(parallelism = c(imports = "mclapply_staged", targets = "mclapply"). - Fix a bug in environment pruning. Previously, dependencies of downstream targets were being dropped from memory in
make(jobs = 1). Now, they are kept in memory until no downstream target needs them (formake(jobs = 1)). - Improve
predict_runtime(). It is a more sensible way to go about predicting runtimes with multiple jobs. Likely to be more accurate. - Calls to
make()no longer leave targets in the user's environment. - Attempt to fix a Solaris CRAN check error. The test at https://github.com/ropensci/drake/blob/b4dbddb840d2549621b76bcaa46c344b0fd2eccc/tests/testthat/test-edge-cases.R#L3 was previously failing on CRAN's Solaris machine (R 3.5.0). In the test, one of the threads deliberately quits in error, and the R/Solaris installation did not handle this properly. The test should work now because it no longer uses any parallelism.
- Deprecate the
imports_onlyargument tomake()anddrake_config()in favor ofskip_targets. - Deprecate
migrate_drake_project(). - Deprecate
max_useful_jobs(). - For non-distributed parallel backends, stop waiting for all the imports to finish before the targets begin.
- Add an
upstream_onlyargument tofailed()so users can list failed targets that do not have any failed dependencies. Naturally accompaniesmake(keep_going = TRUE). - Add an RStudio R Markdown template compatible with https://krlmlr.github.io/drake-pitch/.
- Remove
plyras a dependency. - Handle duplicated targets better in
drake_plan()andbind_plans(). - Add a true function
target()to help create drake plans with custom columns. - In
drake_gc(), clean out disruptive files instorrs with mangled keys (re: https://github.com/ropensci/drake/issues/198). - Move all the vignettes to the up and coming user manual: https://ropenscilabs.github.io/drake-manual/
- Rename the "basic example" to the "mtcars example".
- Deprecate
load_basic_example()in favor ofload_mtcars_example(). - Refocus the
README.mdfile on the main example rather than the mtcars example. - Use a
README.Rmdfile to generateREADME.md. - Add function
deps_targets(). - Deprecate function
deps()in favor ofdeps_code() - Add a
pruning_strategyargument tomake()anddrake_config()so the user can decide howdrakekeeps non-import dependencies in memory when it builds a target. - Add optional custom (experimental) "workers" and "priorities" columns to the
drakeplans to help users customize scheduling. - Add a
makefile_pathargument tomake()anddrake_config()to avoid potential conflicts between user-side customMakefiles and the one written bymake(parallelism = "Makefile"). - Document batch mode for long workflows in the HPC guide.
- Add a
consoleargument tomake()anddrake_config()so users can redirect console output to a file. - Make it easier for the user to find out where a target in the cache came from:
show_source(),readd(show_source = TRUE),loadd(show_source = TRUE).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 8 years ago
The drake R package - Intermediate development release
This is a checkpoint/fallback release just before #369 is solved. #369 is a sensitive issue with major changes to the parallel scheduling algorithm, so there may be bugs despite our best efforts to test.
Scientific Software - Peer-reviewed
- R
Published by wlandau about 8 years ago
The drake R package - CRAN hotfix
- In R 3.5.0, the
!!operator from tidyeval andrlangis parsed differently than in R <= 3.4.4. This change broke one of the tests intests/testthat/tidy-eval.RThe main purpose ofdrake's 5.1.2 release is to fix the broken test. - Fix an elusive
R CMD checkerror from building the pdf manual with LaTeX. - In
drake_plan(), allow users to customize target-level columns usingtarget()inside the commands. - Add a new
bind_plans()function to concatenate the rows of drake plans and then sanitize the aggregate plan. - Add an optional
sessionargument to tellmake()to build targets in a separate, isolated master R session. For example,make(session = callr::r_vanilla).
Scientific Software - Peer-reviewed
- R
Published by wlandau about 8 years ago
The drake R package - Minor release: new file API, tidyselect, and internal fixes
Version 5.1.0
- Add a
reduce_plan()function to do pairwise reductions on collections of targets. - Forcibly exclude the dot (
.) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis funcionality, which sometimes detects.and sometimes does not. - Use
ignore()to optionally ignore pieces of workflow plan commands and/or imported functions. Useignore(some_code)to- Force
draketo not track dependencies insome_code, and - Ignore any changes in
some_codewhen it comes to deciding which target are out of date.
- Force
- Force
draketo only look for imports in environments inheriting fromenvirinmake()(plus explicitly namespaced functions). - Force
loadd()to ignore foreign imports (imports not explicitly found inenvirwhenmake()last imported them). - Reduce default verbosity. Only targets are printed out by default. Verbosity levels are integers ranging from 0 through 4.
- Change
loadd()so that only targets (not imports) are loaded if the...andlistarguments are empty. - Add check to drake_plan() to check for duplicate targets
- Add a
.gitignorefile containing"*"to the default.drake/cache folder every timenew_cache()is called. This means the cache will not be automatically committed to git. Users need to remove.gitignorefile to allow unforced commits, and then subsequentmake()s on the same cache will respect the user's wishes and not add another.gitignore. this only works for the default cache. Not supported for manualstorrs. - Add a new experimental
"future"backend with a manual scheduler. - Implement
dplyr-styletidyselectfunctionality inloadd(),clean(), andbuild_times(). Forbuild_times(), there is an API change: fortidyselectto work, we needed to insert a new...argument as the first argument ofbuild_times(). - Deprecate the single-quoting API for files. Users should now use formal API functions in their commands:
file_in()for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).file_out()for output file targets (ignored if used in imported functions).knitr_in()forknitr/rmarkdownreports. This tellsdraketo look inside the source file for target dependencies in code chunks (explicitly referenced withloadd()andreadd()). Treated as afile_in()if used in imported functions.
- Change
drake_plan()so that it automatically fills in any target names that the user does not supply. Also, anyfile_out()s become the target names automatically (double-quoted internally). - Make
read_drake_plan()(rather than an emptydrake_plan()) the defaultplanargument in all functions that accept aplan. - Add support for active bindings:
loadd(..., lazy = "bind"). That way, when you have a target loaded in one R session and hitmake()in another R session, the target in your first session will automatically update. - Use tibbles for workflow plan data frames and the output of
dataframes_graph(). - Return warnings, errors, and other context of each build, all wrapped up with the usual metadata.
diagnose()will take on the role of returning this metadata. - Deprecate the
read_drake_meta()function in favor ofdiagnose(). - Add a new
expose_imports()function to optionally forcedrakedetect deeply nested functions inside specific packages. - Move the "quickstart.Rmd" vignette to "example-basic.Rmd". The so-called "quickstart" didn't end up being very quick, and it was all about the basic example anyway.
- Move
drake_build()to be an exclusively user-side function. - Add a
replaceargument toloadd()so that objects already in the user's eOne small thing:nvironment need not be replaced. - When the graph cyclic, print out all the cycles.
- Prune self-referential loops (and duplicate edges) from the workflow graph. That way, recursive functions are allowed.
- Add a
seedargument tomake(),drake_config(), andload_basic_example(). Also hard-code a default seed of0. That way, the pseudo-randomness in projects should be reproducible across R sessions. - Cache the pseudo-random seed at the time the project is created and use that seed to build targets until the cache is destroyed.
- Add a new
drake_read_seed()function to read the seed from the cache. Its examples illustrate whatdrakeis doing to try to ensure reproducible random numbers. - Evaluate the quasiquotation operator
!!for the...argument todrake_plan(). Suppress this behavior usingtidy_evaluation = FALSEor by passing in commands passed through thelistargument. - Preprocess workflow plan commands with
rlang::expr()before evaluating them. That means you can use the quasiquotation operator!!in your commands, andmake()will evaluate them according to the tidy evaluation paradigm. - Restructure
drake_example("basic"),drake_example("gsp"), anddrake_example("packages")to demonstrate how to set up the files for seriousdrakeprojects. More guidance was needed in light of this issue. - Improve the examples of
drake_plan()in the help file (?drake_plan).
Version 5.0.0
- Transfer
draketo rOpenSci: https://github.com/ropensci/drake - Several functions now require an explicit
configargument, which you can get fromdrake_config()ormake(). Examples:- outdated()
- missed()
- ratelimitingtimes()
- predict_runtime()
- visdrakegraph()
- dataframes_graph()
- Always process all the imports before building any targets. This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
- Major speed improvement: dispense with internal inventories and rely on
cache$exists()instead. - Let the user define a trigger for each target to customize when
make()decides to build targets. - Document triggers and other debugging/testing tools in the new debug vignette.
- Restructure the internals of the
storrcache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use ofstorrnamespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake. - Use
formatR::tidy_source()instead ofparse()intidy_command()(originallytidy()inR/dependencies.R). Previously,drakewas having problems with an edge case: as a command, the literal string"A"was interpreted as the symbolAafter tidying. Withtidy_source(), literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0. - Speed up clean() by refactoring the cache inventory and using light parallelism.
- Implement
rescue_cache(), exposed to the user and used inclean(). This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more. - Change the default
cpuandelapsedarguments ofmake()toNULL. This solves an elusive bug in how drake imposes timeouts. - Allow users to set target-level timeouts (overall, cpu, and elapsed) with columns in the workflow plan data frame.
- Document timeouts and retries in the new debug vignette.
- Add a new
graphargument to functionsmake(),outdated(), andmissed(). - Export a new
prune_graph()function for igraph objects. - Delete long-deprecated functions
prune()andstatus(). - Deprecate and rename functions:
analyses()=>plan_analyses()as_file()=>as_drake_filename()backend()=>future::plan()build_graph()=>build_drake_graph()check()=>check_plan()config()=>drake_config()evaluate()=>evaluate_plan()example_drake()=>drake_example()examples_drake()=>drake_examples()expand()=>expand_plan()gather()=>gather_plan()plan(),workflow(),workplan()=>drake_plan()plot_graph()=>vis_drake_graph()read_config()=>read_drake_config()read_graph()=>read_drake_graph()read_plan()=>read_drake_plan()render_graph()=>render_drake_graph()session()=>drake_session()summaries()=>plan_summaries()
- Disallow
outputandcodeas names in the workflow plan data frame. Usetargetandcommandinstead. This naming switch has been formally deprecated for several months prior. - Deprecate the ..analysis.. and ..dataset.. wildcards in favor of analysis__ and dataset__, respectively. The new wildcards are stylistically better an pass linting checks.
- Add new functions
drake_quotes(),drake_unquote(), anddrake_strings()to remove the silly dependence on theeplypackage. - Add a
skip_safety_checksflag tomake()anddrake_config(). Increases speed. - In
sanitize_plan(), remove rows with blank targets "". - Add a
purgeargument toclean()to optionally remove all target-level information. - Add a
namespaceargument tocached()so users can inspect individualstorrnamespaces. - Change
verboseto numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything. - Add a new
next_stage()function to report the targets to be made in the next parallelizable stage. - Add a new
session_infoargument tomake(). Apparently,sessionInfo()is a bottleneck for smallmake()s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests. - Add a new
log_progressargument tomake()to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit. - Add an optional
namespaceargument toloadd()andreadd(). You can now load and read from non-defaultstorrnamespaces. - Add
drake_cache_log(),drake_cache_log_file(), andmake(..., cache_log_file = TRUE)as options to track changes to targets/imports in the drake cache. - Detect knitr code chunk dependencies in response to commands with
rmarkdown::render(), not justknit(). - Add a new general best practices vignette to clear up misconceptions about how to use
drakeproperly.
Scientific Software - Peer-reviewed
- R
Published by wlandau about 8 years ago
The drake R package - Intermediate release: last version of drake that does not support tidy evaluation
Scientific Software - Peer-reviewed
- R
Published by wlandau over 8 years ago
The drake R package - First release under rOpenSci
TL;DR: this is the first release in which drake is part of rOpenSci. Relative to 4.4.0, this release has major changes to cache internals, user-level function names, and documentation.
- Transfer
draketo rOpenSci: https://github.com/ropensci/drake - Several functions now require an explicit
configargument, which you can get fromdrake_config()ormake(). Examples:- outdated()
- missed()
- ratelimitingtimes()
- predict_runtime()
- visdrakegraph()
- dataframes_graph()
- Always process all the imports before building any targets. This is part of the solution to #168: if imports and targets are processed together, the full power of parallelism is taken away from the targets. Also, the way parallelism happens is now consistent for all parallel backends.
- Major speed improvement: dispense with internal inventories and rely on
cache$exists()instead. - Let the user define a trigger for each target to customize when
make()decides to build targets. - Document triggers and other debugging/testing tools in the new debug vignette.
- Restructure the internals of the
storrcache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use ofstorrnamespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake. - Use
formatR::tidy_source()instead ofparse()intidy_command()(originallytidy()inR/dependencies.R). Previously,drakewas having problems with an edge case: as a command, the literal string"A"was interpreted as the symbolAafter tidying. Withtidy_source(), literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0. - Speed up clean() by refactoring the cache inventory and using light parallelism.
- Implement
rescue_cache(), exposed to the user and used inclean(). This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more. - Change the default
cpuandelapsedarguments ofmake()toNULL. This solves an elusive bug in how drake imposes timeouts. - Allow users to set target-level timeouts (overall, cpu, and elapsed) with columns in the workflow plan data frame.
- Document timeouts and retries in the new debug vignette.
- Add a new
graphargument to functionsmake(),outdated(), andmissed(). - Export a new
prune_graph()function for igraph objects. - Delete long-deprecated functions
prune()andstatus(). - Deprecate and rename functions:
analyses()=>plan_analyses()as_file()=>as_drake_filename()backend()=>future::plan()build_graph()=>build_drake_graph()check()=>check_plan()config()=>drake_config()evaluate()=>evaluate_plan()example_drake()=>drake_example()examples_drake()=>drake_examples()expand()=>expand_plan()gather()=>gather_plan()plan(),workflow(),workplan()=>drake_plan()plot_graph()=>vis_drake_graph()read_config()=>read_drake_config()read_graph()=>read_drake_graph()read_plan()=>read_drake_plan()render_graph()=>render_drake_graph()session()=>drake_session()summaries()=>plan_summaries()
- Disallow
outputandcodeas names in the workflow plan data frame. Usetargetandcommandinstead. This naming switch has been formally deprecated for several months prior. - Deprecate the ..analysis.. and ..dataset.. wildcards in favor of analysis__ and dataset__, respectively. The new wildcards are stylistically better an pass linting checks.
- Add new functions
drake_quotes(),drake_unquote(), anddrake_strings()to remove the silly dependence on theeplypackage. - Add a
skip_safety_checksflag tomake()anddrake_config(). Increases speed. - In
sanitize_plan(), remove rows with blank targets "". - Add a
purgeargument toclean()to optionally remove all target-level information. - Add a
namespaceargument tocached()so users can inspect individualstorrnamespaces. - Change
verboseto numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything. - Add a new
next_stage()function to report the targets to be made in the next parallelizable stage. - Add a new
session_infoargument tomake(). Apparently,sessionInfo()is a bottleneck for smallmake()s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests. - Add a new
log_progressargument tomake()to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit. - Add an optional
namespaceargument toloadd()andreadd(). You can now load and read from non-defaultstorrnamespaces. - Add
drake_cache_log(),drake_cache_log_file(), andmake(..., cache_log_file = TRUE)as options to track changes to targets/imports in the drake cache. - Detect knitr code chunk dependencies in response to commands with
rmarkdown::render(), not justknit(). - Add a new general best practices vignette to clear up misconceptions about how to use
drakeproperly.
Scientific Software - Peer-reviewed
- R
Published by wlandau over 8 years ago
The drake R package - Another intermediate release before version 5
Version 4.4.1.9002 is not back compatible with version 4.4.1.9001 because the cache internals were refactored again to solve #154. Anyone relying on the development version for current projects may need to use packrat with this release of 4.4.1.9001 to avoid having to rerun projects from scratch.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 8 years ago
The drake R package - Intermediate development release on the way to version 5
Enough infrastructure changes are about to happen on the way to 5.0.0 to have a tag here, just to be safe.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 8 years ago
The drake R package - `future`-powered parallelism, examples for clusters, subgraph visualization, and a lot more speed
- Extend
plot_graph()to display subcomponents. Check out argumentsfrom,mode,order, andsubset. The graphing vignette has demonstrations. - Add
"future_lapply"parallelism: parallel backends supported by the future and future.batchtools packages. See?backendfor examples and the parallelism vignette for an introductory tutorial. More advanced instruction can be found in thefutureandfuture.batchtoolspackages themselves. - Cache diagnostic information of targets that fail and retrieve diagnostic info with
diagnose(). - Add an optional
hookargument tomake()to wrap aroundbuild(). That way, users can more easily control the side effects of distributed jobs. For example, to redirect error messages to a file inmake(..., parallelism = "Makefile", jobs = 2, hook = my_hook),my_hookshould be something likefunction(code){withr::with_message_sink("messages.txt", code)}. - Remove console logging for "parLapply" parallelism.
Drakewas previously using theoutfileargument for PSOCK clusters to generate output that could not be caught bycapture.output(). It was a hack that should have been removed before. - Remove console logging for "parLapply" parallelism.
Drakewas previously using theoutfileargument for PSOCK clusters to generate output that could not be caught bycapture.output(). It was a hack that should have been removed before. - If 'verbose' is 'TRUE' and all targets are already up to date (nothing to build), then
make()andoutdated()print "All targets are already up to date" to the console. - Add new examples in 'inst/examples', most of them demonstrating how to use the
"future_lapply"backends. - New support for timeouts and retries when it comes to building targets.
- Failed targets are now recorded during the build process. You can see them in
plot_graph()andprogress(). Also see the newfailed()function, which is similar toin_progress(). - Speed up the overhead of
parLapplyparallelism. The downside to this fix is thatdrakehas to be properly installed. It should not be loaded withdevtools::load_all(). The speedup comes from lightening the firstclusterExport()call inrun_parLapply(). Previously, we exported every single individualdrakefunction to all the workers, which created a bottleneck. Now, we just loaddrakeitself in each of the workers, which works becausebuild()anddo_prework()are exported. - Change default value of
overwritetoFALSEinload_basic_example(). - Warn when overwriting an existing
report.Rmdinload_basic_example(). - Tell the user the location of the cache using a console message. Happens on every call to
get_cache(..., verbose = TRUE). - Increase efficiency of internal preprocessing via
lightly_parallelize()andlightly_parallelize_atomic(). Now, processing happens faster, and only over the unique values of a vector. - Add a new
storrnamespace calledimportsto be used inis_imported(). That way, the whole object need not be read toclean()is.clean()is much faster and safer.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 8 years ago
The drake R package - Reproducible random numbers, knitr awareness, more docs, minor bugfixes
Most critically of all, this release readies drake for the upcoming testthat version 2.0.0.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 8 years ago
The drake R package - Predict runtimes, externalize caches/hashes
There are several improvements to code style and performance. In addition, there are new features such as cache/hash externalization and runtime prediction. See the new storage and timing vignettes for details. This release has automated checks for back-compatibility with existing projects, and I also did manual back-compatibility checks on serious projects.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 8 years ago
The drake R package - Fixes plus build times
This is mainly a bugfix release, but it also contains @dapperjapper's build_times(), @AlexAxthelm's testing with tempdir(), and a TON of linting by @AlexAxthelm.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly almost 9 years ago
The drake R package - Fix vignettes
...so they render well on CRAN.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly almost 9 years ago
The drake R package - Interactive visualizations with plot_graph()!
Plus useful utilities like deps(), max_useful_jobs(), and in_progress(). Also MUCH improved documentation and examples in the help files of every user-side function.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly almost 9 years ago
The drake R package - Fixes issues 35, 36, 37, and 39
This major release fixes some internal environment and scoping problems in versions 2.X.X. Most users will see no change. However, behavior will change in the edge cases described in issues #35, #36, #37, and #39, which is why this is a major version bump instead of a mere patch.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly about 9 years ago
The drake R package - patch: packages and workflow plans
- Warn rather than quit in error when one of the user's packages fails to load.
- Allow
drake::make(),drake::check()etc. when drake is not actually loaded/attached. - Sanitize workflow plan data frames.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly about 9 years ago
The drake R package - patch
Fixed some (mostly interface-related) bugs and idiosyncrasies in version 2.1.0. Version 2.1.1 is cleaner and back-compatible, and it should work better with projects with hundreds of targets or more.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly about 9 years ago
The drake R package - Parallel computing with parLapply
- Parallel computing with parLapply.
- Improvements to the documentation, including a "caution" vignette.
- The
tracked()function, which lists the objects in a workflow that are reproducibly tracked.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly about 9 years ago
The drake R package - Tiny fixes in documentation and a test.
The actual package wasn't the problem, but the docs and testing were:
- The quickstart vignette called
rmarkdown::render(), which created CRAN package check warnings for solaris. I commented outmy_render()in the workflow data frame invignettes/quickstart.Rmdandinst/examples/basic/basic.R. - Intermediate file test was intermittently failing because of #4, which is a known issue, but not a show-stopper. Downsizing the size of input test files (toggling a comment in debug.R) fixed the test. All I did was move a hash symbol from one line to the previous one in debug.R.
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 9 years ago
The drake R package - Infractructure cleanup
- Infrastructure: I reimplemented most of the main functionality with cleaner code, having learned from many lessons and surprises from initial development.
- Robustness: I am using safer mechanisms to search for dependencies and map out user-side workflows so that drake is less likely to fail on edge cases.
- Documentation: I refactored the vignettes, and I baked an example of a common use case into the code.
- High-performance computing: distributed computing with Makefiles requires expensive overhead, so I added a low-overhead single-node alternative (using parallel::mclapply()).
Scientific Software - Peer-reviewed
- R
Published by wlandau-lilly over 9 years ago