Recent Releases of reproducible
reproducible - v2.1.0
reproducible 2.1.0
New
- new family of functions that are called inside
postProcessTothat usesf::gdal_utilsdirectly. These are still experimental and will only be activated withoptions("reproducible.gdalwarp" = TRUE) - default for
gdalMaskhas changed default for "touches". Now has equivalent forterra::mask(..., touches = TRUE), using"-wo CUTLINE_ALL_TOUCHED=TRUE" gdalProjectnow uses 2 threads, setting"-wo NUM_THREADS=2"; can be changed by user withoptions("reproducible.gdalwarpThreads" = X); see?reproducibleOptionsgdal*functions now addressdatatypeissuesgdal*defaults toFLT8Sifdatatypenot passedmakeRelative,makeAbsoluteand similar have been created to ease many issues encountered inpreProcess
Changes
showSimilar(e.g.,options(reproducible.showSimilar = 1)) now preferentially shows the most recent item in cache if there are several with equivalent matching.- overhaul of messaging in
CacheandprepInputsfamilies; functions are highlighted with a different colour; indent level reflects nesting of bothCacheandprepInputs, so it is easier to identify which message goes with which function call. preProcessis a lot faster now for large numbers of files; usesCHECKSUMSmore effectively and fewer timesretrynow captures itsexprso it doesn't need aquote; is liketrynow.showSimilarmechanisms now returns the most recent, if there are >1 similar that are equivalently similar- if a user is having troubles with
googledrivefor e.g., large files on spotting connections, instructions for usinggdownare provided showCache,clearCachenow have extra argumentsfun,cacheId, and...now can take any arbitrarytag = valuepair. ThecacheIdargument will be very fast if a user is not usinguseDBI()isFALSE..wrapand.unwrapcan now deal withSpatVectorCollection(aterraclass that does not have awrap/unwrapmethod interra)- ALTREP digesting when using
spookyorfastdigestwere not stable forintegersandfactors. There is now a work around in.robustDigestthat stabilizes these by expanding them from their ALTREP representation first. Since they will be saved and recovered anyway, this will have little effect. .wrapand.unwrapare becoming more mature and can handle many more classes effectively. Methods can still be written, if needed.
Testing
- lots of testing with
cacheSaveFormat = "qs", which previously was not reliable especially for environments. With all recent changes to.wrapand.unwrap, these appear stable now and should be able to be used forenvironments.
Bugfixes
switchDataTypecan now correctly switch betweengdalformats andterra- many messaging fixes that were imprecise or missing
- R
Published by achubaty almost 2 years ago
reproducible - v2.0.12
reproducible 2.0.12
- re-submission after removal from CRAN
- R
Published by achubaty almost 2 years ago
reproducible - v2.0.9
reproducible 2.0.9
Enhancements
- new function
isUpdated()to determine whether a cached object has been updated; makeRelative()is now exported for use downstream (e.g.,SpaDES.core);- new functions
getRelative()andnormPathRel()for improved symlink handling (#362); - messaging is improved for
Cachewith the function named instead of justcacheId - messaging for
prepInputs: minor changes - more edge cases for
Checksumsdealt with, so fewer unneeded downloads wrapSpatRaster(wrapfor file-backedspatRasterobjects) fixes for more edge casespostProcessTocan now usesf::gdal_utilsfor the case offromis a gridded object andtois a polygon vector. This appears to be between 2x and 10x faster in tests.postProcessTodoes a pre-crop (with buffer) to make theprojectTofaster. When bothfromandtoare vector objects, this pre-crop appears to create slivers in some cases. This step is now skipped for these cases.Cachecan now deal with unnamed functions, e.g.,Cache((function(x) x)(1)). It will be refered to as "headless".terrawould fail if internet was unavailable, even when internet is not necessary, due to needing to retrieve projection information. Many cases where this happens will now divert to usesf.Cachecan now skip calculatingobjSize, which can take a non-trivial amount of time for large, complicated objects; seereproducibleOptions()
Bug fixes
- Filenames for some classes returned ""; now returns NULL so character vectors are only pointers to files
- Cache on a terra object that writes file to disk, when
quickargument is specified was failing, always creating the same object; fixed with #PR368 useDBIwas incorrectly used if a user had set the option prior to package loading. Now works as expected.- several other minor
preProcessdeals better with more cases of nested paths in archives.- more edge cases corrected for
inputPaths
- R
Published by achubaty over 2 years ago
reproducible - v2.0.8
reproducible 2.0.8
Enhancements
- minor formatting changes
Bug fixes
- only use character strings when comparing
getRVersion() <= "XXX" - fixes for
assessDataTypefor categorical (factor)RasterandSpatRaster
- R
Published by achubaty over 2 years ago
reproducible - v2.0.7
reproducible 2.0.7
Enhancements
- Address change in
roundwithR > 4.3.1; now a primitive, that does method dispatch. Failure was identified with unit tests, by Luke Tierney who was making the change inbase::round.
Bug fixes
- several identified and fixed (PRs by Ceres Barros, notably, PRs #341, #342, #343). These fix missing argument in a
.unwrapcall, and missing check inpreProcess, whentargetFilePathwasNULL. - minor documentation updates
- R
Published by achubaty over 2 years ago
reproducible - v2.0.5
reproducible 2.0.5
Enhancements
- Updates of
Copy& new.wrap,.unwrapgenerics and methods to wrap classes that don't save well to disk as is. This uses the name similar toterra::wrap, but with slight differences internally to allow forSpatRasterobjects who are file-backed and must have their files moved when they are unwrapped. loadFilesupdated for more cases- convert to using
withrthroughout testing for cleaning up - more methods for
Filenameadded, including forPathclass Cache(..., useCloud = TRUE)had many cases that were not working; known cases are now working. Also, now file from file-backed cases are now placed inside thecacheOutputsfolder rather than inside a separate folder (used to be "rasters")
Bugfixes
- several small for edge cases
Dependency changes
- none
- R
Published by achubaty over 2 years ago
reproducible - v2.0.4
reproducible 2.0.4
Enhancements
reproducible.useFuturenow defaults to"multisession"- updated tests to deal with
data.tabledevelopment branch (#314) - removed all use of
data.table::setattrto deal with "modified compiler constants" issue that was detected during CRAN checks at https://github.com/kalibera/cran-checks/blob/master/rcnst/results/reproducible/00check.log
Bugfixes
preProcessfailed whengoogledriveurl filename could be found, butdestinationPathwas not"."normPathhad different behaviour on *nix-alikes and Windows. Now it is the same.- Issue #316 --
SpatRasterobjects if saved to a specific, non relative (togetwd()) path would not be recovered correctly. - Several other Issues that addressed edge cases for
prepInputsand family.
Continuous integration
- Improvements with testing on GitHub Actions
- R
Published by achubaty over 2 years ago
reproducible - v2.0.2
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 2.0.2
Enhancements
- new optional backend for
Cacheviaoptions(reproducible.useDBI = FALSE)is single data files with the samebasenameas the cached object, i.e., with the samecacheIdin the file name. This is a replacement forRSQLiteand will likely become the default in the next release. This approach makes cloud caching easier as all metadata are available in small binary files for each cached object. This is simpler, faster and creates far fewer package dependencies (now 11 recursive; before 27 recursive). If a user has DBI and RSQLite installed, then the backend will default to use these currently, i.e., the previous behaviour. The user can change the backend without loss of Cache data. - moved
rasterandsptoSuggests; no more internal functions use these. User can still work withRasterandspclass objects as before. preProcesscan now handle google docs files, iftype = ...is passed.postProcessnow usesterraandsfinternally (with #253) throughout the family ofpostProcessfunctions. The previous*Inputand*Outputfunctions now redirect to the new*To*functions. These are faster, more stable, and cover vastly more cases than the previous*Inputsfamily. The old backends no longer work as before.- minor functions to assist with transition from
rastertoterra:maxFn,minFn,rasterRead .dealWithClassand.dealWithClassOnRecoveryare now exported generics, with several methods here, notably, list, environment, default- other miscellaneous changes to deal with
rastertoterratransition (e.g.studyAreaNamecan deal withSpatVector) prepInputsnow deals with archives that have sub-folder structure are now dealt with correctly in all examples and tests esp. #181.prepInputscan now deal with.gdbfiles. Though, it is limited tosfout of the box, so e.g., Raster layers insidegdbfiles are not supported (yet?). User can passfun = NAto not try to load it, but at least have the.gdbfile locally on disk.hardLinkOrCopynow useslinkOrCopy(symlink = FALSE); more cases dealt with especially nested directory structures that do not exist in theto.- many GitHub issues closed after transition to using
terraandsf. preProcesshad multiple changes. The following now work: archives with subfolders, archives with subfolders with identical basenames (different dirnames), gdb files, other files wheretargetFileis a directory.- ~40 issues were closed with current release.
- code coverage now approaching 85%
- substantial changes to
preProcessfor minor efficiency gains, edge cases, code cleaning - new function
CacheGeothat weaves togetherprepInputsandCacheto create a geo-spatial caching. See help and examples. maskTonow allowstouchesarg forterra::maskSpatialclass is also "fixed" infixErrorsInprepInputsandpreProcessnow capturedlFun, so user can pass unquoteddlFunCopymethod forSpatRaster, with and without file-backingCache(..., useCloud = TRUE)reworked so appears to be more robust than previously.maskTonow works even iftois larger thanfromnetCDFworks withprepInputs; thanks to user nbsmokee with PR #300.
Dependency changes
- no spatial packages are automatically installed any more; to work with
prepInputsand family, the user will have to installterraandsfat a minimum. terra,sfare inSuggests- removed entirely:
fasterize,fpCompare,magrittr - moved to
Suggests:raster,sp,rlang - A normal (minimal) install of
reproducibleno longer installsDBI, nor does it useRSQLite. All cache repositories database files will be in binary individual files in thecacheOutputsfile. If a user hasDBIand aSQLiteengine, then the previous behaviour will be used.
Defunct
reproducible.useNewDigestAlgorithmis not longer an option as the old algorithms do not work reliably.
Defunct and removed
- removed
assessDataTypeGDAL(),clearStubArtifacts(), - removed non-exported
digestRasterLayer2();evalArgsOnly();.getSourceURL();.getTargetCRS();.checkSums(),.groupedMessage();.checkForAuxililaryFiles() option("reproducible.polygonShortcut")removed
Non exported function changes
.basenamerenamed tobasename2
Bugfixes
Cachewas incorrectly dealing withenvironmentandenvironment-likeobjects. Since some objects, e.g.,Spat*objects interra, must be wrapped prior to saving, environments must be scanned for these classes of objects prior to saving. This previously only occurred forlistobjects;- When working with revdep
SpaDES.core, there were some cases where theCachewas failing as it could not find the module name; - during transition from
postProcess(usingrasterandsp) topostProcessTo, some cases are falling through the cracks; these have being addressed.
- R
Published by achubaty almost 3 years ago
reproducible - v1.2.16
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.16
Dependency changes
- none
Enhancements
Cachenow captures the first argument passed to it without evaluating it, soCache(rnorm(1))now works as expected.- As a result of previous,
Cachenow works with base pipe |> (with R >= 4.1). - Due to some internal changes in the way arguments are evaluated and digested, there may be some cache entries that will be rerun. However, in simple cases of
FUNpassed toCache, there should be no problems with previous cache databases being successfully recovered. - Added more unit tests
- Reworked
Cacheinternals so that digesting is more accurate, as the correct methods for functions are more accurately found, objects within functions are more precisely evaluated. - Improved documentation:
- Examples were reworked, replaced, improved;
- All user-facing exported functions and methods now have complete documentation;
- Added
()in DESCRIPTION for functions; - Added
\valuein.Rdfiles for exported methods (structure, the class, the output meaning); - Remove commented code in examples.
Bug fixes
postProcessnow also checks resolution when assessing whether to projectprepInputshas an internalCachecall for loading the object into memory; this was incorrectly evaluating all files if there were more than one file downloaded and extracted. This resulted in cases, e.g. shapefiles, being considered identical if they had the identical geometries, even if their data were different. This is fixed now as it uses the digest of all files extracted.
Deprecated and defunct
- remove defunct argument
digestPathContentfromCache options("reproducible.useGDAL")is now deprecated; the package is moving towardsterra.
- R
Published by achubaty about 3 years ago
reproducible - v1.2.11
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.11
Dependency changes
- none
Enhancements
- none
Bug fixes
- fix tests for
postProcessTerrato deal with changes in GDAL/PROJ/GEOS (#253; @rsbivand) - fixed issue with masking
- R
Published by achubaty over 3 years ago
reproducible - v1.2.10
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.10
Dependency changes
- Drop support for R 3.6 (#230)
- remove
gdalUtilities,gdalUtils, andrgeosfromSuggests - Added minimum versions of
rasterandterra, because previous versions were causing collisions.
Enhancements
- all direct calls to GDAL are removed: only
terraandsfare used throughout prepInputscan now takefunas a quoted expression onx, the object loaded bydlFuninpreProcesspreProcessargdlFuncan now be a quoted expression- changes to the internals and outputs of
objSize; now is primarily a wrapper aroundlobstr::obj_size, but has an option to get more detail for lists and environments. .robustDigestnow deals explicitly with numerics, which digest differently on different OSs. Namely, they get rounded prior to digesting. Through trial and error, it was found that settingoptions("reproducible.digestDigits" = 7)was sufficient for all known cases. Rounding to deeper than 7 decimal places was insufficient. There are also new methods forlanguage,integer,data.frame(which does each column one at a time to address the numeric issue)- New version of
postProcesscalledpostProcessTerra. This will eventually replacepostProcessas it is much faster in all cases and simpler code base thanks to the fantastic work of Robert Hijmans (terra) and all the upstream work thatterrarelies on - Minor message updates, especially for "adding to memoised copy...". The three dots made it seem like it was taking a long time. When in reality, it is instantaneous and is the last thing that happens in the
Cachecall. If there is a delay after this message, then it is the code following theCachecall that is (silently) slow. retrycan now return a named list for theexprBetween, which allows for more than one object to be modified between retries.
Bug fixes
.robustDigestwas removing Cache attributes from objects under many conditions, when it should have left them there. It is unclear what the issues were, as this would likely not have impactedCache. Now these attributes are left on.data.tableobjects appear to not be recovered correctly from disk (e.g., from Cache repository. We have addeddata.table::copywhen recovering from Cache repositoryclearCacheandccdid not correctly remove file-backed raster files (when not clearing whole CacheRepo); this may have resulted in a proliferation of files, each a filename with an underscore and a new higher number. This fix should eliminate this problem.- deal with development versions of GDAL in
getGDALVersion()(#239) - fix issue with
maskInputs()when not passingrasterToMatch. - fix issue with
isna.SpatialFixwhen usingpostProcess.quosure
- R
Published by achubaty over 3 years ago
reproducible - v1.2.8
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.8
Dependency changes
lwgeomnow a suggested package
Enhancements
terraclass objects can now be correctly saved and recovered byCachefixErrorscan now distinguishtestValidity = NAmeaning don't fix errors andtestValidity = FALSErun buffering which fixes many errors, but don't test whether there are any invalid polygons first (maybe slow), ortestValidity = TRUEmeaning test for validity, then if some are invalid, then run buffer.- Change default option to
reproducible.useNewDigestAlgorithm = 2which will have user visible changes. To keep old behaviour, setoptions(reproducible.useNewDigestAlgorithm = 1) - minor changes to messaging when
options(reproducible.showSimilar)is set. It is now more compact e.g., 3 lines instead of 5. - added
sfmethods tostudyAreaName
Bug fixes
- A small, but very impactful bug that created false positive
Cachereturns; i.e., a 2nd time through a Cache would return a cached copy, when some of the arguments were different. It occurred for when the differences were in unnamed arguments only.
- R
Published by achubaty over 4 years ago
reproducible - v1.2.7
Known issues: https://github.com/PredictiveEcology/reproducible/issues
Version 1.2.7
reproducible will be slowly changing the defaults for vector GIS datasets from the sp package to the sf package.
There is a large user-visible change that will come (in the next release), which will cause prepInputs to read .shp files with sf::st_read instead of raster::shapefile, as it is much faster. To change now, set options("reproducible.shapefileRead" = "sf::st_read")
Enhancements
- default
funinprepInputsfor shapefiles (.shp) is nowsf::st_readif the system hassfinstalled. This can be overridden withoptions("reproducible.shapefileRead" = "raster::shapefile"), and this is indicated with a message at the moment this is occurring, as it will cause different behaviour. quickargument inCachecan now be a character vector, allowing individual character arguments to be digested as character vectors and others to be digested as files located at the specified path as represented by the character vector.objSizepreviously included objects innamespaces,baseenvandemptyenv, so it was generally too large. Now uses the same criteria aspryr::object_size- improvements with messaging when
unzipmissing (thanks to C. Barros #202) - while unzipping, will also search for
7z.exeon Windows if the object is larger than 2GB, if can't findunzip. funargument inprepInputsand family can now be a quoted expression.archiveargument inprepInputscan now beNAwhich means to treat the file downloaded not as an archive, even if it has a.zipfile extension- many minor improvements to functioning of esp.
prepInputs - speed improvements during
postProcessespecially for very large objects (>5GB tested). Previously, it was running manyfixErrorscalls; now only callsfixErrorson fail of the proximate call (e.g., st_crop or whatever) retrynow has a new argumentexprBetweento allow for doing something after the fail (for example, if an operation fails, e.g.,st_crop, then runfixErrors, then return back tost_cropfor the retry)Cachenow has MUCH better nested levels detection, with messaging... and control of how deep the Caching goes seems good, via useCache = 2 will only Cache 2 levels in...archiveargument inprepInputsfamily can now be NA ... meaning do not try to unzip even if it is a.zipfile or other standard archive extensiongdb.zipfiles (e.g., a file with a .zip extension, but that should not be opened with an unzip-type program) can now be opened withprepInputs(url = "whateverUrl", archive = NA, fun = "sf::st_read")funargument inprepInputscan now be a quoted function call.preProcessnow does a better job with large archives that can't be correctly handled with the defaultzipandunzipwith R, by tryingsystem2calls to possible7z.exeor other options on Linux-alikes.
Bug fixes
Copygeneric no longer hasfileBackedDirargument. It is now passed through with the.... This was creating a bug with some cases wherefileBackedDirwas not being correctly executed.fixErrors()now better handlessfpolygons with mixed geometries that include points.- inadvertent deleting of file-backed rasters in multi-filed stacks during
Cache writeOutputs.Rasterattempted to changedatatypeofRasterclass objects using the setReplacementdataType<-, without subsequently writing to disk viawriteRaster. This created bad values in theRaster*object. This now performs awriteRasterif there is adatatypepassed towriteOutputse.g., throughprepInputsorpostProcess.updateSlotFilenamehas many more tests.prepInputs(..., fun = NA)now is the correct specification for "do not load object into R". This essentially replicatespreProcesswith same arguments.- several minor bugfixes
Copydid not correctly copyRasterStacks when some of theRasterLayerobjects were in memory, some on disk;raster::fromDiskreturnedFALSEin those cases, soCopydidn't occur on the file-backed layer files. UsingFilenamesinstead to determine if there are any files that need copying.
- R
Published by achubaty almost 5 years ago
reproducible - v1.2.6
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.6
Enhancements
- Optional (and may be default soon) -- An update to the internal digesting for file-backed Rasters that should be substantially faster, and smaller disk footprint. Set using
options("reproducible.useNewDigestAlgorithm" = 2) - changed default of
options("reproducible.polygonShortcut" = FALSE)as there were still too many edge cases that were not covered.
Bug fix
RasterStackobjects with a single file (thus acting like aRasterBrick) are now handled correctly byCacheandprepInputsfamilies, especially with newoptions("reproducible.useNewDigestAlgorithm" = 2), though in tests, it worked with default also- Fix issue #185, RSQLite now uses a RNG during dbAppend; this affected 2 tests.
- R
Published by achubaty about 5 years ago
reproducible - v1.2.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.2.1
New features
- harmonized message colours that are use adjustable via options:
reproducible.messageColourPrepInputsfor allprepInputsfunctions;reproducible.messageColourCachefor allCachefunctions; andreproducible.messageColourQuestionfor questions that require user input. Defaults arecyan,blueandgreenrespectively. These are user-visible colour changes. - improved messaging for
Cachecases where afile.linkis used instead of saving. - with improved messaging, now
options(reproducible.verbose = 0)will turn off almost all messaging. postProcessand family now havefilename2 = NULLas the default, so not saved to disk. This is a change.verboseis now an argument throughout, whose default isgetOption(reproducible.verbose), which is set by default to1. Thus, individual function calls can be more or less verbose, or the whole session via option.
Bug fixes
RasterStackobjects were not correctly saved to disk under some conditions inpostProcess- fixed- several minor
- R
Published by achubaty over 5 years ago
reproducible - v1.1.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.1.1
New features
- none
Dependency changes
- none
bug fixes
- fix CRAN test failure when
file.linkdoes not succeed.
- R
Published by achubaty almost 6 years ago
reproducible - v1.1.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.1.0
New features
- begin to accommodate changes in GDAL/PROJ and associated updates to other spatial packages.
More updates are expected as other spatial packages (namely
raster) are updated. - can now change
options('reproducible.cacheSaveFormat')on the fly; cache will look for the file bycacheIdand write it usingoptions('reproducible.cacheSaveFormat'). If it is in another format, Cache will load it and resave it with the new format. Experimental still. - new
Copymethods forrefClassobjects,SQLiteand movedenvironmentmethod intoANYas it would be dispatched for unknown classes that inherit fromenvironment, of which there are many and this should be intercepted Requirecan now handle minimum version numbers, e.g.,Require("bit (>=1.1-15.2)"); this can be worked into downstream tools. Still experimental.- Cache will do
file.linkorfile.symlinkif an existing Cache entry with identical output exists and it is large (currently1e6bytes); this will save disk space. - Cache database now has tags for elapsed time of "digest", "original call", and "subsequent recovery from file",
elapsedTimeDigest,elapsedTimeFirstRun, andelapsedTimeLoad, respectively. - Better management of temporary files in package and tests, e.g., during downloading (
preProcess). Includes 2 new functions,tempdir2andtempfile2for use withreproduciblepackage - New option:
reproducible.tempPath, which is used for the new control of temporary files. Defaults tofile.path(tempdir(), "reproducible"). This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned - Copying or moving of Cache directories now works automatically if using default
drvandconn; user may need to manually callmovedCacheif cache is not responding correctly. File-backed Rasters are automatically updated with new paths. - Cache now treats file-backed Rasters as though they had a relative path instead of their absolute path.
This means that Cache directories can be copied from one location to another and the file-backed
Raster*will have their filenames updated on the fly during a Cache recovery. User doesn't need to do anything. postProcessnow will perform simple tests and skipcropInputsandprojectInputswith a message if it can, rather than usingCacheto "skip". This should speed uppostProcessin many cases.- messaging with
Cachehas change. Now,cacheIdis shown in all cases, making it easier to identify specific items in the cache. - Automatically cleanup temporary (intermediate) raster files (with #110).
Dependency changes
- none
bug fixes
Copyonly creates a temporary directory for filebacked rasters; previously anyCopycommand was creating a temporary directory, regardless of whether it was neededcropInputs.spatialObjectshad a bug when object was a large non-Raster class.cropInputsmay have failed due to "self intersection" error when x was aSpatialPolygons*object; now catches error, runsfixErrorsand retriescrop. Great reprex by @tati-micheletti. Fixed in commit89e652ef111af7de91a17a613c66312c1b848847.Filenamesbugfix related toRasterBrickprepInputsdoes a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.prepInputsnow will not show message that it is loading object into R iffun = NULL(#135).
- R
Published by achubaty almost 6 years ago
reproducible - v1.0.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 1.0.0
New features
- This version is not backwards-compatible out of the box. To maintain backwards compatibility, set:
options("reproducible.useDBI" = FALSE) - A new backend was introduced that uses
DBIpackage directly, withoutarchivist. This has much improved speed. - New option:
options("reproducible.cacheSaveFormat"). This can be eitherrds(default) orqs. All cached objects will be saved with this format. Previously it wasrda. - Cache objects can now be saved with with
qs::qsave. In many cases, this has much improved speed and file sizes compared tords; however, testing across a wide range of conditions will occur before it becomes the default. - Changed default behaviour for memoising
...becauseCacheis now much faster, the default is to turn memoising off, viaoptions("reproducible.useMemoise" = FALSE). In cases of large objects, memoising should still be faster, so user can still activate it, setting the option toTRUE. - Much better SQLite database handling for concurrent write attempts. Tested with dozens of write attempts per second by 3 cores with abundant locked database occurrences.
postProcessarguseGDALcan now take"force"as the default behaviour is to not use GDAL if the problem can fit into RAM andsforrastertools will be faster thanGDALtoolsuseCloudargument inCacheand family has slightly modified functionality (see ?Cache new sectionuseCloud) and now has more tests including edge cases, such asuseCloud = TRUE, useCache = 'overwrite'. The cloud version now will also follow the"overwrite"command.
Dependency changes
- deprecating
archivist; moved to Suggests. - removed imports for
bitops,dplyr,fasterize,flock,git2r,lubridate,RcppArmadillo,RCurlandtidyselect. Some of these went to Suggests.
bug fixes
postProcesscalls that use GDAL made more robust (including #93).- Several minor, edge cases were detected and fixed.
- R
Published by achubaty about 6 years ago
reproducible - v0.2.10
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.10
Dependency changes
- made compatible with
googledrivev 1.0.0 (#119)
New features
pkgDep2, a new convenience function to get the dependencies of the "first order" dependencies.useCache, used in many functions (inclCache,postProcess) can now be numeric, a qualitative indicator of "how deep" nestedCachecalls should setuseCache = TRUE-- implemented as 1 or 2 inpostProcesscurrently. See?Cache
bug fixes
pkgDepwas becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second forpkgDep("reproducible"))- improved
retryto use exponential backoff when attempting to access online resources (#121)
- R
Published by achubaty over 6 years ago
reproducible - v0.2.9
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.9
New features
- Cache has 2 new arguments,
useCloudandcloudFolderID. This is a new approach to cloud caching. It has been tested with file backed RasterLayer, RasterStack and RasterBrick and all normal R objects. It will not work for any other class of disk-backed files, e.g.,fforbigmatrix, nor is it likely to work for R6 class objects. - Slowly deprecating cloudCache and family of functions in favour of a new approach using arguments to
Cache, i.e.,useCacheandcloudFolderID downloadDatafrom GoogleDrive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
Bug fixes
- fixes for
rcnsterrors on R-devel, tested usingdevtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5)) - other minor impovements, included fixes for #115
- R
Published by achubaty over 6 years ago
reproducible - v0.2.8
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.8
New features
- new functions for accessing specific items from the
cacheRepo:getArtifact,getCacheId,getUserTags cloudSyncCachehas more options that are implemented and many unitTests
bug fixes
prepInputswasn't correctly passinguseCachecropInputswas reprojecting extent of y as a time saving approach, but this was incorrect ifstudyAreais aSpatialPolygonthat is not close to filling the extent. It now reprojectsstudyAreadirectly which will be slower, but correct. -- fixes issue #93- other minor improvements
- R
Published by achubaty almost 7 years ago
reproducible - v0.2.7
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.7
New features
CHECKSUMS.txtshould now be ordered consistently across operating systems (note:base::orderwill not succeed in doing this --> now using.orderDotsUnderscoreFirst)cloudSyncCachehas a new argument:cacheIds. Now user can control entries bycacheId, so can delete/upload individual objects bycacheId- Experimental support within the
postProcessfamily forsfclass objects
bug fixes
- mostly minor
cloudCachebugfixes for more cases
- R
Published by achubaty about 7 years ago
reproducible - v0.2.6
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.6
Dependency changes
- remove
tibblefrom Imports as it's no longer being used
New features
- remove
%>%pipe that was long ago deprecated. User should use%C%if they want a pipe that is Cache-aware. See examples. - Full rewrite of all
optionsdescriptions now inreproducible, see?reproducibleOptions - now
cacheRepoandoptions("reproducible.cachePath")can take a vector of paths. Similar to how .libPaths() works for libraries,Cachewill search first in the first entry in thecacheRepo, then the second etc. until it finds an entry. It will only write to the first entry. - new value for the option:
options("reproducible.useCache" = "devMode"). The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. IndevMode, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in thecacheRepo, but it does find an entry that matches based onuserTags. In this case, it will delete the old entry in thecacheRepo(identified based on matchinguserTags), then continue with normalCache. For this to work correctly,userTagsmust be unique for each function call. This should be used with caution as it is still experimental. - change to how hashes are calculated. This will cause existing caches to not work correctly. To allow a user to keep old behaviour (during a transition period), the "old" algorigthm can be used, with
options("reproducible.useNewDigestAlgorithm" = FALSE). There is a message of this change on package load. - add experimental
cloud*functions, especiallycloudCachewhich allows sharing of Cache among collaborators. Currently only works withgoogledrive - updated
assessDataTypeto consolidateassessDataTypeGDALandassessDataTypeinto single function (#71, @ianmseddy) cc: new function -- a shortcut for some commonly used options forclearCache()- added experimental capacity for
prepInputsto handle.rararchives, on systems with correct binaries to deal with them (#86, @tati-micheletti) - remove
fastdigest::fastdigestas it is not return the identical hash across operating systems
Bug fixes
prepInputson GIS objects that don't useraster::rasterto load object were skippingpostProcess. Fixed.- under some circumstances, the
prepInputswould cause virtually all entries inCHECKSUMS.txtto be deleted. 2 cases where this happened were identified and corrected. data.tableclass objects would give an error sometimes due to use ofattr(DT). Internally, attributes are now added withdata.table::setattrto deal with this.- calling
gdalwarpfromprostProcessnow correctly matches extent (#73, @tati-micheletti) - files from url that have unknown extension are now guessed with by
preProcess(#92, @tati-micheletti)
- R
Published by achubaty about 7 years ago
reproducible - v0.2.5
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.5
Dependency changes
- Added
remotesto Imports and removeddevtools
New features
- New value possible for
options(reproducible.useCache = 'overwrite'), which allows use ofCachein cases where the function call has an entry in thecacheRepo, will purge it and add the output of the current call instead. - New option
reproducible.inputPaths(defaultNULL) andreproducible.inputPathsRecursive(defaultFALSE), which will be used inprepInputsas possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy. dlGoogle()now setsoptions(httr_oob_default = TRUE)if using Rstudio Server.- Files in
CHECKSUMSnow sorted alphabetically. Checksumscan now have aCHECKSUMS.txtfile located in a different place than thedestinationPath- Attempt to select raster resampling method based on raster type if no method supplied (#63, @ianmseddy)
projectInputsnew function
assessDataTypeGDAL, used inpostProcess, to identify smallestdatatypefor large Raster* objects passed to GDAL system call- when masking and reprojecting large
Rasterobjects, enactgdalwarpsystem call ifraster::canProcessInMemory(x,4) = FALSEfor faster and memory-safe processing - better handling of various data types in
Rasterobjects, including factor rasters
- when masking and reprojecting large
Bug fixes
- Work around internally inside
extractFromArchivefor large (>2GB) zip files. In theRhelp manual,unzipfails for zip files >2GB. This uses a system call if the zip file is too large and fails usingbase::unzip. - Work around for
raster::getDataissues. - Speed up of
Cache()when deeply nested, due togrep(sys.calls(), ...)that would take long and hang. - Bugfix for
preProcess(url = NULL)(#65, @tati-micheletti) - Improved memory performance of
clearCache(#67), especially for largeRasterobjects that are stored as binaryRfiles (i.e.,.rda) - Other minor bugfixes
Other changes
- Deal with new
rasterpackage changes in development version ofrasterpackage - Added checks for float point number issues in raster resolutions produced by
raster::projectRaster .robustDigestnow does not includeCache-added attributes- Additonal tests for
preProcess()(#68, @tati-micheletti) - Many new unit tests written, which caught several minor bugs
- R
Published by achubaty over 7 years ago
reproducible - v0.2.3
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.3
- fix and skip downloading test on CRAN
- R
Published by achubaty over 7 years ago
reproducible - v0.2.2
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.2
Dependency changes
- Add
futureto Suggests.
New features
- new option on non-Windows OSs to use
futureforCachesaving to SQLite database, viaoptions("reproducible.futurePlan"), if thefuturepackage is installed. This isFALSEby default. - If a
do.callfunction is Cached, previously, it would be labelled in the database asdo.call. Now it attempts to extract the actual function being called by thedo.call. Messaging is similarly changed. - new option
reproducible.ask, logical, indicating whetherclearCacheshould ask for deletions when in an interactive session prepInputs,preProcessanddownloadFilenow havedlFun, to pass a custom function for downloading (e.g., "raster::getData")prepInputswill automatically usereadRDSif the file is a.rds.prepInputswill return alistiffun = "base::load", with a message; can still pass anenvirto obtain standard behaviour ofbase::load.clearCache- new argumentask.- new function
assessDataType, used inpostProcess, to identify smallestdatatypefor Raster* objects, if user does not pass an explicitydatatypeinprepInputsorpostProcess(#39, @CeresBarros).
Bug fixes
- fix problems with tests introduced by recent
git2rupdate (@stewid, #36). .prepareRasterBackedFile-- now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the.rdafile. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.- options were wrongly pointing to
spades.XXXand should have beenreproducible.XXX. copyFiledid not perform correctly under all cases; now better handling of these cases, often sending tofile.copy(slower, but more reliable)extractFromArchiveneeded a newChecksumfunction call under some circumstances- several other minor bug fixes.
extractFromArchive-- when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)prepInputs-- arguments that were same asCachewere not being correctly passed internally toCache, and if wrapped in Cache, it was not passed into prepInputs. Fixed..prepareFileBackedRasterwas failing in some cases (specifically if it was inside ado.call) (#40, @CeresBarros).Cachewas failing under some cases ofCache(do.call, ...). Fixed.Cache-- when arguments to Cache were the same as the arguments inFUN, Cache would "take" them. Now, they are correctly passed to theFUN.preProcess-- writing to checksums may have produced a warning ifCHECKSUMS.txtwas not present. Now it does not.- numerous other minor bugfixes
Minor changes
- most tests now use a standardized approach to attaching libraries, creating objects, paths, enabling easier, error resistent test building
- R
Published by achubaty over 7 years ago
reproducible - v0.2.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.1
New features
new functions:
convertPathsandconvertRasterPathsto assist with renaming moved files.
prepInputs-- new featuresalsoExtractnow has more options (NULL,NA,"similar") and defaults to extracting all files in an archive (NULL).- skips
postProcessaltogether if nostudyAreaorrasterToMatch. Previously, this would invoke Cache even if there was nothing topostProcess.
Bug fixes
copyFilecorrectly handles directory names containing spaces.makeMemoisablefixed to handle additonal edge cases.- other minor bug fixes.
- R
Published by achubaty over 7 years ago
reproducible - v0.2.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.2.0
New features
new functions:
prepInputsto aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.postProcesswhich is a wrapper for sequences of several other new functions (cropInputs,fixErrors,projectInputs,maskInputs,writeOutputs, anddetermineFilename)downloadFilecan handle Google Drive and ftp/http(s) fileszipCacheandmergeCachecompareNAdoes comparisons with NA as a possible value e.g.,compareNA(c(1,NA), c(2, NA))returnsFALSE, TRUE
Cache -- new features:
- new arguments
showSimilar,verbosewhich can help with debugging - new argument
useCachewhich allows turning caching on and off at a high level (e.g., options("useCache")) - new argument
cacheIdwhich allows user to hard code a result from a Cache - deprecated arguments:
digestPathContent-->quick,compareRasterFileLength-->length - Cache arguments now propagate inward to nested
Cachefunction calls, unless explicitly set on the inner functions - more precise messages provided upon each use
- many more
userTagsadded automatically to cache entries so much more powerful searching viashowCache(userTags="something")
- new arguments
clearCacheandshowCachenow give messages and require user intervention if request toclearCachewould be large quantities of data deletedmemoise::memoisenow used on 3rd run through an identicalCachecall, dramatically speeding up in most casesnew options:
reproducible.cachePath,reproducible.quick,reproducible.useMemoise,reproducible.useCache,reproducible.useragent,reproducible.verboseasPathhas a new argument indicating how deep should the path be considered when included in caching (only relevant whenquick = TRUE)New vignette on using Cache
Cache is
parallel-safe, meaning there aretryCatcharound every attempt at writing to SQLite database so it can be used safely on multi-threaded machinesbug fixes, unit tests, more
importsfor packages e.g.,statsupdates for R 3.6.0 compact storage of sequence vectors
experimental pipes (
%>%,%C%) and assign%<%several performance enhancements
- R
Published by achubaty over 7 years ago
reproducible - v0.1.4
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.1.4
- Cached pipe operator %C% -- use to begin a pipe sequence, e.g.,
Cache() %C% ... - Cache arg
sideEffectcan now be a path - Cache arg
digestPathContentdefault changed from FALSE (was for speed) to TRUE (for content accuracy) - New function,
searchFull, which shows the full search path, known alternatively as "scope", or "binding environments". It is where R will search for a function when requested by a user. - Uses memoise::memoise for several functions (
loadFromLocalRepo,pkgDep,package_dependencies,available.packages) for speed -- will impact memory at the expense of speed New
Requirefunction- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
requireon those 20 packages, butrequiredoes not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes. - can accept uncommented name, if length 1.
- attempts to create a lighter weight package reproducibility chain. This function is usable in a reproducible workflow: it includes both installing and loading of packages, it can maintain version numbers, and uses smart caching for speed. In tests, it can evaluate whether 20 packages and their dependencies (~130 packages) are installed and loaded quickly (i.e., if all TRUE, ~0.1 seconds). This is much slower than running
remove
dplyrfrom ImportsAdd
RCurlto Importschange name of
digestRasterto.digestRaster
- R
Published by achubaty about 8 years ago
reproducible - v0.1.3
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.1.3
- fix R CMD check errors on Solaris that were not previously resolved
- R
Published by achubaty over 8 years ago
reproducible - v0.1.2
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.1.2
- fix solaris check errors
- fix bug in
digestRasteraffecting in-memory rasters - move
rgdalto Suggests
- R
Published by achubaty over 8 years ago
reproducible - v0.1.1
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.1.1
- cleanup examples and do run them (per CRAN)
- add tests to ensure all exported (non-dot) functions have examples
- R
Published by achubaty over 8 years ago
reproducible - v0.1.0
Known issues: https://github.com/PredictiveEcology/reproducible/issues
version 0.1.0
- A new package, which takes all caching utilities out of the
SpaDESpackage.
- R
Published by achubaty over 8 years ago