Recent Releases of Curifactory
Curifactory - v0.19.0
Added
- A
--no-reportflag to suppress output report generation - A
--pathsflag to print out all involved artifact file paths in the cache for a run. - More informative error handling during loading a metadata file should it fail to parse.
Scientific Software - Peer-reviewed
- Python
Published by WarmCyan 7 months ago
Curifactory - v0.18.0
Added
- An
ImageReporterfor adding any generated and saved images into the output report. - A
LatexTableReporterfor adding a latex string version of a dataframe in the report.
Scientific Software - Peer-reviewed
- Python
Published by WarmCyan over 1 year ago
Curifactory - v0.17.1
Added
- Link to the output log in generated reports.
Changed
--print-paramsoutput is now conditioned on--verbose: whether specifying a hash directly or the flag by itself, the_DRY_REPSwill be included when--verboseis specified and removed when not.
Fixed
- Excessive "no run info" warnings from caching when running an experiment notebook.
run_experimentincorrectly handling aparam_filesofNone.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 2 years ago
Curifactory - v0.17.0
Added
- Templating/keyword formating for cacher path overrides. This allows overriding cacher paths (at the expense of automatically not tracking them) to specify paths outside of the cache folder or directly including parameters in the filename etc.
PathRefcacher, a special type of cacher that allows exclusively passing around paths and short-circuiting directly based on that path's existence (as opposed to theFileReferenceCacherwhich saves a file containing the path), rather than handling saving/loading itself.--hashesdebugging flag, when specified it prints out the hash and name of each parameter set passed into an experiment and then exits.--print-paramsdebugging flag, when specified it prints out the full string representation of each parameter set passed into an experiment, or, if at least the first few characters of a hash are specified, it prints out the corresponding parameter set hash from theparams_registry.json. Note that both this and the--hashesflag are temporary debugging tools until the CLI gets broken out into subcommands, where they may become part of a separate command.
Fixed
--notebookmanager's not using modified experiment cache paths.- Manager maps are disabled after a
run_experimentcall, so managers used in live contexts (e.g. notebooks) may continue to run stages after the experiment has completed. - Experiments generating multiple reports instead of just once and linking/copying the folders as necessary.
Removed
- Old
ExperimentArgsreferences and associated deprecation warnings.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.16.1
Fixed
- Accidental singleton cacher objects in stage decorators causing all DAG-mode reproduction artifacts to always show as the artifacts from the first record.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.16.0
Added
- Optional dependency
curifactory[h5](pytables, for h5 pandas cacher) to setup. - Ability to configure whether non-curifactory logs are silenced with
--all-loggersflag.
Changed
- Repr for Lazy objects, so OutputSignatureErrors don't just list pointer addresses.
- Procedures initialized without an artifact manager don't auto-create one.
Instead, the
procedure.run()function now optionally takes a manager and records list.
Fixed
- Lazy instance cached from previous run not displaying correct preview in detailed report map.
- Experiment run spewing out command error if running from non-git-repo. (Single line warning is now displayed instead.)
- Raising InputSignatureError for potentially unrelated TypeErrors raised within stages.
- Completer parsing for experiments and parameters on MacOS.
generate_report()calls inside an experimentrun()breaking in map mode.- Fallback package report CSS not being used if report path has no style.css.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.15.1
Added
- Hash dry representation output to params registry, to help debug hashing.
Fixed
- Spacing issue around parameter set list in generated notebook.
- Extra metadata not grabbed in save_metadata if metadata had already been collected.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.15.0
The args -> params naming convention change will eventually cause breaking changes (currently args references should just trigger a deprecation warning.) See the migration guide for details on how to remove: https://ornl.github.io/curifactory/latest/migration.html
Added
PandasCacheras a more generalized variant ofPandasCsvCacherandPandasJsonCacher, supporting much more of the IO types pandas supports.
Changed
args.ExperimentArgstoparams.ExperimentParameters(former still exists with deprecation warning.)Record.argstoRecord.params(former still exists with deprecation warning.)- Organization in examples directory.
Fixed
- None extension for cacher not correctly handled in get_path.
- Generated experiment notebook not reference correct cache path for artifacts on store full runs.
set_logging_prefixincorrectly handling global logging scope (which can lead to recursion errors.)
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.14.2
Fixed
- DAG mapping incorrectly handling stages with missing inputs in state.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.14.1
Fixed
- DAG never adding a stage with no outputs to the execution list.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.14.0
DAG-based execution of stages is finally here!
Note that there are breaking API changes in this release, please see the migration guide:
Added
- DAG representation of experiment, this is created and analyzed during the experiment mapping phase. The DAG is used to more intelligently determine which stages need to execute, based on which outputs are ever actually needed for the final experiment outputs (leaf nodes).
--mapCLI flag, this runs the mapping phase of the experiment and then exits, printing out the experiment DAG and showing which artifacts it found in cache and the run name that generated them.inputsto aggregate stage decorator. This acts similarly toinputson a regular stage, except these input artifacts are searched for in the list of records the aggregate is running across, rather than the aggregate's own record. It is also not a requirement that the requested artifact exist in every passed record (though it will throw a warning on any records where it doesn't exist.) Similar tostage, each input needs to have a corresponding argument (with the same name as in the string) in the function definition. The artifacts for each input will be passed as a dictionary, where the values are the artifacts, and the keys are the records they come from. Note that while you can technically haveNoneas the inputs and still access each record's state, in order for the DAG to compute properly, you must specify each needed state artifact in the inputs. (or use the--no-dagflag listed below.)stage_cacherslist to record, at the beginning of every stage this will contain references to the initialized cachers for that stage - this can be used to get output path information.-nCLI flag shorthand for--names--paramsCLI flag long form of-pRawJupyterNotebookCacher, which takes a list of cells of raw strings of python code and stores them as a notebook. This is useful for exporting an interactive analysis with each experiment run.
Changed
--no-mapCLI flag to--no-dag, which disables both the mapping phase and the DAG analysis/DAG-based execution determination. This returns curifactory to its regular stage-by-stage cache short-circuit determination. NOTE: if any weird bugs are encountered, or ifinputsisn't set on aggregate stages, it's advisable to use this flag.--parallel-modeflag to--parallel-safe
Fixed
- Record copy not also containing a copy of the state artifact representations.
- Wrong progress bar updating if multiple records/args had the same hash
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 2 years ago
Curifactory - v0.13.2
Fixed
- Docker module incorrectly using the
run_commandfunction. - Experiment passing in a cutoff run folder to the docker command.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.13.1
Fixed
- Reportables that implement render using the old
nameinstead ofqualified_name, causing unintended figure image overwrites.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.13.0
Added
- Check for
get_params()functions that aren't returning lists.
Changed
- An aggregate stage that is not explicitly given a set of records now takes manager records minus the record containing the currently running aggregate stage.
Fixed
- Record
make_copyadding the new record to the artifact manager twice. - Reportables ToC in report not correctly using the qualified names when cached reportables found.
LinePlotReporternot adding a legend when dictionaries provided for bothxandy.- Potential error when collecting metadata if manager run info doesn't have "status".
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.12.0
Added
- Bash/zsh tab-completion via
argcomplete. (This requires installingargcompleteoutside of the environment and adding a line to your shell's rc file in order to use. You can runcurifactory completion [--bash|--zsh]to add the line, or just runcurifactory completionfor instructions.) resolveoption toLazyoutputs - this allows not automatically loading the object on an input to the stage, directly providing the lazy instance instead. This allows delaying the loading, or simply getting the path of the object to deal with in some other way (e.g. passing to an external command.)
Fixed
experiment lsincorrectly handling curifactory configurations with experiment/param modules located in subdirectories.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.11.1
Fixed
--namesflag incorrectly checking existence of parameterset name.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.11.0
Added
- Curifactory submodules to top level import, so separately importing submodules is no longer necessary.
Changed
- Minimum python version to 3.9.
- Parameterset
nameto be ignored by hashing mechanism.
Fixed
- No longer using backported package
importlib_resourcesthat wasn't in the setup.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.10.1
Fixed
- Hash computation not correctly handling sub-dataclasses recursively.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.10.0
Added
- Metadata output for every cached artifact. Alongside every output cache file will be a
[file]_metadata.json, containing information about the run that generated it, the parameters, and previous stages run in the same record. trackparameter to cachers, indicating whether the output files should be copied into a full store run folder or not. (It is true by default.)- Optional cacher prefixes, which replaces the first part of a cached filepath name (normally the experiment name) with the provided prefix. This allows cross-experiment caching (use with care!)
- Optional cacher subdir, which places output files into the specified subdirectory in the cache/run folder (allows better organization, e.g. Kedro's data engineering convention of 01raw, 02intermediate, etc.)
- Allowing exact path overrides to be used by a cacher, making it cleaner to use them on the fly/outside of stages.
--versionflag on thecurifactorycommand.
Changed
- Full store cached files are now placed into an
artifacts/subdirectory of the run folder. PickleCacher's extension is now correctly set to.pkl(we aren't actually running gzip on it.)- Full store runs no longer call a cacher's
savefunction a second time with a new path, instead relying onRecord's path tracking to simply copy the cached files into the full store folder at the end of a stage. - Cachers' path mechanism - rather than expecting a cacher's
set_pathto be called beforehand,saveandloadshould call the cacher'sget_path(). - The default cachers'
save()functions return the path that was saved to. --nameflag to--prefixto make it more consistent to caching terminology.
Fixed
- Reportable names doubling when loading from cache.
- Silent execution when no parametersets provided or a requested parameterset name wasn't found, (now errors and exits.)
Breaking changes notes
Cacheable.set_pathno longer exists, any custom cachers that override it should remove this function.- Any
self.pathreferences in custom cacherssaveandloadshould replace it withself.get_path([optional_suffix]), and do this for any paths it writes out (rather than callingget_path, modifying the resulting path, and writing to that modified path) - Any previous cached files from the
PickleCacherwill no longer be used as the extension has changed
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.9.3
Fixed
- Lack of proper html escaping of args dump in output reports.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.9.2
Fixed
- String hash representation not recursively getting a string hash representation from any parameter sub-dataclasses.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.9.1
Changed
- String representation of hashed arguments will include the actual value of
the parameters in
IGNORED_PARAMSfor reporting purposes.
Fixed
- Distributed run detection not checking
RANKenv variable.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.9.0
Added
- Git init prompt on
curifactory init, if run in a folder that doesn't contain a.git
Changed
- Argument hashing to allow user to specify
hash_representationson their parameter dataclasses. This allows them to (optionally) provide a function for each individual parameter that will return a custom value to be hashed rather than simply the default string representation. This also allows completely ignoring parameters as part of their hash, by setting their hashing function toNone. (#5) - Arguments whose value is
Noneare not included as part of the hash. (#5)
Fixed
- Store full distributed run creating a full store folder for every distributed process. Store entire run now auto disabled on all non-rank-zero distributed processes
curifactory initnot extractingdebug.py(#6)
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 3 years ago
Curifactory - v0.8.2
Fixed
- Arg hashes and combo hashes attempting to write to parameters registry while in
--parallel-mode.
Removed
- Old dataclasses dependency. (Only used pre 3.6.)
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 3 years ago
Curifactory - v0.8.1
Added
- Ability to revert to plain log output (instead of rich logging handler) with
--plain.
Changed
- Rich progress bars are no longer used by default. They can be enabled with
the
--progressCLI flag.
Fixed
- Bug where the end of an experiment attempts to stop a rich progress bar even if one had not been started.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 3 years ago
Curifactory - v0.8.0
Added
- Command flag to regenerate the report index, useful for when importing run
from another machine. (run via
experiment reports --update.) - Add experiment mapping step before execution - steps through experiment code without executing stage bodies and records a list of all records and stages they call.
- Warn if calling a stage from within another stage - this breaks experiment mapping.
- Rich library dependency, terminal logging is now fancy with colors and progress bars!
- Logging notification if distributed run detected.
- Display control flags:
--no-color,--quiet.
Changed
- Improved CLI help messages.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 3 years ago
Curifactory - v0.7.0
Added
- Paths returned from record's
get_pathandget_dirare now tracked and copied into a store full run. PandasCSVCacherandPandasJSONCacherargument dictionaries to pass into pandas to/read calls.- Dirty git worktree warning in output log and indicator to output reports.
Changed
- Args hashes are now set from within the record constructor to avoid edge cases where hashes changed and broke aggregate hashing.
- Aggregate combo hashes are set on the record directly now, done to track a combo hash throughout a record's lifespan that started with an aggregate, without breaking the aggregate record's potential args hash as well.
Fixed
PandasCSVCacherread_csvnot appropriately handling index column by default.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII over 3 years ago
Curifactory - v0.6.3
Added
- Project
curifactorycommand tests
Changed
- Project init .gitignore handling to check for/add a blank line before adding the curifactory section
Fixed
- Notebook experiment folder not being created on init and from experiment.
- Notebook path config value not being used on notebook write.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII almost 4 years ago
Curifactory - v0.6.2
Added
- Newsgroups example experiment code (see
examples/minimal/experiments/newsgroups.py.)
Changed
- Parallel mode will automatically be set in a distributed pytorch scenario for all processes that aren't local/node rank 0.
Fixed
- Parallel mode causing crash if global args indices are not specified.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 4 years ago
Curifactory - v0.6.1
Changed
- Minimum python version to 3.8.
Fixed
- Parallel runs with reportable caching crashing due to attempts to pickle
references containing
multiprocessing.Lockinstances.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 4 years ago
Curifactory - v0.6.0
Added
FileReferenceCacherfor storing lists of referenced file paths without keeping file contents in memory.- Automatic reportable caching. Reportables of stages that short-circuit will now reload and display in the report.
- Misc example project folder for experiments demonstrating various curifactory features.
Changed
- Improve getting started documentation.
- Cacheables are now given a copy of the current record by the stages. This can be used to access the current argset and even directly get record state within the save/load implementation.
Fixed
- Missing files in git for example projects.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 4 years ago
Curifactory - v0.5.1
Fixed
.dockerignorenot correctly included in package data.- Setup documentation URL.
Changed
- Auto-redirect for docs index.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 4 years ago
Curifactory - v0.5.0
First open source release!
Added
__version__attribute to package init.curifactory initrunnable to create default project structure and filesget_pathandget_dirfunctions directly onRecordinstances for use in stage code. Note that these functions currently DO NOT keep track of usage, so whatever is stored at these paths doesn't get copied via store-full yet.- Example/tutorial notebook number 0, introducing the four primary underlying components of curifactory.
- Example/tutorial notebook number 1, introducing caching, lazy objects, and reporting.
- BSD 3-clause license
Changed
- Significant documentation updates.
- Cleaner minimal example experiment.
Fixed
- Using the (non-windows)
resourcemodule without an OS check in theaggregatedecorator. - Args dump in reports not making any string conversions html-safe (replacing
<and>with their&xx;equivalents.) --notesflag not working without an inline message.
Scientific Software - Peer-reviewed
- Python
Published by WildfireXIII about 4 years ago