Recent Releases of Curifactory

Curifactory - v0.19.0

Added

  • A --no-report flag to suppress output report generation
  • A --paths flag to print out all involved artifact file paths in the cache for a run.
  • More informative error handling during loading a metadata file should it fail to parse.

Scientific Software - Peer-reviewed - Python
Published by WarmCyan 7 months ago

Curifactory - v0.18.0

Added

  • An ImageReporter for adding any generated and saved images into the output report.
  • A LatexTableReporter for adding a latex string version of a dataframe in the report.

Scientific Software - Peer-reviewed - Python
Published by WarmCyan over 1 year ago

Curifactory - v0.17.1

Added

  • Link to the output log in generated reports.

Changed

  • --print-params output is now conditioned on --verbose: whether specifying a hash directly or the flag by itself, the _DRY_REPS will be included when --verbose is specified and removed when not.

Fixed

  • Excessive "no run info" warnings from caching when running an experiment notebook.
  • run_experiment incorrectly handling a param_files of None.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 2 years ago

Curifactory - v0.17.0

Added

  • Templating/keyword formating for cacher path overrides. This allows overriding cacher paths (at the expense of automatically not tracking them) to specify paths outside of the cache folder or directly including parameters in the filename etc.
  • PathRef cacher, a special type of cacher that allows exclusively passing around paths and short-circuiting directly based on that path's existence (as opposed to the FileReferenceCacher which saves a file containing the path), rather than handling saving/loading itself.
  • --hashes debugging flag, when specified it prints out the hash and name of each parameter set passed into an experiment and then exits.
  • --print-params debugging flag, when specified it prints out the full string representation of each parameter set passed into an experiment, or, if at least the first few characters of a hash are specified, it prints out the corresponding parameter set hash from the params_registry.json. Note that both this and the --hashes flag are temporary debugging tools until the CLI gets broken out into subcommands, where they may become part of a separate command.

Fixed

  • --notebook manager's not using modified experiment cache paths.
  • Manager maps are disabled after a run_experiment call, so managers used in live contexts (e.g. notebooks) may continue to run stages after the experiment has completed.
  • Experiments generating multiple reports instead of just once and linking/copying the folders as necessary.

Removed

  • Old ExperimentArgs references and associated deprecation warnings.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.16.1

Fixed

  • Accidental singleton cacher objects in stage decorators causing all DAG-mode reproduction artifacts to always show as the artifacts from the first record.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.16.0

Added

  • Optional dependency curifactory[h5] (pytables, for h5 pandas cacher) to setup.
  • Ability to configure whether non-curifactory logs are silenced with --all-loggers flag.

Changed

  • Repr for Lazy objects, so OutputSignatureErrors don't just list pointer addresses.
  • Procedures initialized without an artifact manager don't auto-create one. Instead, the procedure.run() function now optionally takes a manager and records list.

Fixed

  • Lazy instance cached from previous run not displaying correct preview in detailed report map.
  • Experiment run spewing out command error if running from non-git-repo. (Single line warning is now displayed instead.)
  • Raising InputSignatureError for potentially unrelated TypeErrors raised within stages.
  • Completer parsing for experiments and parameters on MacOS.
  • generate_report() calls inside an experiment run() breaking in map mode.
  • Fallback package report CSS not being used if report path has no style.css.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.15.1

Added

  • Hash dry representation output to params registry, to help debug hashing.

Fixed

  • Spacing issue around parameter set list in generated notebook.
  • Extra metadata not grabbed in save_metadata if metadata had already been collected.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.15.0

The args -> params naming convention change will eventually cause breaking changes (currently args references should just trigger a deprecation warning.) See the migration guide for details on how to remove: https://ornl.github.io/curifactory/latest/migration.html

Added

  • PandasCacher as a more generalized variant of PandasCsvCacher and PandasJsonCacher, supporting much more of the IO types pandas supports.

Changed

  • args.ExperimentArgs to params.ExperimentParameters (former still exists with deprecation warning.)
  • Record.args to Record.params (former still exists with deprecation warning.)
  • Organization in examples directory.

Fixed

  • None extension for cacher not correctly handled in get_path.
  • Generated experiment notebook not reference correct cache path for artifacts on store full runs.
  • set_logging_prefix incorrectly handling global logging scope (which can lead to recursion errors.)

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.14.2

Fixed

  • DAG mapping incorrectly handling stages with missing inputs in state.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.14.1

Fixed

  • DAG never adding a stage with no outputs to the execution list.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.14.0

DAG-based execution of stages is finally here!

Note that there are breaking API changes in this release, please see the migration guide:

Added

  • DAG representation of experiment, this is created and analyzed during the experiment mapping phase. The DAG is used to more intelligently determine which stages need to execute, based on which outputs are ever actually needed for the final experiment outputs (leaf nodes).
  • --map CLI flag, this runs the mapping phase of the experiment and then exits, printing out the experiment DAG and showing which artifacts it found in cache and the run name that generated them.
  • inputs to aggregate stage decorator. This acts similarly to inputs on a regular stage, except these input artifacts are searched for in the list of records the aggregate is running across, rather than the aggregate's own record. It is also not a requirement that the requested artifact exist in every passed record (though it will throw a warning on any records where it doesn't exist.) Similar to stage, each input needs to have a corresponding argument (with the same name as in the string) in the function definition. The artifacts for each input will be passed as a dictionary, where the values are the artifacts, and the keys are the records they come from. Note that while you can technically have None as the inputs and still access each record's state, in order for the DAG to compute properly, you must specify each needed state artifact in the inputs. (or use the --no-dag flag listed below.)
  • stage_cachers list to record, at the beginning of every stage this will contain references to the initialized cachers for that stage - this can be used to get output path information.
  • -n CLI flag shorthand for --names
  • --params CLI flag long form of -p
  • RawJupyterNotebookCacher, which takes a list of cells of raw strings of python code and stores them as a notebook. This is useful for exporting an interactive analysis with each experiment run.

Changed

  • --no-map CLI flag to --no-dag, which disables both the mapping phase and the DAG analysis/DAG-based execution determination. This returns curifactory to its regular stage-by-stage cache short-circuit determination. NOTE: if any weird bugs are encountered, or if inputs isn't set on aggregate stages, it's advisable to use this flag.
  • --parallel-mode flag to --parallel-safe

Fixed

  • Record copy not also containing a copy of the state artifact representations.
  • Wrong progress bar updating if multiple records/args had the same hash

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 2 years ago

Curifactory - v0.13.2

Fixed

  • Docker module incorrectly using the run_command function.
  • Experiment passing in a cutoff run folder to the docker command.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.13.1

Fixed

  • Reportables that implement render using the old name instead of qualified_name, causing unintended figure image overwrites.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.13.0

Added

  • Check for get_params() functions that aren't returning lists.

Changed

  • An aggregate stage that is not explicitly given a set of records now takes manager records minus the record containing the currently running aggregate stage.

Fixed

  • Record make_copy adding the new record to the artifact manager twice.
  • Reportables ToC in report not correctly using the qualified names when cached reportables found.
  • LinePlotReporter not adding a legend when dictionaries provided for both x and y.
  • Potential error when collecting metadata if manager run info doesn't have "status".

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.12.0

Added

  • Bash/zsh tab-completion via argcomplete. (This requires installing argcomplete outside of the environment and adding a line to your shell's rc file in order to use. You can run curifactory completion [--bash|--zsh] to add the line, or just run curifactory completion for instructions.)
  • resolve option to Lazy outputs - this allows not automatically loading the object on an input to the stage, directly providing the lazy instance instead. This allows delaying the loading, or simply getting the path of the object to deal with in some other way (e.g. passing to an external command.)

Fixed

  • experiment ls incorrectly handling curifactory configurations with experiment/param modules located in subdirectories.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.11.1

Fixed

  • --names flag incorrectly checking existence of parameterset name.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.11.0

Added

  • Curifactory submodules to top level import, so separately importing submodules is no longer necessary.

Changed

  • Minimum python version to 3.9.
  • Parameterset name to be ignored by hashing mechanism.

Fixed

  • No longer using backported package importlib_resources that wasn't in the setup.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.10.1

Fixed

  • Hash computation not correctly handling sub-dataclasses recursively.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.10.0

Added

  • Metadata output for every cached artifact. Alongside every output cache file will be a [file]_metadata.json, containing information about the run that generated it, the parameters, and previous stages run in the same record.
  • track parameter to cachers, indicating whether the output files should be copied into a full store run folder or not. (It is true by default.)
  • Optional cacher prefixes, which replaces the first part of a cached filepath name (normally the experiment name) with the provided prefix. This allows cross-experiment caching (use with care!)
  • Optional cacher subdir, which places output files into the specified subdirectory in the cache/run folder (allows better organization, e.g. Kedro's data engineering convention of 01raw, 02intermediate, etc.)
  • Allowing exact path overrides to be used by a cacher, making it cleaner to use them on the fly/outside of stages.
  • --version flag on the curifactory command.

Changed

  • Full store cached files are now placed into an artifacts/ subdirectory of the run folder.
  • PickleCacher's extension is now correctly set to .pkl (we aren't actually running gzip on it.)
  • Full store runs no longer call a cacher's save function a second time with a new path, instead relying on Record's path tracking to simply copy the cached files into the full store folder at the end of a stage.
  • Cachers' path mechanism - rather than expecting a cacher's set_path to be called beforehand, save and load should call the cacher's get_path().
  • The default cachers' save() functions return the path that was saved to.
  • --name flag to --prefix to make it more consistent to caching terminology.

Fixed

  • Reportable names doubling when loading from cache.
  • Silent execution when no parametersets provided or a requested parameterset name wasn't found, (now errors and exits.)

Breaking changes notes

  • Cacheable.set_path no longer exists, any custom cachers that override it should remove this function.
  • Any self.path references in custom cachers save and load should replace it with self.get_path([optional_suffix]), and do this for any paths it writes out (rather than calling get_path, modifying the resulting path, and writing to that modified path)
  • Any previous cached files from the PickleCacher will no longer be used as the extension has changed

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.9.3

Fixed

  • Lack of proper html escaping of args dump in output reports.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.9.2

Fixed

  • String hash representation not recursively getting a string hash representation from any parameter sub-dataclasses.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.9.1

Changed

  • String representation of hashed arguments will include the actual value of the parameters in IGNORED_PARAMS for reporting purposes.

Fixed

  • Distributed run detection not checking RANK env variable.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.9.0

Added

  • Git init prompt on curifactory init, if run in a folder that doesn't contain a .git

Changed

  • Argument hashing to allow user to specify hash_representations on their parameter dataclasses. This allows them to (optionally) provide a function for each individual parameter that will return a custom value to be hashed rather than simply the default string representation. This also allows completely ignoring parameters as part of their hash, by setting their hashing function to None. (#5)
  • Arguments whose value is None are not included as part of the hash. (#5)

Fixed

  • Store full distributed run creating a full store folder for every distributed process. Store entire run now auto disabled on all non-rank-zero distributed processes
  • curifactory init not extracting debug.py (#6)

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 3 years ago

Curifactory - v0.8.2

Fixed

  • Arg hashes and combo hashes attempting to write to parameters registry while in --parallel-mode.

Removed

  • Old dataclasses dependency. (Only used pre 3.6.)

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 3 years ago

Curifactory - v0.8.1

Added

  • Ability to revert to plain log output (instead of rich logging handler) with --plain.

Changed

  • Rich progress bars are no longer used by default. They can be enabled with the --progress CLI flag.

Fixed

  • Bug where the end of an experiment attempts to stop a rich progress bar even if one had not been started.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 3 years ago

Curifactory - v0.8.0

Added

  • Command flag to regenerate the report index, useful for when importing run from another machine. (run via experiment reports --update.)
  • Add experiment mapping step before execution - steps through experiment code without executing stage bodies and records a list of all records and stages they call.
  • Warn if calling a stage from within another stage - this breaks experiment mapping.
  • Rich library dependency, terminal logging is now fancy with colors and progress bars!
  • Logging notification if distributed run detected.
  • Display control flags: --no-color, --quiet.

Changed

  • Improved CLI help messages.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 3 years ago

Curifactory - v0.7.0

Added

  • Paths returned from record's get_path and get_dir are now tracked and copied into a store full run.
  • PandasCSVCacher and PandasJSONCacher argument dictionaries to pass into pandas to/read calls.
  • Dirty git worktree warning in output log and indicator to output reports.

Changed

  • Args hashes are now set from within the record constructor to avoid edge cases where hashes changed and broke aggregate hashing.
  • Aggregate combo hashes are set on the record directly now, done to track a combo hash throughout a record's lifespan that started with an aggregate, without breaking the aggregate record's potential args hash as well.

Fixed

  • PandasCSVCacher read_csv not appropriately handling index column by default.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII over 3 years ago

Curifactory - v0.6.3

Added

  • Project curifactory command tests

Changed

  • Project init .gitignore handling to check for/add a blank line before adding the curifactory section

Fixed

  • Notebook experiment folder not being created on init and from experiment.
  • Notebook path config value not being used on notebook write.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII almost 4 years ago

Curifactory - v0.6.2

Added

  • Newsgroups example experiment code (see examples/minimal/experiments/newsgroups.py.)

Changed

  • Parallel mode will automatically be set in a distributed pytorch scenario for all processes that aren't local/node rank 0.

Fixed

  • Parallel mode causing crash if global args indices are not specified.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 4 years ago

Curifactory - v0.6.1

Changed

  • Minimum python version to 3.8.

Fixed

  • Parallel runs with reportable caching crashing due to attempts to pickle references containing multiprocessing.Lock instances.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 4 years ago

Curifactory - v0.6.0

Added

  • FileReferenceCacher for storing lists of referenced file paths without keeping file contents in memory.
  • Automatic reportable caching. Reportables of stages that short-circuit will now reload and display in the report.
  • Misc example project folder for experiments demonstrating various curifactory features.

Changed

  • Improve getting started documentation.
  • Cacheables are now given a copy of the current record by the stages. This can be used to access the current argset and even directly get record state within the save/load implementation.

Fixed

  • Missing files in git for example projects.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 4 years ago

Curifactory - v0.5.1

Fixed

  • .dockerignore not correctly included in package data.
  • Setup documentation URL.

Changed

  • Auto-redirect for docs index.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 4 years ago

Curifactory - v0.5.0

First open source release!

Added

  • __version__ attribute to package init.
  • curifactory init runnable to create default project structure and files
  • get_path and get_dir functions directly on Record instances for use in stage code. Note that these functions currently DO NOT keep track of usage, so whatever is stored at these paths doesn't get copied via store-full yet.
  • Example/tutorial notebook number 0, introducing the four primary underlying components of curifactory.
  • Example/tutorial notebook number 1, introducing caching, lazy objects, and reporting.
  • BSD 3-clause license

Changed

  • Significant documentation updates.
  • Cleaner minimal example experiment.

Fixed

  • Using the (non-windows) resource module without an OS check in the aggregate decorator.
  • Args dump in reports not making any string conversions html-safe (replacing < and > with their &xx; equivalents.)
  • --notes flag not working without an inline message.

Scientific Software - Peer-reviewed - Python
Published by WildfireXIII about 4 years ago