mitosis

Reproduce Machine Learning experiments easily.

https://github.com/jacob-stevens-haas/mitosis

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Reproduce Machine Learning experiments easily.

Basic Info
  • Host: GitHub
  • Owner: Jacob-Stevens-Haas
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 300 KB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 2
  • Open Issues: 20
  • Releases: 5
Created over 3 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Documentation Status License: MIT PyPI version Downloads Code style: black

Overview

Mitosis is an experiment runner. It handles administrative tasks to decrease the mental overhead of collaboration: * Creating a CLI for your experiment * Recording commit information * Tracking parameterization, as well as parameter names (e.g. "low-noise") * Storing logs * Generating HTML visuals * Pickling result data

The virtuous consequence of these checks and organization is a faster workflow, a more rigorous scientific method, and reduced mental overhead of collaboration.

Trivial Example

Hypothesis: the maximum value of a sine wave is equal to its amplitude.

sine_experiment/__init__.py

import numpy as np
import matplotlib.pyplot as plt

name = "sine-exp"
lookup_dict = {"frequency": {"fast": 10, "slow": 1}}

def run(amplitude, frequency):
    """Deterimne if the maximum value of the sine function equals ``amplitude``"""
    x = np.arange(0, 10, .05)
    y = amplitude * np.sin(frequency * x)
    err = np.abs(max(y) - amplitude)
    plt.title("What's the maximum value of a sine wave?")
    plt.plot(x, y, label="trial data")
    plt.plot(x, amplitude * np.ones_like(x), label="expected")
    plt.legend()
    return {"main": err, "data": y}

pyproject.toml

[tool.mitosis.steps]
my_exp = ["sine_experiment:run", "sine_experiment:lookup_dict"]

Commit these changes to a repository. After installing sine_experiment as a python package, in CLI, run:

mitosis my_exp --param my_exp.frequency=slow --eval-param my_exp.amplitude=4

Mitosis will run sin_experiment.run(), saving all output as an html file in a subdirectory. It will also track the parameters and results. If you later change the variant named "slow" to set frequency=2, mitosis will raise a RuntimeError, preventing you from running a trial. If you want to run sine_experiment with a different parameter value, you need to name that variant something new. Eval parameters, like "amplitude" in the example, behave differently. Rather than being specified by lookup_dict, they are evaluated directly.

Use

Philosophically, an experiment is any time we run code with an aim to convince someone of something. As code, mitosis takes the approach that an experiment is a callable (or a sequence of callables).

Using mitosis involves registering experiments in pyproject.toml, naming interesting parameters, running experiments on the command line, and browsing results.

Registration

mitosis uses the tool.mitosis.steps table of pyproject.toml to learn what python callables are experiment steps and where to lookup named parameter values. It uses a syntax evocative of entry points:

[tool.mitosis.steps]
my_exp = ["sine_experiment:run", "sine_experiment:lookup_dict"]

Experiment steps must be callables with a dictionary return type. The returned dictionary is required to have a key "main". All but the final step in an experiment must also have a key "data" that gets passed to the first argument of the subsequent step. If the key "metrics" is present, it will display prominently in the HTML output

Developer note: Building an experiment step static type at mitosis._typing.ExpRun

CLI

The basic invocation lists the steps along with the values of any parameters for each step.

mitosis [OPTION...] step [steps...] [[-p step.lookup_param=key...]
    [-e step.eval_param=val...]]...

Some nuance: * --debug can be used to waive a lot of the reproducibility checks mitosis does. This arg allows you to run experiments in a dirty git repository (or no repository) and will neither save results in the experimental database, nor increment the trials counter, nor verify/lock in the definitions of any variants. It will, however, create the output notebook. It also changes the experiment log level from INFO to DEBUG. * lookup parameters can be nearly any python object that is pickleable. Tracking parameter values can be turned off for parameters either for something that isn't pickleable (e.g. a lambda function) or isn't important to track (e.g. which GPU to run on). This can be done with eval or lookup parameters by adding a + to the parameter, e.g. -e +jax_playground.gpu_id=1. * Eval parameters which are strings will need quotation marks that escape the shell (e.g. -e smoothing.kernel=\"rbf\") * -e and -p are short form for --eval-param and --param (lookup param).

Results

Trials are saved in trials/ (or whatever is passed after -F). Each trial has a pseudorandom bytes key, postpended to a metadata folder and an html output filename.

There are two obviously useful things to do after an experiment: * view the html file. python -m http.server is helpful to browse results * load the data with mitosis.load_trial_data()

Beyond this, the metadata mitosis keeps to disk is useful for troubleshooting or reproducing experiments, but no facility yet exists to browse or compare experiments.

API

Mitosis is primarily intended as a command line program, so mitosis --help has the syntax documentation. There is only one intentionally public part of the api: mitosis.load_trial_data().

Owner

  • Name: Jacob Stevens-Haas
  • Login: Jacob-Stevens-Haas
  • Kind: user
  • Location: Seattle, WA
  • Company: University of Washington

Ph.D.student at University of Washington Applied Math. Formerly Lead Scientist at Booz Allen Hamilton

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: mitosis
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jacob
    family-names: Stevens-Haas
    email: jacob.stevens.haas@gmail.com
    orcid: 'https://orcid.org/0000-0003-4142-5550'
    affiliation: University of Washington
repository-code: 'https://github.com/Jacob-Stevens-Haas/mitosis'
url: 'https://pypi.org/projects/mitosis'
abstract: >-
  A package designed to manage and visualize experiments,
  tracking changes across different commits,
  parameterizations, and random seed.
keywords:
  - experiments
  - reproducibility
license: MIT
commit: 8785ed4785b59ff77e43888b3561c58cf0a7bbef
version: 0.5.1
date-released: '2024-04-26'

GitHub Events

Total
  • Issues event: 6
  • Watch event: 2
  • Delete event: 9
  • Issue comment event: 2
  • Push event: 33
  • Pull request event: 5
  • Create event: 11
Last Year
  • Issues event: 6
  • Watch event: 2
  • Delete event: 9
  • Issue comment event: 2
  • Push event: 33
  • Pull request event: 5
  • Create event: 11

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 44
  • Total pull requests: 20
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 10 hours
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.05
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 6
  • Pull requests: 7
  • Average time to close issues: about 24 hours
  • Average time to close pull requests: about 23 hours
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.17
  • Average comments per pull request: 0.14
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Jacob-Stevens-Haas (39)
  • yb6599 (3)
  • MalachiteWind (1)
Pull Request Authors
  • Jacob-Stevens-Haas (34)
  • yb6599 (1)
Top Labels
Issue Labels
enhancement (11) bug (4) good first issue (4)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 129 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 24
  • Total maintainers: 1
pypi.org: mitosis

Reproduce Machine Learning experiments easily

  • Versions: 24
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 129 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 11.5%
Average: 28.6%
Downloads: 64.2%
Maintainers (1)
Last synced: 8 months ago

Dependencies

pyproject.toml pypi
  • GitPython *
  • ipykernel *
  • nbclient *
  • nbconvert *
  • nbformat *
  • numpy *
  • pandas *
  • papermill *
  • sqlalchemy *
.github/workflows/release.yaml actions