https://github.com/cheind/pl-git-callback
A PyTorch-Lightning callback to increase model reproducibility through enforcing consistent git repository states upon training and validation.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
A PyTorch-Lightning callback to increase model reproducibility through enforcing consistent git repository states upon training and validation.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
pl-git-callback
This library provides a PyTorch-Lightning callback to increase model reproducibility through enforcing specific git repository states upon training and validation.
Problem
Reproducibility is key to the scientific approach. For ML, reproducibility is an equally important concern [1], since minimal differences in implementations (e.g. varying random seeds, model initialisation, etc.) can lead to highly divergent benchmark results.
PyTorch-Lightning already includes mechanisms to increase reproducibility, but to our knowledge no mechanism is yet foreseen to ensure that models conform to a certain code base. One is free to change the source code of a model (to a certain extent) without actually breaking existing checkpoints.
Contribution
This callback is designed to increase reproducibility at source code level. On the one hand, it ensures that the code repository is in a clean state before training and that there are no uncommitted changes. Secondly, it injects commit information into the checkpoints created during training so that you can better track the associated source code revision. Finally, the callback ensures that loaded checkpoints are compatible with the current state of the repository.
Usage
```python from plgitcallback import GitCommitCallback
The following will ensure a clean git repo before
training starts and also inject repo information to
checkpoints.
cb = GitCommitCallback(git_dir='.', mode="strict")
Default PyTorch-Lightning pipeline
model = MyModel() cp = ModelCheckpoint(filename="mymodel", mode="min", monitor="valloss") trainer = pl.Trainer(maxepochs=1, callbacks=[cb, cp]) trainer.fit(model, DataLoader(dstrain), DataLoader(dsval)) ```
Install
pip install git+https://github.com/cheind/pl-git-callback.git#egg=pl-git-callback[dev]
Operation modes
The callback currently operates in either strict or relaxed
mode. The difference being that strict mode leads to exceptions
when a commit inconsistency is detected, whereas relaxed raises
warnings. Hence, in strict mode the training stops when for
example uncommitted changes are detected.
Logging
This callback injects git commit info into model checkpoints and
writes a git_info.json file to trainer's log dir.
To extract GitStatus information from a PyTorch-Lightning
checkpoint file see function plgitcallback.gitstatus_from_lightning_checkpoint.
Open Issues
- PyTorch-Lightning seems to not call
on_load_checkpointwhen only validating/testing a model, hence bypassing the comparison logic ofGitCommitCallback. See https://github.com/PyTorchLightning/pytorch-lightning/issues/8550
Testing
Run all tests via
bash
pytest
References
[1] https://sites.google.com/view/icml-reproducibility-workshop/icml2017
Owner
- Name: Christoph Heindl
- Login: cheind
- Kind: user
- Location: Austrian area
- Website: https://cheind.github.io/
- Repositories: 88
- Profile: https://github.com/cheind
I am a computer scientist working at the interface of perception, robotics and deep learning.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- GitPython *
- pytorch-lightning *
- black * development
- flake8 * development
- pytest * development