ml.recipes

Increase citations, ease review & collaboration A collection of "easy wins" to make machine learning in research reproducible. This tutorial focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

https://github.com/jesperdramsch/ml.recipes

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary

Keywords

jupyter-book machine-learning ml4science python reproducibility science software-sustainability

Keywords from Contributors

interpretability interactive mesh distribution sequences generic projection optim hacking data-manager

Last synced: 7 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: JesperDramsch
License: mit
Language: HTML
Default Branch: main
Homepage: https://ml.recipes/
Size: 12.1 MB

Statistics

Stars: 74
Watchers: 6
Forks: 17
Open Issues: 0
Releases: 3

Topics

jupyter-book machine-learning ml4science python reproducibility science software-sustainability

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Citation

README.md

Increase citations, ease review & collaboration

A collection of "easy wins" to make machine learning in research reproducible.

This book aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This book focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

Data

This tutorial uses the Palmer Penguins dataset.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

Artwork by @allison_horst

| ▲ Top |

Usage

Building the book

If you'd like to develop and/or build the Increase citations, ease review & collaboration book, you should:

Clone this repository
Run pip install -r requirements.txt (it is recommended you do this within a virtual environment)
(Optional) Edit the books source files located in the book/ directory
(Optional) Jupytext syncs the content between python_scripts and book/notebooks to enable diffs.
Run jupyter-book clean book/ to remove any existing builds
Run jupyter-book build book/

A fully-rendered HTML version of the book will be built in book/_build/html/.

Jupytext

This repo uses: Jupytext doc

To synchronize the notebooks and the Python scripts (based on filestamps, only input cells content is modified in the notebooks):

The idea and implementation for jupytext were copied from the Euroscipy 2019 scikit-learn tutorial. Thanks for the great work!

$ jupytext --sync notebooks/*.ipynb

or simply use:

$ make sync

If you create a new notebook, you need to set-up the text files it is going to be paired with:

$ jupytext --set-formats notebooks//ipynb,python_scripts//auto:percent notebooks/*.ipynb

or simply use:

$ make format

To render all the notebooks (from time to time, slow to run):

$ make render

| ▲ Top |

Hosting the book

Please see the Jupyter Book documentation to discover options for deploying a book online using services such as GitHub, GitLab, or Netlify.

For GitHub and GitLab deployment specifically, the cookiecutter-jupyter-book includes templates for, and information about, optional continuous integration (CI) workflow files to help easily and automatically deploy books online with GitHub or GitLab. For example, if you chose github for the include_ci cookiecutter option, your book template was created with a GitHub actions workflow file that, once pushed to GitHub, automatically renders and pushes your book to the gh-pages branch of your repo and hosts it on GitHub Pages when a push or pull request is made to the main branch.

Contributors

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Credits

This project is created using the excellent open source Jupyter Book project and the executablebooks/cookiecutter-jupyter-book template. Notebooks are synced with scripts using jupytext for version control.

| ▲ Top |

Owner

Name: Jesper Dramsch
Login: JesperDramsch
Kind: user
Location: Bonn
Company: @ECMWF

Website: dramsch.net
Repositories: 101
Profile: https://github.com/JesperDramsch

Scientist for Machine Learning. 🦾 No step on snek. 🐍 You miss 99% of the benchmarks you don't overfit on.

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'If you use this ML.recipes, please cite it as below.'
authors:
    - family-names: Dramsch
      given-names: Jesper Sören
      orcid: https://orcid.org/0000-0001-8273-905X
    - family-names: Maggio
      given-names: Valerio
      orcid: https://orcid.org/0000-0003-4824-893X
title: 'ML Recipes – Increase citations, ease review & foster collaboration '
version: PyData-Global-2022
identifiers:
    - type: doi
      value: 10.5281/zenodo.10381234
date-released: 2022-12-03
url: 'https://github.com/JesperDramsch/ml-for-science-reproducibility-tutorial'

GitHub Events

Total

Watch event: 4
Delete event: 2
Issue comment event: 1
Push event: 18
Pull request event: 8
Create event: 7

Last Year

Watch event: 4
Delete event: 2
Issue comment event: 1
Push event: 18
Pull request event: 8
Create event: 7

Committers

Last synced: 11 months ago

All Time

Total Commits: 77
Total Committers: 3
Avg Commits per committer: 25.667
Development Distribution Score (DDS): 0.065

Past Year

Commits: 5
Committers: 2
Avg Commits per committer: 2.5
Development Distribution Score (DDS): 0.4

Top Committers

Name	Email	Commits
Jesper Dramsch	j**r@d**t	72
dependabot[bot]	4****]	3
Valerio Maggio	l****o	2

Committer Domains (Top 20 + Academic)

dramsch.net: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: about 7 hours
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.29
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 0
Pull requests: 4
Average time to close issues: N/A
Average time to close pull requests: about 9 hours
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.25
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 3

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (6)
JesperDramsch (2)

Top Labels

Issue Labels

Pull Request Labels

dependencies (6) github_actions (6) CI/CD (5) sync (2)

Dependencies

environment.yml pypi

surgeon-pytorch >=0.0.4

requirements.txt pypi

jupyter >=1.0.0
matplotlib >=3.5.3
numpy >=1.22.3
pandas >=1.5.1
pandera >=0.13.4
pydantic >=1.10.2
pytorch >=1.13.0
scikit-learn >=1.1.3
seaborn >=0.12.1
shap >=0.41.0
surgeon-pytorch >=0.0.4