ml.recipes

Increase citations, ease review & collaboration A collection of "easy wins" to make machine learning in research reproducible. This tutorial focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

https://github.com/jesperdramsch/ml.recipes

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords

jupyter-book machine-learning ml4science python reproducibility science software-sustainability

Keywords from Contributors

interpretability interactive mesh distribution sequences generic projection optim hacking data-manager
Last synced: 6 months ago · JSON representation ·

Repository

Increase citations, ease review & collaboration A collection of "easy wins" to make machine learning in research reproducible. This tutorial focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

Basic Info
  • Host: GitHub
  • Owner: JesperDramsch
  • License: mit
  • Language: HTML
  • Default Branch: main
  • Homepage: https://ml.recipes/
  • Size: 12.1 MB
Statistics
  • Stars: 74
  • Watchers: 6
  • Forks: 17
  • Open Issues: 0
  • Releases: 3
Topics
jupyter-book machine-learning ml4science python reproducibility science software-sustainability
Created over 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Citation

README.md

Increase citations, ease review & collaboration

Jupyter Book Badge Binder

A collection of "easy wins" to make machine learning in research reproducible.

This book aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.

This book focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.

Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.

This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.

Data

Gradient Open%20In%20SageMaker%20Studio%20Lab

This tutorial uses the Palmer Penguins dataset.

Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.

Artwork by @allison_horst Artwork by @allison_horst

| ▲ Top |

Usage

Building the book

If you'd like to develop and/or build the Increase citations, ease review & collaboration book, you should:

  1. Clone this repository
  2. Run pip install -r requirements.txt (it is recommended you do this within a virtual environment)
  3. (Optional) Edit the books source files located in the book/ directory
  4. (Optional) Jupytext syncs the content between python_scripts and book/notebooks to enable diffs.
  5. Run jupyter-book clean book/ to remove any existing builds
  6. Run jupyter-book build book/

A fully-rendered HTML version of the book will be built in book/_build/html/.

Jupytext

This repo uses: Jupytext doc

To synchronize the notebooks and the Python scripts (based on filestamps, only input cells content is modified in the notebooks):

The idea and implementation for jupytext were copied from the Euroscipy 2019 scikit-learn tutorial. Thanks for the great work!

$ jupytext --sync notebooks/*.ipynb

or simply use:

$ make sync

If you create a new notebook, you need to set-up the text files it is going to be paired with:

$ jupytext --set-formats notebooks//ipynb,python_scripts//auto:percent notebooks/*.ipynb

or simply use:

$ make format

To render all the notebooks (from time to time, slow to run):

$ make render

| ▲ Top |

Hosting the book

Please see the Jupyter Book documentation to discover options for deploying a book online using services such as GitHub, GitLab, or Netlify.

For GitHub and GitLab deployment specifically, the cookiecutter-jupyter-book includes templates for, and information about, optional continuous integration (CI) workflow files to help easily and automatically deploy books online with GitHub or GitLab. For example, if you chose github for the include_ci cookiecutter option, your book template was created with a GitHub actions workflow file that, once pushed to GitHub, automatically renders and pushes your book to the gh-pages branch of your repo and hosts it on GitHub Pages when a push or pull request is made to the main branch.

Contributors

We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.

Credits

This project is created using the excellent open source Jupyter Book project and the executablebooks/cookiecutter-jupyter-book template. Notebooks are synced with scripts using jupytext for version control.

| ▲ Top |

Owner

  • Name: Jesper Dramsch
  • Login: JesperDramsch
  • Kind: user
  • Location: Bonn
  • Company: @ECMWF

Scientist for Machine Learning. 🦾 No step on snek. 🐍 You miss 99% of the benchmarks you don't overfit on.

Citation (CITATION.cff)

cff-version: 1.2.0
message: 'If you use this ML.recipes, please cite it as below.'
authors:
    - family-names: Dramsch
      given-names: Jesper Sören
      orcid: https://orcid.org/0000-0001-8273-905X
    - family-names: Maggio
      given-names: Valerio
      orcid: https://orcid.org/0000-0003-4824-893X
title: 'ML Recipes – Increase citations, ease review & foster collaboration '
version: PyData-Global-2022
identifiers:
    - type: doi
      value: 10.5281/zenodo.10381234
date-released: 2022-12-03
url: 'https://github.com/JesperDramsch/ml-for-science-reproducibility-tutorial'

GitHub Events

Total
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 18
  • Pull request event: 8
  • Create event: 7
Last Year
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 18
  • Pull request event: 8
  • Create event: 7

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 77
  • Total Committers: 3
  • Avg Commits per committer: 25.667
  • Development Distribution Score (DDS): 0.065
Past Year
  • Commits: 5
  • Committers: 2
  • Avg Commits per committer: 2.5
  • Development Distribution Score (DDS): 0.4
Top Committers
Name Email Commits
Jesper Dramsch j****r@d****t 72
dependabot[bot] 4****] 3
Valerio Maggio l****o 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 7
  • Average time to close issues: N/A
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.29
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 9 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.25
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (6)
  • JesperDramsch (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (6) github_actions (6) CI/CD (5) sync (2)

Dependencies

environment.yml pypi
  • surgeon-pytorch >=0.0.4
requirements.txt pypi
  • jupyter >=1.0.0
  • matplotlib >=3.5.3
  • numpy >=1.22.3
  • pandas >=1.5.1
  • pandera >=0.13.4
  • pydantic >=1.10.2
  • pytorch >=1.13.0
  • scikit-learn >=1.1.3
  • seaborn >=0.12.1
  • shap >=0.41.0
  • surgeon-pytorch >=0.0.4