ml.recipes
Increase citations, ease review & collaboration A collection of "easy wins" to make machine learning in research reproducible. This tutorial focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Increase citations, ease review & collaboration A collection of "easy wins" to make machine learning in research reproducible. This tutorial focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.
Basic Info
- Host: GitHub
- Owner: JesperDramsch
- License: mit
- Language: HTML
- Default Branch: main
- Homepage: https://ml.recipes/
- Size: 12.1 MB
Statistics
- Stars: 74
- Watchers: 6
- Forks: 17
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
Increase citations, ease review & collaboration
A collection of "easy wins" to make machine learning in research reproducible.
This book aims to provide easy ways to increase the quality of scientific contributions that use machine learning methods. The reproducible aspect will make it easy for fellow researchers to use and iterate on a publication, increasing citations of published work. The use of appropriate validation techniques and increase in code quality accelerates the review process during publication and avoids possible rejection due to deficiencies in the methodology. Making models, code and possibly data available increases the visibility of work and enables easier collaboration on future work.
This book focuses on basics that work. Getting you 90% of the way to top-tier reproducibility.
Every scientific conference has seen a massive uptick in applications that use some type of machine learning. Whether it’s a linear regression using scikit-learn, a transformer from Hugging Face, or a custom convolutional neural network in Jax, the breadth of applications is as vast as the quality of contributions.
This work to make machine learning applications reproducible has an outsized impact compared to the limited additional work that is required using existing Python libraries.
Data
This tutorial uses the Palmer Penguins dataset.
Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.
Artwork by @allison_horst
| ▲ Top |
Usage
Building the book
If you'd like to develop and/or build the Increase citations, ease review & collaboration book, you should:
- Clone this repository
- Run
pip install -r requirements.txt(it is recommended you do this within a virtual environment) - (Optional) Edit the books source files located in the
book/directory - (Optional) Jupytext syncs the content between
python_scriptsandbook/notebooksto enable diffs. - Run
jupyter-book clean book/to remove any existing builds - Run
jupyter-book build book/
A fully-rendered HTML version of the book will be built in book/_build/html/.
Jupytext
This repo uses: Jupytext doc
To synchronize the notebooks and the Python scripts (based on filestamps, only input cells content is modified in the notebooks):
The idea and implementation for jupytext were copied from the Euroscipy 2019 scikit-learn tutorial. Thanks for the great work!
$ jupytext --sync notebooks/*.ipynb
or simply use:
$ make sync
If you create a new notebook, you need to set-up the text files it is going to be paired with:
$ jupytext --set-formats notebooks//ipynb,python_scripts//auto:percent notebooks/*.ipynb
or simply use:
$ make format
To render all the notebooks (from time to time, slow to run):
$ make render
| ▲ Top |
Hosting the book
Please see the Jupyter Book documentation to discover options for deploying a book online using services such as GitHub, GitLab, or Netlify.
For GitHub and GitLab deployment specifically, the cookiecutter-jupyter-book includes templates for, and information about, optional continuous integration (CI) workflow files to help easily and automatically deploy books online with GitHub or GitLab. For example, if you chose github for the include_ci cookiecutter option, your book template was created with a GitHub actions workflow file that, once pushed to GitHub, automatically renders and pushes your book to the gh-pages branch of your repo and hosts it on GitHub Pages when a push or pull request is made to the main branch.
Contributors
We welcome and recognize all contributions. You can see a list of current contributors in the contributors tab.
Credits
This project is created using the excellent open source Jupyter Book project and the executablebooks/cookiecutter-jupyter-book template. Notebooks are synced with scripts using jupytext for version control.
| ▲ Top |
Owner
- Name: Jesper Dramsch
- Login: JesperDramsch
- Kind: user
- Location: Bonn
- Company: @ECMWF
- Website: dramsch.net
- Repositories: 101
- Profile: https://github.com/JesperDramsch
Scientist for Machine Learning. 🦾 No step on snek. 🐍 You miss 99% of the benchmarks you don't overfit on.
Citation (CITATION.cff)
cff-version: 1.2.0
message: 'If you use this ML.recipes, please cite it as below.'
authors:
- family-names: Dramsch
given-names: Jesper Sören
orcid: https://orcid.org/0000-0001-8273-905X
- family-names: Maggio
given-names: Valerio
orcid: https://orcid.org/0000-0003-4824-893X
title: 'ML Recipes – Increase citations, ease review & foster collaboration '
version: PyData-Global-2022
identifiers:
- type: doi
value: 10.5281/zenodo.10381234
date-released: 2022-12-03
url: 'https://github.com/JesperDramsch/ml-for-science-reproducibility-tutorial'
GitHub Events
Total
- Watch event: 4
- Delete event: 2
- Issue comment event: 1
- Push event: 18
- Pull request event: 8
- Create event: 7
Last Year
- Watch event: 4
- Delete event: 2
- Issue comment event: 1
- Push event: 18
- Pull request event: 8
- Create event: 7
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jesper Dramsch | j****r@d****t | 72 |
| dependabot[bot] | 4****] | 3 |
| Valerio Maggio | l****o | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 7
- Average time to close issues: N/A
- Average time to close pull requests: about 7 hours
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.29
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 9 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.25
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (6)
- JesperDramsch (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- surgeon-pytorch >=0.0.4
- jupyter >=1.0.0
- matplotlib >=3.5.3
- numpy >=1.22.3
- pandas >=1.5.1
- pandera >=0.13.4
- pydantic >=1.10.2
- pytorch >=1.13.0
- scikit-learn >=1.1.3
- seaborn >=0.12.1
- shap >=0.41.0
- surgeon-pytorch >=0.0.4