mlr3pipelines

Dataflow Programming for Machine Learning in R

https://github.com/mlr-org/mlr3pipelines

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
2 of 28 committers (7.1%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing r r-package stacking

Keywords from Contributors

learners hyperparameter-optimization survival-analysis predictive-modeling multilabel-classification feature-selection mlr hyperparameter-tuning hyperparameters-optimization imbalance-correction

Last synced: 6 months ago · JSON representation

Repository

Dataflow Programming for Machine Learning in R

Basic Info

Host: GitHub
Owner: mlr-org
License: lgpl-3.0
Language: R
Default Branch: master
Homepage: https://mlr3pipelines.mlr-org.com/
Size: 23.7 MB

Statistics

Stars: 144
Watchers: 17
Forks: 28
Open Issues: 122
Releases: 9

Topics

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing r r-package stacking

Created over 8 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---

# mlr3pipelines 

Package website: [release](https://mlr3pipelines.mlr-org.com/) | [dev](https://mlr3pipelines.mlr-org.com/dev/)

Dataflow Programming for Machine Learning in R.


[![r-cmd-check](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml)
[![CRAN](https://www.r-pkg.org/badges/version/mlr3pipelines)](https://cran.r-project.org/package=mlr3pipelines)
[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)
[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)


```{r, include = FALSE}
knitr::opts_chunk$set(
  cache = FALSE,
  collapse = TRUE,
  comment = "#>"
)
set.seed(8008135)
library("paradox")
library("mlr3")
library("mlr3pipelines")
library("mlr3learners")
lgr::get_logger("mlr3")$set_threshold("warn")
```

## What is `mlr3pipelines`?


Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides [here](https://raw.githubusercontent.com/mlr-org/mlr-outreach/main/2020_whyr/slides.pdf).

[![WhyR 2020
mlr3pipelines](https://img.youtube.com/vi/4r8K3GO5wk4/0.jpg)](https://www.youtube.com/watch?v=4r8K3GO5wk4)

**`mlr3pipelines`** is a [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) toolkit for machine learning in R utilising the **[mlr3](https://github.com/mlr-org/mlr3)** package. Machine learning workflows can be written as directed "Graphs" that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the **[mlr3tuning](https://github.com/mlr-org/mlr3tuning)** package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, *mlr3pipelines* is about defining singular data and model manipulation steps as "PipeOps":

```{r}
pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))
```

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a `GraphLearner` that behave like any other `Learner` in `mlr3`.

```{r}
graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)
```

This learner can be used for resampling, benchmarking, and even tuning.

```{r}
resample(tsk("iris"), glrn, rsmp("cv"))
```

## Feature Overview

Single computational steps can be represented as so-called **PipeOps**, which can then be connected with directed edges in a **Graph**. The scope of *mlr3pipelines* is still growing; currently supported features are:

* Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
* Task subsampling for speed and outcome class imbalance handling
* *mlr3* *Learner* operations for prediction and stacking
* Simultaneous path branching (data going both ways)
* Alternative path branching (data going one specific way, controlled by hyperparameters)
* Ensemble methods and aggregation of predictions

## Documentation

A good way to get into `mlr3pipelines` are the following two vignettes:

* [Sequential Pipelines](https://mlr3book.mlr-org.com/chapters/chapter7/sequential_pipelines.html)
* [Non-Sequential Pipelines and Tuning](https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html)

## Bugs, Questions, Feedback

*mlr3pipelines* is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an "issue" about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a "minimum working example" that showcases the behaviour (but don't worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

## Citing mlr3pipelines

If you use mlr3pipelines, please cite our [JMLR article](https://jmlr.org/papers/v22/21-0281.html):

```{r echo = FALSE, comment = ""}
toBibtex(citation("mlr3pipelines"))
```

## Similar Projects

A predecessor to this package is the [*mlrCPO*-package](https://github.com/mlr-org/mlrCPO), which works with *mlr* 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the *[caret](https://github.com/topepo/caret)* package and the related *[recipes](https://recipes.tidymodels.org/)* project, and the *[dplyr](https://github.com/tidyverse/dplyr)* package.

Owner

Name: mlr-org
Login: mlr-org
Kind: organization
Location: Munich, Germany

Website: https://mlr-org.com
Repositories: 80
Profile: https://github.com/mlr-org

GitHub Events

Total

Create event: 52
Release event: 4
Issues event: 76
Watch event: 7
Delete event: 41
Issue comment event: 106
Push event: 454
Pull request review event: 57
Pull request review comment event: 65
Pull request event: 100
Fork event: 2

Last Year

Create event: 52
Release event: 4
Issues event: 76
Watch event: 7
Delete event: 41
Issue comment event: 106
Push event: 454
Pull request review event: 57
Pull request review comment event: 65
Pull request event: 100
Fork event: 2

Committers

Last synced: 9 months ago

All Time

Total Commits: 2,755
Total Committers: 28
Avg Commits per committer: 98.393
Development Distribution Score (DDS): 0.513

Past Year

Commits: 508
Committers: 9
Avg Commits per committer: 56.444
Development Distribution Score (DDS): 0.445

Top Committers

Name	Email	Commits
mb706	m**r@m**m	1,341
pfistfl	p**f@g**m	346
kenomersmannPC	a**r@g**m	282
Michel Lang	m**g@g**m	207
sumny	l**h@w**e	184
pat-s	p**z@g**m	70
Sebastian Fischer	s**r@g**m	69
Zygmunt Zawadzki	z**t@z**l	53
Bernd Bischl	b**l@g**t	47
susanne-207	d**e@g**m	40
Travis		30
Marc Becker	m**r@p**e	14
Maximilian Muecke	m**n@g**m	10
Stefan Coors	s**s@g**t	10
Lona	l**s@g**m	10
dependabot[bot]	4****]	9
github-actions[bot]	4****]	9
Janek Thomas	j**s@w**e	5
Patrick Rockenschaub	p**5@u**k	5
Vitaly Polisky	v**y@p**e	3
ZackBarry	z**3@g**m	2
Jakob Richter	c**e@j**e	2
GitHub	n**y@g**m	2
Carson Zhang	c**4@g**m	1
Darío Hereñú	m**a@g**m	1
Michael Chirico	m**4@g**m	1
damirpolat	d**v@u**u	1
RustyLongbow	f**p@h**m	1

Committer Domains (Top 20 + Academic)

gmx.net: 2 uwyo.edu: 1 github.com: 1 jakob-r.de: 1 polisky.me: 1 ucl.ac.uk: 1 posteo.de: 1 zstat.pl: 1 mb706.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 193
Total pull requests: 199
Average time to close issues: over 1 year
Average time to close pull requests: about 1 month
Total issue authors: 48
Total pull request authors: 14
Average comments per issue: 1.59
Average comments per pull request: 0.68
Merged pull requests: 146
Bot issues: 0
Bot pull requests: 12

Past Year

Issues: 51
Pull requests: 103
Average time to close issues: about 2 months
Average time to close pull requests: 22 days
Issue authors: 15
Pull request authors: 8
Average comments per issue: 0.76
Average comments per pull request: 0.6
Merged pull requests: 76
Bot issues: 0
Bot pull requests: 7

View more stats

Top Authors

Issue Authors

mb706 (81)
sebffischer (17)
advieser (11)
mllg (10)
bblodfon (7)
pfistfl (6)
m-muecke (6)
berndbischl (4)
invain1218 (3)
nipnipj (3)
be-marc (3)
pat-s (3)
FlorianPargent (2)
jpconnel (2)
a-hanf (1)

Pull Request Authors

advieser (82)
mb706 (72)
sebffischer (33)
m-muecke (30)
dependabot[bot] (21)
mllg (8)
pat-s (6)
lona-k (5)
sumny (5)
be-marc (4)
cxzhang4 (2)
MichaelChirico (2)
JHarrisonEcoEvo (1)
pfistfl (1)
damirpolat (1)

Top Labels

Issue Labels

workshop (33) Status: Needs Design (16) Status: Needs Discussion (13) Type: New PipeOp (12) Tag: POFU (12) Status: Contrib (prepared) (9) Type: Enhancement (9) Tag: Graph Transparency (9) Type: Bug (9) Priority: Low (7) Type: Documentation (6) Priority: Medium (6) Status: Contrib (unprepared) (5) Priority: High (4) multipredict (3) Priority: Critical (3) Type: UI Revision (2) Status: Blocked (2) Breaking Changes (2) Status: Available (2) Type: Maintenance (2) Effort: Simple (2) Status: On Hold (1) Status: In Progress (1) Status: Needs Tests (1) Type: Question (1) Tag: NLP (1) predict time state change (1) feature_info_propagation (1) importance (1)

Pull Request Labels

dependencies (21) Status: Review Needed (6) Status: Needs Discussion (5) Status: Blocked (4) Status: Completed (1) Status: Revision Needed (1) Status: On Hold (1) github_actions (1) workshop (1)

Packages

Total packages: 2
Total downloads:
- cran 6,314 last-month
Total docker downloads: 42,239

Total dependent packages: 17
(may contain duplicates)
Total dependent repositories: 34
(may contain duplicates)
Total versions: 57
Total maintainers: 1

proxy.golang.org: github.com/mlr-org/mlr3pipelines

Documentation: https://pkg.go.dev/github.com/mlr-org/mlr3pipelines#section-documentation
License: lgpl-3.0
Latest release: v0.9.0
published 7 months ago

Versions: 28
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

cran.r-project.org: mlr3pipelines

Preprocessing Operators and Pipelines for 'mlr3'

Homepage: https://mlr3pipelines.mlr-org.com
Documentation: http://cran.r-project.org/web/packages/mlr3pipelines/mlr3pipelines.pdf
License: LGPL-3
Latest release: 0.9.0
published 7 months ago

Versions: 29
Dependent Packages: 17
Dependent Repositories: 34
Downloads: 6,314 Last month
Docker Downloads: 42,239

Rankings

Stargazers count: 3.2%

Forks count: 3.3%

Dependent packages count: 3.8%

Dependent repos count: 4.5%

Downloads: 7.8%

Average: 8.1%

Docker downloads count: 26.0%

Maintainers (1)

mlr.developer@mb706.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.1.0 depends
R6 * imports
backports * imports
checkmate * imports
data.table * imports
digest * imports
lgr * imports
mlr3 >= 0.6.0 imports
mlr3misc >= 0.9.0 imports
paradox * imports
withr * imports
GenSA * suggests
MASS * suggests
NMF * suggests
bbotk >= 0.3.0 suggests
bestNormalize * suggests
evaluate * suggests
fastICA * suggests
future * suggests
ggplot2 * suggests
glmnet * suggests
igraph * suggests
kernlab * suggests
kknn * suggests
knitr * suggests
lme4 * suggests
methods * suggests
mlbench * suggests
mlr3filters >= 0.1.1 suggests
mlr3learners * suggests
mlr3measures * suggests
nloptr * suggests
quanteda * suggests
rmarkdown * suggests
rpart * suggests
smotefamily * suggests
stopwords * suggests
testthat * suggests
visNetwork * suggests
vtreat * suggests