mlr3pipelines
Dataflow Programming for Machine Learning in R
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 28 committers (7.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Keywords
bagging
data-science
dataflow-programming
ensemble-learning
machine-learning
mlr3
pipelines
preprocessing
r
r-package
stacking
Keywords from Contributors
learners
hyperparameter-optimization
survival-analysis
predictive-modeling
multilabel-classification
feature-selection
mlr
hyperparameter-tuning
hyperparameters-optimization
imbalance-correction
Last synced: 6 months ago
·
JSON representation
Repository
Dataflow Programming for Machine Learning in R
Basic Info
- Host: GitHub
- Owner: mlr-org
- License: lgpl-3.0
- Language: R
- Default Branch: master
- Homepage: https://mlr3pipelines.mlr-org.com/
- Size: 23.7 MB
Statistics
- Stars: 144
- Watchers: 17
- Forks: 28
- Open Issues: 122
- Releases: 9
Topics
bagging
data-science
dataflow-programming
ensemble-learning
machine-learning
mlr3
pipelines
preprocessing
r
r-package
stacking
Created over 8 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
--- output: github_document --- # mlr3pipelinesPackage website: [release](https://mlr3pipelines.mlr-org.com/) | [dev](https://mlr3pipelines.mlr-org.com/dev/) Dataflow Programming for Machine Learning in R. [](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml) [](https://cran.r-project.org/package=mlr3pipelines) [](https://stackoverflow.com/questions/tagged/mlr3) [](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/) ```{r, include = FALSE} knitr::opts_chunk$set( cache = FALSE, collapse = TRUE, comment = "#>" ) set.seed(8008135) library("paradox") library("mlr3") library("mlr3pipelines") library("mlr3learners") lgr::get_logger("mlr3")$set_threshold("warn") ``` ## What is `mlr3pipelines`? Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides [here](https://raw.githubusercontent.com/mlr-org/mlr-outreach/main/2020_whyr/slides.pdf). [](https://www.youtube.com/watch?v=4r8K3GO5wk4) **`mlr3pipelines`** is a [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) toolkit for machine learning in R utilising the **[mlr3](https://github.com/mlr-org/mlr3)** package. Machine learning workflows can be written as directed "Graphs" that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the **[mlr3tuning](https://github.com/mlr-org/mlr3tuning)** package, it is even possible to simultaneously optimize parameters of multiple processing units. In principle, *mlr3pipelines* is about defining singular data and model manipulation steps as "PipeOps": ```{r} pca = po("pca") filter = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5) learner_po = po("learner", learner = lrn("classif.rpart")) ``` These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a `GraphLearner` that behave like any other `Learner` in `mlr3`. ```{r} graph = pca %>>% filter %>>% learner_po glrn = GraphLearner$new(graph) ``` This learner can be used for resampling, benchmarking, and even tuning. ```{r} resample(tsk("iris"), glrn, rsmp("cv")) ``` ## Feature Overview Single computational steps can be represented as so-called **PipeOps**, which can then be connected with directed edges in a **Graph**. The scope of *mlr3pipelines* is still growing; currently supported features are: * Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering * Task subsampling for speed and outcome class imbalance handling * *mlr3* *Learner* operations for prediction and stacking * Simultaneous path branching (data going both ways) * Alternative path branching (data going one specific way, controlled by hyperparameters) * Ensemble methods and aggregation of predictions ## Documentation A good way to get into `mlr3pipelines` are the following two vignettes: * [Sequential Pipelines](https://mlr3book.mlr-org.com/chapters/chapter7/sequential_pipelines.html) * [Non-Sequential Pipelines and Tuning](https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html) ## Bugs, Questions, Feedback *mlr3pipelines* is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an "issue" about it on the GitHub page! In case of problems / bugs, it is often helpful if you provide a "minimum working example" that showcases the behaviour (but don't worry about this if the bug is obvious). Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project. ## Citing mlr3pipelines If you use mlr3pipelines, please cite our [JMLR article](https://jmlr.org/papers/v22/21-0281.html): ```{r echo = FALSE, comment = ""} toBibtex(citation("mlr3pipelines")) ``` ## Similar Projects A predecessor to this package is the [*mlrCPO*-package](https://github.com/mlr-org/mlrCPO), which works with *mlr* 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the *[caret](https://github.com/topepo/caret)* package and the related *[recipes](https://recipes.tidymodels.org/)* project, and the *[dplyr](https://github.com/tidyverse/dplyr)* package.
Owner
- Name: mlr-org
- Login: mlr-org
- Kind: organization
- Location: Munich, Germany
- Website: https://mlr-org.com
- Repositories: 80
- Profile: https://github.com/mlr-org
GitHub Events
Total
- Create event: 52
- Release event: 4
- Issues event: 76
- Watch event: 7
- Delete event: 41
- Issue comment event: 106
- Push event: 454
- Pull request review event: 57
- Pull request review comment event: 65
- Pull request event: 100
- Fork event: 2
Last Year
- Create event: 52
- Release event: 4
- Issues event: 76
- Watch event: 7
- Delete event: 41
- Issue comment event: 106
- Push event: 454
- Pull request review event: 57
- Pull request review comment event: 65
- Pull request event: 100
- Fork event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| mb706 | m****r@m****m | 1,341 |
| pfistfl | p****f@g****m | 346 |
| kenomersmannPC | a****r@g****m | 282 |
| Michel Lang | m****g@g****m | 207 |
| sumny | l****h@w****e | 184 |
| pat-s | p****z@g****m | 70 |
| Sebastian Fischer | s****r@g****m | 69 |
| Zygmunt Zawadzki | z****t@z****l | 53 |
| Bernd Bischl | b****l@g****t | 47 |
| susanne-207 | d****e@g****m | 40 |
| Travis | 30 | |
| Marc Becker | m****r@p****e | 14 |
| Maximilian Muecke | m****n@g****m | 10 |
| Stefan Coors | s****s@g****t | 10 |
| Lona | l****s@g****m | 10 |
| dependabot[bot] | 4****] | 9 |
| github-actions[bot] | 4****] | 9 |
| Janek Thomas | j****s@w****e | 5 |
| Patrick Rockenschaub | p****5@u****k | 5 |
| Vitaly Polisky | v****y@p****e | 3 |
| ZackBarry | z****3@g****m | 2 |
| Jakob Richter | c****e@j****e | 2 |
| GitHub | n****y@g****m | 2 |
| Carson Zhang | c****4@g****m | 1 |
| Darío Hereñú | m****a@g****m | 1 |
| Michael Chirico | m****4@g****m | 1 |
| damirpolat | d****v@u****u | 1 |
| RustyLongbow | f****p@h****m | 1 |
Committer Domains (Top 20 + Academic)
gmx.net: 2
uwyo.edu: 1
github.com: 1
jakob-r.de: 1
polisky.me: 1
ucl.ac.uk: 1
posteo.de: 1
zstat.pl: 1
mb706.com: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 193
- Total pull requests: 199
- Average time to close issues: over 1 year
- Average time to close pull requests: about 1 month
- Total issue authors: 48
- Total pull request authors: 14
- Average comments per issue: 1.59
- Average comments per pull request: 0.68
- Merged pull requests: 146
- Bot issues: 0
- Bot pull requests: 12
Past Year
- Issues: 51
- Pull requests: 103
- Average time to close issues: about 2 months
- Average time to close pull requests: 22 days
- Issue authors: 15
- Pull request authors: 8
- Average comments per issue: 0.76
- Average comments per pull request: 0.6
- Merged pull requests: 76
- Bot issues: 0
- Bot pull requests: 7
Top Authors
Issue Authors
- mb706 (81)
- sebffischer (17)
- advieser (11)
- mllg (10)
- bblodfon (7)
- pfistfl (6)
- m-muecke (6)
- berndbischl (4)
- invain1218 (3)
- nipnipj (3)
- be-marc (3)
- pat-s (3)
- FlorianPargent (2)
- jpconnel (2)
- a-hanf (1)
Pull Request Authors
- advieser (82)
- mb706 (72)
- sebffischer (33)
- m-muecke (30)
- dependabot[bot] (21)
- mllg (8)
- pat-s (6)
- lona-k (5)
- sumny (5)
- be-marc (4)
- cxzhang4 (2)
- MichaelChirico (2)
- JHarrisonEcoEvo (1)
- pfistfl (1)
- damirpolat (1)
Top Labels
Issue Labels
workshop (33)
Status: Needs Design (16)
Status: Needs Discussion (13)
Type: New PipeOp (12)
Tag: POFU (12)
Status: Contrib (prepared) (9)
Type: Enhancement (9)
Tag: Graph Transparency (9)
Type: Bug (9)
Priority: Low (7)
Type: Documentation (6)
Priority: Medium (6)
Status: Contrib (unprepared) (5)
Priority: High (4)
multipredict (3)
Priority: Critical (3)
Type: UI Revision (2)
Status: Blocked (2)
Breaking Changes (2)
Status: Available (2)
Type: Maintenance (2)
Effort: Simple (2)
Status: On Hold (1)
Status: In Progress (1)
Status: Needs Tests (1)
Type: Question (1)
Tag: NLP (1)
predict time state change (1)
feature_info_propagation (1)
importance (1)
Pull Request Labels
dependencies (21)
Status: Review Needed (6)
Status: Needs Discussion (5)
Status: Blocked (4)
Status: Completed (1)
Status: Revision Needed (1)
Status: On Hold (1)
github_actions (1)
workshop (1)
Packages
- Total packages: 2
-
Total downloads:
- cran 6,314 last-month
- Total docker downloads: 42,239
-
Total dependent packages: 17
(may contain duplicates) -
Total dependent repositories: 34
(may contain duplicates) - Total versions: 57
- Total maintainers: 1
proxy.golang.org: github.com/mlr-org/mlr3pipelines
- Documentation: https://pkg.go.dev/github.com/mlr-org/mlr3pipelines#section-documentation
- License: lgpl-3.0
-
Latest release: v0.9.0
published 7 months ago
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced:
6 months ago
cran.r-project.org: mlr3pipelines
Preprocessing Operators and Pipelines for 'mlr3'
- Homepage: https://mlr3pipelines.mlr-org.com
- Documentation: http://cran.r-project.org/web/packages/mlr3pipelines/mlr3pipelines.pdf
- License: LGPL-3
-
Latest release: 0.9.0
published 7 months ago
Rankings
Stargazers count: 3.2%
Forks count: 3.3%
Dependent packages count: 3.8%
Dependent repos count: 4.5%
Downloads: 7.8%
Average: 8.1%
Docker downloads count: 26.0%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.1.0 depends
- R6 * imports
- backports * imports
- checkmate * imports
- data.table * imports
- digest * imports
- lgr * imports
- mlr3 >= 0.6.0 imports
- mlr3misc >= 0.9.0 imports
- paradox * imports
- withr * imports
- GenSA * suggests
- MASS * suggests
- NMF * suggests
- bbotk >= 0.3.0 suggests
- bestNormalize * suggests
- evaluate * suggests
- fastICA * suggests
- future * suggests
- ggplot2 * suggests
- glmnet * suggests
- igraph * suggests
- kernlab * suggests
- kknn * suggests
- knitr * suggests
- lme4 * suggests
- methods * suggests
- mlbench * suggests
- mlr3filters >= 0.1.1 suggests
- mlr3learners * suggests
- mlr3measures * suggests
- nloptr * suggests
- quanteda * suggests
- rmarkdown * suggests
- rpart * suggests
- smotefamily * suggests
- stopwords * suggests
- testthat * suggests
- visNetwork * suggests
- vtreat * suggests
Package website: [release](https://mlr3pipelines.mlr-org.com/) | [dev](https://mlr3pipelines.mlr-org.com/dev/)
Dataflow Programming for Machine Learning in R.
[](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml)
[](https://cran.r-project.org/package=mlr3pipelines)
[](https://stackoverflow.com/questions/tagged/mlr3)
[](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)
```{r, include = FALSE}
knitr::opts_chunk$set(
cache = FALSE,
collapse = TRUE,
comment = "#>"
)
set.seed(8008135)
library("paradox")
library("mlr3")
library("mlr3pipelines")
library("mlr3learners")
lgr::get_logger("mlr3")$set_threshold("warn")
```
## What is `mlr3pipelines`?
Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides [here](https://raw.githubusercontent.com/mlr-org/mlr-outreach/main/2020_whyr/slides.pdf).
[](https://www.youtube.com/watch?v=4r8K3GO5wk4)
**`mlr3pipelines`** is a [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) toolkit for machine learning in R utilising the **[mlr3](https://github.com/mlr-org/mlr3)** package. Machine learning workflows can be written as directed "Graphs" that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the **[mlr3tuning](https://github.com/mlr-org/mlr3tuning)** package, it is even possible to simultaneously optimize parameters of multiple processing units.
In principle, *mlr3pipelines* is about defining singular data and model manipulation steps as "PipeOps":
```{r}
pca = po("pca")
filter = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))
```
These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a `GraphLearner` that behave like any other `Learner` in `mlr3`.
```{r}
graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)
```
This learner can be used for resampling, benchmarking, and even tuning.
```{r}
resample(tsk("iris"), glrn, rsmp("cv"))
```
## Feature Overview
Single computational steps can be represented as so-called **PipeOps**, which can then be connected with directed edges in a **Graph**. The scope of *mlr3pipelines* is still growing; currently supported features are:
* Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
* Task subsampling for speed and outcome class imbalance handling
* *mlr3* *Learner* operations for prediction and stacking
* Simultaneous path branching (data going both ways)
* Alternative path branching (data going one specific way, controlled by hyperparameters)
* Ensemble methods and aggregation of predictions
## Documentation
A good way to get into `mlr3pipelines` are the following two vignettes:
* [Sequential Pipelines](https://mlr3book.mlr-org.com/chapters/chapter7/sequential_pipelines.html)
* [Non-Sequential Pipelines and Tuning](https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html)
## Bugs, Questions, Feedback
*mlr3pipelines* is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an "issue" about it on the GitHub page!
In case of problems / bugs, it is often helpful if you provide a "minimum working example" that showcases the behaviour (but don't worry about this if the bug is obvious).
Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.
## Citing mlr3pipelines
If you use mlr3pipelines, please cite our [JMLR article](https://jmlr.org/papers/v22/21-0281.html):
```{r echo = FALSE, comment = ""}
toBibtex(citation("mlr3pipelines"))
```
## Similar Projects
A predecessor to this package is the [*mlrCPO*-package](https://github.com/mlr-org/mlrCPO), which works with *mlr* 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the *[caret](https://github.com/topepo/caret)* package and the related *[recipes](https://recipes.tidymodels.org/)* project, and the *[dplyr](https://github.com/tidyverse/dplyr)* package.