mlr3pipelines

Dataflow Programming for Machine Learning in R

https://github.com/mlr-org/mlr3pipelines

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 28 committers (7.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing r r-package stacking

Keywords from Contributors

learners hyperparameter-optimization survival-analysis predictive-modeling multilabel-classification feature-selection mlr hyperparameter-tuning hyperparameters-optimization imbalance-correction
Last synced: 6 months ago · JSON representation

Repository

Dataflow Programming for Machine Learning in R

Basic Info
Statistics
  • Stars: 144
  • Watchers: 17
  • Forks: 28
  • Open Issues: 122
  • Releases: 9
Topics
bagging data-science dataflow-programming ensemble-learning machine-learning mlr3 pipelines preprocessing r r-package stacking
Created over 8 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---

# mlr3pipelines 

Package website: [release](https://mlr3pipelines.mlr-org.com/) | [dev](https://mlr3pipelines.mlr-org.com/dev/)

Dataflow Programming for Machine Learning in R.


[![r-cmd-check](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml/badge.svg)](https://github.com/mlr-org/mlr3pipelines/actions/workflows/r-cmd-check.yml)
[![CRAN](https://www.r-pkg.org/badges/version/mlr3pipelines)](https://cran.r-project.org/package=mlr3pipelines)
[![StackOverflow](https://img.shields.io/badge/stackoverflow-mlr3-orange.svg)](https://stackoverflow.com/questions/tagged/mlr3)
[![Mattermost](https://img.shields.io/badge/chat-mattermost-orange.svg)](https://lmmisld-lmu-stats-slds.srv.mwn.de/mlr_invite/)


```{r, include = FALSE}
knitr::opts_chunk$set(
  cache = FALSE,
  collapse = TRUE,
  comment = "#>"
)
set.seed(8008135)
library("paradox")
library("mlr3")
library("mlr3pipelines")
library("mlr3learners")
lgr::get_logger("mlr3")$set_threshold("warn")
```

## What is `mlr3pipelines`?


Watch our "WhyR 2020" Webinar Presentation on Youtube for an introduction! Find the slides [here](https://raw.githubusercontent.com/mlr-org/mlr-outreach/main/2020_whyr/slides.pdf).

[![WhyR 2020
mlr3pipelines](https://img.youtube.com/vi/4r8K3GO5wk4/0.jpg)](https://www.youtube.com/watch?v=4r8K3GO5wk4)

**`mlr3pipelines`** is a [dataflow programming](https://en.wikipedia.org/wiki/Dataflow_programming) toolkit for machine learning in R utilising the **[mlr3](https://github.com/mlr-org/mlr3)** package. Machine learning workflows can be written as directed "Graphs" that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the **[mlr3tuning](https://github.com/mlr-org/mlr3tuning)** package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, *mlr3pipelines* is about defining singular data and model manipulation steps as "PipeOps":

```{r}
pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))
```

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a `GraphLearner` that behave like any other `Learner` in `mlr3`.

```{r}
graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)
```

This learner can be used for resampling, benchmarking, and even tuning.

```{r}
resample(tsk("iris"), glrn, rsmp("cv"))
```

## Feature Overview

Single computational steps can be represented as so-called **PipeOps**, which can then be connected with directed edges in a **Graph**. The scope of *mlr3pipelines* is still growing; currently supported features are:

* Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
* Task subsampling for speed and outcome class imbalance handling
* *mlr3* *Learner* operations for prediction and stacking
* Simultaneous path branching (data going both ways)
* Alternative path branching (data going one specific way, controlled by hyperparameters)
* Ensemble methods and aggregation of predictions

## Documentation

A good way to get into `mlr3pipelines` are the following two vignettes:

* [Sequential Pipelines](https://mlr3book.mlr-org.com/chapters/chapter7/sequential_pipelines.html)
* [Non-Sequential Pipelines and Tuning](https://mlr3book.mlr-org.com/chapters/chapter8/non-sequential_pipelines_and_tuning.html)

## Bugs, Questions, Feedback

*mlr3pipelines* is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an "issue" about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a "minimum working example" that showcases the behaviour (but don't worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

## Citing mlr3pipelines

If you use mlr3pipelines, please cite our [JMLR article](https://jmlr.org/papers/v22/21-0281.html):

```{r echo = FALSE, comment = ""}
toBibtex(citation("mlr3pipelines"))
```

## Similar Projects

A predecessor to this package is the [*mlrCPO*-package](https://github.com/mlr-org/mlrCPO), which works with *mlr* 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the *[caret](https://github.com/topepo/caret)* package and the related *[recipes](https://recipes.tidymodels.org/)* project, and the *[dplyr](https://github.com/tidyverse/dplyr)* package.

Owner

  • Name: mlr-org
  • Login: mlr-org
  • Kind: organization
  • Location: Munich, Germany

GitHub Events

Total
  • Create event: 52
  • Release event: 4
  • Issues event: 76
  • Watch event: 7
  • Delete event: 41
  • Issue comment event: 106
  • Push event: 454
  • Pull request review event: 57
  • Pull request review comment event: 65
  • Pull request event: 100
  • Fork event: 2
Last Year
  • Create event: 52
  • Release event: 4
  • Issues event: 76
  • Watch event: 7
  • Delete event: 41
  • Issue comment event: 106
  • Push event: 454
  • Pull request review event: 57
  • Pull request review comment event: 65
  • Pull request event: 100
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 2,755
  • Total Committers: 28
  • Avg Commits per committer: 98.393
  • Development Distribution Score (DDS): 0.513
Past Year
  • Commits: 508
  • Committers: 9
  • Avg Commits per committer: 56.444
  • Development Distribution Score (DDS): 0.445
Top Committers
Name Email Commits
mb706 m****r@m****m 1,341
pfistfl p****f@g****m 346
kenomersmannPC a****r@g****m 282
Michel Lang m****g@g****m 207
sumny l****h@w****e 184
pat-s p****z@g****m 70
Sebastian Fischer s****r@g****m 69
Zygmunt Zawadzki z****t@z****l 53
Bernd Bischl b****l@g****t 47
susanne-207 d****e@g****m 40
Travis 30
Marc Becker m****r@p****e 14
Maximilian Muecke m****n@g****m 10
Stefan Coors s****s@g****t 10
Lona l****s@g****m 10
dependabot[bot] 4****] 9
github-actions[bot] 4****] 9
Janek Thomas j****s@w****e 5
Patrick Rockenschaub p****5@u****k 5
Vitaly Polisky v****y@p****e 3
ZackBarry z****3@g****m 2
Jakob Richter c****e@j****e 2
GitHub n****y@g****m 2
Carson Zhang c****4@g****m 1
Darío Hereñú m****a@g****m 1
Michael Chirico m****4@g****m 1
damirpolat d****v@u****u 1
RustyLongbow f****p@h****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 193
  • Total pull requests: 199
  • Average time to close issues: over 1 year
  • Average time to close pull requests: about 1 month
  • Total issue authors: 48
  • Total pull request authors: 14
  • Average comments per issue: 1.59
  • Average comments per pull request: 0.68
  • Merged pull requests: 146
  • Bot issues: 0
  • Bot pull requests: 12
Past Year
  • Issues: 51
  • Pull requests: 103
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 22 days
  • Issue authors: 15
  • Pull request authors: 8
  • Average comments per issue: 0.76
  • Average comments per pull request: 0.6
  • Merged pull requests: 76
  • Bot issues: 0
  • Bot pull requests: 7
Top Authors
Issue Authors
  • mb706 (81)
  • sebffischer (17)
  • advieser (11)
  • mllg (10)
  • bblodfon (7)
  • pfistfl (6)
  • m-muecke (6)
  • berndbischl (4)
  • invain1218 (3)
  • nipnipj (3)
  • be-marc (3)
  • pat-s (3)
  • FlorianPargent (2)
  • jpconnel (2)
  • a-hanf (1)
Pull Request Authors
  • advieser (82)
  • mb706 (72)
  • sebffischer (33)
  • m-muecke (30)
  • dependabot[bot] (21)
  • mllg (8)
  • pat-s (6)
  • lona-k (5)
  • sumny (5)
  • be-marc (4)
  • cxzhang4 (2)
  • MichaelChirico (2)
  • JHarrisonEcoEvo (1)
  • pfistfl (1)
  • damirpolat (1)
Top Labels
Issue Labels
workshop (33) Status: Needs Design (16) Status: Needs Discussion (13) Type: New PipeOp (12) Tag: POFU (12) Status: Contrib (prepared) (9) Type: Enhancement (9) Tag: Graph Transparency (9) Type: Bug (9) Priority: Low (7) Type: Documentation (6) Priority: Medium (6) Status: Contrib (unprepared) (5) Priority: High (4) multipredict (3) Priority: Critical (3) Type: UI Revision (2) Status: Blocked (2) Breaking Changes (2) Status: Available (2) Type: Maintenance (2) Effort: Simple (2) Status: On Hold (1) Status: In Progress (1) Status: Needs Tests (1) Type: Question (1) Tag: NLP (1) predict time state change (1) feature_info_propagation (1) importance (1)
Pull Request Labels
dependencies (21) Status: Review Needed (6) Status: Needs Discussion (5) Status: Blocked (4) Status: Completed (1) Status: Revision Needed (1) Status: On Hold (1) github_actions (1) workshop (1)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 6,314 last-month
  • Total docker downloads: 42,239
  • Total dependent packages: 17
    (may contain duplicates)
  • Total dependent repositories: 34
    (may contain duplicates)
  • Total versions: 57
  • Total maintainers: 1
proxy.golang.org: github.com/mlr-org/mlr3pipelines
  • Versions: 28
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
cran.r-project.org: mlr3pipelines

Preprocessing Operators and Pipelines for 'mlr3'

  • Versions: 29
  • Dependent Packages: 17
  • Dependent Repositories: 34
  • Downloads: 6,314 Last month
  • Docker Downloads: 42,239
Rankings
Stargazers count: 3.2%
Forks count: 3.3%
Dependent packages count: 3.8%
Dependent repos count: 4.5%
Downloads: 7.8%
Average: 8.1%
Docker downloads count: 26.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.1.0 depends
  • R6 * imports
  • backports * imports
  • checkmate * imports
  • data.table * imports
  • digest * imports
  • lgr * imports
  • mlr3 >= 0.6.0 imports
  • mlr3misc >= 0.9.0 imports
  • paradox * imports
  • withr * imports
  • GenSA * suggests
  • MASS * suggests
  • NMF * suggests
  • bbotk >= 0.3.0 suggests
  • bestNormalize * suggests
  • evaluate * suggests
  • fastICA * suggests
  • future * suggests
  • ggplot2 * suggests
  • glmnet * suggests
  • igraph * suggests
  • kernlab * suggests
  • kknn * suggests
  • knitr * suggests
  • lme4 * suggests
  • methods * suggests
  • mlbench * suggests
  • mlr3filters >= 0.1.1 suggests
  • mlr3learners * suggests
  • mlr3measures * suggests
  • nloptr * suggests
  • quanteda * suggests
  • rmarkdown * suggests
  • rpart * suggests
  • smotefamily * suggests
  • stopwords * suggests
  • testthat * suggests
  • visNetwork * suggests
  • vtreat * suggests