arules

Mining Association Rules and Frequent Itemsets with R

https://github.com/mhahsler/arules

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

arules association-rules cran frequent-itemsets r
Last synced: 6 months ago · JSON representation

Repository

Mining Association Rules and Frequent Itemsets with R

Basic Info
Statistics
  • Stars: 195
  • Watchers: 14
  • Forks: 44
  • Open Issues: 2
  • Releases: 26
Topics
arules association-rules cran frequent-itemsets r
Created over 10 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---

```{r echo=FALSE, results = 'asis'}
pkg <- "arules"

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg, anaconda = "r-arules", stackoverflow = "arules")
```

## Introduction

The arules package family for R provides the infrastructure for representing,
manipulating and analyzing transaction data and patterns
using [frequent itemsets and association rules](https://en.wikipedia.org/wiki/Association_rule_learning).
The package also provides a wide range of 
[interest measures](https://mhahsler.github.io/arules/docs/measures) and mining algorithms including the code of
Christian Borgelt's popular and efficient C implementations of the association mining algorithms [Apriori](https://borgelt.net/apriori.html) and [Eclat](https://borgelt.net/eclat.html). In addition, the following mining algorithms are
available via [fim4r](https://borgelt.net/fim4r.html):

* Apriori
* Eclat
* Carpenter
* FPgrowth
* IsTa 
* RElim 
* SaM

Code examples can be found in
[Chapter 5 of the web book R Companion for Introduction to Data
Mining](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html).

```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2)
```

## Packages

### arules core packages

* [arules](https://cran.r-project.org/package=arules): arules base package with data structures, mining algorithms (APRIORI and ECLAT), interest measures. 
* [arulesViz](https://github.com/mhahsler/arulesViz): Visualization of association rules. 
* [arulesCBA](https://github.com/ianstenbit/arulesCBA): Classification algorithms based on association rules (includes CBA).  
* [arulesSequences](https://cran.r-project.org/package=arulesSequences): Mining frequent sequences (cSPADE).

### Other related packages

Additional mining algorithms 

* [arulesNBMiner](https://github.com/mhahsler/arulesNBMiner): Mining NB-frequent itemsets and NB-precise rules.
* [fim4r](https://borgelt.net/fim4r.html): Provides fast implementations for several mining algorithms. An interface function called `fim4r()` is provided in `arules`.
* [opusminer](https://cran.r-project.org/package=opusminer): OPUS Miner algorithm for finding the op k productive, non-redundant itemsets. Call `opus()` with `format = 'itemsets'`. 
* [RKEEL](https://cran.r-project.org/package=RKEEL): Interface to KEEL's association rule mining algorithm.
* [RSarules](https://cran.r-project.org/package=RSarules): Mining algorithm which randomly samples association rules with one pre-chosen item as the consequent from a transaction dataset.


In-database analytics

* [ibmdbR](https://cran.r-project.org/package=ibmdbR): IBM in-database analytics for R can calculate association rules from a database table.
* [rfml](https://cran.r-project.org/package=rfml): Mine frequent itemsets or association rules using a MarkLogic server. 

Interface

* [rattle](https://cran.r-project.org/package=rattle): Provides a graphical user interface for association rule mining.
* [pmml](https://cran.r-project.org/package=pmml): Generates PMML (predictive model markup language) for association rules.

Classification 

* [arc](https://cran.r-project.org/package=arc): Alternative CBA implementation. 
* [inTrees](https://cran.r-project.org/package=inTrees): Interpret Tree Ensembles provides functions for: extracting, measuring and pruning rules; selecting a compact rule set; summarizing rules into a learner.
* [rCBA](https://cran.r-project.org/package=rCBA): Alternative CBA implementation.
* [qCBA](https://cran.r-project.org/package=qCBA): Quantitative Classification by Association Rules.
* [sblr](https://cran.r-project.org/package=sbrl): Scalable Bayesian rule lists algorithm for classification.

Outlier Detection

* [fpmoutliers](https://cran.r-project.org/package=fpmoutliers): Frequent Pattern Mining Outliers.

Recommendation/Prediction

* [recommenerlab](https://github.com/mhahsler/recommenderlab): Supports creating predictions using association rules.


```{r echo=FALSE, results = 'asis'}
pkg_usage(pkg)
```

```{r echo=FALSE, results = 'asis'}
pkg_install(pkg)
```

## Usage

Load package and mine some association rules. 
```{r }
library("arules")
data("IncomeESL")

trans <- transactions(IncomeESL)
trans

rules <- apriori(trans, supp = 0.1, conf = 0.9, target = "rules")
```

Inspect the rules with the highest lift.
```{r }
inspect(head(rules, n = 3, by = "lift"))
```

## Using arules with tidyverse

`arules` works seamlessly with [tidyverse](https://www.tidyverse.org/). For example: 

* `dplyr` can be used for cleaning and preparing the transactions.
* `transaction()` and other functions accept `tibble` as input.
* Functions in arules can be connected with the pipe operator `|>`.
* [arulesViz](https://github.com/mhahsler/arulesViz) provides visualizations based on `ggplot2`.

For example, we can remove the ethnic information column before creating transactions and then mine and inspect rules.
```{r }
library("tidyverse")
library("arules")
data("IncomeESL")

trans <- IncomeESL |>
  select(-`ethnic classification`) |>
  transactions()
rules <- trans |>
  apriori(
    supp = 0.1, conf = 0.9, target = "rules",
    control = list(verbose = FALSE)
  )
rules |>
  head(3, by = "lift") |>
  as("data.frame") |>
  tibble()
```

## Using arules from Python

`arules` and `arulesViz` can now be used directly from Python with the Python 
package [`arulespy`](https://pypi.org/project/arulespy/) available form PyPI. 

## Support

Please report bugs [here on GitHub.](https://github.com/mhahsler/arules/issues)
Questions should be posted on [stackoverflow and tagged with arules](https://stackoverflow.com/questions/tagged/arules).


## References

* Michael Hahsler. [ARULESPY: Exploring association rules and frequent itemsets in 
  Python.](http://dx.doi.org/10.48550/arXiv.2305.15263) arXiv:2305.15263 [cs.DB], May 2023.
* Michael Hahsler. [An R Companion for Introduction to Data Mining: Chapter 5](https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/association-analysis-basic-concepts-and-algorithms.html), 2021, URL: https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/
* Hahsler, Michael. [A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules](https://mhahsler.github.io/arules/docs/measures), 2015, URL: https://mhahsler.github.io/arules/docs/measures.
* Michael Hahsler, Sudheer Chelluboina, Kurt Hornik, and Christian Buchta. [The arules R-package ecosystem: Analyzing interesting patterns from large transaction datasets.](https://jmlr.csail.mit.edu/papers/v12/hahsler11a.html) _Journal of Machine Learning Research,_ 12:1977-1981, 2011.
* Michael Hahsler, Bettina Grün and Kurt Hornik. [arules - A Computational Environment for Mining Association Rules and Frequent Item Sets.](https://dx.doi.org/10.18637/jss.v014.i15) _Journal of Statistical Software,_ 14(15), 2005.

Owner

  • Name: Michael Hahsler
  • Login: mhahsler
  • Kind: user
  • Location: Dallas, TX
  • Company: SMU

I develop packages for AI, ML, and Data Science.

GitHub Events

Total
  • Create event: 4
  • Release event: 2
  • Issues event: 3
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 12
  • Push event: 13
  • Pull request event: 4
  • Fork event: 2
Last Year
  • Create event: 4
  • Release event: 2
  • Issues event: 3
  • Watch event: 4
  • Delete event: 2
  • Issue comment event: 12
  • Push event: 13
  • Pull request event: 4
  • Fork event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 478
  • Total Committers: 4
  • Avg Commits per committer: 119.5
  • Development Distribution Score (DDS): 0.006
Past Year
  • Commits: 27
  • Committers: 1
  • Avg Commits per committer: 27.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Hahsler m****l@h****t 475
igorkf 4****f 1
Makh2018 6****8 1
Ian Johnson i****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 74
  • Total pull requests: 12
  • Average time to close issues: about 2 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 55
  • Total pull request authors: 7
  • Average comments per issue: 2.61
  • Average comments per pull request: 3.5
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 3
  • Average time to close issues: 6 days
  • Average time to close pull requests: 2 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 2.0
  • Average comments per pull request: 4.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sjain777 (7)
  • mhahsler (4)
  • leih123 (3)
  • mytarmail (2)
  • javiercoh (2)
  • cornejom (2)
  • kliegr (2)
  • gdbassett (2)
  • clcazer (2)
  • jasperDD (2)
  • vrodriguezf (2)
  • galadrielbriere (1)
  • bachnguyen-tomo (1)
  • g-cloud9 (1)
  • petamva (1)
Pull Request Authors
  • makhloufledmi (3)
  • mhahsler (3)
  • ianstenbit (2)
  • MichaelChirico (2)
  • igorkf (1)
  • smurfit89 (1)
  • ArnoCo (1)
Top Labels
Issue Labels
question (25) bug (24) invalid (5) enhancement (4) help wanted (2) unconfirmed (1)
Pull Request Labels
bug (2)

Packages

  • Total packages: 2
  • Total downloads:
    • cran 8,971 last-month
  • Total docker downloads: 43,430
  • Total dependent packages: 35
    (may contain duplicates)
  • Total dependent repositories: 76
    (may contain duplicates)
  • Total versions: 98
  • Total maintainers: 1
cran.r-project.org: arules

Mining Association Rules and Frequent Itemsets

  • Versions: 85
  • Dependent Packages: 35
  • Dependent Repositories: 75
  • Downloads: 8,971 Last month
  • Docker Downloads: 43,430
Rankings
Forks count: 1.8%
Dependent packages count: 2.3%
Stargazers count: 2.3%
Downloads: 2.5%
Dependent repos count: 2.7%
Average: 6.2%
Docker downloads count: 25.8%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-arules
  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Dependent repos count: 24.4%
Forks count: 27.4%
Stargazers count: 27.9%
Average: 32.8%
Dependent packages count: 51.6%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • Matrix >= 1.4 depends
  • R >= 4.0.0 depends
  • generics * imports
  • graphics * imports
  • methods * imports
  • stats * imports
  • utils * imports
  • XML * suggests
  • arulesCBA * suggests
  • arulesViz * suggests
  • pmml * suggests
  • proxy * suggests
  • testthat * suggests