stream
A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
data-stream-clustering
datastream
stream-mining
Last synced: 6 months ago
·
JSON representation
Repository
A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package
Statistics
- Stars: 41
- Watchers: 5
- Forks: 8
- Open Issues: 1
- Releases: 11
Topics
data-stream-clustering
datastream
stream-mining
Created over 10 years ago
· Last pushed 12 months ago
Metadata Files
Readme
Changelog
README.Rmd
---
output: github_document
---
```{r echo=FALSE, results = 'asis'}
pkg <- 'stream'
source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)
```
## Introduction
The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and
experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package provides:
* **Stream Sources:** streaming from files, databases, in-memory data, URLs, pipes,
socket connections and several data stream generators including
dynamically streams with concept drift.
* **Stream Processing** with filters (convolution, scaling, exponential moving average, ...)
* **Stream Aggregation:** sampling, windowing.
* **Stream Clustering:** **BICO**, **BIRCH**, **D-Stream**, **DBSTREAM**, and **evoStream**.
* **Stream Outlier Detection** based on **D-Stream**, **DBSTREAM**.
* **Stream Classification** with **DecisionStumps**, **HoeffdingTree**, **NaiveBayes**
and **Ensembles** (streamMOA via RMOA).
* **Stream Regression** with **Perceptron**, **FIMTDD**, **ORTO**, ... (streamMOA via RMOA).
* **Stream Mining Evaluation** with prequential error estimation.
Additional packages in the stream family are:
* [streamConnect](https://github.com/mhahsler/streamConnect): Connect stream mining
components using sockets and web services.
* [streamMOA](https://github.com/mhahsler/streamMOA): Interface to clustering
algorithms implemented in the [MOA](https://moa.cms.waikato.ac.nz/) framework.
The package interfaces clustering algorithms like of **DenStream**, **ClusTree**,
**CluStream** and **MCOD**.
The package also provides an interface to [RMOA](https://github.com/jwijffels/RMOA) for
MOA's stream classifiers and stream regression models.
* [rEMM](https://github.com/mhahsler/rEMM): Provides implementations of
**threshold nearest neighbor clustering** (tNN) and
**Extensible Markov Model** (EMM) for modelling temporal relationships between clusters.
```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2)
pkg_install(pkg)
```
## Usage
```{r echo=FALSE}
options(digits = 3)
```
Load the package and a random data stream with 3 Gaussian clusters and 10\% noise and scale the data to z-scores.
```{r stream}
library("stream")
set.seed(2000)
stream <- DSD_Gaussians(k = 3, d = 2, noise= .1) %>% DSF_Scale()
get_points(stream, n = 5)
plot(stream)
```
Cluster a stream of 1000 points using D-Stream which estimates point density in grid cells.
```{r Dstream}
dsc <- DSC_DStream(gridsize = .1)
update(dsc, stream, 1000)
plot(dsc, stream, grid = TRUE)
```
```{r Dstream_eval}
evaluate_static(dsc, stream, n = 100)
```
Outlier detection using DBSTREAM which uses micro-clusters with a given radius.
```{r DSOutlier_DBSTREAM}
dso <- DSOutlier_DBSTREAM(r = .1)
update(dso, stream, 1000)
plot(dso, stream)
```
```{r DSO_eval}
evaluate_static(dso, stream, n = 100, measure = c("numPoints", "noiseActual", "noisePredicted", "noisePrecision"))
```
Preparing complete stream process pipelines that can be run using a single `update()` call.
```{r pipeline}
pipeline <- DSD_Gaussians(k = 3, d = 2, noise= .1) %>%
DSF_Scale() %>%
DST_Runner(DSC_DStream(gridsize = .1))
pipeline
update(pipeline, n = 500)
pipeline$dst
```
## Acknowledgments
The development of the stream package was supported in part by NSF IIS-0948893, NSF CMMI 1728612, and NIH R21HG005912.
## References
* Michael Hahsler, Matthew Bolaños, and John Forrest. [stream: An extensible framework for data stream clustering research with R.](https://dx.doi.org/10.18637/jss.v076.i14) _Journal of Statistical Software,_ 76(14), February 2017.
* [stream package vignette](https://cran.r-project.org/package=stream/vignettes/stream.pdf) with complete examples.
* [stream reference manual](https://cran.r-project.org/package=stream/stream.pdf)
Owner
- Name: Michael Hahsler
- Login: mhahsler
- Kind: user
- Location: Dallas, TX
- Company: SMU
- Website: http://michael.hahsler.net
- Repositories: 32
- Profile: https://github.com/mhahsler
I develop packages for AI, ML, and Data Science.
GitHub Events
Total
- Release event: 1
- Watch event: 4
- Push event: 1
- Create event: 1
Last Year
- Release event: 1
- Watch event: 4
- Push event: 1
- Create event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michael Hahsler | m****l@h****t | 171 |
| Maximilian Muecke | m****n@g****m | 10 |
| Matthias Carnein | M****n | 4 |
| Dennis Assenmacher | d****r@w****e | 2 |
| dinarior | d****r@g****m | 1 |
| Dalibor Krleža | 3****a | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 11
- Total pull requests: 19
- Average time to close issues: about 1 month
- Average time to close pull requests: 14 days
- Total issue authors: 9
- Total pull request authors: 5
- Average comments per issue: 1.45
- Average comments per pull request: 2.58
- Merged pull requests: 17
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 3 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- MatthiasCarnein (2)
- m-muecke (2)
- mhahsler (1)
- YashSunidhi (1)
- ozlempoyraz (1)
- mik3hall (1)
- dinarior (1)
- Dennis1989 (1)
- shimon166 (1)
Pull Request Authors
- m-muecke (18)
- MatthiasCarnein (5)
- dkrleza (2)
- Dennis1989 (1)
- dinarior (1)
Top Labels
Issue Labels
question (4)
bug (3)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 1,198 last-month
- Total dependent packages: 3
- Total dependent repositories: 8
- Total versions: 28
- Total maintainers: 1
cran.r-project.org: stream
Infrastructure for Data Stream Mining
- Homepage: https://github.com/mhahsler/stream
- Documentation: http://cran.r-project.org/web/packages/stream/stream.pdf
- License: GPL-3
-
Latest release: 2.0-3
published 12 months ago
Rankings
Forks count: 7.9%
Stargazers count: 8.4%
Dependent repos count: 10.5%
Dependent packages count: 10.9%
Average: 11.9%
Downloads: 21.8%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- magrittr * depends
- methods * depends
- proxy >= 0.4 depends
- MASS * imports
- Rcpp >= 0.11.4 imports
- clue * imports
- cluster * imports
- clusterGeneration * imports
- dbscan >= 1.0 imports
- fpc * imports
- grDevices * imports
- graphics * imports
- mlbench * imports
- stats * imports
- utils * imports
- DBI * suggests
- RSQLite * suggests
- animation * suggests
- dplyr * suggests
- knitr * suggests
- rJava * suggests
- testthat * suggests