Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 5 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Keywords
clustering
data-stream
sequence-analysis
Last synced: 6 months ago
·
JSON representation
Repository
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
clustering
data-stream
sequence-analysis
Created over 4 years ago
· Last pushed 7 months ago
Metadata Files
Readme
Changelog
README.Rmd
---
output: github_document
---
```{r echo=FALSE, results = 'asis'}
pkg <- 'rEMM'
source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)
```
Implements TRACDS (Temporal Relationships
between Clusters for Data Streams), a generalization of
Extensible Markov Model (EMM),
to model transition probabilities in sequence data. TRACDS adds a temporal or order model
to data stream clustering by superimposing a dynamically adapting
Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN
data stream clustering).
Interface classes DSC_tNN and DSC_EMM for the [stream package](https://github.com/mhahsler/stream) are provided.
```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2L)
pkg_install(pkg)
```
## Usage
We use a artificial dataset with a mixture of four clusters components. Points are generated using a fixed sequence
<1,2,1,3,4> through the four clusters. The lines below indicate the sequence.
```{r example_data}
library(rEMM)
data("EMMsim")
plot(EMMsim_train, pch = NA)
lines(EMMsim_train, col = "gray")
points(EMMsim_train, pch = EMMsim_sequence_train)
```
EMM recovers the components and the sequence information. We use EMM and then recluster the found structure assuming
that we know that there are 4 components. The graph below represents a Markov model of the found sequence.
```{r example_model}
emm <- EMM(threshold = 0.1, measure = "euclidean")
build(emm, EMMsim_train)
emmc <- recluster_hclust(emm, k = 4, method = "average")
plot(emmc)
```
We can now score new sequences (we use a test sequence created in the same way as the training data) by calculating the product the transition probabilities in the model. The high score indicates this.
```{r}
score(emmc, EMMsim_test)
```
# References
* Michael Hahsler and Margaret H. Dunham.
[rEMM: Extensible Markov model for data stream clustering in R.](http://dx.doi.org/10.18637/jss.v035.i05)
_Journal of Statistical Software,_ 35(5):1-31, 2010.
* Michael Hahsler and Margaret H. Dunham.
[Temporal structure learning for clustering massive data
streams in real-time](https://doi.org/10.1137/1.9781611972818.57).
In _SIAM Conference on Data Mining (SDM11),_ pages 664--675. SIAM, April 2011.
# Acknowledgements
Development of this package was supported in part by NSF IIS-0948893 and R21HG005912 from
the National Human Genome Research Institute.
Owner
- Name: Michael Hahsler
- Login: mhahsler
- Kind: user
- Location: Dallas, TX
- Company: SMU
- Website: http://michael.hahsler.net
- Repositories: 32
- Profile: https://github.com/mhahsler
I develop packages for AI, ML, and Data Science.
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michael Hahsler | m****l@h****t | 26 |
Committer Domains (Top 20 + Academic)
hahsler.net: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 726 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 16
- Total maintainers: 1
cran.r-project.org: rEMM
Extensible Markov Model for Modelling Temporal Relationships Between Clusters
- Homepage: https://github.com/mhahsler/rEMM
- Documentation: http://cran.r-project.org/web/packages/rEMM/rEMM.pdf
- License: GPL-2
-
Latest release: 1.2.1
published almost 2 years ago
Rankings
Forks count: 28.8%
Dependent packages count: 29.8%
Stargazers count: 31.7%
Average: 32.1%
Downloads: 34.9%
Dependent repos count: 35.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 2.10.0 depends
- igraph * depends
- proxy >= 0.4 depends
- MASS * imports
- cluster * imports
- clusterGeneration * imports
- methods * imports
- stats * imports
- stream * imports
- utils * imports
- Rgraphviz * suggests
- graph * suggests
- testthat * suggests