Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary

Keywords

clustering data-stream sequence-analysis
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: mhahsler
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 536 KB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
clustering data-stream sequence-analysis
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog

README.Rmd

---
output: github_document
---

```{r echo=FALSE, results = 'asis'}
pkg <- 'rEMM'

source("https://raw.githubusercontent.com/mhahsler/pkg_helpers/main/pkg_helpers.R")
pkg_title(pkg)
```


Implements TRACDS (Temporal Relationships 
    between Clusters for Data Streams), a generalization of 
    Extensible Markov Model (EMM), 
    to model transition probabilities in sequence data. TRACDS adds a temporal or order model
    to data stream clustering by superimposing a dynamically adapting
    Markov Chain. Also provides an implementation of EMM (TRACDS on top of tNN 
    data stream clustering). 

Interface classes  DSC_tNN and DSC_EMM for the [stream package](https://github.com/mhahsler/stream) are provided.  


```{r echo=FALSE, results = 'asis'}
pkg_citation(pkg, 2L)
pkg_install(pkg)
```

## Usage

We use a artificial dataset with a mixture of four clusters components. Points are generated using a fixed sequence 
<1,2,1,3,4> through the four clusters. The lines below indicate the sequence.

```{r example_data}
library(rEMM)

data("EMMsim")

plot(EMMsim_train, pch = NA)
lines(EMMsim_train, col = "gray")
points(EMMsim_train, pch = EMMsim_sequence_train)
```

EMM recovers the components and the sequence information. We use EMM and then recluster the found structure assuming
that we know that there are 4 components. The graph below represents a Markov model of the found sequence.

```{r example_model}
emm <- EMM(threshold = 0.1, measure = "euclidean")
build(emm, EMMsim_train)
emmc <- recluster_hclust(emm, k = 4, method = "average")
plot(emmc)
```

We can now score new sequences (we use a test sequence created in the same way as the training data) by calculating the product the transition probabilities in the model. The high score indicates this.

```{r}
score(emmc, EMMsim_test)
```

# References
* Michael Hahsler and Margaret H. Dunham. 
  [rEMM: Extensible Markov model for data stream clustering in R.](http://dx.doi.org/10.18637/jss.v035.i05) 
  _Journal of Statistical Software,_ 35(5):1-31, 2010.
* Michael Hahsler and Margaret H. Dunham. 
  [Temporal structure learning for clustering massive data 
    streams in real-time](https://doi.org/10.1137/1.9781611972818.57). 
  In _SIAM Conference on Data Mining (SDM11),_ pages 664--675. SIAM, April 2011.

# Acknowledgements
    
Development of this  package was supported in part by NSF IIS-0948893 and R21HG005912 from
the National Human Genome Research Institute.

Owner

  • Name: Michael Hahsler
  • Login: mhahsler
  • Kind: user
  • Location: Dallas, TX
  • Company: SMU

I develop packages for AI, ML, and Data Science.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 26
  • Total Committers: 1
  • Avg Commits per committer: 26.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Hahsler m****l@h****t 26
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 726 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
  • Total maintainers: 1
cran.r-project.org: rEMM

Extensible Markov Model for Modelling Temporal Relationships Between Clusters

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 726 Last month
Rankings
Forks count: 28.8%
Dependent packages count: 29.8%
Stargazers count: 31.7%
Average: 32.1%
Downloads: 34.9%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 2.10.0 depends
  • igraph * depends
  • proxy >= 0.4 depends
  • MASS * imports
  • cluster * imports
  • clusterGeneration * imports
  • methods * imports
  • stats * imports
  • stream * imports
  • utils * imports
  • Rgraphviz * suggests
  • graph * suggests
  • testthat * suggests