mixR

mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data - Published in JOSS (2022)

https://github.com/garybaylor/mixr

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 14 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary

Scientific Fields

Engineering Computer Science - 40% confidence
Last synced: 6 months ago · JSON representation

Repository

R package for fitting finite mixture models for both raw and binned data

Basic Info
  • Host: GitHub
  • Owner: GaryBAYLOR
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Size: 18 MB
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog License

README.md

CRAN Status Badge CRAN Downloads CRAN Monthly Downloads DOI DOI

mixR: An R package for finite mixture modeling for both raw and binned data

Why mixR?

R programming language provides a rich collection of packages for building and analyzing finite mixture models which are widely used in unsupervised learning such as model-based clustering and density estimation. For example, - mclust can be used to build Gaussian mixture models with different covariance structures - mixtools implements parametric and non-parametric mixture models as well as mixtures of Gaussian regressions - flexmix provides a general framework for finite mixtures of regression models - mixdist fits mixture models for grouped and conditional data (also called binned data).

To our knowledge, almost all R packages for finite mixture models are designed to use raw data as the modeling input except mixdist. However the popular model selection methods based on information criteria or bootstrapping likelihood ratio test (McLachlan, 1987; Feng & McCulloch, 1996; Yu & Harvill, 2019) are not implemented in mixdist.

mixR is a package that aims to bridge this gap and to unify the interface for finite mixture modeling for both raw and binned data.

Installation

For stable/pre-compiled(for Windows and OS X) version, please install from CRAN:

r install.packages('mixR')

To get the latest development version from Github: ```r

install.packages('devtools')

devtools::install_github('garybaylor/mixR') ```

Examples

  • Fitting a normal mixture model ```r library(mixR)

generate data from a Normal mixture model

set.seed(102) x1 = rmixnormal(1000, c(0.3, 0.7), c(-2, 3), c(2, 1))

fit a Normal mixture model

mod1 = mixfit(x1, ncomp = 2)

plot the fitted model

plot(mod1)

fit a Normal mixture model (equal variance)

mod1_ev = mixfit(x1, ncomp = 2, ev = TRUE) ```

  • Fitting a Weibull mixture model r # generate data from a Weibull mixture model x2 = rmixweibull(1000, c(0.4, 0.6), c(0.6, 1.3), c(0.1, 0.1)) mod2_weibull = mixfit(x2, family = 'weibull', ncomp = 2)
  • Fitting a mixture model with binned data ```r head(Stamp2) ## lower upper freq ## 1 0.0595 0.0605 1 ## 5 0.0635 0.0645 2 ## 6 0.0645 0.0655 1 ## 7 0.0655 0.0665 1 ## 9 0.0675 0.0685 1 ## 10 0.0685 0.0695 7 modbinned = mixfit(Stamp2, ncomp = 7, family = 'weibull') plot(modbinned)

data binned from numeric data

x1binned = bin(x1, seq(min(x1), max(x1), length = 30)) mod1binned = mixfit(x1_binned, ncomp = 2) ```

  • Mixture model selection by BIC ```r # Selecting the best g for Normal mixture model s_normal = select(x2, ncomp = 2:6)

Selecting the best g for Weibull mixture model

s_weibull = select(x2, ncomp = 2:6, family = 'weibull')

plot(sweibull) plot(snormal) ```

  • Mixture model selection by bootstrap likelihood ratio test (LRT) ```r b1 = bs.test(x1, ncomp = c(2, 3)) plot(b1, main = 'Bootstrap LRT for Normal Mixture Models (g = 2 vs g = 3)') b1$pvalue

b2 = bs.test(x2, ncomp = c(2, 4)) plot(b2, main = 'Bootstrap LRT for Normal Mixture Models (g = 2 vs g = 4)') b2$pvalue ``` For more examples please check the vignette An Introduction to mixR.

Contributor Code of Conduct

Everyone is welcome to contribute to the project through reporting issues, posting feature requests, updating documentation, submitting pull requests, or contact the project maintainer directly. To maintain a friendly atmosphere and to collaborate in a fun and productive way, we expect contributors to abide by the Contributor Code of Conduct.

Citation

Yu, Y., (2022). mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data. Journal of Open Source Software, 7(69), 4031, https://doi.org/10.21105/joss.04031

BibTex information @article{Yu2022, doi = {10.21105/joss.04031}, url = {https://doi.org/10.21105/joss.04031}, year = {2022}, publisher = {The Open Journal}, volume = {7}, number = {69}, pages = {4031}, author = {Youjiao Yu}, title = {mixR: An R package for Finite Mixture Modeling for Both Raw and Binned Data}, journal = {Journal of Open Source Software} }

Owner

  • Name: cookie_monster
  • Login: GaryBAYLOR
  • Kind: user
  • Location: Bay Area

Data Scientist

GitHub Events

Total
  • Issues event: 1
Last Year
  • Issues event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 69
  • Total Committers: 4
  • Avg Commits per committer: 17.25
  • Development Distribution Score (DDS): 0.043
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Youjiao (Gary) Yu j****o@g****m 66
soodoku g****7@g****m 1
RetoSchmucki r****7@g****m 1
Xiaozhen Han x****n@X****l 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 10
  • Total pull requests: 29
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 1 day
  • Total issue authors: 3
  • Total pull request authors: 3
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.07
  • Merged pull requests: 29
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 2 months
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • soodoku (8)
  • DzmitryGB (1)
  • welch16 (1)
Pull Request Authors
  • GaryBAYLOR (27)
  • RetoSchmucki (2)
  • soodoku (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • Rcpp >= 1.0.6 imports
  • ggplot2 >= 3.3.3 imports
  • graphics * imports
  • stats * imports
  • knitr * suggests
  • mockery * suggests
  • rmarkdown * suggests
  • testthat >= 3.0.0 suggests