molic
molic: An R package for multivariate outlier detection in contingency tables - Published in JOSS (2019)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
categorical-data
contingency-tables
decomposable-graphical-models
high-dimensional-data
outlier-detection
Scientific Fields
Engineering
Computer Science -
60% confidence
Last synced: 6 months ago
·
JSON representation
Repository
Multivariate Outlierdetection In Contingency Tables
Basic Info
- Host: GitHub
- Owner: mlindsk
- License: gpl-3.0
- Language: R
- Default Branch: master
- Size: 14.1 MB
Statistics
- Stars: 6
- Watchers: 0
- Forks: 6
- Open Issues: 0
- Releases: 0
Topics
categorical-data
contingency-tables
decomposable-graphical-models
high-dimensional-data
outlier-detection
Created almost 7 years ago
· Last pushed almost 4 years ago
Metadata Files
Readme
Changelog
Contributing
License
README.Rmd
---
title: "molic: Multivariate OutLIerdetection In Contingency tables"
output:
github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
message = FALSE,
warnings = FALSE,
fig.path = "man/figures/README-",
out.width = "100%"
)
```
[](https://github.com/mlindsk/molic/actions)
[](https://cran.r-project.org/package=molic)
[](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8)
[](https://zenodo.org/badge/latestdoi/177729633)
## About molic
An **R** package to perform outlier detection in contingency tables (i.e. categorical data) using decomposable graphical models (DGMs); models for which the underlying association between all variables can be depicted by an undirected graph. **molic** are designed to work with undirected decomposable graphs returned from `fit_graph` in the [ess](https://github.com/mlindsk/ess) package. Compute-intensive procedures are implemented using [Rcpp](http://www.rcpp.org/)/C++ for better run-time performance.
## Installation
You can install the current stable release of the package by using the `devtools` package:
```{r, eval = FALSE}
devtools::install_github("mlindsk/molic", build_vignettes = FALSE)
```
## Articles
- [The Outlier Model](https://mlindsk.github.io/molic/articles/outlier_intro.html): The "behind the scenes" of the outlier model.
- [Detecting Skin Diseases](https://mlindsk.github.io/molic/articles/dermatitis.html): An example of using the outlier model to detect skin diseases.
- [Outlier Detection in Genetic Data](https://mlindsk.github.io/molic/articles/genetic_example.html): An example of how to conduct an outlier analysis in genetic data.
## Example of Usage
```{r}
library(dplyr)
library(molic)
library(ess) # For the fit_graph function
set.seed(7) # For reproducibility
```
Psoriasis patients
```{r}
d <- derma %>%
filter(ES == "psoriasis") %>%
select(-ES) %>%
as_tibble()
```
Fitting the interaction graph
```{r}
g <- fit_graph(d, trace = FALSE) # see package ess for details
plot(g, vertex.size = 15)
```
This plot shows how the variables are 'associated' in the psoriasis class; see [ess](https://github.com/mlindsk/ess) for more information about `fit_graph`. The outlier model exploits this knowledge instead of assuming independence between all variables (which would clearly be a wrong assumption looking at the graph). The graph may look very different for other classes than psoriasis.
## Example 1 - Testing which observations within the psoriasis class are outliers
We start by fitting an outlier model taking advantage of the fittet graph `g` which holds information about the psoriasis patients. The print method prints information about the distribution of the (deviance) test statistic.
```{r}
m1 <- fit_outlier(d, g)
print(m1)
```
Notice that `m1` is of class 'outlier'. This means, that the procedure has tested which observations _within_ the data are outliers. This method is most often just referred to as outlier detection. The outliers, on a 5% significance level, can now be extracted as follows:
```{r}
outs <- outliers(m1)
douts <- d[which(outs), ]
douts
```
The following plot is the distribution of the test statistic corresponding to the information retrieved using the print method. One can think of a simple t-test, where the distribution of the test statistic is a t-distribution. In order to conclude on the hypothesis, one finds the critical value and verify if the test statistic is greater or less than this.
```{r}
plot(m1)
```
Retrieving the observed test statistics for the individual observations:
```{r}
x1 <- douts[1, ] %>% unlist() # an outlier
x2 <- d[1, ] %>% unlist() # an inliner
dev1 <- deviance(m1, x1) # falls within the critical region in the plot (the red area)
dev2 <- deviance(m1, x2) # falls within the acceptable region in the plot
dev1
dev2
```
Retrieving the p-values:
```{r}
pval(m1, dev1)
pval(m1, dev2)
```
## Example 2 - Testing if a new observation is an outlier
An observation from class chronic dermatitis:
```{r}
z <- derma %>%
filter(ES == "chronic dermatitis") %>%
select(-ES) %>%
slice(1) %>%
unlist()
```
Test if z is an outlier in class psoriasis:
```{r}
m2 <- fit_outlier(d, g, z)
print(m2)
plot(m2)
```
Notice that `m2` is of class 'novelty'. The term _novelty detection_ is sometimes used in the litterature when the goal is to verify if a new unseen observation is an outlier in a homogeneous dataset. Retrieving the test statistic and p-value for `z`
```{r}
dz <- deviance(m2, z)
pval(m2, dz)
```
## How To Cite
If you want to cite the **outlier method** please use
```latex
@article{lindskououtlier,
title={Outlier Detection in Contingency Tables Using Decomposable Graphical Models},
author={Lindskou, Mads and Svante Eriksen, Poul and Tvedebrink, Torben},
journal={Scandinavian Journal of Statistics},
publisher={Wiley Online Library},
doi={10.1111/sjos.12407},
year={2019}
}
```
If you want to cite the **molic** package please use
```latex
@software{lindskoumolic,
author = {Mads Lindskou},
title = {{molic: An R package for multivariate outlier
detection in contingency tables}},
month = oct,
year = 2019,
publisher = {Journal of Open Source Software},
doi = {10.21105/joss.01665},
url = {https://doi.org/10.21105/joss.01665}
}
```
Owner
- Login: mlindsk
- Kind: user
- Repositories: 5
- Profile: https://github.com/mlindsk
JOSS Publication
molic: An R package for multivariate outlier detection in contingency tables
Published
October 10, 2019
Volume 4, Issue 42, Page 1665
Authors
Tags
Rcpp outlier detection contingency tables graphical models decomposable graphsGitHub Events
Total
- Fork event: 1
Last Year
- Fork event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Mads | m****s@m****k | 206 |
| Charlotte Soneson | c****n@g****m | 2 |
| Yihui Xie | x****e@y****e | 1 |
Committer Domains (Top 20 + Academic)
yihui.name: 1
math.aau.dk: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 29
- Average time to close issues: about 1 hour
- Average time to close pull requests: about 1 hour
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 0.0
- Average comments per pull request: 0.14
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jdeligt (1)
Pull Request Authors
- mlindsk (26)
- csoneson (2)
- yihui (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 203 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
- Total maintainers: 1
cran.r-project.org: molic
Multivariate Outlier Detection in Contingency Tables
- Homepage: https://github.com/mlindsk/molic
- Documentation: http://cran.r-project.org/web/packages/molic/molic.pdf
- License: GPL-3
- Status: removed
-
Latest release: 2.0.3
published over 4 years ago
Rankings
Forks count: 11.3%
Stargazers count: 21.1%
Dependent packages count: 29.8%
Average: 33.0%
Dependent repos count: 35.5%
Downloads: 67.6%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- Rcpp * imports
- doParallel * imports
- ess * imports
- foreach * imports
- ggplot2 * imports
- ggridges * imports
- dplyr * suggests
- igraph * suggests
- knitr * suggests
- pander * suggests
- rmarkdown * suggests
- testthat * suggests
