molic

molic: An R package for multivariate outlier detection in contingency tables - Published in JOSS (2019)

https://github.com/mlindsk/molic

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

categorical-data contingency-tables decomposable-graphical-models high-dimensional-data outlier-detection

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation

Repository

Multivariate Outlierdetection In Contingency Tables

Basic Info
  • Host: GitHub
  • Owner: mlindsk
  • License: gpl-3.0
  • Language: R
  • Default Branch: master
  • Size: 14.1 MB
Statistics
  • Stars: 6
  • Watchers: 0
  • Forks: 6
  • Open Issues: 0
  • Releases: 0
Topics
categorical-data contingency-tables decomposable-graphical-models high-dimensional-data outlier-detection
Created almost 7 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Changelog Contributing License

README.Rmd

---
title: "molic: Multivariate OutLIerdetection In Contingency tables"
output:
  github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warnings = FALSE,
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

[![R build status](https://github.com/mlindsk/molic/workflows/R-CMD-check/badge.svg)](https://github.com/mlindsk/molic/actions)
[![](https://www.r-pkg.org/badges/version/molic?color=green)](https://cran.r-project.org/package=molic)
[![status](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8/status.svg)](https://joss.theoj.org/papers/9fa65ced7bf3db01343d68b4488196d8)
[![DOI](https://zenodo.org/badge/177729633.svg)](https://zenodo.org/badge/latestdoi/177729633)

## About molic

An **R** package to perform outlier detection in contingency tables (i.e. categorical data) using decomposable graphical models (DGMs); models for which the underlying association between all variables can be depicted by an undirected graph. **molic** are designed to work with undirected decomposable graphs returned from `fit_graph` in the [ess](https://github.com/mlindsk/ess) package. Compute-intensive procedures are implemented using [Rcpp](http://www.rcpp.org/)/C++ for better run-time performance.

## Installation

You can install the current stable release of the package by using the `devtools` package: 

```{r, eval = FALSE}
devtools::install_github("mlindsk/molic", build_vignettes = FALSE)
```

## Articles

 - [The Outlier Model](https://mlindsk.github.io/molic/articles/outlier_intro.html): The "behind the scenes" of the outlier model.
 - [Detecting Skin Diseases](https://mlindsk.github.io/molic/articles/dermatitis.html): An example of using the outlier model to detect skin diseases. 
 - [Outlier Detection in Genetic Data](https://mlindsk.github.io/molic/articles/genetic_example.html): An example of how to conduct an outlier analysis in genetic data.


## Example of Usage

```{r}
library(dplyr)
library(molic)
library(ess)   # For the fit_graph function
set.seed(7)    # For reproducibility
```

Psoriasis patients

```{r}
d <- derma %>%
  filter(ES == "psoriasis") %>%
  select(-ES) %>%
  as_tibble()
```

Fitting the interaction graph

```{r}
g <- fit_graph(d, trace = FALSE) # see package ess for details
plot(g, vertex.size = 15) 
```

This plot shows how the variables are 'associated' in the psoriasis class; see [ess](https://github.com/mlindsk/ess) for more information about `fit_graph`. The outlier model exploits this knowledge instead of assuming independence between all variables (which would clearly be a wrong assumption looking at the graph). The graph may look very different for other classes than psoriasis.

## Example 1 - Testing which observations within the psoriasis class are outliers

We start by fitting an outlier model taking advantage of the fittet graph `g` which holds information about the psoriasis patients. The print method prints information about the distribution of the (deviance) test statistic.

```{r}
m1 <- fit_outlier(d, g)
print(m1)
```

Notice that `m1` is of class 'outlier'. This means, that the procedure has tested which observations _within_ the data are outliers. This method is most often just referred to as outlier detection. The outliers, on a 5% significance level, can now be extracted as follows:

```{r}
outs  <- outliers(m1)
douts <- d[which(outs), ]
douts
```

The following plot is the distribution of the test statistic corresponding to the information retrieved using the print method. One can think of a simple t-test, where the distribution of the test statistic is a t-distribution. In order to conclude on the hypothesis, one finds the critical value and verify if the test statistic is greater or less than this.

```{r}
plot(m1) 
```

Retrieving the observed test statistics for the individual observations:

```{r}
x1   <- douts[1, ] %>% unlist() # an outlier
x2   <- d[1, ] %>% unlist()     # an inliner
dev1 <- deviance(m1, x1) # falls within the critical region in the plot (the red area)
dev2 <- deviance(m1, x2) # falls within the acceptable region in the plot
dev1
dev2
```

Retrieving the p-values:

```{r}
pval(m1, dev1)
pval(m1, dev2)
```

## Example 2 - Testing if a new observation is an outlier
 
An observation from class chronic dermatitis: 

```{r}
z <- derma %>%
  filter(ES == "chronic dermatitis") %>%
  select(-ES) %>%
  slice(1) %>%
  unlist()
```

Test if z is an outlier in class psoriasis:

```{r}
m2 <- fit_outlier(d, g, z)
print(m2)
plot(m2)
```

Notice that `m2` is of class 'novelty'. The term _novelty detection_ is sometimes used in the litterature when the goal is to verify if a new unseen observation is an outlier in a homogeneous dataset. Retrieving the test statistic and p-value for `z`

```{r}
dz <- deviance(m2, z)
pval(m2, dz)
```

## How To Cite

If you want to cite the **outlier method** please use

```latex
@article{lindskououtlier,
  title={Outlier Detection in Contingency Tables Using Decomposable Graphical Models},
  author={Lindskou, Mads and Svante Eriksen, Poul and Tvedebrink, Torben},
  journal={Scandinavian Journal of Statistics},
  publisher={Wiley Online Library},
  doi={10.1111/sjos.12407},
  year={2019}
}
```

If you want to cite the **molic** package please use

```latex
@software{lindskoumolic,
  author       = {Mads Lindskou},
  title        = {{molic: An R package for multivariate outlier 
                   detection in contingency tables}},
  month        = oct,
  year         = 2019,
  publisher    = {Journal of Open Source Software},
  doi          = {10.21105/joss.01665},
  url          = {https://doi.org/10.21105/joss.01665}
}
```


Owner

  • Login: mlindsk
  • Kind: user

JOSS Publication

molic: An R package for multivariate outlier detection in contingency tables
Published
October 10, 2019
Volume 4, Issue 42, Page 1665
Authors
Mads Lindskou ORCID
Department of Mathematical Sciences, Aalborg University, Denmark, Section of Forensic Genetics, Department of Forensic Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
Editor
Charlotte Soneson ORCID
Tags
Rcpp outlier detection contingency tables graphical models decomposable graphs

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 209
  • Total Committers: 3
  • Avg Commits per committer: 69.667
  • Development Distribution Score (DDS): 0.014
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Mads m****s@m****k 206
Charlotte Soneson c****n@g****m 2
Yihui Xie x****e@y****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 29
  • Average time to close issues: about 1 hour
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.14
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jdeligt (1)
Pull Request Authors
  • mlindsk (26)
  • csoneson (2)
  • yihui (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 203 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
cran.r-project.org: molic

Multivariate Outlier Detection in Contingency Tables

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 203 Last month
Rankings
Forks count: 11.3%
Stargazers count: 21.1%
Dependent packages count: 29.8%
Average: 33.0%
Dependent repos count: 35.5%
Downloads: 67.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • Rcpp * imports
  • doParallel * imports
  • ess * imports
  • foreach * imports
  • ggplot2 * imports
  • ggridges * imports
  • dplyr * suggests
  • igraph * suggests
  • knitr * suggests
  • pander * suggests
  • rmarkdown * suggests
  • testthat * suggests