https://github.com/akcochrane/mvnclean

Cleaning utilities for multivariate normal data

https://github.com/akcochrane/mvnclean

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Cleaning utilities for multivariate normal data

Basic Info
  • Host: GitHub
  • Owner: akcochrane
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 19.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.Rmd

---
output:
  md_document:
    variant: markdown_github
---



# MVNclean
```{r setup,echo=F}
library(MVNclean)

## how to include zenodo: [![DOI](https://zenodo.org/badge/425863127.svg)](https://zenodo.org/badge/latestdoi/425863127)
```

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview of MVNclean

Outliers are a pervasive problem in many forms of data analysis, and these problematic observations can be even more insidious as the number of variables increases.
Outliers may be univariate, multivariate, or both [Leys et al., 2018](doi.org/10.1016/j.jesp.2017.09.011).

If you use the functions in published work, please this package using the [Zenodo DOI ](INSERT_LINK). Much appreciated!

## Function introductions

### YeoJohn: Yeo-Johnson Transformation

In cases wherein variables have univariate skew, monotonic transformations such as the Yeo-Johnson
transformation, which uses a single parameter `lambda`, can be applied. The function `YeoJohn` finds the
optimal `lambda` for minimizing univariate skew on a trimmed vector (e.g., after removing the highest and
lowest 10% of values), and applies that `lambda` to the entire untrimmed vector.

### RUW: Robust Univariate Winsorization

In cases wherein variables have univariate outliers, winsorization replaces outlying values with values 
associated with somewhere putatively "in-distribution." The function `RUW` applies this univariate 
winsorization to a vector using robust estimates of center (median) and dispersion (asymmetric median absolute 
deviation), 

### RMW: Robust Multivariate Winsorization

In cases where a dataset's outliers are multivariate rather than univariate, these outliers can be
difficult to detect. The function `RMW` identifies multivariate outliers using robust Mahalanobis 
distance and then moves these outliers toward the multivariate centroid

### MVNclean: Multivariate Normal Cleaning

The function `MVNclean` applies the pipeline of `YeoJohn`, `RUW`, and `RMW` to a dataset.

## Installing the package

The R package `devtools` includes a very easy way to install packages from Github.

```
devtools::install_github('akcochrane/MVNclean')
```

Owner

  • Name: Aaron Cochrane
  • Login: akcochrane
  • Kind: user

Researcher of visual cognition and learning at the University of Geneva.

GitHub Events

Total
  • Push event: 1
  • Create event: 2
Last Year
  • Push event: 1
  • Create event: 2

Dependencies

DESCRIPTION cran