https://github.com/akcochrane/mvnclean
Cleaning utilities for multivariate normal data
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Last synced: 9 months ago
·
JSON representation
Repository
Cleaning utilities for multivariate normal data
Basic Info
- Host: GitHub
- Owner: akcochrane
- License: mit
- Language: R
- Default Branch: main
- Size: 19.5 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
Readme
License
README.Rmd
---
output:
md_document:
variant: markdown_github
---
# MVNclean
```{r setup,echo=F}
library(MVNclean)
## how to include zenodo: [](https://zenodo.org/badge/latestdoi/425863127)
```
[](https://opensource.org/licenses/MIT)
## Overview of MVNclean
Outliers are a pervasive problem in many forms of data analysis, and these problematic observations can be even more insidious as the number of variables increases.
Outliers may be univariate, multivariate, or both [Leys et al., 2018](doi.org/10.1016/j.jesp.2017.09.011).
If you use the functions in published work, please this package using the [Zenodo DOI ](INSERT_LINK). Much appreciated!
## Function introductions
### YeoJohn: Yeo-Johnson Transformation
In cases wherein variables have univariate skew, monotonic transformations such as the Yeo-Johnson
transformation, which uses a single parameter `lambda`, can be applied. The function `YeoJohn` finds the
optimal `lambda` for minimizing univariate skew on a trimmed vector (e.g., after removing the highest and
lowest 10% of values), and applies that `lambda` to the entire untrimmed vector.
### RUW: Robust Univariate Winsorization
In cases wherein variables have univariate outliers, winsorization replaces outlying values with values
associated with somewhere putatively "in-distribution." The function `RUW` applies this univariate
winsorization to a vector using robust estimates of center (median) and dispersion (asymmetric median absolute
deviation),
### RMW: Robust Multivariate Winsorization
In cases where a dataset's outliers are multivariate rather than univariate, these outliers can be
difficult to detect. The function `RMW` identifies multivariate outliers using robust Mahalanobis
distance and then moves these outliers toward the multivariate centroid
### MVNclean: Multivariate Normal Cleaning
The function `MVNclean` applies the pipeline of `YeoJohn`, `RUW`, and `RMW` to a dataset.
## Installing the package
The R package `devtools` includes a very easy way to install packages from Github.
```
devtools::install_github('akcochrane/MVNclean')
```
Owner
- Name: Aaron Cochrane
- Login: akcochrane
- Kind: user
- Website: aaron-cochrane.me
- Repositories: 2
- Profile: https://github.com/akcochrane
Researcher of visual cognition and learning at the University of Geneva.
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2
Dependencies
DESCRIPTION
cran