https://github.com/agnesdeng/miae
miae: multiple imputation through autoencoders
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
miae: multiple imputation through autoencoders
Basic Info
- Host: GitHub
- Owner: agnesdeng
- License: gpl-3.0
- Language: R
- Default Branch: main
- Size: 2.27 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created almost 4 years ago
· Last pushed about 1 year ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
echo = TRUE,
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
tidy.opts = list(width.cutoff = 60), tidy = TRUE,
dpi = 150, fig.asp = 0.5, fig.width = 7, fig.retina = 1,
out.width = "95%",
warning = FALSE, message = FALSE
)
```
# miae
**miae** is an R package for multiple imputation through autoencoders built with **Torch**. It's currently under development.
## 1. Installation
You can install the current development version of miae from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
#devtools::install_github("agnesdeng/miae")
```
``` {r}
#To load the nhanes3_newborn dataset from the R package mixgb
library(mixgb)
#To obtain visualization diagnostics plot
library(vismi)
#Multiple imputation through autoencoder
library(miae)
```
## 2. Multiple imputation with denoising autoencoder with dropout
```{r}
#load the nhanes3_newborn dataset from the R package mixgb
data("nhanes3_newborn")
str(nhanes3_newborn)
colSums(is.na(nhanes3_newborn))
```
```{r,eval=FALSE}
#use default setting
midae.default <- midae(data = nhanes3_newborn, m = 5,
save.model = TRUE, path = file.path (tempdir ( ), "midaemodel.pt"))
)
```
```{r}
#use customized setting
params<-list(input.dropout = 0.1, hidden.dropout = 0.3,
optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
scaler = "robust",
act = "elu",
init.weight = "he.normal.elu.dropout")
midae.data<- midae(data = nhanes3_newborn, m = 5,
categorical.encoding = "onehot", device = "cpu",
epochs = 10, batch.size = 32,
subsample = 1, early.stopping.epochs = 1,
dae.params = params, pmm.type = NULL,
save.model = FALSE, path = file.path (tempdir ( ) , "midaemodel.pt"))
```
```{r, eval= FALSE}
# obtain the fifth imputed dataset
midae.data[[5]]
```
```{r}
# show the imputed values for missing entries in the variable "BMPHEAD"
show_var(imputation.list = midae.data, var.name = "BMPHEAD", original.data = nhanes3_newborn)
plot2D(imputation.list = midae.data, var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn)
```
## 3. Multiple imputation with variational autoencoder
```{r,eval=FALSE}
#use default setting
mivae.default <- mivae(data = nhanes3_newborn, m = 5,
save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
)
```
```{r}
#use customized setting
params<-list(beta = 0.95,
optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
scaler = "robust",
act = "elu",
init.weight = "he.normal.elu")
mivae.data<- mivae(data = nhanes3_newborn, m = 5,
categorical.encoding = "onehot", device = "cpu",
epochs = 10, batch.size = 32,
subsample =1,
vae.params = params, pmm.type = NULL,
save.model = FALSE, path = file.path (tempdir ( ), "mivaemodel.pt"))
```
```{r}
plot2D(imputation.list = mivae.data, var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn, shape = T)
```
## 4. Impute new data using a saved imputation model
```{r}
set.seed(2023)
n <- nrow(nhanes3_newborn)
idx <- sample(1:n, size = round(0.7 * n), replace = FALSE)
train.data <- nhanes3_newborn[idx, ]
test.data <- nhanes3_newborn[-idx, ]
```
```{r}
mivae.obj<- mivae(data = train.data, m = 5,
categorical.encoding = "onehot", device = "cpu",
epochs = 10, batch.size = 32,
vae.params = params, pmm.type = NULL,
save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
```
```{r}
mivae.newdata <- impute_new(object = mivae.obj, newdata = test.data, m = 5)
```
Owner
- Name: (Agnes) Yongshi Deng
- Login: agnesdeng
- Kind: user
- Location: New Zealand
- Repositories: 3
- Profile: https://github.com/agnesdeng
Statistics PhD student at the University of Auckland
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- data.table * imports
- ggplot2 * imports
- magrittr * imports
- rlang * imports
- stats * imports
- tibble * imports
- tidyr * imports
- torch * imports
- torchopt * imports
- utils * imports
- RColorBrewer * suggests
- knitr * suggests
- rmarkdown * suggests