https://github.com/agnesdeng/miae

miae: multiple imputation through autoencoders

https://github.com/agnesdeng/miae

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

miae: multiple imputation through autoencoders

Basic Info
  • Host: GitHub
  • Owner: agnesdeng
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 2.27 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  tidy.opts = list(width.cutoff = 60), tidy = TRUE,
  dpi = 150, fig.asp = 0.5, fig.width = 7, fig.retina = 1,
  out.width = "95%",
  warning = FALSE, message = FALSE
)
```

# miae




**miae** is an R package for multiple imputation through autoencoders built with **Torch**. It's currently under development.


## 1. Installation

You can install the current development version of miae from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
#devtools::install_github("agnesdeng/miae")
```

``` {r}
#To load the nhanes3_newborn dataset from the R package mixgb
library(mixgb)
#To obtain visualization diagnostics plot
library(vismi)
#Multiple imputation through autoencoder
library(miae)
```

## 2. Multiple imputation with denoising autoencoder with dropout
```{r}
#load the nhanes3_newborn dataset from the R package mixgb
data("nhanes3_newborn")
str(nhanes3_newborn)
colSums(is.na(nhanes3_newborn))
```

```{r,eval=FALSE}
#use default setting
midae.default <- midae(data = nhanes3_newborn, m = 5,
                       save.model = TRUE, path = file.path (tempdir ( ), "midaemodel.pt"))
)
```

```{r}
#use customized setting
params<-list(input.dropout = 0.1, hidden.dropout = 0.3,
             optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
             scaler = "robust",
             act = "elu",
             init.weight = "he.normal.elu.dropout")

midae.data<- midae(data = nhanes3_newborn, m = 5, 
                       categorical.encoding = "onehot", device = "cpu", 
                       epochs = 10, batch.size = 32, 
                       subsample = 1, early.stopping.epochs = 1,
                       dae.params = params, pmm.type = NULL, 
                       save.model = FALSE, path = file.path (tempdir ( ) , "midaemodel.pt"))

```

```{r, eval= FALSE}
# obtain the fifth imputed dataset
midae.data[[5]]
``` 


```{r}
# show the imputed values for missing entries in the variable "BMPHEAD"
show_var(imputation.list = midae.data, var.name = "BMPHEAD", original.data = nhanes3_newborn)
plot2D(imputation.list = midae.data,  var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn)
```

## 3. Multiple imputation with variational autoencoder

```{r,eval=FALSE}
#use default setting
mivae.default <- mivae(data = nhanes3_newborn, m = 5,
                       save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
)
```

```{r}
#use customized setting
params<-list(beta = 0.95, 
             optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
             scaler = "robust",
             act = "elu",
             init.weight = "he.normal.elu")

mivae.data<- mivae(data = nhanes3_newborn, m = 5, 
                   categorical.encoding = "onehot", device = "cpu", 
                   epochs = 10, batch.size = 32, 
                   subsample =1, 
                   vae.params = params, pmm.type = NULL, 
                   save.model = FALSE, path = file.path (tempdir ( ), "mivaemodel.pt"))

```
```{r}
plot2D(imputation.list = mivae.data,  var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn, shape = T)
```


## 4. Impute new data using a saved imputation model
```{r}
set.seed(2023)
n <- nrow(nhanes3_newborn)
idx <- sample(1:n, size = round(0.7 * n), replace = FALSE)
train.data <- nhanes3_newborn[idx, ]
test.data <- nhanes3_newborn[-idx, ]
```


```{r}
mivae.obj<- mivae(data = train.data, m = 5, 
                   categorical.encoding = "onehot", device = "cpu", 
                   epochs = 10, batch.size = 32, 
                   vae.params = params, pmm.type = NULL, 
                   save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
```

```{r}
mivae.newdata <- impute_new(object = mivae.obj, newdata = test.data, m = 5)
```

Owner

  • Name: (Agnes) Yongshi Deng
  • Login: agnesdeng
  • Kind: user
  • Location: New Zealand

Statistics PhD student at the University of Auckland

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • data.table * imports
  • ggplot2 * imports
  • magrittr * imports
  • rlang * imports
  • stats * imports
  • tibble * imports
  • tidyr * imports
  • torch * imports
  • torchopt * imports
  • utils * imports
  • RColorBrewer * suggests
  • knitr * suggests
  • rmarkdown * suggests