https://github.com/agnesdeng/miae

miae: multiple imputation through autoencoders

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

miae: multiple imputation through autoencoders

Basic Info

Host: GitHub
Owner: agnesdeng
License: gpl-3.0
Language: R
Default Branch: main
Size: 2.27 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 4 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  echo = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  tidy.opts = list(width.cutoff = 60), tidy = TRUE,
  dpi = 150, fig.asp = 0.5, fig.width = 7, fig.retina = 1,
  out.width = "95%",
  warning = FALSE, message = FALSE
)
```

# miae




**miae** is an R package for multiple imputation through autoencoders built with **Torch**. It's currently under development.


## 1. Installation

You can install the current development version of miae from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
#devtools::install_github("agnesdeng/miae")
```

``` {r}
#To load the nhanes3_newborn dataset from the R package mixgb
library(mixgb)
#To obtain visualization diagnostics plot
library(vismi)
#Multiple imputation through autoencoder
library(miae)
```

## 2. Multiple imputation with denoising autoencoder with dropout
```{r}
#load the nhanes3_newborn dataset from the R package mixgb
data("nhanes3_newborn")
str(nhanes3_newborn)
colSums(is.na(nhanes3_newborn))
```

```{r,eval=FALSE}
#use default setting
midae.default <- midae(data = nhanes3_newborn, m = 5,
                       save.model = TRUE, path = file.path (tempdir ( ), "midaemodel.pt"))
)
```

```{r}
#use customized setting
params<-list(input.dropout = 0.1, hidden.dropout = 0.3,
             optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
             scaler = "robust",
             act = "elu",
             init.weight = "he.normal.elu.dropout")

midae.data<- midae(data = nhanes3_newborn, m = 5, 
                       categorical.encoding = "onehot", device = "cpu", 
                       epochs = 10, batch.size = 32, 
                       subsample = 1, early.stopping.epochs = 1,
                       dae.params = params, pmm.type = NULL, 
                       save.model = FALSE, path = file.path (tempdir ( ) , "midaemodel.pt"))

```

```{r, eval= FALSE}
# obtain the fifth imputed dataset
midae.data[[5]]
``` 


```{r}
# show the imputed values for missing entries in the variable "BMPHEAD"
show_var(imputation.list = midae.data, var.name = "BMPHEAD", original.data = nhanes3_newborn)
plot2D(imputation.list = midae.data,  var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn)
```

## 3. Multiple imputation with variational autoencoder

```{r,eval=FALSE}
#use default setting
mivae.default <- mivae(data = nhanes3_newborn, m = 5,
                       save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
)
```

```{r}
#use customized setting
params<-list(beta = 0.95, 
             optimizer = "adamW", learning.rate = 0.0001, encoder.structure = c(128, 64, 32), decoder.structure = c(32, 64, 128),
             scaler = "robust",
             act = "elu",
             init.weight = "he.normal.elu")

mivae.data<- mivae(data = nhanes3_newborn, m = 5, 
                   categorical.encoding = "onehot", device = "cpu", 
                   epochs = 10, batch.size = 32, 
                   subsample =1, 
                   vae.params = params, pmm.type = NULL, 
                   save.model = FALSE, path = file.path (tempdir ( ), "mivaemodel.pt"))

```
```{r}
plot2D(imputation.list = mivae.data,  var.x = "BMPHEAD", var.y = "BMPRECUM", original.data = nhanes3_newborn, shape = T)
```


## 4. Impute new data using a saved imputation model
```{r}
set.seed(2023)
n <- nrow(nhanes3_newborn)
idx <- sample(1:n, size = round(0.7 * n), replace = FALSE)
train.data <- nhanes3_newborn[idx, ]
test.data <- nhanes3_newborn[-idx, ]
```


```{r}
mivae.obj<- mivae(data = train.data, m = 5, 
                   categorical.encoding = "onehot", device = "cpu", 
                   epochs = 10, batch.size = 32, 
                   vae.params = params, pmm.type = NULL, 
                   save.model = TRUE, path = file.path (tempdir ( ), "mivaemodel.pt"))
```

```{r}
mivae.newdata <- impute_new(object = mivae.obj, newdata = test.data, m = 5)
```

Owner

Name: (Agnes) Yongshi Deng
Login: agnesdeng
Kind: user
Location: New Zealand

Repositories: 3
Profile: https://github.com/agnesdeng

Statistics PhD student at the University of Auckland

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
data.table * imports
ggplot2 * imports
magrittr * imports
rlang * imports
stats * imports
tibble * imports
tidyr * imports
torch * imports
torchopt * imports
utils * imports
RColorBrewer * suggests
knitr * suggests
rmarkdown * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/agnesdeng/miae

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Dependencies