rmidas

R package for missing-data imputation with deep learning

https://github.com/midasverse/rmidas

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary

Keywords

deep-learning imputation-methods neural-network r reticulate tensorflow
Last synced: 6 months ago · JSON representation

Repository

R package for missing-data imputation with deep learning

Basic Info
  • Host: GitHub
  • Owner: MIDASverse
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 24.4 MB
Statistics
  • Stars: 34
  • Watchers: 4
  • Forks: 5
  • Open Issues: 13
  • Releases: 6
Topics
deep-learning imputation-methods neural-network r reticulate tensorflow
Created over 5 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: 
  github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

# rMIDAS 


[![CRAN status](https://www.r-pkg.org/badges/version/rMIDAS)](https://cran.r-project.org/package=rMIDAS/)
[![lifecycle](https://img.shields.io/badge/lifecycle-maturing-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html)
[![Last-changedate](https://img.shields.io/badge/last%20change-`r gsub('-', '--', Sys.Date())`-yellowgreen.svg)](https://github.com/MIDASverse/rMIDAS/commits/master/)
[![R-CMD-check-Linux](https://github.com/MIDASverse/rMIDAS/actions/workflows/testlinux.yml/badge.svg)](https://github.com/MIDASverse/rMIDAS/actions/workflows/testlinux.yml)
[![R-CMD-check-macOS](https://github.com/MIDASverse/rMIDAS/actions/workflows/testmacos.yml/badge.svg)](https://github.com/MIDASverse/rMIDAS/actions/workflows/testmacos.yml)
[![R-CMD-check-Windows](https://github.com/MIDASverse/rMIDAS/actions/workflows/testwindows.yml/badge.svg)](https://github.com/MIDASverse/rMIDAS/actions/workflows/testwindows.yml)


## Overview

**rMIDAS** is an R package for accurate and efficient multiple imputation using deep learning methods. The package provides a simplified workflow for imputing and then analyzing data:

* `convert()` carries out all necessary preprocessing steps
* `train()` constructs and trains a MIDAS imputation model
* `complete()` generates multiple completed datasets from the trained model
* `combine()` runs regression analysis across the complete data, following Rubin's combination rules

**rMIDAS** is based on the Python package [MIDASpy](https://github.com/MIDASverse/MIDASpy).

### Efficient handling of large data

rMIDAS also incorporates several features to streamline and improve the the efficiency of multiple imputation analysis:

* Optimisation for large datasets using `data.table` and `mltools` packages
* Automatic reversing of all pre-processing steps prior to analysis
* Built-in regression function based on `glm` (applying Rubin’s combination rules)

### Background and suggested citations

For more information on MIDAS, the method underlying the software, see:

Lall, Ranjit, and Thomas Robinson. 2022. "The MIDAS Touch: Accurate and Scalable Missing-Data Imputation with Deep Learning." _Political Analysis_ 30, no. 2: 179-196. [Published version](https://ranjitlall.github.io/assets/pdf/Lall%20and%20Robinson%202022%20PA.pdf). [Accepted version](http://eprints.lse.ac.uk/108170/1/Lall_Robinson_PA_Forthcoming.pdf).

Lall, Ranjit, and Thomas Robinson. 2023. "Efficient Multiple Imputation for Diverse Data in Python and R: MIDASpy and rMIDAS." _Journal of Statistical Software_. [Accepted version](https://ranjitlall.github.io/assets/pdf/jss4379.pdf) (in press).

## Installation

rMIDAS is available on [CRAN](https://cran.r-project.org/package=rMIDAS). To install the package in R, you can use the following code:

```{r, eval = FALSE}
install.packages("rMIDAS")
```

To install the latest development version, use the following code:

```{r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("MIDASverse/rMIDAS")
```

Note that rMIDAS uses the [reticulate](https://github.com/rstudio/reticulate) package to interface with Python. When the package is first loaded, it will prompt the user on whether to set up a Python environment and its dependencies automatically. Users that choose to set up the environment and dependencies manually, or who use rMIDAS in headless mode can specify a Python binary using `set_python_env()` (examples below). Currently, Python versions from 3.6 to 3.10 are supported. For a custom Python environment the following dependencies are also required:

  * matplotlib
  * numpy
  * pandas
  * scikit-learn
  * scipy
  * statsmodels
  * tensorflow (<2.12.0)
  * tensorflow-addons (<0.20.0)

Setting a custom Python install must be performed *before* training or imputing data occurs. To manually set up a Python environment:

```{r, eval = FALSE}
library(rMIDAS)
# Decline the automatic setup

# Point to a Python binary
set_python_env(x = "path/to/python/binary")

# Or point to a virtualenv binary
set_python_env(x = "virtual_env", type = "virtualenv")

# Or point to a conda environment
set_python_env(x = "conda_env", type = "conda")

# Now run rMIDAS::train() and rMIDAS::complete()...

```

You can also download the [`rmidas-env.yml`](https://github.com/MIDASverse/rMIDAS/blob/master/rmidas-env.yml) conda environment file from this repository to set up all dependencies in a new conda environment. To do so, download the .yml file, navigate to the download directory in your console and run:
```{bash, eval=FALSE}
conda env create -f rmidas-env.yml
```

Then, prior to training a MIDAS model, make sure to load this environment in R:
```{r, eval=FALSE}
# First load the rMIDAS package
library(rMIDAS)
# Decline the automatic setup

set_python_env(x = "rmidas", type = "conda")
```

*Note*: **reticulate** only allows you to set a Python binary once per R session, so if you wish to switch to a different Python binary, or have already run `train()` or `convert()`, you will need to restart or terminate R prior to using `set_python_env()`.

## Vignettes (including simple example)

**rMIDAS** is packaged with three vignettes:

1. [`vignette("imputation_demo", "rMIDAS")`](https://github.com/MIDASverse/rMIDAS/blob/master/vignettes/imputation_demo.md) demonstrates the basic workflow and capacities of **rMIDAS**
2. [`vignette("custom_python_versions", "rMIDAS")`](https://github.com/MIDASverse/rMIDAS/blob/master/vignettes/custom_python_versions.md) provides detailed guidance on configuring Python binaries and environments, including some troubleshooting tips
3. [`vignette("use_server", "rMIDAS")`](https://github.com/MIDASverse/rMIDAS/blob/master/vignettes/use-server.md) provides guidance for running **rMIDAS** in headless mode

An additional example that showcases rMIDAS core functionalities can be found [here](https://github.com/MIDASverse/rMIDAS/blob/master/examples/rmidas_demo.md).


## Contributing to rMIDAS

Interested in contributing to **rMIDAS**? We are looking to hire a research assistant to work part-time (flexibly) to help us build out new features and integrate our software with existing machine learning pipelines. You would be paid the standard research assistant rate at the University of Oxford. To apply, please send your CV (or a summary of relevant skills/experience) to .

## Getting help

rMIDAS is still in development, and we may not have caught all bugs. If you come across any difficulties, or have any suggestions for improvements, please raise an issue [here](https://github.com/MIDASverse/MIDASpy/issues).

Owner

  • Name: MIDASverse
  • Login: MIDASverse
  • Kind: organization

MIDAS: A deep learning method for missing-data imputation

GitHub Events

Total
  • Issues event: 4
  • Watch event: 2
Last Year
  • Issues event: 4
  • Watch event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 149
  • Total Committers: 6
  • Avg Commits per committer: 24.833
  • Development Distribution Score (DDS): 0.738
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
tsrobinson t****n@p****k 39
ranjitlall 3****l 33
Tom Robinson t****r@t****e 22
Tom Robinson t****r@p****n 22
Tom Robinson t****4@g****m 20
edvinskis e****s@s****l 13
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 30
  • Total pull requests: 4
  • Average time to close issues: 4 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 12
  • Total pull request authors: 2
  • Average comments per issue: 1.6
  • Average comments per pull request: 0.5
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 0.25
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tsrobinson (13)
  • nick-youngblut (4)
  • aeggers (3)
  • go-bayes (1)
  • MichaelChirico (1)
  • hsmallbone (1)
  • MitjaCernko (1)
  • ihameed11 (1)
  • pdwaggoner (1)
  • itchyshin (1)
  • andreacaflisch (1)
  • melondonkey (1)
  • michaelbcerny (1)
Pull Request Authors
  • edvinskis (3)
  • tsrobinson (1)
Top Labels
Issue Labels
enhancement (14) documentation (8) fix pending (6) bug (2) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 315 last-month
  • Total docker downloads: 41,971
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 8
  • Total maintainers: 1
cran.r-project.org: rMIDAS

Multiple Imputation with Denoising Autoencoders

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 315 Last month
  • Docker Downloads: 41,971
Rankings
Stargazers count: 11.0%
Forks count: 11.3%
Average: 24.8%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Downloads: 36.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.6.0 depends
  • data.table * depends
  • mltools * depends
  • reticulate * depends
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/check-r-package v1 composite
  • r-lib/actions/setup-r v1 composite
  • r-lib/actions/setup-r-dependencies v1 composite