Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.2%) to scientific vocabulary
Keywords
Repository
R enviroment - fast imputations :dragon:
Basic Info
- Host: GitHub
- Owner: Polkas
- Language: R
- Default Branch: main
- Homepage: https://polkas.github.io/miceFast/
- Size: 11.8 MB
Statistics
- Stars: 20
- Watchers: 2
- Forks: 2
- Open Issues: 2
- Releases: 2
Topics
Metadata Files
README.md
miceFast 
Author: Maciej Nasinski
Check the miceFast website for more details
Overview
miceFast provides fast methods for imputing missing data, leveraging an object-oriented programming paradigm and optimized linear algebra routines.
The package includes convenient helper functions compatible with data.table, dplyr, and other popular R packages.
Major speed improvements occur when:
- Using a grouping variable, where the data is automatically sorted by group, significantly reducing computation time.
- Performing multiple imputations, by evaluating the underlying quantitative model only once for multiple draws.
- Running Predictive Mean Matching (PMM), thanks to presorting and binary search.
For performance details, see performance_validity.R in the extdata folder.
It is recommended to read the Advanced Usage Vignette.
Installation
You can install miceFast from CRAN:
r
install.packages("miceFast")
Or install the development version from GitHub:
```r
install.packages("devtools")
devtools::install_github("polkas/miceFast") ```
Quick Example
Below is a short demonstration. See the vignette for advanced usage and best practices.
```r library(miceFast)
set.seed(1234) data(air_miss)
Visualize the NA structure
upsetNA(airmiss, 6)
Simple and naive fill
imputeddata <- naivefillNA(airmiss)
Compare with other packages:
Hmisc
library(Hmisc) data.frame(Map(function(x) Hmisc::impute(x, "random"), air_miss))
mice
library(mice) mice::complete(mice::mice(air_miss, printFlag = FALSE)) ```
Loop example
Multiple imputations are performed in a loop where a continuous variable is imputed using a Bayesian linear model (lm_bayes) that incorporates relevant predictors and weights for robust estimation. Simultaneously, a categorical variable is imputed using linear discriminant analysis (LDA) augmented with a randomly generated ridge penalty.
```r library(dplyr)
Define a function that performs the imputation on the dataset
imputedata <- function(data) { data %>% mutate( # Impute the continuous variable using lmbayes SolarRimp = fillNA( x = ., model = "lmbayes", posity = "Solar.R", positx = c("Wind", "Temp", "Intercept"), w = weights # assuming 'weights' is a column in data ), # Impute the categorical variable using lda with a random ridge parameter Ozonechacimp = fillNA( x = ., model = "lda", posity = "Ozonechac", positx = c("Wind", "Temp"), ridge = runif(1, 0, 50) ) ) }
Set seed for reproducibility
set.seed(123456)
Run the imputation process 3 times using replicate()
This returns a list of imputed datasets.
res <- replicate(n = 3, expr = imputedata(airmiss), simplify = FALSE)
Check results: Calculate the mean of the imputed Solar.R values in each dataset
meansimputed <- lapply(res, function(x) mean(x$SolarRimp, na.rm = TRUE)) print(meansimputed)
Check results: Tabulate the imputed categorical variable for each dataset
tablesimputed <- lapply(res, function(x) table(x$Ozonechacimp)) print(tablesimputed) ```
Key Features
- Object-Oriented Interface via
miceFastobjects (Rcpp modules). - Convenient Helpers:
fill_NA(): Single imputation (lda,lm_pred,lm_bayes,lm_noise).fill_NA_N(): Multiple imputations (pmm,lm_bayes,lm_noise).VIF(): Variance Inflation Factor calculations.naive_fill_NA(): Automatic naive imputations.compare_imp(): Compare original vs. imputed values.upset_NA(): Visualize NA structure using UpSetR.
Quick Reference Table:
| Function | Description |
|-----------------|-----------------------------------------------------------------------------|
| new(miceFast) | Creates an OOP instance with numerous imputation methods (see the vignette). |
| fill_NA() | Single imputation: lda, lm_pred, lm_bayes, lm_noise. |
| fill_NA_N() | Multiple imputations (N repeats): pmm, lm_bayes, lm_noise. |
| VIF() | Computes Variance Inflation Factors. |
| naive_fill_NA() | Performs automatic, naive imputations. |
| compare_imp() | Compares imputations vs. original data. |
| upset_NA() | Visualizes NA structure using an UpSet plot. |
Performance Highlights
Benchmark testing (on R 4.4.3, macOS M3 Pro, optimized BLAS and LAPACK) shows miceFast can significantly reduce computation time, especially in these scenarios:
- Linear Discriminant Analysis (LDA): ~5x faster.
- Grouping Variable Imputations: ~10x faster (and can exceed 100x in some edge cases).
- Multiple Imputations: ~
x * (number of multiple imputations)faster, since the model is computed only once. - Variance Inflation Factors (VIF): ~5x faster, because we only compute the inverse of X'X.
- Predictive Mean Matching (PMM): ~3x faster, thanks to presorting and binary search.

For performance details, see performance_validity.R in the extdata folder.
Owner
- Name: Maciej Nasinski
- Login: Polkas
- Kind: user
- Location: Warsaw Poland
- Company: @insightsengineering
- Repositories: 5
- Profile: https://github.com/Polkas
Maciej Nasinski - Data Scientist
GitHub Events
Total
- Issues event: 1
- Watch event: 4
- Delete event: 2
- Push event: 52
- Pull request event: 6
- Create event: 2
Last Year
- Issues event: 1
- Watch event: 4
- Delete event: 2
- Push event: 52
- Pull request event: 6
- Create event: 2
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Maciej Nasinski | n****j@g****m | 104 |
| ol-oxy | o****a@g****m | 1 |
| Maciej Nasinski | m****i@a****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 17
- Total pull requests: 13
- Average time to close issues: 5 months
- Average time to close pull requests: about 1 hour
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.59
- Average comments per pull request: 0.15
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Polkas (15)
- sebastian-fox (2)
Pull Request Authors
- Polkas (12)
- ol-oxy (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 404 last-month
- Total docker downloads: 21,154
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 12
- Total maintainers: 1
cran.r-project.org: miceFast
Fast Imputations Using 'Rcpp' and 'Armadillo'
- Homepage: https://github.com/Polkas/miceFast
- Documentation: http://cran.r-project.org/web/packages/miceFast/miceFast.pdf
- License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
-
Latest release: 0.8.5
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.6.0 depends
- Rcpp >= 0.12.12 imports
- data.table * imports
- methods * imports
- UpSetR * suggests
- dplyr * suggests
- ggplot2 * suggests
- knitr * suggests
- magrittr * suggests
- mice * suggests
- pacman * suggests
- rmarkdown * suggests
- testthat * suggests
- actions/checkout v2 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- actions/cache v1 composite
- actions/checkout v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite