ebTobit

Empirical Bayesian Estimation of Possibly Censored Gaussian Matrices

https://github.com/barbehenna/ebtobit

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Empirical Bayesian Estimation of Possibly Censored Gaussian Matrices

Basic Info
  • Host: GitHub
  • Owner: barbehenna
  • Language: R
  • Default Branch: main
  • Size: 110 KB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

Empirical Bayesian Estimation of Censored Gaussian (Tobit) Matrices

R-CMD-check CRAN status CRAN RStudio mirror downloads CRAN RStudio mirror downloads Lifecycle: stable <!-- badges: end -->

What is it?

An R package for denoising censored, Gaussian means with empirical Bayes $g$-modeling. The general model is as follows:

$$ \thetai \sim{iid} g \quad (\subseteq \mathbb{R}^p) $$

$$ X{ij} \mid \theta{ij} \sim{indep.} N(\theta{ij}, \sigma^2) $$

$$ L{ij} \leq X{ij} \leq R_{ij} $$

The data is represented with matrices:

$$ \theta = \begin{bmatrix} \theta{11} & \dots & \theta{1p} \ \theta{21} & \dots & \theta{2p} \ \vdots & \ddots & \vdots \ \theta{n1} & \dots & \theta{np} \ \end{bmatrix} \qquad X = \begin{bmatrix} X{11} & \dots & X{1p} \ X{21} & \dots & X{2p} \ \vdots & \ddots & \vdots \ X{n1} & \dots & X{np} \ \end{bmatrix} $$

$$ L = \begin{bmatrix} L{11} & \dots & L{1p} \ L{21} & \dots & L{2p} \ \vdots & \ddots & \vdots \ L{n1} & \dots & L{np} \ \end{bmatrix} \qquad R = \begin{bmatrix} R{11} & \dots & R{1p} \ R{21} & \dots & R{2p} \ \vdots & \ddots & \vdots \ R{n1} & \dots & R{np} \ \end{bmatrix} $$

The bounds $L{ij}$ and $R{ij}$ are assumed to be known. When $L{ij} = R{ij}$ there is a direct (noisy) measurement of $\theta{ij}$, if $L{ij} < R{ij}$ then there is a censored measurement of $\theta{ij}$. This structure is commonly referred to as partially interval censored data and it allows for any combination of observed measurements and left-, right-, and interval-censored measurements.

We use a Tobit likelihood for each measurement:

$$ P(L, R \mid \theta) = \begin{cases} \phi{\sigma} ( L - \theta ) & L = R \ \Phi{\sigma} ( R - \theta ) - \Phi_{\sigma} ( L - \theta ) & L < R \end{cases} $$

where the standard Gaussian likelihood is used when there is a direct Gaussian measurement (ie $L = X = R$) and a Gaussian probability is used when there is a censored Gaussian measurement (ie $L < R$).

What does it do?

This package provides an object ebTobit (Empirical Bayes model with Tobit likelihood) that estimates the prior, $g$ over a user-specified grid gr and then computes the posterior mean or $\ell1$ mediod as estimates for $\theta$. In one dimension, the $\ell1$ mediod is the median. By default gr is set using the exemplar method so the grid is the maximum likelihood estimate for each $\theta{ij}$. When the censoring interval is finite, the maximum likelihood estimate for each $\theta{ij}$ is $0.5 ( L{ij} + R{ij} )$

Suppose $p = 1$ and there is no censoring, then the basic utility is:

```r library(ebTobit)

create noisy measurements

n <- 100 t <- sample(c(0, 5), size = n, replace = TRUE, prob = c(0.8, 0.2)) x <- t + stats::rnorm(n)

fit g-model with default prior grid

res1 <- ebTobit(x)

measure performance of estimated posterior mean

mean((t - fitted(res1))^2) ```

Next we can look at a more complicated example with $p = 10$:

```r library(ebTobit)

create noisy measurements (low rank structure)

n <- 1000; p <- 10 t <- matrix(stats::rgamma(np, shape = 5, rate = 1), n, p) x <- t + matrix(stats::rnorm(np), n, p)

assume we can't accurately measure x < 1 but we know theta > 0

L <- ifelse(x < 1, 0, x) R <- ifelse(x < 1, 1, x)

fit g-model with default prior grid

res2 <- ebTobit(x) res3 <- ebTobit(L, R)

oberve that the censoring affects the fitted range

range(fitted(res2)) range(fitted(res3))

fit censored data with a different grid (large and random not MLE)

res4 <- ebTobit( L = L, R = R, gr = sapply(1:p, function(j) stats::runif(1e+4, min = min(L[,j]), max = max(R[,j]))), algorithm = "EM" )

compute posterior mean and L1mediod given new data

we can also predict based on partially interval-censored observations

y <- matrix(stats::rexp(5*p, rate = 0.5), 5, p) predict(res4, y) # posterior mean predict(res4, y, method = "L1mediod") # posterior L1-mediod ```

How do install it?

This package is available on CRAN. It can also be installed directly from GitHub:

r remotes::install_github("barbehenna/ebTobit")

Data

This R package also includes a real bile acid data.frame taken directly from Lei et al. (2018) (https://doi.org/10.1096/fj.201700055R) via https://github.com/WandeRum/GSimp (https://doi.org/10.1371/journal.pcbi.1005973). The bile acid data contains measurements of 34 bile acids for 198 patients; no missing values are present in the data. In our modeling, we assume the bile acid values are independent log-normal measurements.

r data(BileAcid, package = "ebTobit") # attach the bile acid data

Who wrote it?

Alton Barbehenn and Sihai Dave Zhao

What license?

GPL (>= 3)

Owner

  • Name: Alton Barbehenn
  • Login: barbehenna
  • Kind: user
  • Company: Department of Statistics, University of Illinois Urbana-Champaign

Citation (CITATION.cff)

cff-version: 1.2.0
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Alton
    family-names: Barbehenn
    orcid: 'https://orcid.org/0009-0000-3364-7204'
  - given-names: Sihai Dave
    family-names: Zhao
title: 'ebTobit: Empirical Bayesian Tobit Matrix Estimation'
version: 1.0.1
doi: 10.48550/arXiv.2306.07239
date-released: 2023-06-15
repository-code: 'https://github.com/barbehenna/ebTobit'
license: GPL-3.0

GitHub Events

Total
Last Year

Packages

  • Total packages: 1
  • Total downloads:
    • cran 201 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: ebTobit

Empirical Bayesian Tobit Matrix Estimation

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 201 Last month
Rankings
Forks count: 28.2%
Dependent packages count: 28.3%
Stargazers count: 31.3%
Dependent repos count: 36.9%
Average: 37.5%
Downloads: 62.8%
Maintainers (1)
Last synced: 10 months ago