tcensReg

tcensReg is a package written to obtain maximum likelihood estimates from a truncated normal distribution with censoring.

https://github.com/williazo/tcensreg

Keywords

censored-data censreg contrast-sensitivity limited-dependent-variable maximum-likelihood-estimates mle normal-estimation truncation truncreg visual-quality

Last synced: 6 months ago · JSON representation

Repository

tcensReg is a package written to obtain maximum likelihood estimates from a truncated normal distribution with censoring.

Basic Info

Host: GitHub
Owner: williazo
License: other
Language: R
Default Branch: master
Homepage:
Size: 1.65 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Topics

censored-data censreg contrast-sensitivity limited-dependent-variable maximum-likelihood-estimates mle normal-estimation truncation truncreg visual-quality

Created over 7 years ago · Last pushed about 5 years ago

Metadata Files

Readme License

README.md

Maximum Likelihood Estimation of a Truncated Normal Distribution with

Censored Data

The goal of this package is to estimate parameters from a linear model when the data comes from a truncated normal distribution with censoring. Maximum likelihood values are returned. There are multiple method available for optimization with the default set as conjugate gradient. This package is also able to return maximum likelihood estimates for truncated only or censored only data similar to truncreg and censReg packages.

For details on the methods, simulation performance, and an application to visual quality testing from clinical trials for intraocular lenses see the corresponding manuscript with open source access available at doi:10.1186/s12874-020-01032-9.

Installation

You can install tcensReg from CRAN for the stable release or install the GitHub version for the active development version:

``` r

stable CRAN version

install.packages("tcensReg")

or ----

active devel. GitHub version

install.packages("devtools") devtools::install_github("williazo/tcensReg") ```

Example 1: Single Population

Some common examples where this type of problem may arise is when there is a natural truncation imposed by the structure of the data. For instance several applications have an implied zero truncation such as product lifetimes, age, or detection thresholds. To show how to implement the functions within the package, I will demonstrate a simple simulation example.

Assume that we have observations from an underlying truncated normal distribution. In our case we will assume a zero-truncated model by setting a=0. We generate this truncated normal data below and refer to it as y_star.

``` r library(tcensReg) mu <- 0.5 sigma <- 0.5 a <- 0

set.seed(032420)

generate random values from the truncated normal distribution using tcensReg function

y_star <- rtnorm(n=1000, mu=mu, sd=sigma, a=a)

note that the lowerbound will always be non-negative

round(range(y_star), 3) ```

## [1] 0.004 2.217

Next, we can imagine a scenario where we have an imprecise measurement of y_star leading to censoring. In our case we assume that values below a limit of detection, nu, are censored. This creates a random variable y.

In the example below we set our limit of detection as nu=0.25.

``` r nu <- 0.25 y <- ifelse(ystar <= nu, nu, ystar)

calculating the number of censored observations

sum(y == nu)/length(y) ```

## [1] 0.199

``` r

collecting the uncensored and censored data together

dt <- data.frame(y_star, y) ```

We can observe the histogram and density plot for the uncensored data, which shows the zero-truncation.

We can then compare this to the censored observations below

We can then estimate the mean, mu, and standard deviation, sigma, using y with the tcensReg package as shown below.

``` r

loading the package

library(tcensReg)
tmod <- tcensReg(y ~ 1, data=dt, a=0, v=0.25) summary(tmod) ```

## 
## Call:
## tcensReg(formula = y ~ 1, a = 0, v = 0.25, data = dt)
## 
## Assumed Distribution:
## Truncated Normal with Censoring
## 
## Count of Observations:
##    Total Observations Censored Observations 
##                  1000                   199 
## 
## 
## Coefficients:
##             Estimate Std. Error t value
## (Intercept)   0.4562     0.0359 12.7024
## sigma         0.5405     0.0142 38.0508
## 
## Log Likelihood: -710.7033
## Information Criterion: AIC=1425.4066 BIC=1435.2221
## Optimization Method: CG
## Psuedo R2: 0 method - mckelvey_zavoina

By default the coefficients are returned along with log likelihood and other fit criterion statistics. Note that the Pseudo R2 in the case of an intercept only model is exactly equal to zero.

r names(t_mod)

##  [1] "theta"             "convergence"       "initial_ll"       
##  [4] "final_ll"          "var_cov"           "method"           
##  [7] "info_criteria"     "model_matrix"      "call"             
## [10] "n_count"           "latent_assumption"

Note that the this object contains parameter estimates theta, convergence criterion code, initial/final log-likelihood values, variance-covariance matrix, method of optimization, information criterion, design matrix used from the model, formula call, count of total/censored observations, and latent distributional assumption.

Comparing the values to the truth we see that the estimates are unbiased.

``` r

tcensReg model

output <- tcensReg(y ~ 1, data=dt, a=a, v=nu)

extracting the point estimates

tcensReg_est <- coef(output) #this returns sigma rather than log sigma

OLS model

lmoutput <- lm(y ~ 1, data=dt) lmest <- c(coef(lmoutput), summary(lmoutput)$sigma)

censored only model, i.e., Tobit model

cens_output <- tcensReg(y ~ 1, data=dt, v=nu) ```

## Warning: `a` is not specified indicating no truncation

``` r censest <- coef(censoutput)

resultsdf <- data.frame(rbind(c(mu, sigma), t(tcensRegest), lmest, t(censest))) names(resultsdf) <- c("mu", "sigma") row.names(resultsdf) <- c("Truth", "tcensReg", "Normal MLE", "Tobit") resultsdf$mubias <- resultsdf$mu - mu resultsdf$sigmabias <- resultsdf$sigma - sigma

knitr::kable(results_df, format="markdown", digits=4) ```

| | mu | sigma | mu_bias | sigma_bias | | :--------- | -----: | -----: | -------: | ----------: | | Truth | 0.5000 | 0.5000 | 0.0000 | 0.0000 | | tcensReg | 0.4562 | 0.5405 | -0.0438 | 0.0405 | | Normal MLE | 0.6685 | 0.3841 | 0.1685 | -0.1159 | | Tobit | 0.6166 | 0.4595 | 0.1166 | -0.0405 |

Other methods result in significant bias for both mu and sigma.

Example 2: Two Population Model with Separate Variances

As an extension for the single population model above, we can imagine a two independent truncated normal random variables that have common censoring and truncation values but different standard deviations.

We can simulate the underlying truncated normal distributions Y1_star and Y2_star similar to (Y) above except now we allow them to have separate mean and variances.

For this example we let mu_1=0.5, mu_2=1, sigma_1=0.25, sigma_2=2, and a=0.

``` r mu1 <- 0.5 mu2 <- 1 sigma1 <- 0.25 sigma2 <- 2 a <- 0

set.seed(032420) y1star <- rtnorm(1000, mu = mu1, sd = sigma1, a = a) y2star <- rtnorm(1000, mu = mu2, sd = sigma2, a = a) df <- data.frame(ystar = c(y1star, y2star), group = c(rep("Population 1", length(y1star)), rep("Population 2", length(y2_star)))) ```

Plotting each of these uncensored population densities, we can see the difference in shape based on the underlying parameter selection.

Then censoring each observation at nu, we are left with Y1 and Y2. Again, we let nu=0.25.

r nu <- 0.25 df$y <- ifelse(df$y_star<=nu, nu, df$y_star)

We then can fit our model with separate variances for each group using the command tcensReg_sepvar as shown below.

r mod_result <- tcensReg_sepvar(y ~ group, a=a, v=nu, group_var="group", method="maxLik", data=df) mod_result

## $theta
##       (Intercept) groupPopulation 2        log_sigma1        log_sigma2 
##         0.4933551         0.6659116        -1.3281990         0.6546893 
## 
## $convergence
## [1] 2
## 
## $initial_ll
## [1] -2013.046
## 
## $final_ll
## [1] -1933.59
## 
## $var_cov
##                     (Intercept) groupPopulation 2    log_sigma1    log_sigma2
## (Intercept)        9.384525e-05     -9.384525e-05 -9.865163e-05 -1.179984e-14
## groupPopulation 2 -9.384525e-05      2.475291e-02  9.865163e-05 -6.214006e-03
## log_sigma1        -9.865163e-05      9.865163e-05  8.721167e-04  1.240418e-14
## log_sigma2        -1.179984e-14     -6.214006e-03  1.240418e-14  2.211993e-03
## 
## $method
## [1] "maxLik"

``` r sepvarest <- modresult$theta mu1est <- sepvarest[1] mu2est <- sum(sepvarest[1:2]) sigma1est <- exp(sepvarest[3]) sigma2est <- exp(sepvarest[4])

resultsdf <- data.frame(rbind(c(mu1, mu2, sigma1, sigma2), c(mu1est, mu2est, sigma1est, sigma2est))) names(resultsdf) <- c("mu1", "mu2", "sigma1", "sigma2") row.names(resultsdf) <- c("Truth", "tcensReg") resultsdf$mu1pctbias <- paste0(round(((resultsdf$mu1 - mu1)/mu1)100, 2), "%") resultsdf$mu2pctbias <- paste0(round(((resultsdf$mu2 - mu2)/mu_2)100, 2), "%") resultsdf$sigma1pctbias <- paste0(round(((resultsdf$sigma1 - sigma1)/sigma1)*100, 2), "%") resultsdf$sigma2pctbias <- paste0(round(((resultsdf$sigma2 - sigma2)/sigma2)*100, 2), "%")

knitr::kable(results_df, format="markdown", digits=4) ```

| | mu_1 | mu_2 | sigma_1 | sigma_2 | mu1_pct_bias | mu2_pct_bias | sigma1_pct_bias | sigma2_pct_bias | | :------- | -----: | -----: | -------: | -------: | :------------- | :------------- | :---------------- | :---------------- | | Truth | 0.5000 | 1.0000 | 0.250 | 2.0000 | 0% | 0% | 0% | 0% | | tcensReg | 0.4934 | 1.1593 | 0.265 | 1.9245 | -1.33% | 15.93% | 5.98% | -3.77% |

Performance Comparison: Censored-Only and Truncated-Only

Note also that the tcensReg can also estimate parameters in the censored-only or truncated-only cases. We show below that by using analytic values in the tcensReg implementation that our method is faster then the alternative estimation procedures while providing better variance estimates. With a small set of covariates and p<<n we can use the Newton-Raphson method of optimization, which is computationally fast with few covariates.

``` r library(microbenchmark)

testing the censored-only regression

library(censReg) cens <- microbenchmark(tcensRegmethod = tcensReg(y ~ 1, data=dt, v=nu, method="Newton"), censRegmethod = censReg(y ~ 1, left=nu, data=dt)) knitr::kable(summary(cens), format="markdown", digits=4) ```

| expr | min | lq | mean | median | uq | max | neval | cld | | :--------------- | ------: | ------: | ------: | ------: | ------: | -------: | ----: | :-- | | tcensReg_method | 4.8881 | 5.0943 | 5.9877 | 5.2487 | 5.7981 | 14.1300 | 100 | a | | censReg_method | 13.7873 | 14.5127 | 19.2472 | 18.2919 | 21.4700 | 103.8775 | 100 | b |

``` r

point estimates are equivalent

tcensRegest <- as.numeric(tcensReg(y ~ 1, data=dt, v=nu, method="Newton")$theta) censRegest <- as.numeric(coef(censReg(y ~ 1, left=nu, data=dt))) all.equal(tcensRegest, censRegest) ```

## [1] TRUE

``` r

testing the truncated-only regression

library(truncreg) trunc <- microbenchmark( tcensRegmethod = tcensReg(ystar ~ 1, data=dt, a=a, method="Newton"), truncregmethod = truncreg(ystar ~ 1, point=a, data=dt)) knitr::kable(summary(trunc), format="markdown", digits=4) ```

| expr | min | lq | mean | median | uq | max | neval | cld | | :--------------- | ------: | ------: | ------: | ------: | ------: | ------: | ----: | :-- | | tcensReg_method | 9.4831 | 9.6810 | 11.3402 | 9.9553 | 12.1703 | 19.8827 | 100 | a | | truncreg_method | 16.2908 | 16.7846 | 19.4056 | 17.5601 | 21.6932 | 35.6788 | 100 | b |

``` r tcensRegest <- as.numeric(tcensReg(ystar ~ 1, data=dt, a=a, method="Newton")$theta)

note truncreg returns sigma not log_sigma so we need to exponentiate our value

tcensRegest[2] <- exp(tcensRegest[2]) truncregest <- as.numeric(coef(truncreg(ystar ~ 1, point=a, data=dt))) all.equal(tcensRegest, truncregest) ```

## [1] "Mean relative difference: 0.0003643991"

In the comparisons above we are using an intercept only model, but in general we expect that interest lies in understanding how a set of covariates effect the mean response. So to test the sensitivity and speed as the number of covariates approaches n we can generate independent random variables X and fit the regression model of Y or Y_star.

We can compare the censored-only and truncated-only performance with 100 predictors, i.e. p=20. To illustrate some of the other available optimization methods we will set method to BFGS, which is a quasi-Newton optimization method.

``` r

number of predictors

p <- 20 X <- NULL for(i in seqlen(p)){ Xi <- rnorm(n = length(y)) X <- cbind(X, Xi) } colnames(X) <- paste0("var", seq_len(p)) dt <- data.frame(y, X)

testing the censored-only regression with 100 covariates

cens <- microbenchmark(tcensRegmethod = tcensReg(y ~ ., data=dt, v=nu, method="BFGS"), censRegmethod = censReg(y ~ ., left=nu, data=dt)) knitr::kable(summary(cens), format="markdown", digits=4) ```

| expr | min | lq | mean | median | uq | max | neval | cld | | :--------------- | -------: | -------: | -------: | -------: | -------: | -------: | ----: | :-- | | tcensReg_method | 218.2029 | 223.9493 | 234.8069 | 226.8067 | 230.3986 | 366.2588 | 100 | a | | censReg_method | 345.9823 | 362.4582 | 386.9384 | 368.1871 | 373.6284 | 530.3745 | 100 | b |

``` r

point estimates are equivalent

tcensRegest <- as.numeric(tcensReg(y ~ ., data=dt, v=nu, method="BFGS")$theta) censRegest <- as.numeric(coef(censReg(y ~ ., left=nu, data=dt))) all.equal(tcensRegest, censRegest) ```

## [1] "Mean relative difference: 8.089508e-05"

``` r

testing the truncated-only regression with 100 covariates

trunc <- microbenchmark(tcensRegmethod = tcensReg(ystar ~ ., data=dt, a=a, method="BFGS"), truncregmethod = truncreg(ystar ~ ., point=a, data=dt)) knitr::kable(summary(trunc), format="markdown", digits=4) ```

| expr | min | lq | mean | median | uq | max | neval | cld | | :--------------- | -------: | -------: | -------: | -------: | -------: | -------: | ----: | :-- | | tcensReg_method | 202.3384 | 207.2465 | 215.1095 | 210.5763 | 213.9136 | 363.9193 | 100 | a | | truncreg_method | 461.7276 | 470.2840 | 490.9210 | 475.7646 | 480.7120 | 637.6223 | 100 | b |

``` r tcensRegest <- as.numeric(tcensReg(ystar ~ ., data=dt, a=a, method="BFGS")$theta)

note truncreg returns sigma not log_sigma so we need to exponentiate the last parameter value

tcensRegest[length(tcensRegest)] <- exp(tcensRegest[length(tcensRegest)]) truncregest <- as.numeric(coef(truncreg(ystar ~ ., point=a, data=dt))) all.equal(tcensRegest, truncregest) ```

## [1] "Mean relative difference: 2.993568e-05"

Owner

Name: Justin Williams
Login: williazo
Kind: user
Location: Los Angeles, CA
Company: Los Angeles Dodgers

Website: https://williazo.github.io
Repositories: 4
Profile: https://github.com/williazo

Senior Quantitative Analyst

GitHub Events

Total

Last Year

Committers

Last synced: over 2 years ago

All Time

Total Commits: 162
Total Committers: 1
Avg Commits per committer: 162.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Justin Williams	w**o@u**u	162

Committer Domains (Top 20 + Academic)

ucla.edu: 1

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4

cran.r-project.org: tcensReg

MLE of a Truncated Normal Distribution with Censored Data

Homepage: https://github.com/williazo/tcensReg
Documentation: http://cran.r-project.org/web/packages/tcensReg/tcensReg.pdf
License: MIT + file LICENSE
Status: removed
Latest release: 0.1.7
published over 5 years ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 0

Rankings

Stargazers count: 28.5%

Forks count: 28.8%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 42.5%

Downloads: 89.7%

Last synced: over 2 years ago

tcensReg

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Censored Data

Installation

stable CRAN version

or ----

active devel. GitHub version

Example 1: Single Population

generate random values from the truncated normal distribution using tcensReg function

note that the lowerbound will always be non-negative

calculating the number of censored observations

collecting the uncensored and censored data together

loading the package

tcensReg model

extracting the point estimates

OLS model

censored only model, i.e., Tobit model

Example 2: Two Population Model with Separate Variances

Performance Comparison: Censored-Only and Truncated-Only

testing the censored-only regression

point estimates are equivalent

testing the truncated-only regression

note truncreg returns sigma not log_sigma so we need to exponentiate our value

number of predictors

testing the censored-only regression with 100 covariates

point estimates are equivalent

testing the truncated-only regression with 100 covariates

note truncreg returns sigma not log_sigma so we need to exponentiate the last parameter value

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Packages

cran.r-project.org: tcensReg

Rankings

Dependencies