npdrderiv

Nonparametric Doubly Robust Inference on Dose-Response Curve and its Derivative

https://github.com/zhangyk8/npdrderiv

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary

Keywords

dose-response-function doubly-robust-inference kernel-smoothing
Last synced: 10 months ago · JSON representation

Repository

Nonparametric Doubly Robust Inference on Dose-Response Curve and its Derivative

Basic Info
  • Host: GitHub
  • Owner: zhangyk8
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 1.08 GB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
dose-response-function doubly-robust-inference kernel-smoothing
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

Nonparametric Doubly Robust Inference on Dose-Response Curve and its Derivative

This repository contains Python3 code for implementing the regression adjustment (RA), inverse probability weighting (IPW), and doubly robust (DR) estimators of dose-response curve and its derivative with and without the positivity condition.

Requirements

File Descriptions

Some high-level descriptions of our Python scripts are as follows:

  • npDoseResponseDerivDR.py (Key file): This file contains the implementations of the RA, IPW, and DR estimators of the derivative of a dose-response curve with and without the positivity condition.
  • npDoseResponseDR.py (Key file): This file contains the implementations of the RA, IPW, and DR estimators of the dose-response curve under the positivity condition.
  • rbf.py (Key auxiliary file): This file contains the implementations of common kernel functions.
  • utils1.py (Key auxiliary file): This file contains the utility functions for the main functions for implementing our proposed methods.
  • Case Study -- Job Corps program (Final).ipynb: This Jupyter Notebook contains detailed code for applying our proposed DR estimators to the Job Corps data, replicating the finite-difference method proposed by Colangelo and Lee (2020), and creating the comparative plot. (Reproducing Figure 4 in the arxiv version of our paper.)
  • JobCorpsData_Final.py: This file contains the code of applying our proposed DR estimators to the Job Corps data (for parallel slurm jobs).
  • Plotting for Simulation 1 Without Positivity (L=1 or L=5).ipynb: These two Jupyter Notebooks contain the code for plotting the simulation results when the data model violates the positivity condition. (Reproducing Figures 5 and 6 in the arxiv version of our paper.)
  • Plotting for Simulation 2 (L=1 or L=5).ipynb: These two Jupyter Notebooks contain the code for plotting the simulation results when the data model satisfies the positivity condition. (Reproducing Figures 2 and 3 in the arxiv version of our paper.)
  • Sim_Nopos1.py and SimNopos1inner.py: These two files contain the code for applying our (bias-corrected) IPW and DR estimators when the data model violates the positivity condition (for parallel slurm jobs).
  • Simulation2nonsepReplicate.py and Simulation2selfnorm.py: These two files contain the code for applying our proposed estimators and replicating the finite-difference method proposed by Colangelo and Lee (2020) under the data model that satisfies the positivity condition (for parallel slurm jobs).
  • SynthesizeNoposRes.py, SynthesizeRes2Replicate.py, and SynthesizeRes2selfnorm.py: These files are used to synthesize the outputs from the parallel slurm jobs.

1. Problem Setup

The dose-response curve is defined as , where is the potential outcome that would have been observed under treatment level . Accordingly, the derivative function of the dose-response curve is defined as .

2. Nonparametric Inference Under the Positivity Condition

Under some regularity conditions and the positivity condition (see Section 2 of our paper), the dose-response curve and its derivative can be identified as and , where is the conditional mean outcome (or regression) function of the outcome variable given the treatment and the covariate vector .

For nonparametric estimation of the dose-response curve with observed data , there are three major strategies:

  • Regression Adjustment (RA) Estimator:

where is a (consistent) estimator of the conditional mean outcome function .

  • Inverse Probability Weighting (IPW) Estimator:

where is a (consistent) estimator of the conditional density .

  • Doubly Robust (DR) Estimator:

where and are (consistent) estimators of and , respectively. The DR estimator is not only robust to the misspecifications of the conditional mean outcome and conditional density models but also asymptotically normal as

where is the bias term that shrinks quadratically to 0 with respect to is the asymptotic variance term that can be estimated by

More details can be found in Section 2.1 and Section D of our paper.

For nonparametric estimation of the derivative function , we also consider three similar strategies:

  • Regression Adjustment (RA) Estimator:

where is a (consistent) estimator of .

  • Inverse Probability Weighting (IPW) Estimator:

where is a kernel function with , is a (consistent) estimator of the conditional density .

  • Doubly Robust (DR) Estimator:

where and are (consistent) estimators of and , respectively. Again, our proposed DR estimator of is not only robust to the misspecifications of the conditional mean outcome and conditional density models but also asymptotically normal as

where is the bias term that shrinks quadratically to 0 with respect to is the asymptotic variance term that can be estimated by

More details can be found in Section 3 of our paper.

3. Nonparametric Inference Without the Positivity Condition

Both the dose-response curve and its derivative are unidentifiable in general when the positivity condition is violated; see Section 4.1 in our paper. To identify and estimate them, we assume an additive structure on the potential outcome so that

These formulas lead to the RA estimator of as:

The above IPW and DR estimators will give rise to inconsistent estimates of without the positivity even when the additive model holds true. We proposed our bias-corrected IPW and DR estimators of as:

where , and are (consistent) estimators of , the joint density , and the interior conditional density respectively. We also prove that this bias-corrected DR estimator is not only robust to the misspecifications of the conditional mean outcome and conditional density models but also asymptotically normal as

More details can be found in Section 5 of our paper.

4. Example Code

Estimation of Dose-Response Curve and its Derivative Under Positivity

```bash import numpy as np import scipy.stats import matplotlib.pyplot as plt from npDoseResponseDR import DRCurve from npDoseResponseDerivDR import DRDerivCurve, NeurNet from sklearn.neural_network import MLPRegressor

rho = 0.5 # correlation between adjacent Xs d = 20 # Dimension of the confounding variables n = 2000

Sigma = np.zeros((d,d)) + np.eye(d) for i in range(d): for j in range(i+1, d): if (j < i+2) or (j > i+d-2): Sigma[i,j] = rho Sigma[j,i] = rho sig = 1

np.random.seed(123)

Data generating process

Xsim = np.random.multivariatenormal(mean=np.zeros(d), cov=Sigma, size=n) nu = np.random.randn(n) eps = np.random.randn(n) theta = 1/(np.linspace(1, d, d)**2)

Tsim = scipy.stats.norm.cdf(3*np.dot(Xsim, theta)) + 3nu/4 - 1/2 Y_sim = 1.2Tsim + Tsim*2 + T_simXsim[:,0] + 1.2*np.dot(Xsim, theta) + eps*np.sqrt(0.5+ scipy.stats.norm.cdf(Xsim[:,0])) Xdat = np.columnstack([Tsim, Xsim]) tqry = np.linspace(-2, 2, 41)

Choice of the bandwidth parameter

h = 4np.std(T_sim)n**(-1/5)

RA estimator of m(t)

regmod = MLPRegressor(hiddenlayersizes=(10,), activation='relu', learningrate='adaptive', learningrateinit=0.1, randomstate=1, maxiter=200) mestra5 = DRCurve(Y=Ysim, X=Xdat, teval=tqry, est="RA", mu=regmod, L=5, h=None, kern="epanechnikov", printbw=False)

IPW estimator of m(t)

regrnn2 = MLPRegressor(hiddenlayersizes=(20,), activation='relu', learningrate='adaptive', learningrateinit=0.1, randomstate=1, maxiter=200) mestipw5 = DRCurve(Y=Ysim, X=Xdat, teval=tqry, est="IPW", mu=None, condTStype='kde', condTSmod=regrnn2, tau=0.001, L=5, h=h, kern="epanechnikov", hcond=None, selfnorm=True, printbw=True)

DR estimator of m(t)

mestdr5, sdestdr5 = DRCurve(Y=Ysim, X=Xdat, teval=tqry, est="DR", mu=regmod, condTStype='kde', condTSmod=regrnn2, tau=0.001, L=5, h=h, kern="epanechnikov", hcond=None, selfnorm=True, print_bw=True)

RA estimator of \theta(t)

thetara5 = DRDerivCurve(Y=Ysim, X=Xdat, teval=tqry, est="RA", betamod=NeurNet, niter=1000, lr=0.01, L=5, printbw=False)

IPW estimator of \theta(t)

thetaipw5 = DRDerivCurve(Y=Ysim, X=Xdat, teval=tqry, est="IPW", betamod=None, condTStype='kde', condTSmod=regrnn2, tau=0.001, L=5, h=h, kern="epanechnikov", hcond=None, selfnorm=True, printbw=True)

DR estimator of \theta(t)

thetadr5, thetasd5 = DRDerivCurve(Y=Ysim, X=Xdat, teval=tqry, est="DR", betamod=NeurNet, niter=1000, lr=0.01, condTStype='kde', condTSmod=regrnn2, tau=0.001, L=5, h=h, kern="epanechnikov", hcond=None, selfnorm=True, printbw=True)

Visualization of the estimators

plt.rcParams.update({'font.size': 16}) plt.figure(figsize=(12,5)) plt.subplot(1, 2, 1) plt.plot(tqry, 1.2*tqry + tqry**2, linewidth=4, color='red', label=r'True $m(t)$') plt.plot(tqry, mestra5, linewidth=2, color='darkorange', label=r'$\widehat{m}{\mathrm{RA}}(t)$') plt.plot(tqry, mestipw5, linewidth=2, alpha=0.8, color='darkgreen', label=r'$\widehat{m}{\mathrm{IPW}}(t)$') plt.plot(tqry, mestdr5, linewidth=2, alpha=0.8, color='blue', label=r'$\widehat{m}{\mathrm{DR}}(t)$') plt.fillbetween(tqry, mestdr5 - sdestdr5*scipy.stats.norm.ppf(0.975), mestdr5 + sdest_dr5*scipy.stats.norm.ppf(0.975), color='blue', alpha=.3, label='95% pointwise CIs') plt.legend(fontsize=13) plt.xlabel('Treatment value t') plt.ylabel(r'Dose-response curve $m(t)$ or $\widehat{m}(t)$')

plt.subplot(1, 2, 2) plt.plot(tqry, 1.2 + 2*tqry, linewidth=4, color='red', label=r'True $\theta(t)$') plt.plot(tqry, thetara5, linewidth=2, color='darkorange', label=r'$\widehat{\theta}{\mathrm{RA}}(t)$') plt.plot(tqry, thetaipw5, linewidth=2, alpha=0.8, color='darkgreen', label=r'$\widehat{\theta}{\mathrm{IPW}}(t)$') plt.plot(tqry, thetadr5, linewidth=2, alpha=0.8, color='blue', label=r'$\widehat{\theta}{\mathrm{DR}}(t)$') plt.fillbetween(tqry, thetadr5 - thetasd5*scipy.stats.norm.ppf(0.975), thetadr5 + theta_sd5*scipy.stats.norm.ppf(0.975), color='blue', alpha=.3, label='95% pointwise CIs') plt.legend(fontsize=13) plt.xlabel('Treatment value t') plt.ylabel(r'Derivative curve $\theta(t)$ or $\widehat{\theta}(t)$')

plt.tightlayout() plt.savefig('./Figures/estillust_pos.png') ```


Fig 1. Illustrative plots of RA, IPW, and DR estimators of a dose-response curve and its derivative under positivity.

Dose-Response Curve Derivative Estimation Without Positivity

```bash import numpy as np import scipy.stats import matplotlib.pyplot as plt from npDoseResponseDerivDR import NeurNet, RADRDerivBC, IPWDRDerivBC, DRDRDerivBC

n = 2000 np.random.seed(123)

Data generating process

Xsim = 2*np.random.rand(n) - 1 Tsim = np.sin(np.piX_sim) + np.random.rand(n)0.6 - 0.3 Ysim = Tsim2 + T_sim3 + 10*Xsim + np.random.normal(loc=0, scale=1, size=n) Xdat = np.concatenate([Tsim.reshape(-1,1), Xsim.reshape(-1,1)], axis=1)

Query points

t_qry = np.linspace(-0.8, 0.8, 41)

Choice of the bandwidth parameter

h = 2np.std(T_sim)n**(-1/5)

Bias-corrected RA estimator of \theta(t)

thetaCRA5 = RADRDerivBC(Y=Ysim, X=Xdat, teval=tqry, mu=NeurNet, L=5, niter=1000, lr=0.01, hbar=None, kernT_bar="gaussian")

Bias-corrected IPW estimator of \theta(t)

thetaCIPW5, condTS5 = IPWDRDerivBC(Y=Ysim, X=Xdat, teval=tqry, L=5, h=h, kern='epanechnikov', b=None, selfnorm=True, thresval=0.85)

Bias-corrected DR estimator of \theta(t)

thetaCDR5, thetaCsd5 = DRDRDerivBC(Y=Ysim, X=Xdat, teval=tqry, mu=NeurNet, L=5, h=h, kern='epanechnikov', niter=1000, lr=0.01, b=None, thresval=0.85, self_norm=False)

plt.rcParams.update({'font.size': 16}) plt.figure(figsize=(6,5)) plt.plot(tqry, 2*tqry + 3(t_qry2), linewidth=4, color='red', label=r'True $\theta(t)$') plt.plot(tqry, thetaCRA5, linewidth=2, alpha=0.8, color='darkorange', label=r'$\widehat{\theta}{\mathrm{C,RA}}(t)$') plt.plot(tqry, thetaCIPW5, linewidth=2, alpha=0.8, color='darkgreen', label=r'$\widehat{\theta}{\mathrm{C,IPW}}(t)$') plt.plot(tqry, thetaCDR5, linewidth=2, alpha=0.8, color='blue', label=r'$\widehat{\theta}{\mathrm{C,DR}}(t)$') plt.fillbetween(tqry, thetaCDR5 - thetaCsd5scipy.stats.norm.ppf(0.975), thetaCDR5 + thetaCsd5*scipy.stats.norm.ppf(0.975), color='blue', alpha=.3, label='95% pointwise CIs') plt.legend(fontsize=13) plt.xlabel('Treatment value t') plt.ylabel(r'Derivative curve $\theta(t)$ or $\widehat{\theta}C(t)$') plt.tightlayout() plt.savefig('./Figures/estillustnopos.png') ```


Fig 2. Illustrative plot of bias-corrected RA, IPW, and DR estimators of the derivative of a dose-response curve without positivity.

Additional References

  • K. Colangelo and Y.-Y. Lee (2020). Double debiased machine learning nonparametric inference with continuous treatments. arXiv preprint arXiv:2004.03036.

Owner

  • Name: Yikun Zhang
  • Login: zhangyk8
  • Kind: user
  • Location: Guangzhou, China / Seattle, USA
  • Company: University of Washington, Seattle

GitHub Events

Total
  • Watch event: 2
  • Push event: 4
  • Public event: 1
Last Year
  • Watch event: 2
  • Push event: 4
  • Public event: 1