Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: jhatamyar
- Language: R
- Default Branch: main
- Size: 80.1 KB
Statistics
- Stars: 13
- Watchers: 2
- Forks: 8
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
MLDID
The MLDID package computes average and conditional average treatment effects on the treated (ATT and CATT), allowing for them to be used to study detailed drivers of treatment effect heterogeneity, see:
Hatamyar, J., Kreif, N., Rocha, R., Huber, M. Machine Learning for Staggered Difference-in-Differences and Dynamic Treatment Effect Heterogeneity (2023) Arxiv
Installation
You can install the package using
```
install.packages("devtools")
devtools::install_github("jhatamyar/MLDID") ```
Basics
This code demonstrates the basic functionality of the package using the data on minimum wage and county-level employment from Callaway & Sant'Anna (2021). Note that it is used here for demonstrative purposes only. Vignette for more detailed usage forthcoming.
``` R library(MLDID) library(did) ## to get the data data(mpdta) data <- mpdta
There is only one covariate in the data so generate a noisy extra, as MLDID requires more than one covariate
data$noise <- rnorm(nrow(mpdta)) ```
The Group-Time ATT and CATT estimator
The function MLDID() performs the main work of the package, implementing Algorithm 1 in the paper until the aggregation step. Functionality to estimate the nuisance functions using either rlearner or causal forest has been added, with the option to use the Superlearner to estimate delta, and the tune_penalty parameter can be set to TRUE to implement cross-validated parameter tuning for causal forest and the rlasso.
R
att_gt.mp <- MLDID(outcome = 'lemp',
group = 'first.treat',
time = 'year',
id_name = 'countyreal',
data,
xformla = ~lpop + noise,
tune_penalty = F
#nu_model = "cf",
#sigma_model = "cf",
#delta_model = "SL",
)
The att_gt.mp object holds the estimated ATT and CATT for each group-time, as well as other information needed in post-processing steps.
Aggregating the estimates to Dynamic (time-to-event) time
The functions dynamic_att() and dynamic_cates() perform the aggregation steps in Equation 20 and 21 of the paper. The dynamic_cates() function can also aggregate the estimated robust scores.
```R
aggregate the ATTs
ATT.dynamic.MP <- dynamicattgt(attgt.mp)
print the dynamic estimates and the DRDID as in Callaway & Sant'Anna (2021) version
ATT.dynamic.MP[["dynamic.att.e"]] ATT.dynamic.MP[["dynamic.att.e.csa"]]
aggregate the CATTs and scores
cates.dynamic.MP <- dynamiccates(attgt.mp, type = "cates") scores.dynamic.MP <- dynamiccates(attgt.mp, type = "scores") ```
Heterogeneity Analysis
The functions BLP_eventtimes(), CLAN_glhtest() and CLAN_ttest() allow for inference on the estimated CATTs and scores. The dynamic CATTs/scores must first be appended to the original data according to event-time using the function het_prep(). The function BLP_summary() provides a readable output of the coefficients for each time period and their standard errors.
```R
prepare the data for analysis. Note depending on the size of your data this may also be slow
het.data.MP <- hetprep(attgt.mp, cates.dynamic.MP)
create a list of variables to test for heterogeneity, must be string
affected.mp <- c("lpop", "noise")
formulate them for the input - note that here you could also just use "lpop + noise" as the argument for the functions instead of "affected_str.mp"
affected_str.mp <- paste(affected.mp, collapse = " + ")
Run the BLP regressions: if error, reduce nperiods.
BLP.bye.MP <- BLPeventtimes(data = het.data.MP, nperiods = 3, rhsformula = affected_str.mp)
summarize the BLP output
BLPsummary(BLP.bye.MP, affected.mp)
you can also check the summary for regression output at each event-time e using BLP.bye.MP[[e+1]][["coefficients"]]:
BLP.bye.MP[[1]][["coefficients"]] ## will be output for e=0 BLP.bye.MP[[2]][["coefficients"]] ## output for e=1
Run the CLANs two ways, first using the glh test as in Chernozhukov et al (2018), then a simple ttest of means of the most/least affected groups
CLAN.glh.cates.MP <- CLANglhtest(het.data.MP, affected=affected.mp) CLAN.ttest.cates.MP <- CLANttest(het.data.MP, affected=affected.mp)
note there is no post-processing function for CLANS as of this update, but will be forthcoming
can check the output by printing the object. Each matrix represents an event time.
CLAN.glh.cates.MP
You can also visualize the BLP coefficients - we recommend interpreting with caution, as ideally the lpop variable should be discretized:
plotBLP(BLP.bye.MP, affected.mp) ```
Troubleshooting
If you experience issues with this package: - Ensure that there are no missing values (NA) - Ensure that you are using at least two covariates in the model formula, as the ML methods may require two dimensions - Check for small number of observations - if very few per group-time, methods may not converge - Ensure that you are using a balanced panel (work is ongoing to allow for unbalanced panels to be used) - Check whether setting t_func = F resolves the issue - Please create a github "issue" if problems persist!
References
Callaway, B. and P. H. Sant’Anna (2021). Difference-in-differences with multiple time periods. In: Journal of Econometrics 225.2, pp. 200–230.
Chernozhukov, V., M. Demirer, E. Duflo, and I. Fernandez-Val (2018). Generic machine learning inference on heterogeneous treatment effects in randomized experiments, with an application to immunization in India. Tech. rep. National Bureau of Economic Research.
Wager, S. and S. Athey (2018). Estimation and inference of heterogeneous treatment effects using random forests. In: Journal of the American Statistical Association 113.523, pp. 1228–1242.
This work was generously funded by the UK Medical Research Council (Grant #: MR/T04487X/1)
Owner
- Name: Julia Hatamyar
- Login: jhatamyar
- Kind: user
- Company: University of York
- Repositories: 1
- Profile: https://github.com/jhatamyar
Research Fellow at University of York, Centre for Health Economics - causal ML
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Hatamyar
given-names: Julia
orcid: https://https://orcid.org/0000-0003-4145-1265
title: "MLDID: Machine Learning for Staggered Difference-in-Differences"
version: 1.1.0
date-released: 2023-12-10
GitHub Events
Total
- Issues event: 1
- Watch event: 7
- Issue comment event: 2
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 7
- Issue comment event: 2
- Fork event: 1
Dependencies
- DRDID * imports
- Matrix * imports
- SuperLearner * imports
- dplyr * imports
- ggplot2 * imports
- glmnet * imports
- grf * imports
- magrittr * imports
- matrixStats * imports
- stats * imports