ptetools

generic tools for causal inference with panel data

https://github.com/bcallaway11/ptetools

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

generic tools for causal inference with panel data

Basic Info
Statistics
  • Stars: 11
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Citation

README.Rmd

---
output: github_document
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  warning = FALSE,
  message = FALSE
)
```

```{r, echo=FALSE, results="hide", warning=FALSE, message=FALSE}
library(ggplot2)
library(dplyr)
library(ppe)
library(ptetools)
data(covid_data)
```
# Panel Treatment Effects Tools (ptetools) Package ptetools

## Structure of panel data causal inference problems

The `ptetools` package compartmentalizes the steps needed to implement estimators of group-time average treatment effects (and their aggregations) in order to make it easier to apply the same sorts of arguments outside of their "birthplace" in the literature on difference-in-differences.

Essentially, the idea is that many panel data causal inference problems involve steps such as:

1. Defining an identification strategy (e.g., difference-in-differences)

2. Defining a notion of a group (e.g., based on treatment timing)

3. Looping over groups and time periods

4. Organizing the data so that the correct data show up for each group and time period

5. Computing group-time average treatment effects (or other parameters that are local to a group and a time period)

6. Aggregating group-time average treatment effect parameters (e.g., into an event study or an overall average treatment effect parameter)

Many of these steps are common across different panel data causal inference settings.  For example, you could implement a difference-in-differences identification strategy or a change-in-changes identification strategy with all of the same steps as above except for replacing step 1.

The idea of the `ptetools` package is to re-use as much code/infrastructure as possible when developing new approaches to panel data causal inference.  For example, `ptetools` sits as the "backend" for several other packages including: `ife`, `contdid`, and parts of `qte`.

## How ptetools works

The main function is called `pte`.  The most important parameters that it takes in are `subset_fun` and `attgt_fun`.  These are functions that the user should pass to `pte`.

`subset_fun` takes in the overall data, a group, a time period, and possibly other arguments and returns a `data.frame` containing the relevant subset of the data, an outcome, and whether or not a unit should be considered to be in the treated or comparison group for that group/time.  There is one example of a relevant subset function provided in the package: [the `two_by_two_subset` function](https://github.com/bcallaway11/ptetools/blob/master/R/subset_functions.R). This function takes an original dataset, subsets it into pre- and post-treatment periods and denotes treated and untreated units.  This particular subset is perhaps the most common/important one for thinking about treatment effects with panel data, and this function can be reused across applications.

The other main function is `attgt_fun`.  This function should be able to take in the correct subset of data, possibly along with other arguments to the function, and report an *ATT* for that subset.  With minor modification, this function should be available for most any sort of treatment effects application --- for example, if you can solve the baseline 2x2 case in difference in differences, you should use that function here, and the `ptetools` package will take care of dealing with the variation in treatment timing.

If `attgt_fun` returns an influence function, then the `ptetools` package will also conduct inference using the multiplier bootstrap (which is fast) and produce uniform confidence bands (which adjust for multiple testing).

The default output of `pte` is an overall treatment effect on the treated (i.e., across all groups that participate in the treatment in any time period) and and event study.  More aggregations are possible, but these seem to be the leading cases; aggregations of group-time average treatment effects are discussed at length in [Callaway and Sant'Anna (2021)](https://doi.org/10.1016/j.jeconom.2020.12.001).

Below are several examples of how the `ptetools` package can be used to implement an identification strategy with a very small amount of new code.

## Example 1: Difference in differences

The [`did` package](https://bcallaway11.github.io/did/), which is based on [Callaway and Sant'Anna (2021)](https://doi.org/10.1016/j.jeconom.2020.12.001), includes estimates of group-time average treatment effects, *ATT(g,t)*, based on a difference in differences identification strategy.  The following example demonstrates that it is easy to compute group-time average treatment effects using difference in differences using the `ptetools` package.  [*Note:* This is definitely not the recommended way of doing this as there is very little error handling, etc. here, but it is rather a proof of concept.  You should use the `did` package for this case.]

This example reproduces DID estimates of the effect of the minimum wage on employment using data from the `did` package.

```{r}
library(did)
data(mpdta)
did_res <- pte(
  yname = "lemp",
  gname = "first.treat",
  tname = "year",
  idname = "countyreal",
  data = mpdta,
  setup_pte_fun = setup_pte,
  subset_fun = two_by_two_subset,
  attgt_fun = did_attgt,
  xformla = ~lpop
)

summary(did_res)
ggpte(did_res)
```

What's most interesting here, is that the only "new" code that needs to be write is in [the `did_attgt` function](https://github.com/bcallaway11/ptetools/blob/master/R/attgt_functions.R).  You will see that this is a very small amount of code.

## Example 2: Policy Evaluation during a Pandemic

As a next example, consider trying to estimate effects of COVID-19 related policies during a pandemic.  The estimates below are for the effects of state-level shelter-in-place orders during the early part of the pandemic.

The data for this example comes from the `ppe` package which can be loaded by running
```{r eval=FALSE}
devtools::install_github("bcallaway11/ppe")
library(ppe)
data(covid_data)
```

[Callaway and Li (2021)](https://arxiv.org/abs/2105.06927) argue that a particular unconfoundedness-type strategy is more appropriate in this context than DID-type strategies due to the spread of COVID-19 cases being highly nonlinear.  However, they still deal with the challenge of variation in treatment timing.  Therefore, it is still useful to think about group-time average treatment effects, but the DID strategy should be replaced with their particular unconfoundedness type assumption.

The `ptetools` package is particularly useful here.

```{r}
# formula for covariates
xformla <- ~ current + I(current^2) + region + totalTestResults
```

```{r echo=FALSE, results=FALSE, warning=FALSE}
# drop some data as in paper
trim_id_list <- lapply(c(10, 15, 20, 25, 30),
  did::trimmer,
  tname = "time.period",
  idname = "state_id",
  gname = "group",
  xformla = xformla,
  data = covid_data,
  control_group = "nevertreated",
  threshold = 0.95
)
time_id_list <- unlist(trim_id_list)

# unique(subset(covid_data, state_id %in% time_id_list)$state)
covid_data2 <- subset(covid_data, !(state_id %in% time_id_list))
```


```{r}
covid_res <- pte(
  yname = "positive",
  gname = "group",
  tname = "time.period",
  idname = "state_id",
  data = covid_data2,
  setup_pte_fun = setup_pte_basic,
  subset_fun = two_by_two_subset,
  attgt_fun = covid_attgt,
  xformla = xformla,
  max_e = 21,
  min_e = -10
)

summary(covid_res)
ggpte(covid_res) + ylim(c(-1000, 1000))
```

What's most interesting is just how little code needs to be written here.  The only new code required is the `ppe::covid_attgt` function which is [available here](https://github.com/bcallaway11/ppe/blob/master/R/covid_attgt.R), and, as you can see, this is very simple.


## Example 3: Empirical Bootstrap

The code above used the multiplier bootstrap.  The great thing about the multiplier bootstrap is that it's fast.  But in order to use it, you have to work out the influence function for the estimator of *ATT(g,t)*.  Although I pretty much always end up doing this, it can be tedious, and it can be nice to get a working version of the code for a project going before working out the details on the influence function.

The `ptetools` package can be used with the empirical bootstrap.  There are a few limitations.  First, it's going to be substantially slower.  Second, this code just reports pointwise confidence intervals.  However, this basically is set up to fit into my typical workflow, and I see this as a way to get preliminary results.

Let's demonstrate it.  To do this, consider the same setup as in Example 1, but where no influence function is returned.  Let's write the code for this:
```{r}
# did with no influence function
did_attgt_noif <- function(gt_data, xformla, ...) {
  # call original function
  did_gt <- did_attgt(gt_data, xformla, ...)

  # remove influence function
  did_gt$inf_func <- NULL

  did_gt
}
```

Now, we can show the same sorts of results as above
```{r}
did_res_noif <- pte(
  yname = "lemp",
  gname = "first.treat",
  tname = "year",
  idname = "countyreal",
  data = mpdta,
  setup_pte_fun = setup_pte,
  subset_fun = two_by_two_subset,
  attgt_fun = did_attgt_noif, # this is only diff.
  xformla = ~lpop
)

summary(did_res_noif)
ggpte(did_res_noif)
```

What's exciting about this is just how little new code needs to be written.

Owner

  • Login: bcallaway11
  • Kind: user
  • Location: Athens, GA
  • Company: University of Georgia

Brantly Callaway, Department of Economics, University of Georgia

Citation (CITATION.cff)

# --------------------------------------------
# CITATION file created with {cffr} R package
# See also: https://docs.ropensci.org/cffr/
# --------------------------------------------
 
cff-version: 1.2.0
message: 'To cite package "ptetools" in publications use:'
type: software
license: GPL-3.0-only
title: 'ptetools: Panel Treatment Effects Tools'
version: 1.0.0
abstract: Generic code for estimating treatment effects with panel data. The idea
  is to break into separate steps organizing the data, looping over groups and time
  periods, computing group-time average treatment effects, and aggregating group-time
  average treatment effects. Often, one is able to implement a new identification/estimation
  procedure by simply replacing the step on estimating group-time average treatment
  effects. See several different examples of this approach in the package documentation.
authors:
- family-names: Callaway
  given-names: Brantly
  email: brantly.callaway@uga.edu
preferred-citation:
  type: manual
  title: Panel Treatment Effects Tools
  authors:
  - family-names: Callaway
    given-names: Brantly
    email: brantly.callaway@uga.edu
  year: '2025'
  notes: R package version 1.0.0
  url: https://github.com/bcallaway11/ptetools
repository-code: https://github.com/bcallaway11/ptetools
url: https://github.com/bcallaway11/ptetools
contact:
- family-names: Callaway
  given-names: Brantly
  email: brantly.callaway@uga.edu
references:
- type: software
  title: Matrix
  abstract: 'Matrix: Sparse and Dense Matrix Classes and Methods'
  notes: Imports
  url: https://Matrix.R-forge.R-project.org
  repository: https://CRAN.R-project.org/package=Matrix
  authors:
  - family-names: Bates
    given-names: Douglas
    orcid: https://orcid.org/0000-0001-8316-9503
  - family-names: Maechler
    given-names: Martin
    email: mmaechler+Matrix@gmail.com
    orcid: https://orcid.org/0000-0002-8685-9910
  - family-names: Jagan
    given-names: Mikael
    orcid: https://orcid.org/0000-0002-3542-2938
  year: '2025'
  doi: 10.32614/CRAN.package.Matrix

GitHub Events

Total
  • Watch event: 9
  • Push event: 37
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 9
  • Push event: 37
  • Fork event: 1
  • Create event: 2