synthnn

A Python package for estimating treatment effects using Synthetic Nearest Neighbors

https://github.com/rivkalipko/synthnn

Last synced: 10 months ago · JSON representation ·

Repository

A Python package for estimating treatment effects using Synthetic Nearest Neighbors

Basic Info

Host: GitHub
Owner: rivkalipko
License: mit
Language: Python
Default Branch: main
Size: 811 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

synthnn

A Python package for panel data causal inference implementing synthetic nearest neighbors (SNN), a causal model for matrix completion that imputes treated units’ counterfactual outcomes from weighted nearest neighbors in a low-rank subspace learned from pre-treatment data..

Features

Flexible Panel Data Support — Works with both simultaneous and staggered treatment adoption.
Multiple Inference Methods — Jackknife, bootstrap, and Fisher-style placebo tests for uncertainty quantification.
Built-in Visualization — Gap plots and observed vs. counterfactual comparisons.
Customizable Imputation — Fully configurable parameters to match your data’s characteristics.

Installation

bash pip install synthnn

Quick Start

```python import pandas as pd from synthnn import SNN

Load your panel data

df = pd.readcsv("yourpanel_data.csv")

Initialize and fit the SNN model

model = SNN( unitcol="Unit", timecol="Time", outcomecol="Y", treatcol="W", variance_type="bootstrap", resamples=500, alpha=0.05 ) model.fit(df) model.summary()

Visualize results

model.plot("gap") # Average treatment effect on the treated (ATT) over time model.plot("counterfactual") # Observed vs. counterfactual ```

Full Example — Replicating Abadie et al. (2010)

This example reproduces the well-known California tobacco control study. Data: prop99.csv in the demos folder.

```python import pandas as pd from synthnn import SNN

1. Load the data from Abadie et al. (2010)

df0 = pd.readcsv("prop99.csv", lowmemory=False)

df = ( df0 .query("TopicDesc == 'The Tax Burden on Tobacco' " "and SubMeasureDesc == 'Cigarette Consumption (Pack Sales Per Capita)'") .loc[:, ["LocationDesc", "Year", "DataValue"]] .rename(columns={ "LocationDesc": "Unit", "Year": "Time", "DataValue": "Y" }) )

Drop territories & aggregate rows (keep 50 states)

badunits = ["District of Columbia", "United States", "Guam", "Puerto Rico", "American Samoa", "Virgin Islands"] df = df[~df["Unit"].isin(badunits)]

2. Define the treatment indicator

df["W"] = ((df["Unit"] == "California") & (df["Time"] >= 1989)).astype(int)

3. Fit Synthetic-Nearest-Neighbors

model = SNN( unitcol="Unit", timecol="Time", outcomecol="Y", treatcol="W", variance_type="bootstrap", resamples=100, alpha=0.05 )

model.fit(df)

4. Inspect results

model.summary()

5. Plot the gap between treated and counterfactual

model.plot( title="SNN replication of Abadie et al. (2010)", xlabel="Event Time (0 = 1989)", ylabel="ATT (packs per-capita)" ).write_image("gap.png")

6. Plot observed vs counterfactual paths

model.plot( plottype="counterfactual", title="Observed vs Synthetic California", xlabel="Event Time (0 = 1989)", ylabel="Cigarette Consumption (packs per-capita)" ).writeimage("counterfactual.png")

7. Same as before but with calendar time on the x-axis, only post-treatment periods, and custom colors

model.plot( plottype="counterfactual", calendartime=True, xrange=(1989, 2014), title="Observed vs Synthetic California: Post-Treatment Periods", xlabel="Year", ylabel="Cigarette Consumption (packs per-capita)", counterfactualcolor="#406B34", # green observedcolor="#ff7f0e" # orange ).write_image("graphics.png")

8. Inference using the placebo test (only works if there is exactly one treated unit)

modelpc = SNN(unitcol="Unit", timecol="Time", outcomecol="Y", treatcol="W", variancetype="placebo", alpha=0.05) modelpc.fit(df) modelpc.summary()

9. Plot the results, displaying the paths of the placebo treated units against the actual treated unit

modelpc.plot(showplacebos=True, title="Placebo Test for Inference", xlabel="Event Time (0 = 1989)", ylabel="ATT (packs per capita)").write_image("placebo.png") ```

Output

Click to expand

```plaintext ============================================================ SNN Estimation Results ============================================================ --- Overall ATT --- estimate method se p_value ci_lower ci_upper -28.25 bootstrap 2.032 0 -32.07 -24.03 --- ATT by Event Time (Post-Treatment) --- event_time att N_units se p_value ci_lower ci_upper method 0 -14.2 1 1.651 0 -17.06 -11.28 bootstrap 1 -15.15 1 2.077 3.015e-13 -18.75 -11.43 bootstrap 2 -22.02 1 2.089 0 -26.16 -18.22 bootstrap 3 -22.12 1 2.184 0 -26.15 -18.05 bootstrap 4 -25.27 1 1.959 0 -28.55 -21.33 bootstrap 5 -29.18 1 2.129 0 -32.97 -25 bootstrap 6 -31.54 1 2.052 0 -35.08 -27.1 bootstrap 7 -31.75 1 2.054 0 -35.6 -27.29 bootstrap 8 -32.37 1 2.207 0 -36.2 -28.41 bootstrap 9 -32.8 1 2.035 0 -36.08 -28.68 bootstrap 10 -35.09 1 2.144 0 -38.64 -31.03 bootstrap 11 -35.74 1 2.196 0 -39.74 -31.06 bootstrap 12 -36.65 1 2.301 0 -41.26 -31.28 bootstrap 13 -37.07 1 2.291 0 -41.5 -31.68 bootstrap 14 -37.75 1 3.217 0 -44.07 -31.11 bootstrap 15 -34.89 1 3.052 0 -40.54 -27.46 bootstrap 16 -33.71 1 3.303 0 -39.55 -26.32 bootstrap 17 -31.7 1 3.097 0 -37.31 -25.12 bootstrap 18 -30.94 1 3.264 0 -36.9 -23.89 bootstrap 19 -27.91 1 2.687 0 -32.99 -22.78 bootstrap 20 -26.63 1 2.583 0 -31.33 -21.51 bootstrap 21 -23.79 1 2.254 0 -27.74 -19.66 bootstrap 22 -22.49 1 2.131 0 -26.36 -18.57 bootstrap 23 -21.83 1 2.042 0 -25.58 -18.39 bootstrap 24 -21.35 1 2.044 0 -24.94 -17.73 bootstrap 25 -20.63 1 1.895 0 -24.19 -17.52 bootstrap ============================================================ ============================================================ SNN Estimation Results ============================================================ --- Overall ATT --- estimate placebo_p placebo_rank -28.25 0.08 4 Placebo Fisher p-value: 0.08 (rank 4/50) --- ATT by Event Time (Post-Treatment) --- event_time att N_units placebo_p 0 -14.2 1 0.2 1 -15.15 1 0.22 2 -22.02 1 0.12 3 -22.12 1 0.12 4 -25.27 1 0.08 5 -29.18 1 0.06 6 -31.54 1 0.06 7 -31.75 1 0.06 8 -32.37 1 0.06 9 -32.8 1 0.04 10 -35.09 1 0.04 11 -35.74 1 0.04 12 -36.65 1 0.04 13 -37.07 1 0.06 14 -37.75 1 0.1 15 -34.89 1 0.12 16 -33.71 1 0.1 17 -31.7 1 0.14 18 -30.94 1 0.14 19 -27.91 1 0.14 20 -26.63 1 0.2 21 -23.79 1 0.2 22 -22.49 1 0.18 23 -21.83 1 0.18 24 -21.35 1 0.16 25 -20.63 1 0.12 ============================================================ ```

Plots

Parameters

General

unit_col, time_col, outcome_col, treat_col (str) — Column names for unit ID, time, outcome, and treatment indicator.
variance_type (str) — Inference method:
- "jackknife" — Leave-one-unit-out resampling
- "bootstrap" (default) — Block bootstrap on units
- "placebo" — Fisher randomization test (only when exactly one treated unit)
resamples (int) — Bootstrap resamples (default: 500)
alpha (float) — Significance level for confidence intervals (default: 0.05)
snn_params (dict) — Parameters for the SyntheticNearestNeighbors imputer.

SNN Parameters (`snn_params`)

n_neighbors (int) — Number of nearest neighbors (default: 1)
weights (str) — 'uniform' or 'distance'
random_splits (bool) — Use random splits in the algorithm
max_rank (int) — Maximum rank for low-rank approximation
spectral_t, linear_span_eps, subspace_eps (float) — Algorithm thresholds (default: 0.1)
min_value, max_value (float) — Bounds for imputed values
verbose (bool) — Print progress.

Plot Parameters

plot_type — "gap" or "counterfactual"
calendar_time (bool) — Use calendar time (for simultaneous adoption only)
xrange (tuple) — (min, max) for x-axis
title, xlabel, ylabel (str) — Labels
figsize (tuple) — (width, height)
color, observed_color, counterfactual_color, placebo_color (str) — Plot colors
placebo_opacity (float) — Opacity for placebo lines (default: 0.25)

Output Attributes

After fitting, the model exposes:

overall_att_ — Overall ATT with inference statistics
att_by_event_time_ — ATT series by event time
att_by_time_ — ATT series by calendar time
individual_effects_ — Unit-level effects
counterfactual_event_df_ — Observed vs. counterfactual (event time)
counterfactual_df_ — Observed vs. counterfactual (calendar time)

Requirements

pandas, numpy, scipy, plotly, scikit-learn

Acknowledgments

The implementation in this package adapts and builds upon the code from the syntheticNN repository by Dennis Shen.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, you can cite it as below. @software{synthnn, author = {Lipkovitz, Rivka}, month = jun, title = {{synthnn: a Python package for estimating treatment effects using Synthetic Nearest Neighbors}}, url = {https://github.com/rivkalipko/synthnn}, year = {2025} }

Please also consider citing the authors of the original paper:

Agarwal, A., Dahleh, M., Shah, D., & Shen, D. (2023, July). Causal matrix completion. In The thirty sixth annual conference on learning theory (pp. 3821-3826). PMLR.

Owner

Name: Rivka Lipkovitz
Login: rivkalipko
Kind: user

Website: https://rivka.me
Repositories: 1
Profile: https://github.com/rivkalipko

Citation (CITATION.cff)

cff-version: 1.1.5
message: "If you use this software in your research, please cite it as below."
authors:
- family-names: "Lipkovitz"
  given-names: "Rivka"
  orcid: "https://orcid.org/0000-0002-0273-664X"
title: "synthnn: a Python package implementing the synthetic nearest neighbors estimator for panel data causal inference."
date-released: 2025-08-13
url: "https://github.com/rivkalipko/synthnn"

GitHub Events

Total

Push event: 5
Fork event: 1

Last Year

Push event: 5
Fork event: 1

Packages

Total packages: 1
Total downloads:
- pypi 326 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: synthnn

A Python package for estimating treatment effects using Synthetic Nearest Neighbors.

Homepage: https://github.com/rivkalipko/synthnn
Documentation: https://synthnn.readthedocs.io/
License: MIT License Copyright (c) 2025 Rivka Lipkovitz Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 1.1.2
published 10 months ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 326 Last month

Rankings

Dependent packages count: 9.0%

Average: 29.7%

Dependent repos count: 50.5%

Maintainers (1)

lipkovitz

Last synced: 10 months ago

synthnn

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

readme.md

synthnn

Features

Installation

Quick Start

Load your panel data

Initialize and fit the SNN model

Visualize results

Full Example — Replicating Abadie et al. (2010)

1. Load the data from Abadie et al. (2010)

Drop territories & aggregate rows (keep 50 states)

2. Define the treatment indicator

3. Fit Synthetic-Nearest-Neighbors

4. Inspect results

5. Plot the gap between treated and counterfactual

6. Plot observed vs counterfactual paths

7. Same as before but with calendar time on the x-axis, only post-treatment periods, and custom colors

8. Inference using the placebo test (only works if there is exactly one treated unit)

9. Plot the results, displaying the paths of the placebo treated units against the actual treated unit

Output

Plots

Parameters

General

SNN Parameters (snn_params)

Plot Parameters

Output Attributes

Requirements

Acknowledgments

License

Citation

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Packages

pypi.org: synthnn

Rankings

Maintainers (1)

Dependencies

SNN Parameters (`snn_params`)