birdie

Bayesian Instrumental Regression for Disparity Estimation

https://github.com/corymccartan/birdie

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: science.org
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.2%) to scientific vocabulary

Keywords

imputation r racial-disparities statistics
Last synced: 6 months ago · JSON representation

Repository

Bayesian Instrumental Regression for Disparity Estimation

Basic Info
Statistics
  • Stars: 6
  • Watchers: 4
  • Forks: 3
  • Open Issues: 1
  • Releases: 5
Topics
imputation r racial-disparities statistics
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-"
)
set.seed(5118)
```

# **BIRDiE**: Estimating disparities when race is not observed 


[![R-CMD-check](https://github.com/CoryMcCartan/birdie/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/CoryMcCartan/birdie/actions/workflows/R-CMD-check.yaml)
[![CRAN_Status_Badge](https://www.r-pkg.org/badges/version-last-release/birdie)](https://cran.r-project.org/package=redist)
![CRAN downloads](http://cranlogs.r-pkg.org/badges/grand-total/birdie)


Bayesian Improved Surname Geocoding (BISG) is a simple model that predicts
individual race based off last names and addresses.  While predictive, it
is not perfect, and measurement error in these predictions can cause problems
in downstream analyses.

Bayesian Instrumental Regression for Disparity Estimation (BIRDiE) is a class of
Bayesian models for accurately estimating conditional distributions by race, 
using BISG probabilities as inputs. 
This package implements BIRDiE as described in [McCartan, Fisher, Goldin, Ho, and Imai (2025)](https://doi.org/10.1080/01621459.2025.2526695).
It also implements standard BISG and an improved measurement-error BISG model as described 
in [Imai, Olivella, and Rosenman (2022)](https://www.science.org/doi/full/10.1126/sciadv.adc9824).

BIRDiE Overview Poster

## Installation

You can install the latest version of the package from CRAN with:

``` r
install.packages("birdie")
```

You can also install the development version with:

``` r
# install.packages("remotes")
remotes::install_github("CoryMcCartan/birdie")
```

## Basic Usage

A basic analysis has two steps.
First, you compute BISG probability estimates with the `bisg()` or `bisg_me()` functions (or using any other probabilistic race prediction tool).
Then, you estimate the distribution of an outcome variable by race using the `birdie()` function.

```{r}
library(birdie)

data(pseudo_vf)

head(pseudo_vf)
```

To compute BISG probabilities, you provide the last name and (optionally) geography variables as part of a formula.

```{r}
r_probs = bisg(~ nm(last_name) + zip(zip), data=pseudo_vf)

head(r_probs)
```

Computing regression estimates requires specifying a model structure.
Here, we'll use a Categorical-Dirichlet regression model that lets the
relationship between turnout and race vary by ZIP code.
This is the "no-pooling" model from McCartan et al.
We'll use Gibbs sampling for inference, which will also let us capture the uncertainty in our estimates.

```{r}
fit = birdie(r_probs, turnout ~ proc_zip(zip), data=pseudo_vf, 
             family=cat_dir(), algorithm="gibbs")

print(fit)
```

The `proc_zip()` function fills in missing ZIP codes, among other things.
We can extract the estimated conditional distributions with `coef()`.
We can also get updated BISG probabilities that additionally condition on turnout using `fitted()`.
Additional functions allow us to extract a tidy version of our estimates (`tidy()`)
and visualize the estimated distributions (`plot()`).

```{r}
coef(fit)

head(fitted(fit))

tidy(fit)

plot(fit)
```

A more detailed introduction to the method and software package can be found 
on the [Get Started](https://corymccartan.com/birdie/articles/birdie.html) page.

Owner

  • Name: Cory McCartan
  • Login: CoryMcCartan
  • Kind: user
  • Company: New York University

Faculty Fellow at NYU's Center for Data Science, working on computational social science problems and open-source R software.

GitHub Events

Total
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 8
  • Push event: 6
  • Pull request event: 1
  • Fork event: 2
Last Year
  • Issues event: 3
  • Watch event: 1
  • Issue comment event: 8
  • Push event: 6
  • Pull request event: 1
  • Fork event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 145
  • Total Committers: 3
  • Avg Commits per committer: 48.333
  • Development Distribution Score (DDS): 0.014
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Cory McCartan c****n@g****m 143
Kosuke Imai i****i@h****u 1
Jeroen Ooms j****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 14
  • Total pull requests: 6
  • Average time to close issues: 11 days
  • Average time to close pull requests: 6 days
  • Total issue authors: 7
  • Total pull request authors: 3
  • Average comments per issue: 1.57
  • Average comments per pull request: 0.33
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 1
  • Average time to close issues: 23 days
  • Average time to close pull requests: about 6 hours
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 1.25
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • CoryMcCartan (8)
  • lecy (1)
  • ericmanning (1)
  • mjia002 (1)
  • davidmcclendon (1)
  • jcha1997 (1)
  • maxwellpalmer (1)
Pull Request Authors
  • kosukeimai (4)
  • CoryMcCartan (1)
  • jeroen (1)
Top Labels
Issue Labels
feature request (6) documentation (1) meta (1) bug (1) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 265 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: birdie

Bayesian Instrumental Regression for Disparity Estimation

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 265 Last month
Rankings
Dependent packages count: 28.7%
Dependent repos count: 35.4%
Average: 50.0%
Downloads: 86.0%
Maintainers (1)
Last synced: 6 months ago