CausalTables.jl

CausalTables.jl: Simulating and storing data for statistical causal inference in Julia - Published in JOSS (2025)

https://github.com/salbalkus/causaltables.jl

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation ·

Repository

A new type of Table to store and simulate data for causal inference in Julia.

Basic Info
Statistics
  • Stars: 16
  • Watchers: 3
  • Forks: 2
  • Open Issues: 5
  • Releases: 16
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

CausalTables.jl

Build Status Coverage Status License: MIT JOSS Status Aqua QA

A common interface for processing and simulating data for causal inference in Julia.

Causal inference is the process of estimating, from data, the effect of a treatment variable on an outcome variable -- typically in the presence of confounders. The goal of CausalTables.jl is to simplify the development of statistical causal inference methods in Julia. To this end, the package provides two sets of tools:

  1. The CausalTable, a Tables.jl-compliant data structure that wraps a table of data with labels of the causes of relevant variables, denoted via a type of directed acyclic graph (DAG). Users can call existing functions to easily intervene on treatment variables, identify common subsets of variables (confounders, mediators, instruments, etc.) or use causal labels in other ways -- all while still allowing the data to be used with other Julia packages that accept Tables.jl data structures.
  2. The StructuralCausalModel interface, which allows users to encode a Structural Causal Model (SCM), a sequence of conditional distributions where each distribution can depend (causally) on any of the previous. This supports simulating data from arbitrary causal structures, extract ground truth distributions conditional on the data generated in previous steps, and approximating common ground-truth estimands such as the average treatment effect or policy effect.

What sets this package apart? CausalTables.jl provides a common interface for manipulating tabular data for causal inference. While packages like CausalInference.jl only focus on causal graphs and discovery algorithms, the CausalTable interface provides utility functions to clean and manipulate practical datasets for input into statistical estimators. The simulation capabilities of CausalTables.jl are similar to those of probabilistic programming languages like Turing.jl or Gen.jl; however, unlike these packages, with CausalTables.jl users can extract the true conditional distributions of relevant variables from a dataset in closed-form after data has been generated. This makes it easy to extract parameters like ground-truth ("oracle") conditional means or propensity scores, which are often helpful for testing whether an estimator is behaving as intended.

Installation

CausalTables.jl can be installed using the Julia package manager. From the Julia REPL, type ] to enter the Pkg REPL mode and run

Pkg> add CausalTables

Running a simulation with CausalTables.jl

To simulate data, one must first define a StructuralCausalModel (SCM). An SCM is composed of a DataGeneratingProcess, which is a sequence of random variables, along with labels for treatment, response, and confounder variables. For example, the following code defines an SCM with a binary treatment $A$, a continuous confounder $W$, and a continuous response $Y$. The @dgp macro constructs a DataGeneratingProcess object according to the simple syntax name ~ distribution, where rhs is a Distribution object from Distributions.jl. This object can also be a function of parameters defined outside of the macro. More advanced syntax is detailed in the documentation.

``` using CausalTables using Distributions

dgp(a, b; σ2 = 1) = @dgp( W ~ Beta(a, b), A ~ (@. Bernoulli(W)), Y ~ (@. Normal(A + W, σ2)) )

scm = StructuralCausalModel( dgp(2, 2; σ2 = 2); treatment = :A, response = :Y, causes = (A = [:W], Y = [:A, :W])) ```

Once a StructuralCausalModel is defined, one can then draw a randomly-generated CausalTable according to the SCM using the rand function:

``` ctbl = rand(scm, 100)

CausalTable ┌──────────┬───────┬───────────┐ │ W │ A │ Y │ │ Float64 │ Bool │ Float64 │ ├──────────┼───────┼───────────┤ │ 0.715179 │ true │ 5.5174 │ │ 0.267457 │ false │ 1.16403 │ │ 0.563615 │ true │ -3.89226 │ │ 0.777111 │ true │ 4.98015 │ │ ⋮ │ ⋮ │ ⋮ │ │ 0.481617 │ true │ 5.87858 │ │ 0.2251 │ false │ -1.5951 │ │ 0.214866 │ true │ -0.733905 │ │ 0.548646 │ true │ -1.37903 │ └──────────┴───────┴───────────┘ 92 rows omitted Summaries: NamedTuple() Arrays: NamedTuple() ```

A CausalTable is a Table with a causal structure, such as labels for treatment, response, and causes. In addition to implementing the standard Tables.jl interface, CausalTables.jl also provides extra functions to make working with causal data easier. See the documentation for more information.

Given an SCM, it is also possible to approximate the "ground truth" value of a variety of relevant causal estimands from this SCM, including counterfactual means (cfmean), as well as average treatment effects (ate) and average policy effects (ape). For example, the ground truth average treatment effect for this SCM can be approximated like so:

``` ate(scm)

(μ = 1.0, eff_bound = 24.423232546851047) ```

Alternatively, one can compute the ground truth of low-level statistical functionals, such as conditional means or propensity scores, for use in downstream analyses.

``` propensity(scm, ctbl, :A)

100-element Vector{Float64}: 0.7151793080118533 0.7325427650946469 ⋮ 0.2148661580024375 0.5486463146032539 ```

See the documentation for more information and tutorials.

Community Guidelines

If you find a bug, have a feature request, or otherwise experience any issues with this software package, please open an issue on the issue tracker. If you would like to contribute code to the software yourself, we encourage you to open a pull request. We welcome all contributions, including bug fixes, documentation improvements, and new features.

Owner

  • Name: Salvador Balkus
  • Login: salbalkus
  • Kind: user
  • Location: Boston, MA
  • Company: Harvard University

PhD student in Biostatistics at the Harvard T.H. Chan School of Public Health. Former @UMDBigDataClub president.

JOSS Publication

CausalTables.jl: Simulating and storing data for statistical causal inference in Julia
Published
February 24, 2025
Volume 10, Issue 106, Page 7580
Authors
Salvador V. Balkus ORCID
Department of Biostatistics, Harvard T.H. Chan School of Public Health
Nima S. Hejazi ORCID
Department of Biostatistics, Harvard T.H. Chan School of Public Health
Editor
Oskar Laverny ORCID
Tags
statistics causal inference tables

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Balkus
  given-names: Salvador V.
  orcid: "https://orcid.org/0000-0003-4695-833X"
- family-names: Hejazi
  given-names: Nima S.
  orcid: "https://orcid.org/0000-0002-7127-2789"
contact:
- family-names: Balkus
  given-names: Salvador V.
  orcid: "https://orcid.org/0000-0003-4695-833X"
doi: 10.5281/zenodo.14867116
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Balkus
    given-names: Salvador V.
    orcid: "https://orcid.org/0000-0003-4695-833X"
  - family-names: Hejazi
    given-names: Nima S.
    orcid: "https://orcid.org/0000-0002-7127-2789"
  date-published: 2025-02-24
  doi: 10.21105/joss.07580
  issn: 2475-9066
  issue: 106
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 7580
  title: "CausalTables.jl: Simulating and storing data for statistical
    causal inference in Julia"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.07580"
  volume: 10
title: "CausalTables.jl: Simulating and storing data for statistical
  causal inference in Julia"

GitHub Events

Total
  • Create event: 25
  • Release event: 13
  • Issues event: 15
  • Watch event: 13
  • Delete event: 11
  • Issue comment event: 56
  • Push event: 190
  • Pull request event: 26
  • Fork event: 1
Last Year
  • Create event: 25
  • Release event: 13
  • Issues event: 15
  • Watch event: 13
  • Delete event: 11
  • Issue comment event: 56
  • Push event: 190
  • Pull request event: 26
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 364
  • Total Committers: 3
  • Avg Commits per committer: 121.333
  • Development Distribution Score (DDS): 0.03
Past Year
  • Commits: 191
  • Committers: 3
  • Avg Commits per committer: 63.667
  • Development Distribution Score (DDS): 0.058
Top Committers
Name Email Commits
Salvador Balkus s****s@g****m 353
CompatHelper Julia c****y@j****g 8
Nima Hejazi nh@n****g 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 44
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 3.24
  • Average comments per pull request: 0.07
  • Merged pull requests: 34
  • Bot issues: 0
  • Bot pull requests: 15
Past Year
  • Issues: 8
  • Pull requests: 30
  • Average time to close issues: 4 days
  • Average time to close pull requests: 1 day
  • Issue authors: 3
  • Pull request authors: 2
  • Average comments per issue: 6.38
  • Average comments per pull request: 0.07
  • Merged pull requests: 28
  • Bot issues: 0
  • Bot pull requests: 7
Top Authors
Issue Authors
  • salbalkus (16)
  • vanAmsterdam (1)
  • JuliaTagBot (1)
Pull Request Authors
  • salbalkus (35)
  • github-actions[bot] (26)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • julia 5 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
juliahub.com: CausalTables

A new type of Table to store and simulate data for causal inference in Julia.

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 5 Total
Rankings
Dependent repos count: 3.2%
Downloads: 3.3%
Average: 7.6%
Dependent packages count: 16.3%
Last synced: 6 months ago

Dependencies

.github/workflows/CI.yml actions
  • actions/checkout v4 composite
  • codecov/codecov-action v1 composite
  • julia-actions/cache v1 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest v1 composite
  • julia-actions/setup-julia v1 composite
.github/workflows/CompatHelper.yml actions
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite
.github/workflows/documentation.yml actions
  • actions/checkout v4 composite
  • julia-actions/setup-julia v1 composite