Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: neurodata
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Size: 9.96 MB
Statistics
  • Stars: 4
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

Causal Effect Detection and Correction

arXiv shield imaging neuro shield

Contents

Overview

Batch effects, undesirable sources of variance across multiple experiments, present significant challenges for scientific and clinical discoveries. Specifically, batch effects can (i) produce spurious signals and/or (ii) obscure genuine signals, contributing to the ongoing reproducibility crisis. Typically, batch effects are modeled as classical, rather than causal, statistical effects. This model choice renders the methods unable to differentiate between biological or experimental sources of variability, leading to unnecessary false positive and negative effect detections and over-confidence. We formalize batch effects as causal effects to address these concerns, and augment existing batch effect detection and correction approaches with causal machinery. Simulations illustrate that our causal approaches mitigate spurious findings and reveal otherwise obscured signals as compared to non-causal approaches. Applying our causal methods to a large neuroimaging mega-study reveals instances where prior art confidently asserts that the data do not support the presence of batch effects when we expect to detect them. On the other hand, our causal methods correctly discern that there exists irreducible confounding in the data, so it is unclear whether differences are due to batches or not. This work therefore provides a framework for understanding the potential capabilities and limitations of analysis of multi-site data using causal machinery.

Repo Contents

  • R: R package code.
  • docs: usage of the causalBatch package on many real and simulated data examples for scientific articles.
  • man: package manual for help in R session.
  • tests: R unit tests written using the testthat package.
  • vignettes: R vignettes for R session html help pages.

System Requirements

Hardware Requirements

The causalBatch package requires only a standard computer with enough RAM to support the operations defined by a user. For minimal performance, this will be a computer with about 2 GB of RAM. For optimal performance, we recommend a computer with the following specs:

RAM: 16+ GB
CPU: 4+ cores, 3+ GHz/core

The runtimes below are generated using a computer with the recommended specs (16 GB RAM, 4 cores@3 GHz) and internet of speed 100 Mbps.

Software Requirements

OS Requirements

The package development version is tested on Mac operating systems. The developmental version of the package has been tested on the following systems:

Linux: Mac OSX: Ventura 13.1 Windows:

Before setting up the causalBatch package, users should have R version 4.2.0 or higher, and several packages set up from CRAN.

Installation Guide

Stable Release

The stable release of the package is available on CRAN, and can be installed from R as:

install.packages('causalBatch')

Development Version

Package dependencies

Users should install the following packages prior to installing lolR, from an R terminal:

``` install.packages(c('cdcsis', 'MatchIt', 'nnet', 'dplyr', 'tidyverse', 'magrittr'))

if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(version = "3.18")

BiocManager::install("sva") ```

which will install in about 1 minute on a machine with the recommended specs.

The causalBatch package functions with all packages in their latest versions as they appear on CRAN on January 22, 2024. The versions of software are, specifically:

sva=3.50.0 cdcsis=2.0.3 tidyverse=2.0.0 dplyr=1.1.4 MatchIt=4.5.5 nnet=7.3.19 magrittr=2.0.3

If you are having an issue that you believe to be tied to software versioning issues, please drop us an Issue.

Package Installation

From an R session, type:

``` require(devtools)

install causalBatch with the vignettes

installgithub('neurodata/causalbatch', build_vignettes=TRUE, force=TRUE)

require(causalBatch)

view one of the basic vignettes

vignette("causal_simulations", package="causalBatch") ```

The package should take approximately 40 seconds to install with vignettes on a recommended computer.

Demo

For interactive demos of the functions, please check out the vignettes built into the package. They can be accessed as follows:

require(causalBatch) vignette("causal_simulations", package="causalBatch") vignette("causal_balancing", package="causalBatch") vignette("causal_cdcorr", package="causalBatch") vignette("causal_ccombat", package="causalBatch")

Results and figure reproduction

See Batch Effects Paper for instructions to reproduce figures from Bridgeford et al. (2025).

Citation

For usage of the package and associated manuscript, please cite according to the enclosed citation.bib.

Owner

  • Name: neurodata
  • Login: neurodata
  • Kind: organization
  • Email: admin@neurodata.io
  • Location: everywhere

Citation (citation.bib)

@article{Bridgeford2025Jan,
	author = {Bridgeford, Eric W. and Powell, Michael and Kiar, Gregory and Noble, Stephanie and Chung, Jaewon and Panda, Sambit and Lawrence, Ross and Xu, Ting and Milham, Michael and Caffo, Brian and Vogelstein, Joshua T.},
	title = {{When no answer is better than a wrong answer: A causal perspective on batch effects}},
	journal = {Imaging Neuroscience},
	volume = {3},
	year = {2025},
	month = jan,
	publisher = {MIT Press},
	doi = {10.1162/imag_a_00458}
}

@article{Bridgeford2023Jul,
	author = {Bridgeford, Eric W. and Chung, Jaewon and Gilbert, Brian and Panda, Sambit and Li, Adam and Shen, Cencheng and Badea, Alexandra and Caffo, Brian and Vogelstein, Joshua T.},
	title = {{Learning sources of variability from high-dimensional observational studies}},
	journal = {ArXiv e-prints},
	year = {2023},
	month = jul,
	eprint = {2307.13868},
	doi = {10.48550/arXiv.2307.13868}
}

@article{Lopez2017Aug,
	author = {Lopez, Michael J. and Gutman, Roee},
	title = {{Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas}},
	journal = {Statistical Science},
	volume = {32},
	number = {3},
	pages = {432--454},
	year = {2017},
	month = aug,
	issn = {0883-4237},
	publisher = {Institute of Mathematical Statistics},
	doi = {10.1214/17-STS612}
}

GitHub Events

Total
  • Watch event: 3
  • Push event: 30
Last Year
  • Watch event: 3
  • Push event: 30

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: 7 days
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ebridge2 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 249 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
cran.r-project.org: causalBatch

Causal Batch Effects

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 249 Last month
Rankings
Dependent packages count: 28.2%
Dependent repos count: 36.2%
Average: 49.7%
Downloads: 84.7%
Maintainers (1)
Last synced: 7 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.2.0 depends
  • MatchIt * imports
  • cdcsis * imports
  • dplyr * imports
  • magrittr * imports
  • nnet * imports
  • sva * imports
  • covr * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • ks * suggests
  • parallel * suggests
  • rmarkdown * suggests
  • roxygen2 * suggests
  • testthat >= 3.0.0 suggests
  • tidyr * suggests
Dockerfile docker
  • bioconductor/bioconductor_docker devel build