https://github.com/aiandglobaldevelopmentlab/causalimages-software

causalimages: An R package for performing causal inference with image and image sequence data

Keywords

biomedical-image-analysis causal-inference computer-vision machine-learning

Last synced: 9 months ago · JSON representation

Repository

causalimages: An R package for performing causal inference with image and image sequence data

Basic Info

Host: GitHub
Owner: AIandGlobalDevelopmentLab
License: other
Language: R
Default Branch: main
Homepage: https://arxiv.org/pdf/2310.00233
Size: 192 MB

Statistics

Stars: 6
Watchers: 5
Forks: 3
Open Issues: 4
Releases: 2

Topics

biomedical-image-analysis causal-inference computer-vision machine-learning

Created about 3 years ago · Last pushed 11 months ago

Metadata Files

Readme License

`causalimages`: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images

Additional tutorials: Image-based De-confounding | Image/Video Representations | Building tfrecord corpus

Replication data: Heterogeneity Paper | De-confounding Paper

Beta package version: GitHub.com/cjerzak/causalimages-software

Stable package version: GitHub.com/AIandGlobalDevelopmentLab/causalimages-software

What is causalimages?

Causal inference has entered a new stage where novel data sources are being integrated into the study of cause and effect. Image information is a particularly promising data stream in this context: it is widely available and richly informative in social science and bio-medical contexts.

This package, causalimages, enables causal analysis with images. For example, the function, AnalyzeImageHeterogeneity, performs the image-based treatment effect heterogeneity decomposition described in Jerzak, Johansson, and Daoud (2023). This function can be used, for example, to determine which neighborhoods are most responsive to an anti-poverty intervention using earth observation data from, e.g., satellites. In the bio-medical domain, this function could be used to model the kinds of patients who would be most responsive to interventions on the basis of pre-treatment diagnostic imaging. See References for a link to replication data for the image heterogeneity paper; see this tutorial for a walkthrough using the replication data.

The function, AnalyzeImageConfounding, performs the image-based deconfounding analysis described in Jerzak, Johansson, and Daoud (2023+). This function can be used, for example, to control for confounding factors correlated with both neighborhood wealth and aid decisions in observational studies of development. In the bio-medical context, this function could be used to control for confounding variables captured via diagnostic imaging in order to improve observational inference.

Package Installation

From within R, you may download via the devtools package. In particular, use

devtools::install_github(repo = "cjerzak/causalimages-software/causalimages")

Then, to load the software, use library( causalimages )

Pipeline

Use of causalimages generally follows the following pipeline. Steps 1 and 2 will be necessary for all downstream tasks.

1. Build package backend. This establishes the necessary modules, including JAX and Equinox, used in the causal image modeling. We attempt to establish GPU acceleration where that hardware is available. For tutorial, see tutorials/BuildBackend_Tutorial.R for more information. You can try using conda="auto" or finding the correct path to the conda executable by typing where conda in the terminal: causalimages::BuildBackend(conda = "/Users/cjerzak/miniforge3/bin/python")

If you prefer to manually install the backend, create a conda environment and install the Python packages used by causalimages. The commands below replicate what BuildBackend() performs under the hood (Python 3.10 or newer is recommended):

bash conda create -n CausalImagesEnv python=3.11 conda activate CausalImagesEnv python3 -m pip install tensorflow tensorflow-metal optax equinox jmp tensorflow_probability python3 -m pip install jax-metal

2. Write TfRecord. Next, you will need to write a TfRecord representation of your image or image sequence corpus. This function converts your image corpus into efficient float16 representations for fast reading of the images into memory for model training and output generation. For a tutorial, see tutorials/CausalImage_TfRecordFxns.R ```

see:

?causalimages::WriteTfRecord ```

3. Generate image representations for downstream tasks. You sometimes will only want to extract representations of your image or image sequence corpus. In that case, you'll use GetImageRepresentations(). For tutorial, see tutorials/ExtractImageRepresentations_Tutorial.R. ```

for help, see:

?causalimages::GetImageRepresentations ```

4. Perform causal image analysis. Finally, you may also want to perform a causal analysis using the image or image sequence data. For a tutorial on image-based treatment effect heterogeneity, see tutorials/AnalyzeImageHeterogeneity_Tutorial.R. For a tutorial on image-based confounding analysis, see tutorials/AnalyzeImageConfounding_Tutorial.R. ```

for help, see also:

?causalimages::AnalyzeImageHeterogeneity ?causalimages::AnalyzeImageConfounding ```

Image Heterogeneity Tutorial

The most up-to-date tutorials are found in the tutorials folder of this GitHub. Here, we also provide an abbreviated tutorial using an image heterogeneity analysis.

Load in Tutorial Data

After we've loaded in the package, we can get started running an analysis. Let's read in the tutorial data so we can explore its structure: ```

outcome, treatment, and covariate information:

summary( obsW <- causalimages::obsW ) # treatment vector summary( obsY <- causalimages::obsY ) # outcome vector summary( LongLat <- causalimages::LongLat ) # long-lat coordinates for each unit summary( X <- causalimages::X ) # other covariates

image information:

dim( FullImageArray <- causalimages::FullImageArray ) # dimensions of the full image array in memory head( KeysOfImages <- causalimages::KeysOfImages ) # unique image keys associated with the images in FullImageArray head( KeysOfObservations <- causalimages::KeysOfObservations ) # image keys of observations to be associated to images via KeysOfImages We can also analyze the images that we'll use in this analysis.

plot the second band of the third image

causalimages::image2(FullImageArray[3,,,2])

plot the first band of the first image

causalimages::image2(FullImageArray[1,,,1]) ``We're using rather small image bricks around each long/lat coordinate so that this tutorial code is memory efficient. In practice, your images will be larger and you'll usually have to read them in from disk (with those instructions outlined in theacquireImageFxn` function that you'll specify). We have an example of that approach later in the tutorial.

Writing image corpus to `tfrecord`

One important part of the image analysis pipeline is writing the image corpus tfrecord file for efficient model training. You will use the causalimages::WriteTfRecord function, which takes as an input another function, acquireImageFxn, as an argument which we use for extracting all the images and writing them to the tfrecord. There are two ways that you can approach this: (1) you may store all images in R's memory, or you may (2) save images on your hard drive and read them in when needed while generating the tfrecord. The second option will be more common for large images.

You must write your acquireImageFxn to take in a single argument: keys. - keys (a positional argument) is a character or numeric vector. Each value of keys refers to a unique image object that will be read in. If each observation has a unique image associated with it, perhaps imageKeysOfUnits = 1:nObs. In the example we'll use, multiple observations map to the same image.

When Loading All Images in Memory

In this tutorial, we have all the images in memory in the FullImageArray array. We can write an acquireImageFxn function like so: ``` acquireImageFromMemory <- function(keys){ # here, the function input keys # refers to the unit-associated image keys m_ <- FullImageArray[match(keys, KeysOfImages),,,]

# if keys == 1, add the batch dimension so output dims are always consistent # (here in image case, dims are batch by height by width by channel) if(length(keys) == 1){ m_ <- array(m,dim = c(1L,dim(m)[1],dim(m)[2],dim(m)[3])) } return( m_ ) }

OneImage <- acquireImageFromMemory(sample(KeysOfObservations,1)) dim( OneImage )

ImageSample <- acquireImageFromMemory(sample(KeysOfObservations,10)) dim( ImageSample )

plot image: it's always a good idea

to check the images through extensive sanity checks

such as your comparing satellite image representation

against those from OpenStreetMaps or Google Earth.

causalimages::image2( ImageSample[3,,,1] ) ```

Now, let's write the tfrecord: ``` causalimages::WriteTfRecord( file = "~/Downloads/CausalIm.tfrecord", uniqueImageKeys = unique( KeysOfObservations ), acquireImageFxn = acquireImageFromMemory )

Note: You first may need to call causalimages::BuildBackend() to build the backend (done only once)

```

When Reading in Images from Disk

For most applications of large-scale causal image analysis, we won't be able to read whole set of images into R's memory. Instead, we will specify a function that will read images from somewhere on your hard drive. You can also experiment with other methods---as long as you can specify a function that returns an image when given the appropriate imageKeysOfUnits value, you should be fine. See tutorials/AnalyzeImageHeterogeneity_Tutorial.R for a full example.

Analyzing the Sample Data

Now that we've established some understanding of the data and written the acquireImageFxn, we are ready to proceed with the initial use of the causal image decomposition.

Note: The images used here are heavily clipped to keep this tutorial fast; the model parameters chosen here are selected to make training rapid too. The function output here should therefore not be interpreted too seriously.

``` ImageHeterogeneityResults <- causalimages::AnalyzeImageHeterogeneity( # data inputs obsW = obsW, obsY = obsY, imageKeysOfUnits = KeysOfObservations, file = "~/Downloads/CausalIm.tfrecord", # this points to the tfrecord X = X,

      # inputs to control where visual results are saved as PDF or PNGs 
      # (these image grids are large and difficult to display in RStudio's interactive mode)
      plotResults = T,
      figuresPath = "~/Downloads/CausalImagesTutorial",

      # other modeling options
      kClust_est = 2,
      nSGD = 400L, # make this larger for full applications
      batchSize = 16L)

```

Visual Results

Upon completion, AnalyzeImageHeterogeneity will save several images from the analysis to the location figuresPath. The figuresTag will be appended to these images to keep track of results from different analyses. Currently, these images include the following: - The image results with .pdf name starting, VisualizeHeteroReal_variational_minimal_uncertainty, which plots the images having great uncertainty in the cluster probabilities. - The image results with .pdf name starting, VisualizeHeteroReal_variational_minimal_mean; these plots display the images having the highest probabilities for each associated cluster. - Finally, one output .pdf has name starting HeteroSimTauDensityRealDataFig, and plots the estimated distributions over image-level treatment effects for the various clusters. Overlap of these distributions is to be expected, since the quantity is computed at the image (not some aggregate) level.

Numerical Results

We can also examine some of the numerical results contained in the ImageHeterogeneityResults output. ```

image type treatment effect cluster means

ImageHeterogeneityResults$clusterTaus_mean

image type treatment effect cluster standard deviations

ImageHeterogeneityResults$clusterTaus_sd

per image treatment effect cluster probability means

ImageHeterogeneityResults$clusterProbs_mean

per image treatment effect cluster probability standard deviations

ImageHeterogeneityResults$clusterProbs_sd ```

Pointers

Here are a few tips for using the AnalyzeImageHeterogeneity function: - If the cluster probabilities are very extreme (all 0 or 1), try increasing nSGD, simplifying the model structure (e.g., making nFilters, nDepthHidden_conv, or nDepthHidden_dense smaller), or increasing the number of Monte Carlo interations in the Variational Inference training (increase nMonte_variational). - For satellite data, images that show up as pure dark blue are centered around a body of water.

Acknowledgements

We thank James Bailie, Devdatt Dubhashi, Felipe Jordan, Mohammad Kakooei, Eagon Meng, Xiao-Li Meng, and Markus Pettersson for valuable feedback on this project. We thank Xiaolong Yang for excellent research assistance. Special thanks to Cindy Conlin for being our intrepid first package user and for many invaluable suggestions for improvement.

References

[1.] Connor T. Jerzak, Fredrik Johansson, Adel Daoud. Image-based Treatment Effect Heterogeneity. Proceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), Proceedings of Machine Learning Research (PMLR), 213: 531-552, 2023. [Article PDF] [Summary PDF] [Replication Data] [Replication Data Tutorial]

@article{JJD-Heterogeneity, title={Image-based Treatment Effect Heterogeneity}, author={Jerzak, Connor T. and Fredrik Johansson and Adel Daoud}, journal={Proceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), Proceedings of Machine Learning Research (PMLR)}, year={2023}, volume={213}, pages={531-552} }

[2.] Connor T. Jerzak, Fredrik Johansson, Adel Daoud. Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities. ArXiv Preprint, 2023. arxiv.org/pdf/2301.12985.pdf [Replication Data] @article{JJD-Confounding, title={Integrating Earth Observation Data into Causal Inference: Challenges and Opportunities}, author={Jerzak, Connor T. and Fredrik Johansson and Adel Daoud}, journal={ArXiv Preprint}, year={2023}, volume={}, pages={}, publisher={} }

[3.] Connor T. Jerzak, Adel Daoud. CausalImages: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images. ArXiv Preprint, 2023. arxiv.org/pdf/2301.12985.pdf @article{JerDao2023, title={CausalImages: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images}, author={Jerzak, Connor T. and Adel Daoud}, journal={ArXiv Preprint}, year={2023}, volume={}, pages={}, publisher={} }

Owner

Name: AIandGlobalDevelopmentLab
Login: AIandGlobalDevelopmentLab
Kind: organization

Repositories: 1
Profile: https://github.com/AIandGlobalDevelopmentLab

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 9
Watch event: 2
Issue comment event: 22
Push event: 2
Fork event: 1

Last Year

Create event: 1
Release event: 1
Issues event: 9
Watch event: 2
Issue comment event: 22
Push event: 2
Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 10
Total pull requests: 0
Average time to close issues: 3 months
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 4.4
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 9
Pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 4.44
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/aiandglobaldevelopmentlab/causalimages-software

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

causalimages: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images

What is causalimages?

Package Installation

Pipeline

see:

for help, see:

for help, see also:

Image Heterogeneity Tutorial

Load in Tutorial Data

outcome, treatment, and covariate information:

image information:

plot the second band of the third image

plot the first band of the first image

Writing image corpus to tfrecord

When Loading All Images in Memory

plot image: it's always a good idea

to check the images through extensive sanity checks

such as your comparing satellite image representation

against those from OpenStreetMaps or Google Earth.

Note: You first may need to call causalimages::BuildBackend() to build the backend (done only once)

When Reading in Images from Disk

Analyzing the Sample Data

Visual Results

Numerical Results

image type treatment effect cluster means

image type treatment effect cluster standard deviations

per image treatment effect cluster probability means

per image treatment effect cluster probability standard deviations

Pointers

Acknowledgements

References

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

`causalimages`: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images

Writing image corpus to `tfrecord`