bicausality

A framework to infer causality on binary data using techniques in frequent pattern mining and estimation statistics. Given a set of individual vectors S={x} where x(i) is a realization value of binary variable i, the framework infers empirical causal relations of binary variables i,j from S in a form of causal graph G=(V,E).

https://github.com/darkeyes/bicausality

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary

Keywords

binary-variable causal-inference estimation-statistics exploratory-data-analysis frequent-pattern-mining
Last synced: 6 months ago · JSON representation

Repository

A framework to infer causality on binary data using techniques in frequent pattern mining and estimation statistics. Given a set of individual vectors S={x} where x(i) is a realization value of binary variable i, the framework infers empirical causal relations of binary variables i,j from S in a form of causal graph G=(V,E).

Basic Info
  • Host: GitHub
  • Owner: DarkEyes
  • License: other
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 61.3 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
binary-variable causal-inference estimation-statistics exploratory-data-analysis frequent-pattern-mining
Created over 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme License

README.md

BiCausality: Binary Causality Inference Framework

minimal R version CRAN Status Badge Download arXiv License

A framework to infer causality on binary data using techniques in frequent pattern mining and estimation statistics. Given a set of individual vectors S={x} where x(i) is a realization value of binary variable i, the framework infers empirical causal relations of binary variables i,j from S in a form of causal graph G=(V,E) where V is a set of nodes representing binary variables and there is an edge from i to j in E if the variable i causes j. The framework determines dependency among variables as well as analyzing confounding factors before deciding whether i causes j.

Note: The causal relations inferred by this work is not the real causal relations; they are empirical causal relations that needed to be validated. Our main goal is to develop an exploratory data analysis tools to pinpoint possible causal relations to support researchers before the validation in the field studies to find real causal relations.

Installation

For the newest version on github, please call the following command in R terminal.

r remotes::install_github("DarkEyes/BiCausality") This requires a user to install the "remotes" package before installing BiCausality.

Example: Inferred binary causal graph from simulation

In the first step, we generate a simulation dataset as an input. ``` r seedN<-2022

n<-200 # 200 individuals d<-10 # 10 variables mat<-matrix(nrow=n,ncol=d) # the input of framework

Simulate binary data from binomial distribution where the probability of value being 1 is 0.5.

for(i in seq(n)) { set.seed(seedN+i) mat[i,] <- rbinom(n=d, size=1, prob=0.5) }

mat[,1]<-mat[,2] | mat[,3] # 1 causes by 2 and 3 mat[,4] <-mat[,2] | mat[,5] # 4 causses by 2 and 5 mat[,6] <- mat[,1] | mat[,4] # 6 causes by 1 and 4

```

We use the following function to infer whether X causes Y. ```r

Run the function

library(BiCausality) resC<-BiCausality::CausalGraphInferMainFunc(mat = mat,CausalThs=0.1, nboot =50, IndpThs=0.05) ``` The result of the adjacency matrix of the directed causal graph is below:

r resC$CausalGRes$Ehat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 0 0 0 0 1 0 0 0 0 [2,] 1 0 0 1 0 0 0 0 0 0 [3,] 1 0 0 0 0 0 0 0 0 0 [4,] 0 0 0 0 0 1 0 0 0 0 [5,] 0 0 0 1 0 0 0 0 0 0 [6,] 0 0 0 0 0 0 0 0 0 0 [7,] 0 0 0 0 0 0 0 0 0 0 [8,] 0 0 0 0 0 0 0 0 0 0 [9,] 0 0 0 0 0 0 0 0 0 0 [10,] 0 0 0 0 0 0 0 0 0 0 The value in the element EValHat[i,j] represents that i causes j if the value is not zero. For example, EValHat[2,1] = 1 implies node 2 causes node 1, which is correct since node 1 have nodes 2 and 3 as causal nodes.

The directed causal graph also can be plot using the code below. r library(igraph) net <- graph_from_adjacency_matrix(resC$CausalGRes$Ehat ,weighted = NULL) plot(net, edge.arrow.size = 0.3, vertex.size =20 , vertex.color = '#D4C8E9',layout=layout_with_kk) The plot is below.

For the causal relation of variables 2 and 1, we can use the command below to see further information.

**Note that the odd difference between X and Y denoted oddDiff(X,Y) is define as |P (X = 1, Y = 1) P (X = 0, Y = 0) −P (X = 0, Y = 1) P (X = 1, Y = 0)|. If X is directly proportional to Y, then oddDiff(X,Y) is close to 1. If X is inverse of Y, then oddDiff(X,Y) is close to -1. If X and Y have no association, then oddDiff(X,Y) is close to zero.

r resC$CausalGRes$causalInfo[['2,1']] Suppose Y is variable 1 and X is variable 2, the results are below.

```r

This value represents the 95th percentile confidence interval of P(Y=1|X=1).

$CDirConfValInv 2.5% 97.5% 1 1

This value represents the 95th percentile confidence interval of |P(Y=1|X=1) - P(X=1|Y=1)|.

$CDirConfInv 2.5% 97.5% 0.3217322 0.4534494

This value represents the mean of |P(Y=1|X=1) - P(X=1|Y=1)|.

$CDirmean [1] 0.3787904

The test that has the null hypothesis that |P(Y=1|X=1) - P(X=1|Y=1)| below

or equal the argument of parameter "CausalThs" and the alternative hypothesis

is that |P(Y=1|X=1) - P(X=1|Y=1)| is greater than "CausalThs".

$testRes2

Wilcoxon signed rank test with continuity correction

data: abs(bCausalDirDist) V = 1275, p-value = 3.893e-10 alternative hypothesis: true location is greater than 0.1

The test that has the null hypothesis that |oddDiff(X,Y)| below

or equal the argument of parameter "IndpThs" and the alternative hypothesis is

that |oddDiff(X,Y)| is greater than "IndpThs".

$testRes1

Wilcoxon signed rank test with continuity correction

data: abs(bSignDist) V = 1275, p-value = 3.894e-10 alternative hypothesis: true location is greater than 0.05

If the test above rejects the null hypothesis with the significance threshold

alpha (default alpha=0.05), then the value "sign=1", otherwise, it is zero.

$sign [1] 1

This value represents the 95th percentile confidence interval of oddDiff(X,Y)

$SignConfInv 2.5% 97.5% 0.08670325 0.13693900

This value represents the mean of oddDiff(X,Y)

$Signmean [1] 0.1082242 ```

Citation

Amornbunchornvej, Chainarong, Navaporn Surasvadi, Anon Plangprasopchok, and Suttipong Thajchayapong. "Framework for inferring empirical causal graphs from binary data to support multidimensional poverty analysis." Heliyon 9, no. 5 (2023): e15947. https://doi.org/10.1016/j.heliyon.2023.e15947 arXiv.

Contact

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 244 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
cran.r-project.org: BiCausality

Binary Causality Inference Framework

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 244 Last month
Rankings
Forks count: 21.9%
Stargazers count: 28.5%
Dependent packages count: 29.8%
Average: 35.0%
Dependent repos count: 35.5%
Downloads: 59.4%
Maintainers (1)
Last synced: about 1 year ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • igraph * suggests
  • knitr * suggests
  • markdown * suggests
  • rmarkdown * suggests