https://github.com/cwwhitney/hill_climbing_bic

Validation of expert models with hill climbing mathematical optimization technique and Bayesian information criterion

Last synced: 9 months ago · JSON representation

Repository

Validation of expert models with hill climbing mathematical optimization technique and Bayesian information criterion

Basic Info

Host: GitHub
Owner: CWWhitney
Language: TeX
Default Branch: master
Size: 1.62 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

README.Rmd

---
output: github_document
bibliography: refs/hill_climbing.bib
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Hill Climbing as test for causal model

We apply causal inference techniques, expert-elicited probabilities, and optimization algorithms, to improve decision-making for interventions aimed at enhancing livelihoods through agroforestry. We use a hill-climbing algorithm to learn the structure of a Bayesian Network (BN) based on observed data. The observed data contains information from publications, each contributing to various factors that may influence agroforestry systems and livelihoods in different regions. The goal is to use this data to infer the best network structure that best explains the dependencies among the variables in the dataset. See the details in `hill_climbing.R`.

We aim to build a predictive decision model that connects causal relationships between planting trees on farms and farmer livelihoods. The process has three main steps: 1. searching available literature to define the key causal relationships related to planting trees on farms. A Directed Acyclic Graph (DAG) will be constructed to visually represent these relationships, incorporating various factors like costs, benefits, risks, and their impact on livelihoods, 2. the causal model will be translated into a BN. Literature results will be used to fill in the Conditional Probability Tables (CPTs) for each node in the BN, providing the probability distributions based on available data, 3. once the Bayesian Network is constructed, data will be gathered (even if minimal) to test and refine the model. The hill climbing algorithm will be applied to optimize the model's parameters by adjusting them to best fit the observed data. The optimization process will focus on maximizing the model’s predictive accuracy and identifying the most likely causal relationships. The work demonstrates a robust and adaptable decision model.

```{r plot_dagitty}
  source("functions/dagitty_tree_planting.R")
# Plot the DAG
plot(dag)
```

We build the same graph in `bnlearn` for use in that environment. See all the custom CPTs in `model_in_bnlearn.R`. 

```{r plot_network_structure}
source("functions/model_in_bnlearn.R")
plot(network_structure)
```

## Perform inference 

Calculate the probability of "Livelihoods" being "Improved" given "Trees on Farm". 

```{r inference_result}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (TreeDiversity == "Yes"))
```

To validate our Bayesian Network, we can perform several tests to ensure that the model behaves as expected and that the conditional dependencies between the nodes are correctly represented. 

### Test for inconsistent Evidence

Here we introduce evidence that contradicts the dependency structure to check for the system response. A node conditioned on one state `TreeDiversity == "No"`, but the evidence `Firewood == "Yes"` conflicts with `bn_fitted`, it should return a very low or zero probability (for each iteration of the model). 

```{r inconsistent_inference}
cpquery(bn_fitted, event = (TreeDiversity == "No"), evidence = (Firewood == "Yes"))
```

### Query for Node Probabilities

Test the probability distributions of individual nodes given various evidence. For example, given evidence for Market, check the conditional probability distribution for Livelihoods.

Example for Livelihoods:

```{r inference_Livelihoods}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (Benefits == "High"))

```

This should return the probability of improved livelihoods given that the market is high.

### Sensitivity Analysis

Perform a sensitivity analysis to understand how changes in one or more variables affect the results. For example, change the probability of Firewood or Timber and see how it affects the probability of Livelihoods.

```{r sensitivity_analysis}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (Timber == "Yes"))

```

### Simulation and Comparison with Expected Results

Generate synthetic data based on the network structure and compare it with expected or known results.

```{r simulated_data}
# Simulate 1000 samples
simulated_data <- rbn(bn_fitted, n = 1000)
head(simulated_data)
```

Calculate the observed distribution of 'Livelihoods'.

```{r observed_Livelihoods}
observed_Livelihoods <- table(simulated_data$Livelihoods) / nrow(simulated_data)

observed_Livelihoods
```

Save the expectation for 'Livelihoods'. 

```{r expected_Livelihoods}
expected_Livelihoods <- c("Improved" = 0.7, "Not Improved" = 0.3)
```

Compare the observed distribution with the expected one. 

```{r observed_expected_compare}
data.frame(
  "Observed" = observed_Livelihoods,
  "Expected" = expected_Livelihoods
)
```

Calculate the distribution of 'Timber' given 'TreeDiversity' (example for other node relationships too).

```{r observed_Timber_given_TreeDiversity}
table(simulated_data$Timber, simulated_data$TreeDiversity) / nrow(simulated_data)

```

Visualize Livelihoods results. 

```{r simulated_data_plot}
library(ggplot2)

ggplot(simulated_data, aes(x = Livelihoods)) +
  geom_bar(aes(y = after_stat(prop)), stat = "count") +
  scale_y_continuous(labels = scales::percent) +
  ggtitle("Distribution of Livelihoods in Simulated Data")

```

### Hill-climbing algorithm

Learn the structure of a Bayesian network using a hill-climbing algorithm `hc`. 

The observations for the are based on reports from the literature:

Agroforestry introduces both initial and ongoing costs, including planting, labor, and pruning, which can constrain adoption, particularly for resource-poor farmers. External factors, such as market access, extension services, and credit availability, significantly influence these costs, as demonstrated in studies on farmer-managed natural regeneration in Niger (@haglund_dry_2011) and agroforestry practices in sloping lands of Asia and the Pacific (@craswell_agroforestry_1997). Bureaucratic inefficiencies and limited market alternatives also add to these challenges (@akter_agroforestry_2022).

Despite the costs, agroforestry provides substantial benefits to farmers. It enhances access to food, timber, fuelwood, and fodder, directly improving livelihood capitals, as observed in tropical moist forests in Bangladesh (@akter_agroforestry_2022). Agroforestry supports biodiversity conservation, soil fertility, and carbon sequestration, making it a key strategy for climate change mitigation and adaptation in Sub-Saharan Africa (@verchot_climate_2007; @bogale_sustainability_2023). In Ethiopia, smallholder farmers benefit from improved productivity and diversified income streams through agroforestry (@amare_agroforestry_2019), while in Kenya, agroforestry can increase household food security, particularly in regions prone to wildlife crop raiding (@quandt_agroforestry_2021). Agroforestry systems also provide resilience to environmental stressors by diversifying income sources and creating favorable microclimates (@ngango_does_2024; @bishaw_farmers_2013).

However, farmers can also face challenges such as reduced crop yields due to competition for water, nutrients, and light, as well as exposure to fluctuating market conditions (@do_adapting_2024; @akter_agroforestry_2022). These risks are compounded by adoption barriers, including insecure land tenure and lack of institutional support (@hughes_assessing_2020; @johansson_mapping_2013).

The interplay of costs, benefits, and risks ultimately determines the impact of agroforestry on livelihoods. While high costs and risks can hinder adoption and sustainability, the benefits—such as increased resilience, economic returns, and ecosystem services—can offset these challenges (@quandt_agroforestry_2021; @awazi_agroforestry_2020). External factors like cooperative memberships and extension services significantly shape the outcomes of agroforestry systems (@ngango_does_2024; @bishaw_farmers_2013). Research in Bangladesh highlights agroforestry’s positive impacts on livelihoods despite systemic inefficiencies (@akter_agroforestry_2022), while studies in Ethiopia and Kenya demonstrate agroforestry's role in reducing livelihood risks and enhancing resilience to environmental stress (@amare_agroforestry_2019; @bishaw_farmers_2013). In Cameroon, agroforestry has been shown to mitigate conflict between farmers and pastoralists, promoting social and economic stability (@awazi_agroforestry_2020). Similarly, agroforestry practices in Tanzania reveal that social and ecological factors, such as tree survival rates and community perceptions, influence the sustainability of these systems (@johansson_mapping_2013).

Agroforestry’s potential to address multiple livelihood and environmental challenges is clear, but its success depends on targeted policy interventions to reduce costs, mitigate risks, and enhance benefits, ensuring equitable access and scalability across diverse contexts.

```{r hill_climbing}
# Example with hill climbing (using bnlearn)
library(bnlearn)
```

We used the score-based structure learning algorithm from `bnlearn` to learn the structure of a Bayesian network using a hill-climbing algorithm. We used the observed data from the publications with some missing values (NA) for unobserved nodes. 

```{r observed_data}
source("data/observed_data.R")
```

Convert all the character columns from our observations into factors for the hill climbing. 

```{r convert_columns}

# Convert character columns to factors
observed_data$TreeDiversity <- as.factor(observed_data$TreeDiversity)
observed_data$Timber <- as.factor(observed_data$Timber)
observed_data$Firewood <- as.factor(observed_data$Firewood)
observed_data$Fruit <- as.factor(observed_data$Fruit)
observed_data$Market <- as.factor(observed_data$Market)
observed_data$Shade <- as.factor(observed_data$Shade)
observed_data$Habitat <- as.factor(observed_data$Habitat)
observed_data$ExternalRisks <- as.factor(observed_data$ExternalRisks)
observed_data$Costs <- as.factor(observed_data$Costs)
observed_data$Benefits <- as.factor(observed_data$Benefits)
observed_data$Livelihoods <- as.factor(observed_data$Livelihoods)
```

Plot the fitted model with the data from the papers only. 

```{r fitted_model}
source("functions/model_in_bnlearn.R")
# x in hc = the observations alone
fitted_model <- hc(observed_data)
plot(fitted_model)
```

Plot the model based on both our model structure and the literature when we use the original network structure as a `start`. This is a `class bn` object. It shows DAG and we use it to initialize the `hc` algorithm. 

```{r hill_climbing_model}
# x in hc = the observations 
# start = the original network structure 
hill_climbing_model <- hc(x= observed_data, start = network_structure)
plot(hill_climbing_model)

```

# References

Owner

Name: Cory Whitney
Login: CWWhitney
Kind: user
Location: Bonn, Germany
Company: University of Bonn

Website: https://www.zef.de/index.php?id=2232&tx_zefportal_staff[ref]=2252&tx_zefportal_staff[uid]=1799&no_cache=1
Twitter: human_ecologist
Repositories: 42
Profile: https://github.com/CWWhitney

Holistic and collaborative research processes related to decision theory, human ecology, ethno- botany/biology/ecology.

GitHub Events

Total

Issues event: 1
Push event: 10
Create event: 2

Last Year

Issues event: 1
Push event: 10
Create event: 2

Committers

Last synced: 12 months ago

All Time

Total Commits: 40
Total Committers: 1
Avg Commits per committer: 40.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 40
Committers: 1
Avg Commits per committer: 40.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
CWWhitney	w**y@g**m	40

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cwwhitney/hill_climbing_bic

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels