https://github.com/cwwhitney/hill_climbing_bic
Validation of expert models with hill climbing mathematical optimization technique and Bayesian information criterion
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.1%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
Validation of expert models with hill climbing mathematical optimization technique and Bayesian information criterion
Basic Info
- Host: GitHub
- Owner: CWWhitney
- Language: TeX
- Default Branch: master
- Size: 1.62 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Created about 1 year ago
· Last pushed 11 months ago
Metadata Files
Readme
README.Rmd
---
output: github_document
bibliography: refs/hill_climbing.bib
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# Hill Climbing as test for causal model
We apply causal inference techniques, expert-elicited probabilities, and optimization algorithms, to improve decision-making for interventions aimed at enhancing livelihoods through agroforestry. We use a hill-climbing algorithm to learn the structure of a Bayesian Network (BN) based on observed data. The observed data contains information from publications, each contributing to various factors that may influence agroforestry systems and livelihoods in different regions. The goal is to use this data to infer the best network structure that best explains the dependencies among the variables in the dataset. See the details in `hill_climbing.R`.
We aim to build a predictive decision model that connects causal relationships between planting trees on farms and farmer livelihoods. The process has three main steps: 1. searching available literature to define the key causal relationships related to planting trees on farms. A Directed Acyclic Graph (DAG) will be constructed to visually represent these relationships, incorporating various factors like costs, benefits, risks, and their impact on livelihoods, 2. the causal model will be translated into a BN. Literature results will be used to fill in the Conditional Probability Tables (CPTs) for each node in the BN, providing the probability distributions based on available data, 3. once the Bayesian Network is constructed, data will be gathered (even if minimal) to test and refine the model. The hill climbing algorithm will be applied to optimize the model's parameters by adjusting them to best fit the observed data. The optimization process will focus on maximizing the model’s predictive accuracy and identifying the most likely causal relationships. The work demonstrates a robust and adaptable decision model.
```{r plot_dagitty}
source("functions/dagitty_tree_planting.R")
# Plot the DAG
plot(dag)
```
We build the same graph in `bnlearn` for use in that environment. See all the custom CPTs in `model_in_bnlearn.R`.
```{r plot_network_structure}
source("functions/model_in_bnlearn.R")
plot(network_structure)
```
## Perform inference
Calculate the probability of "Livelihoods" being "Improved" given "Trees on Farm".
```{r inference_result}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (TreeDiversity == "Yes"))
```
To validate our Bayesian Network, we can perform several tests to ensure that the model behaves as expected and that the conditional dependencies between the nodes are correctly represented.
### Test for inconsistent Evidence
Here we introduce evidence that contradicts the dependency structure to check for the system response. A node conditioned on one state `TreeDiversity == "No"`, but the evidence `Firewood == "Yes"` conflicts with `bn_fitted`, it should return a very low or zero probability (for each iteration of the model).
```{r inconsistent_inference}
cpquery(bn_fitted, event = (TreeDiversity == "No"), evidence = (Firewood == "Yes"))
```
### Query for Node Probabilities
Test the probability distributions of individual nodes given various evidence. For example, given evidence for Market, check the conditional probability distribution for Livelihoods.
Example for Livelihoods:
```{r inference_Livelihoods}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (Benefits == "High"))
```
This should return the probability of improved livelihoods given that the market is high.
### Sensitivity Analysis
Perform a sensitivity analysis to understand how changes in one or more variables affect the results. For example, change the probability of Firewood or Timber and see how it affects the probability of Livelihoods.
```{r sensitivity_analysis}
cpquery(bn_fitted, event = (Livelihoods == "Improved"), evidence = (Timber == "Yes"))
```
### Simulation and Comparison with Expected Results
Generate synthetic data based on the network structure and compare it with expected or known results.
```{r simulated_data}
# Simulate 1000 samples
simulated_data <- rbn(bn_fitted, n = 1000)
head(simulated_data)
```
Calculate the observed distribution of 'Livelihoods'.
```{r observed_Livelihoods}
observed_Livelihoods <- table(simulated_data$Livelihoods) / nrow(simulated_data)
observed_Livelihoods
```
Save the expectation for 'Livelihoods'.
```{r expected_Livelihoods}
expected_Livelihoods <- c("Improved" = 0.7, "Not Improved" = 0.3)
```
Compare the observed distribution with the expected one.
```{r observed_expected_compare}
data.frame(
"Observed" = observed_Livelihoods,
"Expected" = expected_Livelihoods
)
```
Calculate the distribution of 'Timber' given 'TreeDiversity' (example for other node relationships too).
```{r observed_Timber_given_TreeDiversity}
table(simulated_data$Timber, simulated_data$TreeDiversity) / nrow(simulated_data)
```
Visualize Livelihoods results.
```{r simulated_data_plot}
library(ggplot2)
ggplot(simulated_data, aes(x = Livelihoods)) +
geom_bar(aes(y = after_stat(prop)), stat = "count") +
scale_y_continuous(labels = scales::percent) +
ggtitle("Distribution of Livelihoods in Simulated Data")
```
### Hill-climbing algorithm
Learn the structure of a Bayesian network using a hill-climbing algorithm `hc`.
The observations for the are based on reports from the literature:
Agroforestry introduces both initial and ongoing costs, including planting, labor, and pruning, which can constrain adoption, particularly for resource-poor farmers. External factors, such as market access, extension services, and credit availability, significantly influence these costs, as demonstrated in studies on farmer-managed natural regeneration in Niger (@haglund_dry_2011) and agroforestry practices in sloping lands of Asia and the Pacific (@craswell_agroforestry_1997). Bureaucratic inefficiencies and limited market alternatives also add to these challenges (@akter_agroforestry_2022).
Despite the costs, agroforestry provides substantial benefits to farmers. It enhances access to food, timber, fuelwood, and fodder, directly improving livelihood capitals, as observed in tropical moist forests in Bangladesh (@akter_agroforestry_2022). Agroforestry supports biodiversity conservation, soil fertility, and carbon sequestration, making it a key strategy for climate change mitigation and adaptation in Sub-Saharan Africa (@verchot_climate_2007; @bogale_sustainability_2023). In Ethiopia, smallholder farmers benefit from improved productivity and diversified income streams through agroforestry (@amare_agroforestry_2019), while in Kenya, agroforestry can increase household food security, particularly in regions prone to wildlife crop raiding (@quandt_agroforestry_2021). Agroforestry systems also provide resilience to environmental stressors by diversifying income sources and creating favorable microclimates (@ngango_does_2024; @bishaw_farmers_2013).
However, farmers can also face challenges such as reduced crop yields due to competition for water, nutrients, and light, as well as exposure to fluctuating market conditions (@do_adapting_2024; @akter_agroforestry_2022). These risks are compounded by adoption barriers, including insecure land tenure and lack of institutional support (@hughes_assessing_2020; @johansson_mapping_2013).
The interplay of costs, benefits, and risks ultimately determines the impact of agroforestry on livelihoods. While high costs and risks can hinder adoption and sustainability, the benefits—such as increased resilience, economic returns, and ecosystem services—can offset these challenges (@quandt_agroforestry_2021; @awazi_agroforestry_2020). External factors like cooperative memberships and extension services significantly shape the outcomes of agroforestry systems (@ngango_does_2024; @bishaw_farmers_2013). Research in Bangladesh highlights agroforestry’s positive impacts on livelihoods despite systemic inefficiencies (@akter_agroforestry_2022), while studies in Ethiopia and Kenya demonstrate agroforestry's role in reducing livelihood risks and enhancing resilience to environmental stress (@amare_agroforestry_2019; @bishaw_farmers_2013). In Cameroon, agroforestry has been shown to mitigate conflict between farmers and pastoralists, promoting social and economic stability (@awazi_agroforestry_2020). Similarly, agroforestry practices in Tanzania reveal that social and ecological factors, such as tree survival rates and community perceptions, influence the sustainability of these systems (@johansson_mapping_2013).
Agroforestry’s potential to address multiple livelihood and environmental challenges is clear, but its success depends on targeted policy interventions to reduce costs, mitigate risks, and enhance benefits, ensuring equitable access and scalability across diverse contexts.
```{r hill_climbing}
# Example with hill climbing (using bnlearn)
library(bnlearn)
```
We used the score-based structure learning algorithm from `bnlearn` to learn the structure of a Bayesian network using a hill-climbing algorithm. We used the observed data from the publications with some missing values (NA) for unobserved nodes.
```{r observed_data}
source("data/observed_data.R")
```
Convert all the character columns from our observations into factors for the hill climbing.
```{r convert_columns}
# Convert character columns to factors
observed_data$TreeDiversity <- as.factor(observed_data$TreeDiversity)
observed_data$Timber <- as.factor(observed_data$Timber)
observed_data$Firewood <- as.factor(observed_data$Firewood)
observed_data$Fruit <- as.factor(observed_data$Fruit)
observed_data$Market <- as.factor(observed_data$Market)
observed_data$Shade <- as.factor(observed_data$Shade)
observed_data$Habitat <- as.factor(observed_data$Habitat)
observed_data$ExternalRisks <- as.factor(observed_data$ExternalRisks)
observed_data$Costs <- as.factor(observed_data$Costs)
observed_data$Benefits <- as.factor(observed_data$Benefits)
observed_data$Livelihoods <- as.factor(observed_data$Livelihoods)
```
Plot the fitted model with the data from the papers only.
```{r fitted_model}
source("functions/model_in_bnlearn.R")
# x in hc = the observations alone
fitted_model <- hc(observed_data)
plot(fitted_model)
```
Plot the model based on both our model structure and the literature when we use the original network structure as a `start`. This is a `class bn` object. It shows DAG and we use it to initialize the `hc` algorithm.
```{r hill_climbing_model}
# x in hc = the observations
# start = the original network structure
hill_climbing_model <- hc(x= observed_data, start = network_structure)
plot(hill_climbing_model)
```
# References
Owner
- Name: Cory Whitney
- Login: CWWhitney
- Kind: user
- Location: Bonn, Germany
- Company: University of Bonn
- Website: https://www.zef.de/index.php?id=2232&tx_zefportal_staff[ref]=2252&tx_zefportal_staff[uid]=1799&no_cache=1
- Twitter: human_ecologist
- Repositories: 42
- Profile: https://github.com/CWWhitney
Holistic and collaborative research processes related to decision theory, human ecology, ethno- botany/biology/ecology.
GitHub Events
Total
- Issues event: 1
- Push event: 10
- Create event: 2
Last Year
- Issues event: 1
- Push event: 10
- Create event: 2
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- CWWhitney (1)
Pull Request Authors
Top Labels
Issue Labels
documentation (1)
enhancement (1)
help wanted (1)