stable-tree-algorithm-for-suicide-risk-identification-in-youth-experiencing-homelessness-yeh-
Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Repository
Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)
Basic Info
Statistics
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 22
- Releases: 0
Metadata Files
README.md
Stable Decision Tree Method for Predicting Suicidal Ideation for At-Risk Homeless Youth
This project implements the stable decision tree algorithm based on the method outline in "Improving Stability in Decision Tree Models"[^1] that presentas a unique distace metric for heuritic-based decision trees as a measure of stability. The algorithm produces a Pareto optimal set from which a single final optimal tree is selected according to an objective function targeting a unique metric to optimize (AUC, distance, combined, etc.). Our Implementation attempts to improve upon previous work[^2] in creating an effective method to identify suicide risk among youth experiencing homelessness(YEH). The dataset used in this implementation presents a unique contribution to considering social network features as well as the individual factors in building risk profiles.
The distance metric implementation used in the code may be found as a reference below.[^3]
We reproduce the original aggregation metrics from the Bertimas paper, comparing different Pareto optimal tree selection strategies and a downsampled variant. Results can be found in the Experiments folder
A write-up of the project and experiment design can be found here: docs/Bertimas-Report-Final.pdf
[^1]: Improving Stability in Decision Tree Models
[^3]: Path Distance Metric Repository from Stable Decision Tree Algorithm
Commands to run
```bash uv run src/StableTree/main.py --group-name FINALaggregateoutput --option experiment --datasets data/DataSetCombinedSISNIBaselineFE.csv data/DataSetCombinedSISNIBaselineFE.csv data/breast_cancer.csv --labels suicidea suicattempt target
uv run src/StableTree/main.py --group-name finalaggregateoutputalldatasets --option plot --datasets data/DataSetCombinedSISNIBaselineFE.csv data/breastcancer.csv
```
Setup python & env
install uv
curl -LsSf https://astral.sh/uv/install.sh | sh 33 brew install graphviz #graphviz binaries for pydotpluscdto source directory; file(s) using UVcd suicide_project uv venv # only first time source /bin/activate uv sync uv run run8.py uv run run9.py
## Terminal Example:
Running for dataset DataSetCombinedSISNIBaseline_FE with seed 42
================================================== dsnameDataSetCombinedSISNIBaselineFE.csv
Experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFEsuicidea - Seed: 42 - Dataset: DataSetCombinedSISNIBaselineFE
Number of samples in the full dataset: 586
Number of samples in the training set: 726
Number of samples in the test set: 242
Shape of training set: (726, 56)
Shape of random split: (363, 56), (363,)
Number of trees in T0: 20
Number of trees in T: 20
Computing average tree distances || 20/20 [100%] in 20.7s (0.96/s)
Number of distances computed: 20
Average AUC score: 0.821854723038044
Number of Pareto optimal trees: 7
Frequenicies of top 2 common features: [[('traumasum', 70.0), ('fight', 20.0)], [('harddruglife', 45.0), ('exchange', 15.0)], [('LEAFNODE', 25.0), ('harddruglife', 20.0)]]
Selected stability-accuracy trade-off final tree index: 1
Stability-accuracy tree depth: 4, nodes: 23
Selected AUC maximizing tree index: 1
AUC-maximizing tree depth: 4, nodes: 23
Selected distance minimizing tree index: 15
Distance-minimizing tree depth: 11, nodes: 79
Completed experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFE_suicidea
References:
Owner
- Login: mishkin101
- Kind: user
- Repositories: 2
- Profile: https://github.com/mishkin101
GitHub Events
Total
- Push event: 17
- Create event: 1
Last Year
- Push event: 17
- Create event: 1
Dependencies
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- alive-progress >=3.2.0
- gurobipy >=12.0.1
- imbalanced-learn >=0.13.0
- itables >=2.3.0
- joblib >=1.4.2
- jupyter-cache >=1.0.1
- jupyterlab-rise >=0.43.1
- matplotlib >=3.10.1
- mkdocs-jupyter >=0.25.1
- mkdocs-material >=9.6.12
- notebook >=7.3.3
- numpy >=2.2.4
- pandas >=2.2.3
- pydotplus >=2.0.2
- scikit-learn >=1.6.1
- six >=1.17.0
- skimpy >=0.0.18
- tabulate >=0.9.0
- tqdm >=4.67.1
- numpy *
- pandas *
- scikit-learn *
- scipy *
- numpy *
- pandas *
- scikit-learn *
- scipy *
- 155 dependencies