stable-tree-algorithm-for-suicide-risk-identification-in-youth-experiencing-homelessness-yeh-

Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)

https://github.com/mishkin101/stable-tree-algorithm-for-suicide-risk-identification-in-youth-experiencing-homelessness-yeh-

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)

Basic Info

Host: GitHub
Owner: mishkin101
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 162 MB

Statistics

Stars: 2
Watchers: 2
Forks: 0
Open Issues: 22
Releases: 0

Created about 1 year ago · Last pushed 12 months ago

Metadata Files

Readme Citation

Stable Decision Tree Method for Predicting Suicidal Ideation for At-Risk Homeless Youth

This project implements the stable decision tree algorithm based on the method outline in "Improving Stability in Decision Tree Models"[^1] that presentas a unique distace metric for heuritic-based decision trees as a measure of stability. The algorithm produces a Pareto optimal set from which a single final optimal tree is selected according to an objective function targeting a unique metric to optimize (AUC, distance, combined, etc.). Our Implementation attempts to improve upon previous work[^2] in creating an effective method to identify suicide risk among youth experiencing homelessness(YEH). The dataset used in this implementation presents a unique contribution to considering social network features as well as the individual factors in building risk profiles.

The distance metric implementation used in the code may be found as a reference below.[^3]

We reproduce the original aggregation metrics from the Bertimas paper, comparing different Pareto optimal tree selection strategies and a downsampled variant. Results can be found in the Experiments folder

A write-up of the project and experiment design can be found here: docs/Bertimas-Report-Final.pdf

[^1]: Improving Stability in Decision Tree Models

[^2]:"Getting to the Root of the Problem: A Decision-Tree Analysis for Suicide Risk Among Young People Experiencing Homelessness"

[^3]: Path Distance Metric Repository from Stable Decision Tree Algorithm

Commands to run

```bash uv run src/StableTree/main.py --group-name FINALaggregateoutput --option experiment --datasets data/DataSetCombinedSISNIBaselineFE.csv data/DataSetCombinedSISNIBaselineFE.csv data/breast_cancer.csv --labels suicidea suicattempt target

uv run src/StableTree/main.py --group-name finalaggregateoutputalldatasets --option plot --datasets data/DataSetCombinedSISNIBaselineFE.csv data/breastcancer.csv
```

Setup python & env

install uv curl -LsSf https://astral.sh/uv/install.sh | sh 33 brew install graphviz #graphviz binaries for pydotplus
cd to source directory; file(s) using UV cd suicide_project uv venv # only first time source /bin/activate uv sync uv run run8.py uv run run9.py

## Terminal Example:

Running for dataset DataSetCombinedSISNIBaseline_FE with seed 42

================================================== dsnameDataSetCombinedSISNIBaselineFE.csv

Experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFEsuicidea - Seed: 42 - Dataset: DataSetCombinedSISNIBaselineFE

Number of samples in the full dataset: 586

Number of samples in the training set: 726

Number of samples in the test set: 242

Shape of training set: (726, 56)

Shape of random split: (363, 56), (363,)

Number of trees in T0: 20

Number of trees in T: 20

Computing average tree distances || 20/20 [100%] in 20.7s (0.96/s)

Number of distances computed: 20

Average AUC score: 0.821854723038044

Number of Pareto optimal trees: 7

Frequenicies of top 2 common features: [[('traumasum', 70.0), ('fight', 20.0)], [('harddruglife', 45.0), ('exchange', 15.0)], [('LEAFNODE', 25.0), ('harddruglife', 20.0)]]

Selected stability-accuracy trade-off final tree index: 1

Stability-accuracy tree depth: 4, nodes: 23

Selected AUC maximizing tree index: 1

AUC-maximizing tree depth: 4, nodes: 23

Selected distance minimizing tree index: 15

Distance-minimizing tree depth: 11, nodes: 79

Completed experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFE_suicidea

References:

Owner

Login: mishkin101
Kind: user

Repositories: 2
Profile: https://github.com/mishkin101

GitHub Events

Total

Push event: 17
Create event: 1

Last Year

Push event: 17
Create event: 1

Dependencies

.github/workflows/ci.yml actions

actions/cache v4 composite
actions/checkout v4 composite
actions/setup-python v5 composite

pyproject.toml pypi

alive-progress >=3.2.0
gurobipy >=12.0.1
imbalanced-learn >=0.13.0
itables >=2.3.0
joblib >=1.4.2
jupyter-cache >=1.0.1
jupyterlab-rise >=0.43.1
matplotlib >=3.10.1
mkdocs-jupyter >=0.25.1
mkdocs-material >=9.6.12
notebook >=7.3.3
numpy >=2.2.4
pandas >=2.2.3
pydotplus >=2.0.2
scikit-learn >=1.6.1
six >=1.17.0
skimpy >=0.0.18
tabulate >=0.9.0
tqdm >=4.67.1

src/dt-distance/setup.py pypi

numpy *
pandas *
scikit-learn *
scipy *

suicide_project/dt_distance_repo/setup.py pypi

numpy *
pandas *
scikit-learn *
scipy *

uv.lock pypi

155 dependencies

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science