stable-tree-algorithm-for-suicide-risk-identification-in-youth-experiencing-homelessness-yeh-

Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)

https://github.com/mishkin101/stable-tree-algorithm-for-suicide-risk-identification-in-youth-experiencing-homelessness-yeh-

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Implementation of stable decision tree algorithm based on novel distance metric in "Improving Stability in Decision Tree Models" (Bertimas, 2023)

Basic Info
  • Host: GitHub
  • Owner: mishkin101
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 162 MB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 22
  • Releases: 0
Created 12 months ago · Last pushed 9 months ago
Metadata Files
Readme Citation

README.md

Stable Decision Tree Method for Predicting Suicidal Ideation for At-Risk Homeless Youth

This project implements the stable decision tree algorithm based on the method outline in "Improving Stability in Decision Tree Models"[^1] that presentas a unique distace metric for heuritic-based decision trees as a measure of stability. The algorithm produces a Pareto optimal set from which a single final optimal tree is selected according to an objective function targeting a unique metric to optimize (AUC, distance, combined, etc.). Our Implementation attempts to improve upon previous work[^2] in creating an effective method to identify suicide risk among youth experiencing homelessness(YEH). The dataset used in this implementation presents a unique contribution to considering social network features as well as the individual factors in building risk profiles.

The distance metric implementation used in the code may be found as a reference below.[^3]

We reproduce the original aggregation metrics from the Bertimas paper, comparing different Pareto optimal tree selection strategies and a downsampled variant. Results can be found in the Experiments folder

A write-up of the project and experiment design can be found here: docs/Bertimas-Report-Final.pdf

[^1]: Improving Stability in Decision Tree Models

[^2]:"Getting to the Root of the Problem: A Decision-Tree Analysis for Suicide Risk Among Young People Experiencing Homelessness"

[^3]: Path Distance Metric Repository from Stable Decision Tree Algorithm

Commands to run

```bash uv run src/StableTree/main.py --group-name FINALaggregateoutput --option experiment --datasets data/DataSetCombinedSISNIBaselineFE.csv data/DataSetCombinedSISNIBaselineFE.csv data/breast_cancer.csv --labels suicidea suicattempt target

uv run src/StableTree/main.py --group-name finalaggregateoutputalldatasets --option plot --datasets data/DataSetCombinedSISNIBaselineFE.csv data/breastcancer.csv
```

Setup python & env

  1. install uv curl -LsSf https://astral.sh/uv/install.sh | sh 33 brew install graphviz #graphviz binaries for pydotplus

  2. cd to source directory; file(s) using UV cd suicide_project uv venv # only first time source /bin/activate uv sync uv run run8.py uv run run9.py

## Terminal Example:

Running for dataset DataSetCombinedSISNIBaseline_FE with seed 42

================================================== dsnameDataSetCombinedSISNIBaselineFE.csv

Experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFEsuicidea - Seed: 42 - Dataset: DataSetCombinedSISNIBaselineFE

Number of samples in the full dataset: 586

Number of samples in the training set: 726

Number of samples in the test set: 242

Shape of training set: (726, 56)

Shape of random split: (363, 56), (363,)

Number of trees in T0: 20

Number of trees in T: 20

Computing average tree distances || 20/20 [100%] in 20.7s (0.96/s)

Number of distances computed: 20

Average AUC score: 0.821854723038044

Number of Pareto optimal trees: 7

Frequenicies of top 2 common features: [[('traumasum', 70.0), ('fight', 20.0)], [('harddruglife', 45.0), ('exchange', 15.0)], [('LEAFNODE', 25.0), ('harddruglife', 20.0)]]

Selected stability-accuracy trade-off final tree index: 1

Stability-accuracy tree depth: 4, nodes: 23

Selected AUC maximizing tree index: 1

AUC-maximizing tree depth: 4, nodes: 23

Selected distance minimizing tree index: 15

Distance-minimizing tree depth: 11, nodes: 79

Completed experiment: experiment20250501134848seed42DataSetCombinedSISNIBaselineFE_suicidea

References:

Owner

  • Login: mishkin101
  • Kind: user

GitHub Events

Total
  • Push event: 17
  • Create event: 1
Last Year
  • Push event: 17
  • Create event: 1

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v4 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
pyproject.toml pypi
  • alive-progress >=3.2.0
  • gurobipy >=12.0.1
  • imbalanced-learn >=0.13.0
  • itables >=2.3.0
  • joblib >=1.4.2
  • jupyter-cache >=1.0.1
  • jupyterlab-rise >=0.43.1
  • matplotlib >=3.10.1
  • mkdocs-jupyter >=0.25.1
  • mkdocs-material >=9.6.12
  • notebook >=7.3.3
  • numpy >=2.2.4
  • pandas >=2.2.3
  • pydotplus >=2.0.2
  • scikit-learn >=1.6.1
  • six >=1.17.0
  • skimpy >=0.0.18
  • tabulate >=0.9.0
  • tqdm >=4.67.1
src/dt-distance/setup.py pypi
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
suicide_project/dt_distance_repo/setup.py pypi
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
uv.lock pypi
  • 155 dependencies