charmark-biomarker-discovery

Python toolkit for character-level Markov modeling and linguistic biomarker discovery in dementia

https://github.com/jkevin2010/charmark-biomarker-discovery

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 11 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Python toolkit for character-level Markov modeling and linguistic biomarker discovery in dementia

Basic Info
  • Host: GitHub
  • Owner: JKEVIN2010
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 199 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

CharMark: Unmasking Cognitive Decline in Everyday Speech

License: MIT Open in Colab CI DOI medRxiv

Imagine a future where your casual conversation, describing a picture or chatting with a friend, could reveal the earliest whispers of cognitive change.
CharMark brings that future a step closer. It’s a lightweight Python toolkit that turns plain text transcripts into interpretable “fingerprints” of your speech, helping researchers and clinicians spot subtle signs of dementia before they become unmissable.


Why CharMark?

Traditional speech analyses dive into audio waves or complex neural nets—powerful, but often opaque. CharMark asks a different question:

“At the level of single characters and pauses, how does language flow?”

By building a first-order Markov chain from your transcript, then computing its steady-state probabilities, CharMark captures micro-patterns—like hesitation in pauses or recurring letter sequences—that can serve as early digital biomarkers of cognitive decline.


Key Features

  • Clean & Simple: Load your CSV of transcripts, call a few methods, and get back matrices of interpretable features.
  • Interpretable: Character-level probabilities, Kolmogorov-Smirnov tests, PCA plots, and network visualizations lay bare why two groups differ.
  • End-to-End Pipeline: From raw text cleaning to unsupervised clustering and Lasso-based validation, everything’s in one place.
  • Lightweight: No massive datasets or GPUs required—just Python and your transcripts.

Quickstart

```bash git clone https://github.com/jkevin2010/charmark-biomarker-discovery.git cd charmark-biomarker-discovery pip install -r requirements.txt

```

Example Notebook

Want to see CharMark in action?
Check out example.ipynb for a step-by-step demo using a toy dataset.

This notebook walks you through:
1. Creating a sample CSV
2. Extracting steady-state features
3. Running PCA, KS tests, and Lasso validation
4. Visualizing the character transition network


Cite Us

If you use CharMark in your research, please cite:

Mekulu K, Aqlan F, Yang H (2025). CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia. Preprint.
DOI: 10.21203/rs.3.rs-6391300/v1

BibTeX: ```bibtex @misc{Mekulu2025CharMark, author = {Kevin Mekulu and Faisal Aqlan and Hui Yang}, title = {CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia}, year = {2025}, doi = {10.21203/rs.3.rs-6391300/v1}, url = {https://doi.org/10.21203/rs.3.rs-6391300/v1}, note = {Preprint on Research Square} }

Owner

  • Name: Kevin Mekulu
  • Login: JKEVIN2010
  • Kind: user

Risk more

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the following:"
authors:
- family-names: Mekulu
given-names: Kevin
- family-names: Aqlan
given-names: Faisal
- family-names: Yang
given-names: Hui
title: "CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia"
version: "0.1.0"
doi: "10.21203/rs.3.rs-6391300/v1"
date-released: "2025-05-02"
url: "https://doi.org/10.21203/rs.3.rs-6391300/v1"

GitHub Events

Total
  • Release event: 1
  • Push event: 9
Last Year
  • Release event: 1
  • Push event: 9

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 22
  • Total Committers: 1
  • Avg Commits per committer: 22.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 22
  • Committers: 1
  • Avg Commits per committer: 22.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Kevin Mekulu j****j@g****m 22

Issues and Pull Requests

Last synced: 7 months ago


Dependencies

requirements.txt pypi
  • matplotlib >=3.5.0
  • networkx >=2.6.0
  • numpy >=1.21.0
  • pandas >=1.3.0
  • scikit-learn >=1.0.0
  • scipy >=1.7.0