charmark-biomarker-discovery
Python toolkit for character-level Markov modeling and linguistic biomarker discovery in dementia
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 11 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Python toolkit for character-level Markov modeling and linguistic biomarker discovery in dementia
Basic Info
- Host: GitHub
- Owner: JKEVIN2010
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 199 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
CharMark: Unmasking Cognitive Decline in Everyday Speech
Imagine a future where your casual conversation, describing a picture or chatting with a friend, could reveal the earliest whispers of cognitive change.
CharMark brings that future a step closer. It’s a lightweight Python toolkit that turns plain text transcripts into interpretable “fingerprints” of your speech, helping researchers and clinicians spot subtle signs of dementia before they become unmissable.
Why CharMark?
Traditional speech analyses dive into audio waves or complex neural nets—powerful, but often opaque. CharMark asks a different question:
“At the level of single characters and pauses, how does language flow?”
By building a first-order Markov chain from your transcript, then computing its steady-state probabilities, CharMark captures micro-patterns—like hesitation in pauses or recurring letter sequences—that can serve as early digital biomarkers of cognitive decline.
Key Features
- Clean & Simple: Load your CSV of transcripts, call a few methods, and get back matrices of interpretable features.
- Interpretable: Character-level probabilities, Kolmogorov-Smirnov tests, PCA plots, and network visualizations lay bare why two groups differ.
- End-to-End Pipeline: From raw text cleaning to unsupervised clustering and Lasso-based validation, everything’s in one place.
- Lightweight: No massive datasets or GPUs required—just Python and your transcripts.
Quickstart
```bash git clone https://github.com/jkevin2010/charmark-biomarker-discovery.git cd charmark-biomarker-discovery pip install -r requirements.txt
```
Example Notebook
Want to see CharMark in action?
Check out example.ipynb for a step-by-step demo using a toy dataset.
This notebook walks you through:
1. Creating a sample CSV
2. Extracting steady-state features
3. Running PCA, KS tests, and Lasso validation
4. Visualizing the character transition network
Cite Us
If you use CharMark in your research, please cite:
Mekulu K, Aqlan F, Yang H (2025). CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia. Preprint.
DOI: 10.21203/rs.3.rs-6391300/v1
BibTeX: ```bibtex @misc{Mekulu2025CharMark, author = {Kevin Mekulu and Faisal Aqlan and Hui Yang}, title = {CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia}, year = {2025}, doi = {10.21203/rs.3.rs-6391300/v1}, url = {https://doi.org/10.21203/rs.3.rs-6391300/v1}, note = {Preprint on Research Square} }
Owner
- Name: Kevin Mekulu
- Login: JKEVIN2010
- Kind: user
- Repositories: 1
- Profile: https://github.com/JKEVIN2010
Risk more
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite the following:" authors: - family-names: Mekulu given-names: Kevin - family-names: Aqlan given-names: Faisal - family-names: Yang given-names: Hui title: "CharMark: Character-Level Markov Modeling to Detect Linguistic Signs of Dementia" version: "0.1.0" doi: "10.21203/rs.3.rs-6391300/v1" date-released: "2025-05-02" url: "https://doi.org/10.21203/rs.3.rs-6391300/v1"
GitHub Events
Total
- Release event: 1
- Push event: 9
Last Year
- Release event: 1
- Push event: 9
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Kevin Mekulu | j****j@g****m | 22 |
Issues and Pull Requests
Last synced: 7 months ago
Dependencies
- matplotlib >=3.5.0
- networkx >=2.6.0
- numpy >=1.21.0
- pandas >=1.3.0
- scikit-learn >=1.0.0
- scipy >=1.7.0