charm-study
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: meduri-ruthwick
- Default Branch: main
- Size: 42 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
🧬 ChARM – Chromatin Accessibility Retrospective Model
A machine learning model for predicting chromatin accessibility from genomic sequence features
Developed by Meduri Ruthwick & Dr. Umashankar Singh | HoMeCell Lab, IIT Gandhinagar
Powered by PARAM Ananta Supercomputing Cluster
📌 Overview
ChARM (Chromatin Accessibility Retrospective Model) is a Random Forest-based machine learning model trained on ATAC-seq data from human HEK293T cells to predict open chromatin regions using only DNA sequence features. This approach allows us to explore chromatin accessibility across 105 vertebrate genomes, especially for organisms where experimental methods like ATAC-seq are not feasible.
🎯 Purpose
Understanding open chromatin across all vertebrates is challenging due to experimental constraints. ChARM enables: - Comparative epigenomic analysis without experimental data - Insights into chromatin evolution and accessibility landscapes - In silico prioritization of accessible genomic regions for validation
🧠 Features Used in Final Model
After experimenting with 80+ feature sets, the final model uses: - GC Skew - CpG occurrences - TFBS motif occurrences (only motifs with >50% GC)
Trained on ~34,000 sequences (balanced class 0/1)
⚙️ Model Specifications
- Algorithm: Random Forest Classifier
- Library:
scikit-learn - Hyperparameter Tuning:
GridSearchCV - Parameters searched:
n_estimators: 100, 200, 300max_depth: None, 10, 20, 30min_samples_split: 2, 5, 10min_samples_leaf: 1, 2, 4max_features: sqrt, log2
📊 Performance
- ROC AUC: 0.85
- PR AUC: 0.86
- Validated across 11 independent human cell lines
- Key Visualizations:
- Confusion Matrix
- Feature Importance
- Permutation Feature Importance
🌐 Applications
- Predicted putative ATAC-like enriched regions (pAERs) across 105 vertebrate genomes
- In-depth analysis performed for primate genomes
- Enables comparative genomics and functional region identification in species without epigenomic data
📥 Input & Output
Input:
- BED file of genomic regions (e.g., ATAC peaks, summit regions)
Output:
- Prediction label (open/closed)
- Prediction probability
📜 License & Access
This repository describes the model and its applications. Code and models will be available upon request. [Academic purposes only]
👤 Authors
- Meduri Ruthwick, PhD Scholar, IIT Gandhinagar
- Dr. Umashankar Singh, Associate Professor, IIT Gandhinagar
🔬 HoMeCell Lab – Department of Biological Sciences and Engineering
Indian Institute of Technology Gandhinagar, Gujarat, India
📣 Contact
For code access or collaboration inquiries:\ 📧 [meduri.ruthwick@iitgn.ac.in | usingh@iitgn.ac.in]\ 🔗 https://github.com/ruthwick
📌 Notes
This model is part of an ongoing thesis project and is yet to be published. Please cite appropriately when referencing ChARM.
Owner
- Name: Meduri Ruthwick
- Login: meduri-ruthwick
- Kind: user
- Location: Gandhinagar, Gujarat
- Website: https://sympart.github.io/sympart/aboutme.html
- Repositories: 1
- Profile: https://github.com/meduri-ruthwick
Citation (CITATION.cff)
cff-version: 1.2.0
title: "ChARM: Chromatin Accessibility Retrospective Model [Unpublished work]"
authors:
- family-names: Meduri
given-names: Ruthwick
orcid: "0000-0003-2403-2712"
- family-names: Singh
given-names: Umashankar
orcid: "0000-0001-8578-8201"
date-released: 2025-07-10
GitHub Events
Total
- Member event: 1
- Push event: 3
Last Year
- Member event: 1
- Push event: 3