Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: meduri-ruthwick
  • Default Branch: main
  • Size: 42 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 7 months ago
Metadata Files
Readme Citation

README.md

🧬 ChARM – Chromatin Accessibility Retrospective Model

A machine learning model for predicting chromatin accessibility from genomic sequence features
Developed by Meduri Ruthwick & Dr. Umashankar Singh | HoMeCell Lab, IIT Gandhinagar
Powered by PARAM Ananta Supercomputing Cluster

HoMeCell Lab Logo     IIT Gandhinagar Logo     ChARM Logo


📌 Overview

ChARM (Chromatin Accessibility Retrospective Model) is a Random Forest-based machine learning model trained on ATAC-seq data from human HEK293T cells to predict open chromatin regions using only DNA sequence features. This approach allows us to explore chromatin accessibility across 105 vertebrate genomes, especially for organisms where experimental methods like ATAC-seq are not feasible.


🎯 Purpose

Understanding open chromatin across all vertebrates is challenging due to experimental constraints. ChARM enables: - Comparative epigenomic analysis without experimental data - Insights into chromatin evolution and accessibility landscapes - In silico prioritization of accessible genomic regions for validation


🧠 Features Used in Final Model

After experimenting with 80+ feature sets, the final model uses: - GC Skew - CpG occurrences - TFBS motif occurrences (only motifs with >50% GC)

Trained on ~34,000 sequences (balanced class 0/1)


⚙️ Model Specifications

  • Algorithm: Random Forest Classifier
  • Library: scikit-learn
  • Hyperparameter Tuning: GridSearchCV
  • Parameters searched:
    • n_estimators: 100, 200, 300
    • max_depth: None, 10, 20, 30
    • min_samples_split: 2, 5, 10
    • min_samples_leaf: 1, 2, 4
    • max_features: sqrt, log2

📊 Performance

  • ROC AUC: 0.85
  • PR AUC: 0.86
  • Validated across 11 independent human cell lines
  • Key Visualizations:
    • Confusion Matrix
    • Feature Importance
    • Permutation Feature Importance

🌐 Applications

  • Predicted putative ATAC-like enriched regions (pAERs) across 105 vertebrate genomes
  • In-depth analysis performed for primate genomes
  • Enables comparative genomics and functional region identification in species without epigenomic data

📥 Input & Output

Input:
- BED file of genomic regions (e.g., ATAC peaks, summit regions)

Output:
- Prediction label (open/closed) - Prediction probability


📜 License & Access

This repository describes the model and its applications. Code and models will be available upon request. [Academic purposes only]


👤 Authors

  • Meduri Ruthwick, PhD Scholar, IIT Gandhinagar
  • Dr. Umashankar Singh, Associate Professor, IIT Gandhinagar

🔬 HoMeCell Lab – Department of Biological Sciences and Engineering

Indian Institute of Technology Gandhinagar, Gujarat, India


📣 Contact

For code access or collaboration inquiries:\ 📧 [meduri.ruthwick@iitgn.ac.in | usingh@iitgn.ac.in]\ 🔗 https://github.com/ruthwick


📌 Notes

This model is part of an ongoing thesis project and is yet to be published. Please cite appropriately when referencing ChARM.


Owner

  • Name: Meduri Ruthwick
  • Login: meduri-ruthwick
  • Kind: user
  • Location: Gandhinagar, Gujarat

Citation (CITATION.cff)

cff-version: 1.2.0
title: "ChARM: Chromatin Accessibility Retrospective Model [Unpublished work]"
authors:
  - family-names: Meduri
    given-names: Ruthwick
    orcid: "0000-0003-2403-2712"
  - family-names: Singh
    given-names: Umashankar
    orcid: "0000-0001-8578-8201"
date-released: 2025-07-10

GitHub Events

Total
  • Member event: 1
  • Push event: 3
Last Year
  • Member event: 1
  • Push event: 3