https://github.com/dhslab/peakachu-cohort
Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Repository
Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.
Basic Info
- Host: GitHub
- Owner: dhslab
- Language: Python
- Default Branch: master
- Size: 1.95 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Peakachu Cohort Analysis Toolkit
This project extends the capabilities of the Peakachu loop caller (Salameh et al., Nat Commun 11, 3428 (2020)) to enable scalable analysis of chromatin loops across cohorts of samples.
Core Features
The primary goal is to automate and streamline the analysis of multiple Hi-C datasets (.hic/.cool files), identify differential looping patterns, and facilitate interactive visualization.
Key functionalities include:
Batch Loop Calling:
- Automates Peakachu's
trainandscore_genomefunctions across multiple samples and resolutions (e.g., 5kb, 10kb). - Parallelizes analysis for efficient processing of large cohorts.
- Automates Peakachu's
Intensity Extraction:
- Retrieves raw and CLR-normalized contact counts for all predicted loops.
- Provides quantitative data essential for downstream differential analysis.
CTCF Overlap Annotation:
- Annotates loops by intersecting anchor regions with provided CTCF ChIP-seq peak files (BED format).
- Helps prioritize biologically relevant, CTCF-mediated loops.
Differential Comparison:
- Performs statistical comparisons (e.g., fold-change, Wilcoxon test) of loop intensities between defined groups (e.g., mutant vs. wild-type).
- Identifies loops with significant changes associated with experimental conditions.
HiGlass Integration:
- Generates configuration files to visualize predicted loops and intensity tracks within the HiGlass interactive genome browser.
- Packages outputs for easy loading and exploration.
Getting Started
Installation
This toolkit is designed as a Python package. You can install it directly from this repository using pip:
```bash pip install git+https://github.com/your-username/peakachu-cohort-analysis.git
Or, after cloning the repository:
cd peakachu-cohort-analysis
pip install .
```
We recommend using a dedicated virtual environment (e.g., conda or venv). Ensure you have Python 3.8 or higher.
Configuration
The main workflow is driven by a configuration file, typically named config.yaml. This file specifies input data locations, analysis parameters, and group definitions for comparisons.
Here's an example structure:
```yaml
config.yaml example
outputdir: ./results/cohortanalysis resolutions: [5000, 10000] # Resolutions in bp (e.g., 5kb, 10kb)
--- Input Data ---
hic_files: # List of .hic or .cool files - /path/to/sample1.hic - /path/to/sample2.mcool::/resolutions/5000 # Specify resolution for multi-res coolers - /path/to/sample3.hic # ... more samples
ctcfpeaks: # Optional: BED file with CTCF peaks for annotation - /path/to/ctcfpeaks.bed
--- Peakachu Parameters ---
peakachumodel: /path/to/pretrained/peakachumodel.pkl # Optional: Use a pre-trained model peakachuparams: # Parameters passed to Peakachu scoregenome mindist: 10000 maxdist: 3000000 # ... other peakachu parameters
--- Cohort & Group Definitions ---
samples: # Define metadata and group assignment for each sample sample1: group: 'wildtype' # Add other metadata if needed sample2: group: 'mutant' sample3: group: 'wildtype' # ... map sample names (from hic_files base names) to groups
groups: # Define the groups for comparison - wildtype - mutant
--- Differential Analysis ---
differentialparams: method: 'wilcoxon' # 'foldchange' or 'wilcoxon' pseudocount: 1 # For fold-change calculation fdrthreshold: 0.05 # Significance threshold
--- HiGlass Configuration ---
higlassoptions: server: 'http://localhost:8888/api/v1' # Your HiGlass server API endpoint trackcolor_range: ['#FFFFFF', '#FF0000'] # Color range for intensity tracks
```
Adjust the paths and parameters according to your specific dataset and analysis goals.
Basic Usage
The primary way to run the analysis is via the main script (e.g., run_cohort_analysis.py), providing the configuration file:
bash
python run_cohort_analysis.py --config config.yaml
This command will execute the following steps based on the configuration:
- Run Peakachu
score_genomefor each sample and resolution. - Extract loop intensities (raw and normalized).
- Annotate loops with CTCF overlap (if provided).
- Perform differential analysis between specified groups.
- Generate HiGlass configuration files for visualization.
Results will be saved in the directory specified by output_dir in the config.yaml file.
Development Roadmap
See scripts/prd.txt for details on the development plan, including MVP requirements and future enhancements.
Owner
- Name: Code and Software from David Spencer's lab
- Login: dhslab
- Kind: organization
- Email: dspencerlab@gmail.com
- Location: United States of America
- Website: davidspencerlab.org
- Twitter: dspencerlab
- Repositories: 6
- Profile: https://github.com/dhslab
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3
Dependencies
- 260 dependencies
- @anthropic-ai/sdk ^0.39.0
- boxen ^8.0.1
- chalk ^4.1.2
- cli-table3 ^0.6.5
- commander ^11.1.0
- cors ^2.8.5
- dotenv ^16.3.1
- express ^4.21.2
- fastmcp ^1.20.5
- figlet ^1.8.0
- fuse.js ^7.0.0
- gradient-string ^3.0.0
- helmet ^8.1.0
- inquirer ^12.5.0
- jsonwebtoken ^9.0.2
- lru-cache ^10.2.0
- openai ^4.89.0
- ora ^8.2.0