https://github.com/dhslab/peakachu-cohort

Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary

Last synced: 8 months ago · JSON representation

Repository

Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.

Basic Info

Host: GitHub
Owner: dhslab
Language: Python
Default Branch: master
Size: 1.95 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

Peakachu Cohort Analysis Toolkit

This project extends the capabilities of the Peakachu loop caller (Salameh et al., Nat Commun 11, 3428 (2020)) to enable scalable analysis of chromatin loops across cohorts of samples.

Core Features

The primary goal is to automate and streamline the analysis of multiple Hi-C datasets (.hic/.cool files), identify differential looping patterns, and facilitate interactive visualization.

Key functionalities include:

Batch Loop Calling:
- Automates Peakachu's train and score_genome functions across multiple samples and resolutions (e.g., 5kb, 10kb).
- Parallelizes analysis for efficient processing of large cohorts.
Intensity Extraction:
- Retrieves raw and CLR-normalized contact counts for all predicted loops.
- Provides quantitative data essential for downstream differential analysis.
CTCF Overlap Annotation:
- Annotates loops by intersecting anchor regions with provided CTCF ChIP-seq peak files (BED format).
- Helps prioritize biologically relevant, CTCF-mediated loops.
Differential Comparison:
- Performs statistical comparisons (e.g., fold-change, Wilcoxon test) of loop intensities between defined groups (e.g., mutant vs. wild-type).
- Identifies loops with significant changes associated with experimental conditions.
HiGlass Integration:
- Generates configuration files to visualize predicted loops and intensity tracks within the HiGlass interactive genome browser.
- Packages outputs for easy loading and exploration.

Getting Started

Installation

This toolkit is designed as a Python package. You can install it directly from this repository using pip:

```bash pip install git+https://github.com/your-username/peakachu-cohort-analysis.git

Or, after cloning the repository:

cd peakachu-cohort-analysis

pip install .

```

We recommend using a dedicated virtual environment (e.g., conda or venv). Ensure you have Python 3.8 or higher.

Configuration

The main workflow is driven by a configuration file, typically named config.yaml. This file specifies input data locations, analysis parameters, and group definitions for comparisons.

Here's an example structure:

```yaml

config.yaml example

outputdir: ./results/cohortanalysis resolutions: [5000, 10000] # Resolutions in bp (e.g., 5kb, 10kb)

--- Input Data ---

hic_files: # List of .hic or .cool files - /path/to/sample1.hic - /path/to/sample2.mcool::/resolutions/5000 # Specify resolution for multi-res coolers - /path/to/sample3.hic # ... more samples

ctcfpeaks: # Optional: BED file with CTCF peaks for annotation - /path/to/ctcfpeaks.bed

--- Peakachu Parameters ---

peakachumodel: /path/to/pretrained/peakachumodel.pkl # Optional: Use a pre-trained model peakachuparams: # Parameters passed to Peakachu scoregenome mindist: 10000 maxdist: 3000000 # ... other peakachu parameters

--- Cohort & Group Definitions ---

samples: # Define metadata and group assignment for each sample sample1: group: 'wildtype' # Add other metadata if needed sample2: group: 'mutant' sample3: group: 'wildtype' # ... map sample names (from hic_files base names) to groups

groups: # Define the groups for comparison - wildtype - mutant

--- Differential Analysis ---

differentialparams: method: 'wilcoxon' # 'foldchange' or 'wilcoxon' pseudocount: 1 # For fold-change calculation fdrthreshold: 0.05 # Significance threshold

--- HiGlass Configuration ---

higlassoptions: server: 'http://localhost:8888/api/v1' # Your HiGlass server API endpoint trackcolor_range: ['#FFFFFF', '#FF0000'] # Color range for intensity tracks

```

Adjust the paths and parameters according to your specific dataset and analysis goals.

Basic Usage

The primary way to run the analysis is via the main script (e.g., run_cohort_analysis.py), providing the configuration file:

bash python run_cohort_analysis.py --config config.yaml

This command will execute the following steps based on the configuration:

Run Peakachu score_genome for each sample and resolution.
Extract loop intensities (raw and normalized).
Annotate loops with CTCF overlap (if provided).
Perform differential analysis between specified groups.
Generate HiGlass configuration files for visualization.

Results will be saved in the directory specified by output_dir in the config.yaml file.

Development Roadmap

See scripts/prd.txt for details on the development plan, including MVP requirements and future enhancements.

Owner

Name: Code and Software from David Spencer's lab
Login: dhslab
Kind: organization
Email: dspencerlab@gmail.com
Location: United States of America

Website: davidspencerlab.org
Twitter: dspencerlab
Repositories: 6
Profile: https://github.com/dhslab

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Dependencies

package-lock.json npm

260 dependencies

package.json npm

@anthropic-ai/sdk ^0.39.0
boxen ^8.0.1
chalk ^4.1.2
cli-table3 ^0.6.5
commander ^11.1.0
cors ^2.8.5
dotenv ^16.3.1
express ^4.21.2
fastmcp ^1.20.5
figlet ^1.8.0
fuse.js ^7.0.0
gradient-string ^3.0.0
helmet ^8.1.0
inquirer ^12.5.0
jsonwebtoken ^9.0.2
lru-cache ^10.2.0
openai ^4.89.0
ora ^8.2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/dhslab/peakachu-cohort

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Peakachu Cohort Analysis Toolkit

Core Features

Getting Started

Installation

Or, after cloning the repository:

cd peakachu-cohort-analysis

pip install .

Configuration

config.yaml example

--- Input Data ---

--- Peakachu Parameters ---

--- Cohort & Group Definitions ---

--- Differential Analysis ---

--- HiGlass Configuration ---

Basic Usage

Development Roadmap

Owner

GitHub Events

Total

Last Year

Dependencies