https://github.com/dhslab/peakachu-cohort

Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.

https://github.com/dhslab/peakachu-cohort

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

Wraps Peakachu’s train and score_genome functions in a parallelized CLI subcommand enabling scalable analysis across cohorts without manual per-sample commands.

Basic Info
  • Host: GitHub
  • Owner: dhslab
  • Language: Python
  • Default Branch: master
  • Size: 1.95 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 9 months ago
Metadata Files
Readme

README.md

Peakachu Cohort Analysis Toolkit

This project extends the capabilities of the Peakachu loop caller (Salameh et al., Nat Commun 11, 3428 (2020)) to enable scalable analysis of chromatin loops across cohorts of samples.

Core Features

The primary goal is to automate and streamline the analysis of multiple Hi-C datasets (.hic/.cool files), identify differential looping patterns, and facilitate interactive visualization.

Key functionalities include:

  1. Batch Loop Calling:

    • Automates Peakachu's train and score_genome functions across multiple samples and resolutions (e.g., 5kb, 10kb).
    • Parallelizes analysis for efficient processing of large cohorts.
  2. Intensity Extraction:

    • Retrieves raw and CLR-normalized contact counts for all predicted loops.
    • Provides quantitative data essential for downstream differential analysis.
  3. CTCF Overlap Annotation:

    • Annotates loops by intersecting anchor regions with provided CTCF ChIP-seq peak files (BED format).
    • Helps prioritize biologically relevant, CTCF-mediated loops.
  4. Differential Comparison:

    • Performs statistical comparisons (e.g., fold-change, Wilcoxon test) of loop intensities between defined groups (e.g., mutant vs. wild-type).
    • Identifies loops with significant changes associated with experimental conditions.
  5. HiGlass Integration:

    • Generates configuration files to visualize predicted loops and intensity tracks within the HiGlass interactive genome browser.
    • Packages outputs for easy loading and exploration.

Getting Started

Installation

This toolkit is designed as a Python package. You can install it directly from this repository using pip:

```bash pip install git+https://github.com/your-username/peakachu-cohort-analysis.git

Or, after cloning the repository:

cd peakachu-cohort-analysis

pip install .

```

We recommend using a dedicated virtual environment (e.g., conda or venv). Ensure you have Python 3.8 or higher.

Configuration

The main workflow is driven by a configuration file, typically named config.yaml. This file specifies input data locations, analysis parameters, and group definitions for comparisons.

Here's an example structure:

```yaml

config.yaml example

outputdir: ./results/cohortanalysis resolutions: [5000, 10000] # Resolutions in bp (e.g., 5kb, 10kb)

--- Input Data ---

hic_files: # List of .hic or .cool files - /path/to/sample1.hic - /path/to/sample2.mcool::/resolutions/5000 # Specify resolution for multi-res coolers - /path/to/sample3.hic # ... more samples

ctcfpeaks: # Optional: BED file with CTCF peaks for annotation - /path/to/ctcfpeaks.bed

--- Peakachu Parameters ---

peakachumodel: /path/to/pretrained/peakachumodel.pkl # Optional: Use a pre-trained model peakachuparams: # Parameters passed to Peakachu scoregenome mindist: 10000 maxdist: 3000000 # ... other peakachu parameters

--- Cohort & Group Definitions ---

samples: # Define metadata and group assignment for each sample sample1: group: 'wildtype' # Add other metadata if needed sample2: group: 'mutant' sample3: group: 'wildtype' # ... map sample names (from hic_files base names) to groups

groups: # Define the groups for comparison - wildtype - mutant

--- Differential Analysis ---

differentialparams: method: 'wilcoxon' # 'foldchange' or 'wilcoxon' pseudocount: 1 # For fold-change calculation fdrthreshold: 0.05 # Significance threshold

--- HiGlass Configuration ---

higlassoptions: server: 'http://localhost:8888/api/v1' # Your HiGlass server API endpoint trackcolor_range: ['#FFFFFF', '#FF0000'] # Color range for intensity tracks

```

Adjust the paths and parameters according to your specific dataset and analysis goals.

Basic Usage

The primary way to run the analysis is via the main script (e.g., run_cohort_analysis.py), providing the configuration file:

bash python run_cohort_analysis.py --config config.yaml

This command will execute the following steps based on the configuration:

  1. Run Peakachu score_genome for each sample and resolution.
  2. Extract loop intensities (raw and normalized).
  3. Annotate loops with CTCF overlap (if provided).
  4. Perform differential analysis between specified groups.
  5. Generate HiGlass configuration files for visualization.

Results will be saved in the directory specified by output_dir in the config.yaml file.

Development Roadmap

See scripts/prd.txt for details on the development plan, including MVP requirements and future enhancements.

Owner

  • Name: Code and Software from David Spencer's lab
  • Login: dhslab
  • Kind: organization
  • Email: dspencerlab@gmail.com
  • Location: United States of America

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3

Dependencies

package-lock.json npm
  • 260 dependencies
package.json npm
  • @anthropic-ai/sdk ^0.39.0
  • boxen ^8.0.1
  • chalk ^4.1.2
  • cli-table3 ^0.6.5
  • commander ^11.1.0
  • cors ^2.8.5
  • dotenv ^16.3.1
  • express ^4.21.2
  • fastmcp ^1.20.5
  • figlet ^1.8.0
  • fuse.js ^7.0.0
  • gradient-string ^3.0.0
  • helmet ^8.1.0
  • inquirer ^12.5.0
  • jsonwebtoken ^9.0.2
  • lru-cache ^10.2.0
  • openai ^4.89.0
  • ora ^8.2.0