patformm

A little tool that takes MM tags of Biomodal bams and convert to pat format.

https://github.com/jackieduckie/patformm

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

A little tool that takes MM tags of Biomodal bams and convert to pat format.

Basic Info
  • Host: GitHub
  • Owner: jackieduckie
  • Language: Python
  • Default Branch: main
  • Size: 38.1 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Citation

README.md

patformm

A tool for converting methylation information from Biomodal BAM files (MM tags) to PAT format for methylation analysis.

Description

patformm processes BAM files containing methylation information in MM tags (e.g., MM:Z:C+C.,1,23;) and converts them to PAT format through a two-step process:

  1. parsemmtags: Extracts methylation information from BAM files
  2. calculate_cpos: Processes methylation patterns and converts to CpG indices

Core Components

parsemmtags

Extracts essential information from BAM files: - Chromosome - Start position - CIGAR string - MM tag - Sequence - Strand information (reverse/forward)

bash patformm parse_mm_tags --threads 8 -o output.bed input.bam

calculate_cpos

Processes the extracted information to generate PAT format: - Maps read positions to reference positions using CIGAR strings - Identifies methylated C positions from MM tags - Converts to CpG indices using a reference CpG map - Handles both forward and reverse strands - Supports chunked processing for memory efficiency

bash patformm calculate_cpos --threads 8 --chunk-size 1000000 -o output.pat input.bed

Implementation Details

MM Tag Processing

  • Parses MM tags in format MM:Z:C+C,<positions>
  • Tracks methylated C positions in reads
  • Handles both forward (C) and reverse (G) strand methylation

CpG Index Mapping

  • Uses a preloaded CpG reference map (CpG.bed.hg38.gz)
  • Maps genomic positions to CpG indices
  • Handles deletions and gaps with '.' notation

Output Format

PAT format with: chromosome first_cpg_index methylation_pattern count chr1 1234 CT.C 5 Where: - methylation_pattern: C=methylated, T=unmethylated, .=missing/invalid

Performance Features

  • Parallel processing support
  • Chunked file processing
  • Memory-efficient design
  • Temporary file handling in scratch space

Requirements

  • Python 3.8+
  • samtools
  • wgbstools
  • Reference files:
    • CpG.bed.hg38.gz (CpG position index)

Usage Example

```bash

Step 1: Parse MM tags

patformm parsemmtags --threads 8 -o sample.bed sample.bam

Step 2: Calculate CpG positions

patformm calculate_cpos --threads 8 --chunk-size 1000000 -o sample.pat sample.bed ```

Notes

  • Large BAM files are processed in chunks to manage memory usage
  • Supports multi-threaded processing for improved performance

Owner

  • Name: Ruining Dong
  • Login: jackieduckie
  • Kind: user
  • Location: Melbourne
  • Company: @umccr @UMCCR-RADIO-Lab

Semi-professional dog patter

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: patformm
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Ruining
    family-names: Dong
    name-suffix: Dr
    orcid: 'https://orcid.org/0000-0003-1433-0484'
repository-code: 'https://github.com/jackieduckie/patformm'

GitHub Events

Total
  • Release event: 1
  • Push event: 8
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 8
  • Create event: 1