patformm
A little tool that takes MM tags of Biomodal bams and convert to pat format.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Repository
A little tool that takes MM tags of Biomodal bams and convert to pat format.
Basic Info
- Host: GitHub
- Owner: jackieduckie
- Language: Python
- Default Branch: main
- Size: 38.1 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
patformm
A tool for converting methylation information from Biomodal BAM files (MM tags) to PAT format for methylation analysis.
Description
patformm processes BAM files containing methylation information in MM tags (e.g., MM:Z:C+C.,1,23;) and converts them to PAT format through a two-step process:
- parsemmtags: Extracts methylation information from BAM files
- calculate_cpos: Processes methylation patterns and converts to CpG indices
Core Components
parsemmtags
Extracts essential information from BAM files: - Chromosome - Start position - CIGAR string - MM tag - Sequence - Strand information (reverse/forward)
bash
patformm parse_mm_tags --threads 8 -o output.bed input.bam
calculate_cpos
Processes the extracted information to generate PAT format: - Maps read positions to reference positions using CIGAR strings - Identifies methylated C positions from MM tags - Converts to CpG indices using a reference CpG map - Handles both forward and reverse strands - Supports chunked processing for memory efficiency
bash
patformm calculate_cpos --threads 8 --chunk-size 1000000 -o output.pat input.bed
Implementation Details
MM Tag Processing
- Parses MM tags in format
MM:Z:C+C,<positions> - Tracks methylated C positions in reads
- Handles both forward (C) and reverse (G) strand methylation
CpG Index Mapping
- Uses a preloaded CpG reference map (
CpG.bed.hg38.gz) - Maps genomic positions to CpG indices
- Handles deletions and gaps with '.' notation
Output Format
PAT format with:
chromosome first_cpg_index methylation_pattern count
chr1 1234 CT.C 5
Where:
- methylation_pattern: C=methylated, T=unmethylated, .=missing/invalid
Performance Features
- Parallel processing support
- Chunked file processing
- Memory-efficient design
- Temporary file handling in scratch space
Requirements
- Python 3.8+
- samtools
- wgbstools
- Reference files:
- CpG.bed.hg38.gz (CpG position index)
Usage Example
```bash
Step 1: Parse MM tags
patformm parsemmtags --threads 8 -o sample.bed sample.bam
Step 2: Calculate CpG positions
patformm calculate_cpos --threads 8 --chunk-size 1000000 -o sample.pat sample.bed ```
Notes
- Large BAM files are processed in chunks to manage memory usage
- Supports multi-threaded processing for improved performance
Owner
- Name: Ruining Dong
- Login: jackieduckie
- Kind: user
- Location: Melbourne
- Company: @umccr @UMCCR-RADIO-Lab
- Repositories: 2
- Profile: https://github.com/jackieduckie
Semi-professional dog patter
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: patformm
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Ruining
family-names: Dong
name-suffix: Dr
orcid: 'https://orcid.org/0000-0003-1433-0484'
repository-code: 'https://github.com/jackieduckie/patformm'
GitHub Events
Total
- Release event: 1
- Push event: 8
- Create event: 1
Last Year
- Release event: 1
- Push event: 8
- Create event: 1