pymaftools

https://github.com/xu62u4u6/pymaftools

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.3%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: xu62u4u6
License: mit
Language: Python
Default Branch: main
Homepage: https://pypi.org/project/pymaftools/
Size: 2.99 MB

Statistics

Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 3

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License Citation

pymaftools

pymaftools is a Python package designed to handle and analyze MAF (Mutation Annotation Format) files. It provides utilities for working with mutation data, including the MAF and PivotTable classes for data manipulation, and functions for visualizing mutation data with oncoplots.

Features

MAF Class: A utility to load, parse, and manipulate MAF files.
PivotTable Class: A custom pivot table implementation for summarizing mutation frequencies and sorting genes and samples.
Oncoplot: Generate oncoplot visualizations with mutation data and frequencies.
LollipopPlot: Visualize mutation positions along protein sequences with optional domain annotation.
Boxplot with Statistical Testing: Generate comparative boxplots with integrated statistical tests (e.g., Wilcoxon, t-test) for group-wise mutation metrics.
Similarity Metrics: Compute similarity between samples or cohorts based on mutation profiles (e.g., Jaccard index, cosine similarity).

Requirements

pymaftools requires Python 3.10 or higher and the following dependencies:

Core Dependencies

pandas (>2.0) - Data manipulation and analysis
numpy - Numerical computing
matplotlib - Basic plotting and visualization
seaborn - Statistical data visualization
scipy - Scientific computing and statistics
networkx - Graph algorithms and network analysis
scikit-learn - Machine learning algorithms
statsmodels - Statistical modeling and hypothesis testing
statannotations - Statistical annotations for plots
requests - HTTP library for API calls
beautifulsoup4 - HTML/XML parsing
tqdm - Progress bars

All dependencies will be automatically installed when you install pymaftools using pip or conda.

Installation

Create a Conda Environment

bash conda create -n pymaftools python=3.10

Using GitHub (for the latest version) ✅ Recommended

To install directly from GitHub (if you want the latest changes):

bash pip install git+https://github.com/xu62u4u6/pymaftools.git

Using pip (from PyPI)

You can install the stable version pymaftools package directly from PyPI using pip:

bash pip install pymaftools

Usage

Importing the Package

python from pymaftools import *

Getting Started

```python

Load MAF files

mafcase1 = MAF.readmaf("case1.maf") mafcase2 = MAF.readmaf("case2.maf") allcasemaf = MAF.mergemafs([mafcase1, maf_case2])

if no sample_ID column in MAF file, add it

allcasemaf["sampleID"] = allcasemaf["tumorsample_barcode"]

Filter to keep only nonsynonymous mutations

filteredallcasemaf = allcasemaf.filtermaf(MAF.nonsynonymous_types)

Convert to pivot table (features x samples table, mutation classification as values)

pivottable = filteredallcasemaf.topivottable()

Inspect PivotTable structure

print(pivottable) # check pivot table print(pivottable.featuremetadata) # check feature metadata (genes/mutations) print(pivottable.sample_metadata) # check sample metadata

Process and sort the pivot table

sortedpivottable = (pivottable .addfreq() # Calculate mutation frequencies .sortfeatures(by="freq") # Sort features by frequency .sortsamplesbymutations() # Sort samples by mutation patterns .calculateTMB(capturesize=50) # Calculate tumor mutation burden )

Create basic oncoplot using method chaining

oncoplot = (OncoPlot(sortedpivottable.head(50)) .setconfig(figsize=(15, 10), widthratios=[20, 2, 2]) # heatmap, frequency bar, classification bar .mutationheatmap() .plotfreq() .plot_bar() .save("oncoplot.png", dpi=300) ) ```

Create Mutation Oncoplot with Sample Metadata

```python

Load and process data

LUADmaf = MAF.readcsv("data/WES/LUADallcasemaf.csv") LUSCmaf = MAF.readcsv("data/WES/LUSCallcasemaf.csv") allcasemaf = MAF.mergemafs([LUADmaf, LUSC_maf])

Filter maf and convert to table

freq = 0.1 table = (allcasemaf .filtermaf(allcasemaf.nonsynonymoustypes) .topivottable() )

load sample metadata

allsamplemetadata = pd.readcsv("data/allsample_metadata.csv")

get caseID (case1T -> case1, T) and concat samplemetadata using caseID

table.samplemetadata[["caseID", "sampletype"]] = table.columns.toseries().str.rsplit("", n=1).apply(pd.Series) table.samplemetadata = pd.merge(table.samplemetadata.resetindex(), allsamplemetadata, lefton="caseID", righton="caseID", ).setindex(["sampleID"])

Add frequency calculations for different groups

table = table.addfreq( groups={"LUAD": table.subset(samples=table.samplemetadata.subtype == "LUAD"), "ASC": table.subset(samples=table.samplemetadata.subtype == "ASC"), "LUSC": table.subset(samples=table.samplemetadata.subtype == "LUSC")} )

Filter and sort table

table = (table.filterbyfreq(freq) .sortfeatures(by="freq") .sortsamplesbygroup(groupcol="subtype", grouporder=["LUAD", "ASC", "LUSC"], top=10)
)

Setup color mappings

categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}

Create oncoplot with method chaining

oncoplot = (OncoPlot(table) .setconfig(categoricalcolumns=categoricalcolumns, figsize=(30, 14), widthratios=[25, 3, 0, 2]) # main heatmap, freq heatmap, flexible region, legend region .mutationheatmap() # .plotfreq(freqcolumns=["freq", "LUADfreq", "ASCfreq", "LUSCfreq"]) .plotbar() .plotcategoricalmetadata(cmapdict=cmapdict) # or like {"subtype": {"LUAD": orange, "ASC": green, "LUSC": blue}, } .plotalllegends() .save("mutationoncoplot.tiff", dpi=300) ) ```

Create Numeric CNV Oncoplot

```python

Create numeric heatmap for CNV data using method chaining

categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}

oncoplot = (OncoPlot(CNVgenecosmic) .setconfig(categoricalcolumns=categoricalcolumns, figsize=(30, 10), widthratios=[25, 1, 0, 3]) # main heatmap, freq heatmap, flexible region, legend region .numericheatmap(yticklabels=False, cmap="coolwarm", vmin=-2, vmax=2) .plotbar() .plotcategoricalmetadata(cmapdict=cmapdict) .plotalllegends() .save("cnv_oncoplot.tiff", dpi=600) ) ```

Create Lolipop plot

```python

read MAF file

maf = MAF.readcsv(YOURMAFPATH) gene = "EGFR" # gene name AAlength, mutationsdata = maf.getproteininfo(gene) # get protein length and mutations data domainsdata, refseqID = MAF.getdomaininfo(gene, AAlength) # search domain data match protein length

create LollipopPlot object

plot = LollipopPlot( proteinname=gene, proteinlength=AAlength, domains=domainsdata, mutations=mutations_data ) plot.plot() ```

FAQ

1. How to adjust font sizes in OncoPlot?

You can adjust font sizes for different components using the following parameters:

```python

Adjust y-axis gene name font size

oncoplot = OncoPlot(pivottable, ytickfontsize=12)

Adjust font size in heatmaps

oncoplot.mutationheatmap(ytickfontsize=10) oncoplot.numericheatmap(ytickfontsize=8)

Adjust annotation font size in frequency plot

oncoplot.plotfreq(annotfontsize=10) ```

2. How to customize color mappings?

You can use the ColorManager to register and retrieve custom colors for different mutation types and categorical variables:

```python from pymaftools.plot.ColorManager import ColorManager

Get the color manager instance

color_manager = ColorManager()

Register custom mutation type colors

custommutationcolors = { "MissenseMutation": "#FF6B6B", "NonsenseMutation": "#4ECDC4", "FrameShiftDel": "#45B7D1" } colormanager.registercmap("custommutations", custommutation_colors)

Register custom categorical colors

customcategoricalcolors = { "LUAD": "orange", "LUSC": "blue", "ASC": "green" } colormanager.registercmap("subtype", customcategoricalcolors)

Use registered colors in plots

mutationcmap = colormanager.getcmap("custommutations") subtypecmap = colormanager.get_cmap("subtype", alpha=0.7) # You can set alpha value here

oncoplot.mutationheatmap(cmapdict=mutationcmap) oncoplot.plotcategoricalmetadata(cmapdict={"subtype": subtype_cmap}) ```

3. How to control visualization parameters in similarity analysis?

```python

Control heatmap display options

result = SimilarityMatrix.analyzesimilarity( table=table, method="jaccard", title="Similarity Analysis", groups=groups, grouporder=["Group1", "Group2", "Group3"], layout="horizontal", # or "grid" heatmapshowonlyxticks=True, # Show only x-axis labels heatmapannot=False, # Don't show numeric annotations savedir="./output" ) ```

4. How to handle performance issues with large datasets?

For large datasets, consider the following strategies:

```python

Filter before analysis

filteredtable = (pivottable .filterbyfreq(0.1) # Keep only features with frequency > 10% .head(100) # Take only top 100 features )

do downstream analysis on filtered_table

```

5. How to save and load analysis results?

Use SQLite format for saving both Cohort and PivotTable data:

```python

Save PivotTable to SQLite

pivottable.tosqlite("analysis_results.db")

Load PivotTable from SQLite

loadedtable = PivotTable.readsqlite("analysis_results.db")

Save Cohort to SQLite

cohort.tosqlite("cohortdata.db")

Load Cohort from SQLite

loadedcohort = Cohort.readsqlite("cohort_data.db")

Save figures (supports multiple formats)

oncoplot.save("oncoplot.png", dpi=300) oncoplot.save("oncoplot.tiff", dpi=300) ```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

xu62u4u6

Owner

Name: xu62u4u6
Login: xu62u4u6
Kind: user
Location: Kaoshiung

Repositories: 1
Profile: https://github.com/xu62u4u6

基隆念生科的高雄人。喜歡海。喜歡異想天開和寫扣。

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below:"
title: "pymaftools: A Python Toolkit for MAF File Analysis"
authors:
  - family-names: Liu
    given-names: Ding-Yang
    affiliation: "National Yang Ming Chiao Tung University"
    orcid: https://orcid.org/0009-0003-1978-6330
date-released: 2025-07-17
version: 0.2.3
url: https://github.com/xu62u4u6/pymaftools

GitHub Events

Total

Release event: 3
Watch event: 5
Push event: 77
Create event: 5

Last Year

Release event: 3
Watch event: 5
Push event: 77
Create event: 5

Dependencies

requirements.txt pypi

matplotlib ==3.9.2
munkres ==1.1.4

setup.py pypi

matplotlib *
numpy *
pandas *
seaborn *

pymaftools

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

pymaftools

Features

Requirements

Core Dependencies

Installation

Create a Conda Environment

Using GitHub (for the latest version) ✅ Recommended

Using pip (from PyPI)

Usage

Importing the Package

Getting Started

Load MAF files

if no sample_ID column in MAF file, add it

Filter to keep only nonsynonymous mutations

Convert to pivot table (features x samples table, mutation classification as values)

Inspect PivotTable structure

Process and sort the pivot table

Create basic oncoplot using method chaining

Create Mutation Oncoplot with Sample Metadata

Load and process data

Filter maf and convert to table

load sample metadata

get caseID (case1T -> case1, T) and concat samplemetadata using caseID

Add frequency calculations for different groups

Filter and sort table

Setup color mappings

Create oncoplot with method chaining

Create Numeric CNV Oncoplot

Create numeric heatmap for CNV data using method chaining

Create Lolipop plot

read MAF file

create LollipopPlot object

FAQ

1. How to adjust font sizes in OncoPlot?

Adjust y-axis gene name font size

Adjust font size in heatmaps

Adjust annotation font size in frequency plot

2. How to customize color mappings?

Get the color manager instance

Register custom mutation type colors

Register custom categorical colors

Use registered colors in plots

3. How to control visualization parameters in similarity analysis?

Control heatmap display options

4. How to handle performance issues with large datasets?

Filter before analysis

do downstream analysis on filtered_table

5. How to save and load analysis results?

Save PivotTable to SQLite

Load PivotTable from SQLite

Save Cohort to SQLite

Load Cohort from SQLite

Save figures (supports multiple formats)

License

Author

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies