Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 3
Created about 1 year ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation

README.md

pymaftools

pymaftools is a Python package designed to handle and analyze MAF (Mutation Annotation Format) files. It provides utilities for working with mutation data, including the MAF and PivotTable classes for data manipulation, and functions for visualizing mutation data with oncoplots.

image

Features

  • MAF Class: A utility to load, parse, and manipulate MAF files.
  • PivotTable Class: A custom pivot table implementation for summarizing mutation frequencies and sorting genes and samples.
  • Oncoplot: Generate oncoplot visualizations with mutation data and frequencies.
  • LollipopPlot: Visualize mutation positions along protein sequences with optional domain annotation.
  • Boxplot with Statistical Testing: Generate comparative boxplots with integrated statistical tests (e.g., Wilcoxon, t-test) for group-wise mutation metrics.
  • Similarity Metrics: Compute similarity between samples or cohorts based on mutation profiles (e.g., Jaccard index, cosine similarity).

Requirements

pymaftools requires Python 3.10 or higher and the following dependencies:

Core Dependencies

  • pandas (>2.0) - Data manipulation and analysis
  • numpy - Numerical computing
  • matplotlib - Basic plotting and visualization
  • seaborn - Statistical data visualization
  • scipy - Scientific computing and statistics
  • networkx - Graph algorithms and network analysis
  • scikit-learn - Machine learning algorithms
  • statsmodels - Statistical modeling and hypothesis testing
  • statannotations - Statistical annotations for plots
  • requests - HTTP library for API calls
  • beautifulsoup4 - HTML/XML parsing
  • tqdm - Progress bars

All dependencies will be automatically installed when you install pymaftools using pip or conda.

Installation

Create a Conda Environment

bash conda create -n pymaftools python=3.10

Using GitHub (for the latest version) ✅ Recommended

To install directly from GitHub (if you want the latest changes):

bash pip install git+https://github.com/xu62u4u6/pymaftools.git

Using pip (from PyPI)

You can install the stable version pymaftools package directly from PyPI using pip:

bash pip install pymaftools

Usage

Importing the Package

python from pymaftools import *

Getting Started

```python

Load MAF files

mafcase1 = MAF.readmaf("case1.maf") mafcase2 = MAF.readmaf("case2.maf") allcasemaf = MAF.mergemafs([mafcase1, maf_case2])

if no sample_ID column in MAF file, add it

allcasemaf["sampleID"] = allcasemaf["tumorsample_barcode"]

Filter to keep only nonsynonymous mutations

filteredallcasemaf = allcasemaf.filtermaf(MAF.nonsynonymous_types)

Convert to pivot table (features x samples table, mutation classification as values)

pivottable = filteredallcasemaf.topivottable()

Inspect PivotTable structure

print(pivottable) # check pivot table print(pivottable.featuremetadata) # check feature metadata (genes/mutations) print(pivottable.sample_metadata) # check sample metadata

Process and sort the pivot table

sortedpivottable = (pivottable .addfreq() # Calculate mutation frequencies .sortfeatures(by="freq") # Sort features by frequency .sortsamplesbymutations() # Sort samples by mutation patterns .calculateTMB(capturesize=50) # Calculate tumor mutation burden )

Create basic oncoplot using method chaining

oncoplot = (OncoPlot(sortedpivottable.head(50)) .setconfig(figsize=(15, 10), widthratios=[20, 2, 2]) # heatmap, frequency bar, classification bar .mutationheatmap() .plotfreq() .plot_bar() .save("oncoplot.png", dpi=300) ) ```

Create Mutation Oncoplot with Sample Metadata

```python

Load and process data

LUADmaf = MAF.readcsv("data/WES/LUADallcasemaf.csv") LUSCmaf = MAF.readcsv("data/WES/LUSCallcasemaf.csv") allcasemaf = MAF.mergemafs([LUADmaf, LUSC_maf])

Filter maf and convert to table

freq = 0.1 table = (allcasemaf .filtermaf(allcasemaf.nonsynonymoustypes) .topivottable() )

load sample metadata

allsamplemetadata = pd.readcsv("data/allsample_metadata.csv")

get caseID (case1T -> case1, T) and concat samplemetadata using caseID

table.samplemetadata[["caseID", "sampletype"]] = table.columns.toseries().str.rsplit("", n=1).apply(pd.Series) table.samplemetadata = pd.merge(table.samplemetadata.resetindex(), allsamplemetadata, lefton="caseID", righton="caseID", ).setindex(["sampleID"])

Add frequency calculations for different groups

table = table.addfreq( groups={"LUAD": table.subset(samples=table.samplemetadata.subtype == "LUAD"), "ASC": table.subset(samples=table.samplemetadata.subtype == "ASC"), "LUSC": table.subset(samples=table.samplemetadata.subtype == "LUSC")} )

Filter and sort table

table = (table.filterbyfreq(freq) .sortfeatures(by="freq") .sortsamplesbygroup(groupcol="subtype", grouporder=["LUAD", "ASC", "LUSC"], top=10)
)

Setup color mappings

categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}

Create oncoplot with method chaining

oncoplot = (OncoPlot(table) .setconfig(categoricalcolumns=categoricalcolumns, figsize=(30, 14), widthratios=[25, 3, 0, 2]) # main heatmap, freq heatmap, flexible region, legend region .mutationheatmap() # .plotfreq(freqcolumns=["freq", "LUADfreq", "ASCfreq", "LUSCfreq"]) .plotbar() .plotcategoricalmetadata(cmapdict=cmapdict) # or like {"subtype": {"LUAD": orange, "ASC": green, "LUSC": blue}, } .plotalllegends() .save("mutationoncoplot.tiff", dpi=300) ) ``` image

Create Numeric CNV Oncoplot

```python

Create numeric heatmap for CNV data using method chaining

categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}

oncoplot = (OncoPlot(CNVgenecosmic) .setconfig(categoricalcolumns=categoricalcolumns, figsize=(30, 10), widthratios=[25, 1, 0, 3]) # main heatmap, freq heatmap, flexible region, legend region .numericheatmap(yticklabels=False, cmap="coolwarm", vmin=-2, vmax=2) .plotbar() .plotcategoricalmetadata(cmapdict=cmapdict) .plotalllegends() .save("cnv_oncoplot.tiff", dpi=600) ) ```

image

Create Lolipop plot

```python

read MAF file

maf = MAF.readcsv(YOURMAFPATH) gene = "EGFR" # gene name AAlength, mutationsdata = maf.getproteininfo(gene) # get protein length and mutations data domainsdata, refseqID = MAF.getdomaininfo(gene, AAlength) # search domain data match protein length

create LollipopPlot object

plot = LollipopPlot( proteinname=gene, proteinlength=AAlength, domains=domainsdata, mutations=mutations_data ) plot.plot() ```

image

FAQ

1. How to adjust font sizes in OncoPlot?

You can adjust font sizes for different components using the following parameters:

```python

Adjust y-axis gene name font size

oncoplot = OncoPlot(pivottable, ytickfontsize=12)

Adjust font size in heatmaps

oncoplot.mutationheatmap(ytickfontsize=10) oncoplot.numericheatmap(ytickfontsize=8)

Adjust annotation font size in frequency plot

oncoplot.plotfreq(annotfontsize=10) ```

2. How to customize color mappings?

You can use the ColorManager to register and retrieve custom colors for different mutation types and categorical variables:

```python from pymaftools.plot.ColorManager import ColorManager

Get the color manager instance

color_manager = ColorManager()

Register custom mutation type colors

custommutationcolors = { "MissenseMutation": "#FF6B6B", "NonsenseMutation": "#4ECDC4", "FrameShiftDel": "#45B7D1" } colormanager.registercmap("custommutations", custommutation_colors)

Register custom categorical colors

customcategoricalcolors = { "LUAD": "orange", "LUSC": "blue", "ASC": "green" } colormanager.registercmap("subtype", customcategoricalcolors)

Use registered colors in plots

mutationcmap = colormanager.getcmap("custommutations") subtypecmap = colormanager.get_cmap("subtype", alpha=0.7) # You can set alpha value here

oncoplot.mutationheatmap(cmapdict=mutationcmap) oncoplot.plotcategoricalmetadata(cmapdict={"subtype": subtype_cmap}) ```

3. How to control visualization parameters in similarity analysis?

```python

Control heatmap display options

result = SimilarityMatrix.analyzesimilarity( table=table, method="jaccard", title="Similarity Analysis", groups=groups, grouporder=["Group1", "Group2", "Group3"], layout="horizontal", # or "grid" heatmapshowonlyxticks=True, # Show only x-axis labels heatmapannot=False, # Don't show numeric annotations savedir="./output" ) ```

4. How to handle performance issues with large datasets?

For large datasets, consider the following strategies:

```python

Filter before analysis

filteredtable = (pivottable .filterbyfreq(0.1) # Keep only features with frequency > 10% .head(100) # Take only top 100 features )

do downstream analysis on filtered_table

```

5. How to save and load analysis results?

Use SQLite format for saving both Cohort and PivotTable data:

```python

Save PivotTable to SQLite

pivottable.tosqlite("analysis_results.db")

Load PivotTable from SQLite

loadedtable = PivotTable.readsqlite("analysis_results.db")

Save Cohort to SQLite

cohort.tosqlite("cohortdata.db")

Load Cohort from SQLite

loadedcohort = Cohort.readsqlite("cohort_data.db")

Save figures (supports multiple formats)

oncoplot.save("oncoplot.png", dpi=300) oncoplot.save("oncoplot.tiff", dpi=300) ```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

xu62u4u6

Owner

  • Name: xu62u4u6
  • Login: xu62u4u6
  • Kind: user
  • Location: Kaoshiung

基隆念生科的高雄人。 喜歡海。 喜歡異想天開和寫扣。

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below:"
title: "pymaftools: A Python Toolkit for MAF File Analysis"
authors:
  - family-names: Liu
    given-names: Ding-Yang
    affiliation: "National Yang Ming Chiao Tung University"
    orcid: https://orcid.org/0009-0003-1978-6330
date-released: 2025-07-17
version: 0.2.3
url: https://github.com/xu62u4u6/pymaftools

GitHub Events

Total
  • Release event: 3
  • Watch event: 5
  • Push event: 77
  • Create event: 5
Last Year
  • Release event: 3
  • Watch event: 5
  • Push event: 77
  • Create event: 5

Dependencies

requirements.txt pypi
  • matplotlib ==3.9.2
  • munkres ==1.1.4
setup.py pypi
  • matplotlib *
  • numpy *
  • pandas *
  • seaborn *