pymaftools
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: xu62u4u6
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://pypi.org/project/pymaftools/
- Size: 2.99 MB
Statistics
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
pymaftools
pymaftools is a Python package designed to handle and analyze MAF (Mutation Annotation Format) files. It provides utilities for working with mutation data, including the MAF and PivotTable classes for data manipulation, and functions for visualizing mutation data with oncoplots.
Features
- MAF Class: A utility to load, parse, and manipulate MAF files.
- PivotTable Class: A custom pivot table implementation for summarizing mutation frequencies and sorting genes and samples.
- Oncoplot: Generate oncoplot visualizations with mutation data and frequencies.
- LollipopPlot: Visualize mutation positions along protein sequences with optional domain annotation.
- Boxplot with Statistical Testing: Generate comparative boxplots with integrated statistical tests (e.g., Wilcoxon, t-test) for group-wise mutation metrics.
- Similarity Metrics: Compute similarity between samples or cohorts based on mutation profiles (e.g., Jaccard index, cosine similarity).
Requirements
pymaftools requires Python 3.10 or higher and the following dependencies:
Core Dependencies
- pandas (>2.0) - Data manipulation and analysis
- numpy - Numerical computing
- matplotlib - Basic plotting and visualization
- seaborn - Statistical data visualization
- scipy - Scientific computing and statistics
- networkx - Graph algorithms and network analysis
- scikit-learn - Machine learning algorithms
- statsmodels - Statistical modeling and hypothesis testing
- statannotations - Statistical annotations for plots
- requests - HTTP library for API calls
- beautifulsoup4 - HTML/XML parsing
- tqdm - Progress bars
All dependencies will be automatically installed when you install pymaftools using pip or conda.
Installation
Create a Conda Environment
bash
conda create -n pymaftools python=3.10
Using GitHub (for the latest version) ✅ Recommended
To install directly from GitHub (if you want the latest changes):
bash
pip install git+https://github.com/xu62u4u6/pymaftools.git
Using pip (from PyPI)
You can install the stable version pymaftools package directly from PyPI using pip:
bash
pip install pymaftools
Usage
Importing the Package
python
from pymaftools import *
Getting Started
```python
Load MAF files
mafcase1 = MAF.readmaf("case1.maf") mafcase2 = MAF.readmaf("case2.maf") allcasemaf = MAF.mergemafs([mafcase1, maf_case2])
if no sample_ID column in MAF file, add it
allcasemaf["sampleID"] = allcasemaf["tumorsample_barcode"]
Filter to keep only nonsynonymous mutations
filteredallcasemaf = allcasemaf.filtermaf(MAF.nonsynonymous_types)
Convert to pivot table (features x samples table, mutation classification as values)
pivottable = filteredallcasemaf.topivottable()
Inspect PivotTable structure
print(pivottable) # check pivot table print(pivottable.featuremetadata) # check feature metadata (genes/mutations) print(pivottable.sample_metadata) # check sample metadata
Process and sort the pivot table
sortedpivottable = (pivottable .addfreq() # Calculate mutation frequencies .sortfeatures(by="freq") # Sort features by frequency .sortsamplesbymutations() # Sort samples by mutation patterns .calculateTMB(capturesize=50) # Calculate tumor mutation burden )
Create basic oncoplot using method chaining
oncoplot = (OncoPlot(sortedpivottable.head(50)) .setconfig(figsize=(15, 10), widthratios=[20, 2, 2]) # heatmap, frequency bar, classification bar .mutationheatmap() .plotfreq() .plot_bar() .save("oncoplot.png", dpi=300) ) ```
Create Mutation Oncoplot with Sample Metadata
```python
Load and process data
LUADmaf = MAF.readcsv("data/WES/LUADallcasemaf.csv") LUSCmaf = MAF.readcsv("data/WES/LUSCallcasemaf.csv") allcasemaf = MAF.mergemafs([LUADmaf, LUSC_maf])
Filter maf and convert to table
freq = 0.1 table = (allcasemaf .filtermaf(allcasemaf.nonsynonymoustypes) .topivottable() )
load sample metadata
allsamplemetadata = pd.readcsv("data/allsample_metadata.csv")
get caseID (case1T -> case1, T) and concat samplemetadata using caseID
table.samplemetadata[["caseID", "sampletype"]] = table.columns.toseries().str.rsplit("", n=1).apply(pd.Series) table.samplemetadata = pd.merge(table.samplemetadata.resetindex(), allsamplemetadata, lefton="caseID", righton="caseID", ).setindex(["sampleID"])
Add frequency calculations for different groups
table = table.addfreq( groups={"LUAD": table.subset(samples=table.samplemetadata.subtype == "LUAD"), "ASC": table.subset(samples=table.samplemetadata.subtype == "ASC"), "LUSC": table.subset(samples=table.samplemetadata.subtype == "LUSC")} )
Filter and sort table
table = (table.filterbyfreq(freq)
.sortfeatures(by="freq")
.sortsamplesbygroup(groupcol="subtype",
grouporder=["LUAD", "ASC", "LUSC"], top=10)
)
Setup color mappings
categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}
Create oncoplot with method chaining
oncoplot = (OncoPlot(table)
.setconfig(categoricalcolumns=categoricalcolumns,
figsize=(30, 14),
widthratios=[25, 3, 0, 2]) # main heatmap, freq heatmap, flexible region, legend region
.mutationheatmap() #
.plotfreq(freqcolumns=["freq", "LUADfreq", "ASCfreq", "LUSCfreq"])
.plotbar()
.plotcategoricalmetadata(cmapdict=cmapdict) # or like {"subtype": {"LUAD": orange, "ASC": green, "LUSC": blue}, }
.plotalllegends()
.save("mutationoncoplot.tiff", dpi=300)
)
```

Create Numeric CNV Oncoplot
```python
Create numeric heatmap for CNV data using method chaining
categoricalcolumns = ["subtype", "sex", "smoke"] cmapdict = {key: cm.getcmap(key, alpha=0.7) for key in categoricalcolumns}
oncoplot = (OncoPlot(CNVgenecosmic) .setconfig(categoricalcolumns=categoricalcolumns, figsize=(30, 10), widthratios=[25, 1, 0, 3]) # main heatmap, freq heatmap, flexible region, legend region .numericheatmap(yticklabels=False, cmap="coolwarm", vmin=-2, vmax=2) .plotbar() .plotcategoricalmetadata(cmapdict=cmapdict) .plotalllegends() .save("cnv_oncoplot.tiff", dpi=600) ) ```

Create Lolipop plot
```python
read MAF file
maf = MAF.readcsv(YOURMAFPATH) gene = "EGFR" # gene name AAlength, mutationsdata = maf.getproteininfo(gene) # get protein length and mutations data domainsdata, refseqID = MAF.getdomaininfo(gene, AAlength) # search domain data match protein length
create LollipopPlot object
plot = LollipopPlot( proteinname=gene, proteinlength=AAlength, domains=domainsdata, mutations=mutations_data ) plot.plot() ```

FAQ
1. How to adjust font sizes in OncoPlot?
You can adjust font sizes for different components using the following parameters:
```python
Adjust y-axis gene name font size
oncoplot = OncoPlot(pivottable, ytickfontsize=12)
Adjust font size in heatmaps
oncoplot.mutationheatmap(ytickfontsize=10) oncoplot.numericheatmap(ytickfontsize=8)
Adjust annotation font size in frequency plot
oncoplot.plotfreq(annotfontsize=10) ```
2. How to customize color mappings?
You can use the ColorManager to register and retrieve custom colors for different mutation types and categorical variables:
```python from pymaftools.plot.ColorManager import ColorManager
Get the color manager instance
color_manager = ColorManager()
Register custom mutation type colors
custommutationcolors = { "MissenseMutation": "#FF6B6B", "NonsenseMutation": "#4ECDC4", "FrameShiftDel": "#45B7D1" } colormanager.registercmap("custommutations", custommutation_colors)
Register custom categorical colors
customcategoricalcolors = { "LUAD": "orange", "LUSC": "blue", "ASC": "green" } colormanager.registercmap("subtype", customcategoricalcolors)
Use registered colors in plots
mutationcmap = colormanager.getcmap("custommutations") subtypecmap = colormanager.get_cmap("subtype", alpha=0.7) # You can set alpha value here
oncoplot.mutationheatmap(cmapdict=mutationcmap) oncoplot.plotcategoricalmetadata(cmapdict={"subtype": subtype_cmap}) ```
3. How to control visualization parameters in similarity analysis?
```python
Control heatmap display options
result = SimilarityMatrix.analyzesimilarity( table=table, method="jaccard", title="Similarity Analysis", groups=groups, grouporder=["Group1", "Group2", "Group3"], layout="horizontal", # or "grid" heatmapshowonlyxticks=True, # Show only x-axis labels heatmapannot=False, # Don't show numeric annotations savedir="./output" ) ```
4. How to handle performance issues with large datasets?
For large datasets, consider the following strategies:
```python
Filter before analysis
filteredtable = (pivottable .filterbyfreq(0.1) # Keep only features with frequency > 10% .head(100) # Take only top 100 features )
do downstream analysis on filtered_table
```
5. How to save and load analysis results?
Use SQLite format for saving both Cohort and PivotTable data:
```python
Save PivotTable to SQLite
pivottable.tosqlite("analysis_results.db")
Load PivotTable from SQLite
loadedtable = PivotTable.readsqlite("analysis_results.db")
Save Cohort to SQLite
cohort.tosqlite("cohortdata.db")
Load Cohort from SQLite
loadedcohort = Cohort.readsqlite("cohort_data.db")
Save figures (supports multiple formats)
oncoplot.save("oncoplot.png", dpi=300) oncoplot.save("oncoplot.tiff", dpi=300) ```
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
xu62u4u6
Owner
- Name: xu62u4u6
- Login: xu62u4u6
- Kind: user
- Location: Kaoshiung
- Repositories: 1
- Profile: https://github.com/xu62u4u6
基隆念生科的高雄人。 喜歡海。 喜歡異想天開和寫扣。
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below:"
title: "pymaftools: A Python Toolkit for MAF File Analysis"
authors:
- family-names: Liu
given-names: Ding-Yang
affiliation: "National Yang Ming Chiao Tung University"
orcid: https://orcid.org/0009-0003-1978-6330
date-released: 2025-07-17
version: 0.2.3
url: https://github.com/xu62u4u6/pymaftools
GitHub Events
Total
- Release event: 3
- Watch event: 5
- Push event: 77
- Create event: 5
Last Year
- Release event: 3
- Watch event: 5
- Push event: 77
- Create event: 5
Dependencies
- matplotlib ==3.9.2
- munkres ==1.1.4
- matplotlib *
- numpy *
- pandas *
- seaborn *