msianalyzer
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: NagelLabHub
- License: mit
- Language: Python
- Default Branch: main
- Size: 3.79 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MSIanalyzer
MSIanalyzer is a flexible pipeline for high-resolution analysis and visualization of Microsatellite Instability (MSI) and tandem repeat regions from sequencing reads.
Key Features
- Universal Repeat Region Analysis – Supports user-defined primers targeting any repetitive motif region.
- Per-Read Repeat Count Quantification – High-resolution repeat unit counting for accurate MSI profiling.
- Detailed Indel Characterization – Precise detection and annotation of interruptions in repeat sequences.
- Customized Pileup Visualizations – Clear visual summaries of coverage, repeats, indels, and genomic context.
- Optimized for Nanopore Data – Designed to tolerate higher error rates and fully leverage ONT long-read sequencing.
- Cluster-Aware Statistical Analysis – Incorporates read clustering to enhance detection of sample-level differences.
Installation
bash
pip install git+https://github.com/NagelLabHub/MSIanalyzer.git
Quick Usage Example
A ready-to-run Google Colab notebook is available here to demonstrate an example run of MSIanalyzer using the built-in example data.
To run MSIanalyzer via the command line, use the following examples from the example folder:
```bash
Run analysis on a single marker (without or with sample comparison)
msianalyzer run-marker BAT25 examplemarker.json msianalyzer run-marker BAT25 examplemarker.json --run-tests
Run batch analysis on all markers in the JSON (can use '--threads' for parrallel processing)
msianalyzer run-batch example_marker.json
Generate pileup visualization for FASTQ file(s)
msianalyzer pileup fastqexample/BVSBWG3500x.fastq hg38/chr4BAT25.fa ```
Citation
If you use this software, please cite:
Ting Zhai, Daniel J. Laverty, Zachary D. Nagel (2025). MSIanalyzer: Targeted Nanopore Sequencing Enables Single Nucleotide Resolution Analysis of Microsatellite Instability Diversity. bioRxiv 2025.06.26.661510. https://doi.org/10.1101/2025.06.26.661510
Owner
- Name: Nagel Lab
- Login: NagelLabHub
- Kind: organization
- Repositories: 1
- Profile: https://github.com/NagelLabHub
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it using the metadata below."
title: "MSIanalyzer: Targeted Nanopore Sequencing Enables Single Nucleotide Resolution Analysis of Microsatellite Instability Diversity"
authors:
- family-names: Zhai
given-names: Ting
email: tingzhai@hsph.harvard.edu
affiliation: Harvard T.H. Chan School of Public Health
- family-names: Laverty
given-names: Daniel J.
affiliation: Harvard T.H. Chan School of Public Health
- family-names: Nagel
given-names: Zachary D.
affiliation: Harvard T.H. Chan School of Public Health
abstract: "We present a targeted sequencing-based pipeline that profiles microsatellite instability (MSI) at single-nucleotide resolution. Targeted amplicons from the five widely studied Bethesda panel microsatellite loci were sequenced using Oxford Nanopore Technology in two microsatellite unstable colorectal cancer cell lines (HCT15, HCT116), two microsatellite stable cancer cell lines (TK6, U2OS), and two peripheral blood mononuclear cell samples from healthy donors. An anchor-extension algorithm was developed to capture repeat motifs while allowing interruptions, using a threshold informed by platform-specific error. Cluster-aware Dirichlet-multinomial and beta-binomial tests were applied for between-sample comparisons while accounting for read-level clustering within samples. The algorithm revealed distinct repeat profiles in HCT15 and HCT116 compared to other cell types and uncovered allelic diversity across samples at different MSI loci. Our approach complements existing short tandem repeat callers by preserving read-level diversity and delivering targeted, quantitative MSI calls with potential applications in mechanistic research and clinical assay development."
keywords:
- microsatellite instability
- nanopore sequencing
- single nucleotide resolution
- MSI
- colorectal cancer
- bioinformatics
type: software
date-released: 2025-06-28
version: "1.0.0"
license: MIT
repository-code: https://github.com/NagelLabHub/msianalyzer
url: https://www.biorxiv.org/content/10.1101/2025.06.26.661510v1
doi: 10.1101/2025.06.26.661510
GitHub Events
Total
- Issues event: 1
- Watch event: 1
- Issue comment event: 1
- Public event: 2
- Push event: 7
Last Year
- Issues event: 1
- Watch event: 1
- Issue comment event: 1
- Public event: 2
- Push event: 7
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- handoko12u (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- adjustText >=1.0
- biopython >=1.83
- joblib >=1.4
- matplotlib >=3.9
- numpy >=2.0
- pandas >=2.2
- pyarrow >=14
- pysam >=0.22
- python-Levenshtein >=0.25
- scikit-bio >=0.6.3
- scikit-learn >=1.5
- scipy >=1.13
- seaborn >=0.13
- statsmodels >=0.14
- typer >=0.9