gapmm2
gapmm2: gapped alignment using minimap2 (align transcripts to genome)
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary
Repository
gapmm2: gapped alignment using minimap2 (align transcripts to genome)
Basic Info
- Host: GitHub
- Owner: nextgenusfs
- License: bsd-2-clause
- Language: Python
- Default Branch: main
- Size: 82 KB
Statistics
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 7
Metadata Files
README.md
gapmm2: gapped alignment using minimap2
This tool is a wrapper for minimap2 to run spliced/gapped alignment, ie aligning transcripts to a genome. You are probably saying, yes minimap2 runs this with -x splice --cs option (you are correct). However, there are instances where the terminal exons from stock minimap2 alignments are missing. This tool detects those alignments that have unaligned terminal eons and uses edlib to find the terminal exon positions. The tool then updates the PAF output file with the updated information.
Rationale
We can pull out a gene model in GFF3 format that has a short 5' terminal exon:
scaffold_9 funannotate gene 408904 409621 . - . ID=OPO1_006919;
scaffold_9 funannotate mRNA 408904 409621 . - . ID=OPO1_006919-T1;Parent=OPO1_006919;product=hypothetical protein;
scaffold_9 funannotate exon 409609 409621 . - . ID=OPO1_006919-T1.exon1;Parent=OPO1_006919-T1;
scaffold_9 funannotate exon 409320 409554 . - . ID=OPO1_006919-T1.exon2;Parent=OPO1_006919-T1;
scaffold_9 funannotate exon 409090 409255 . - . ID=OPO1_006919-T1.exon3;Parent=OPO1_006919-T1;
scaffold_9 funannotate exon 408904 409032 . - . ID=OPO1_006919-T1.exon4;Parent=OPO1_006919-T1;
scaffold_9 funannotate CDS 409609 409621 . - 0 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9 funannotate CDS 409320 409554 . - 2 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9 funannotate CDS 409090 409255 . - 1 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
scaffold_9 funannotate CDS 408904 409032 . - 0 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;
If we then map this transcript against the genome, we get the following PAF alignment.
$ minimap2 -x splice --cs genome.fasta cds-transcripts.fa | grep 'OPO1_006919'
OPO1_006919-T1 543 13 543 - scaffold_9 658044 408903 409554 530 530 60 NM:i:0 ms:i:530 AS:i:466 nn:i:0 ts:A:+ tp:A:P cm:i:167 s1:i:510 s2:i:0 de:f:0 rl:i:0 cs:Z::129~ct57ac:166~ct64ac:235
The --cs flag in minimap2 can be used to parse the coordinates (below) and you can see we are missing the 5' exon.
```
cs2coords(408903, 13, 543, '-', ':129~ct57ac:166~ct64ac:235') ([(409320, 409554), (409090, 409255), (408904, 409032)], ```
So if we run this same alignment with gapmm2 we are able to properly align the 5' terminal exon.
$ gapmm2 genome.fa cds-transcripts.fa | grep 'OPO1_006919'
OPO1_006919-T1 543 0 543 - scaffold_9 658044 408903 409621 543 543 60 tp:A:P ts:A:+ NM:i:0 cs:Z::129~ct57ac:166~ct64ac:235~ct54ac:13
```
cs2coords(408903, 0, 543, '-', ':129~ct57ac:166~ct64ac:235~ct54ac:13') ([(409609, 409621), (409320, 409554), (409090, 409255), (408904, 409032)] ```
Usage:
gapmm2 can be run as a command line script:
``` $ gapmm2 usage: gapmm2 [-o] [-f] [-t] [-m] [-i] [-d] [-h] [--version] reference query
gapmm2: gapped alignment with minimap2. Performs minimap2/mappy alignment with splice options and refines terminal alignments with edlib.
Positional arguments: reference reference genome (FASTA) query transcipts in FASTA or FASTQ
Optional arguments: -o , --out output in PAF format (default: stdout) -f , --out-format output format paf,gff3 -t , --threads number of threads to use with minimap2 (default: 3) -m , --min-mapq minimum map quality value (default: 1) -i , --max-intron max intron length, controls terminal search space (default: 500) -d, --debug write some debug info to stderr (default: False)
Help: -h, --help Show this help message and exit --version Show program's version number and exit ```
Python API
It can also be run as a python module. The module provides several functions for working with spliced alignments:
aligner function
The main function for aligning transcripts to a genome. It can write an output file in either PAF or GFF3. It returns a dictionary with alignment statistics.
```python
from gapmm2.align import aligner stats = aligner('genome.fa', 'transcripts.fa', out_fmt="gff3", output="output.gff3") stats {'n': 6926, 'low-mapq': 0, 'refine-left': 409, 'refine-right': 63} ```
cs2coords function
This function parses the CIGAR string (cs) from minimap2 and converts it to genomic coordinates, identifying exons, introns, and other alignment features.
```python
from gapmm2.align import cs2coords cs2coords(408903, 0, 543, '-', ':129~ct57ac:166~ct64ac:235~ct54ac:13') ([(409609, 409621), (409320, 409554), (409090, 409255), (408904, 409032)], [(0, 13), (13, 248), (248, 414), (414, 543)], 0, 0, True) ```
Installation
You can install gapmm2 using pip:
bash
pip install gapmm2
Or you can install the latest development version directly from GitHub:
bash
pip install git+https://github.com/nextgenusfs/gapmm2.git
You can also install from conda:
bash
conda install -c bioconda gapmm2
Dependencies
Gapmm2 requires the following Python packages:
- mappy (Python bindings for minimap2)
- edlib (for sequence alignment)
- natsort (for natural sorting)
These dependencies will be automatically installed when you install gapmm2 using pip or conda. Note that I've recently seen some seqmentation faults from mappy, so as of v25.4.13 it will run minimap2 directly instead of mappy if minimap2 is installed.
Development
Testing
Gapmm2 includes a test suite that can be run using pytest. To run the tests, first install pytest:
bash
pip install pytest pytest-cov
Then run the tests from the root directory of the repository:
bash
python -m pytest tests/ --cov=gapmm2
Code Formatting
This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).
To set up pre-commit:
- Install pre-commit:
bash
pip install pre-commit
- Install the git hooks:
bash
pre-commit install
- (Optional) Run against all files:
bash
pre-commit run --all-files
After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.
Owner
- Name: Jon Palmer
- Login: nextgenusfs
- Kind: user
- Location: Palo Alto, CA
- Repositories: 9
- Profile: https://github.com/nextgenusfs
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'gapmm2: gapped alignment using minimap2'
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- family-names: Palmer
given-names: Jonathan Mark
email: nextgenusfs@gmail.com
orcid: 'https://orcid.org/0000-0003-0929-3658'
affiliation: Independent Researcher
repository-code: 'https://github.com/nextgenusfs/gapmm2'
keywords:
- genome
- alignment
- minimap2
- gapped alignment
- alignment refinement
license: BSD-2-Clause
version: 25.8.12
date-released: '2025-08-16'
GitHub Events
Total
- Create event: 3
- Issues event: 2
- Release event: 3
- Watch event: 1
- Issue comment event: 1
- Push event: 10
Last Year
- Create event: 3
- Issues event: 2
- Release event: 3
- Watch event: 1
- Issue comment event: 1
- Push event: 10
Committers
Last synced: about 3 years ago
All Time
- Total Commits: 8
- Total Committers: 2
- Avg Commits per committer: 4.0
- Development Distribution Score (DDS): 0.25
Top Committers
| Name | Commits | |
|---|---|---|
| Jon Palmer | n****s@g****m | 6 |
| Jon Palmer | n****s@g****m | 2 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: about 2 hours
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: about 2 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- nextgenusfs (1)
- martinmau1 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 110 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 7
- Total maintainers: 1
pypi.org: gapmm2
gapmm2: gapped alignment using minimap2
- Homepage: https://github.com/nextgenusfs/gapmm2
- Documentation: https://gapmm2.readthedocs.io/
- License: BSD-2-Clause
-
Latest release: 25.8.12
published 8 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/download-artifact v4 composite
- actions/setup-python v5 composite
- actions/upload-artifact v4 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- codecov/codecov-action v5 composite
- edlib *
- mappy *
- natsort *