gapmm2

gapmm2: gapped alignment using minimap2 (align transcripts to genome)

https://github.com/nextgenusfs/gapmm2

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

gapmm2: gapped alignment using minimap2 (align transcripts to genome)

Basic Info
  • Host: GitHub
  • Owner: nextgenusfs
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: main
  • Size: 82 KB
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 7
Created about 4 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Latest Github release Conda codecov

gapmm2: gapped alignment using minimap2

This tool is a wrapper for minimap2 to run spliced/gapped alignment, ie aligning transcripts to a genome. You are probably saying, yes minimap2 runs this with -x splice --cs option (you are correct). However, there are instances where the terminal exons from stock minimap2 alignments are missing. This tool detects those alignments that have unaligned terminal eons and uses edlib to find the terminal exon positions. The tool then updates the PAF output file with the updated information.

Rationale

We can pull out a gene model in GFF3 format that has a short 5' terminal exon:

scaffold_9 funannotate gene 408904 409621 . - . ID=OPO1_006919; scaffold_9 funannotate mRNA 408904 409621 . - . ID=OPO1_006919-T1;Parent=OPO1_006919;product=hypothetical protein; scaffold_9 funannotate exon 409609 409621 . - . ID=OPO1_006919-T1.exon1;Parent=OPO1_006919-T1; scaffold_9 funannotate exon 409320 409554 . - . ID=OPO1_006919-T1.exon2;Parent=OPO1_006919-T1; scaffold_9 funannotate exon 409090 409255 . - . ID=OPO1_006919-T1.exon3;Parent=OPO1_006919-T1; scaffold_9 funannotate exon 408904 409032 . - . ID=OPO1_006919-T1.exon4;Parent=OPO1_006919-T1; scaffold_9 funannotate CDS 409609 409621 . - 0 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1; scaffold_9 funannotate CDS 409320 409554 . - 2 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1; scaffold_9 funannotate CDS 409090 409255 . - 1 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1; scaffold_9 funannotate CDS 408904 409032 . - 0 ID=OPO1_006919-T1.cds;Parent=OPO1_006919-T1;

If we then map this transcript against the genome, we get the following PAF alignment.

$ minimap2 -x splice --cs genome.fasta cds-transcripts.fa | grep 'OPO1_006919' OPO1_006919-T1 543 13 543 - scaffold_9 658044 408903 409554 530 530 60 NM:i:0 ms:i:530 AS:i:466 nn:i:0 ts:A:+ tp:A:P cm:i:167 s1:i:510 s2:i:0 de:f:0 rl:i:0 cs:Z::129~ct57ac:166~ct64ac:235

The --cs flag in minimap2 can be used to parse the coordinates (below) and you can see we are missing the 5' exon.

```

cs2coords(408903, 13, 543, '-', ':129~ct57ac:166~ct64ac:235') ([(409320, 409554), (409090, 409255), (408904, 409032)], ```

So if we run this same alignment with gapmm2 we are able to properly align the 5' terminal exon.

$ gapmm2 genome.fa cds-transcripts.fa | grep 'OPO1_006919' OPO1_006919-T1 543 0 543 - scaffold_9 658044 408903 409621 543 543 60 tp:A:P ts:A:+ NM:i:0 cs:Z::129~ct57ac:166~ct64ac:235~ct54ac:13

```

cs2coords(408903, 0, 543, '-', ':129~ct57ac:166~ct64ac:235~ct54ac:13') ([(409609, 409621), (409320, 409554), (409090, 409255), (408904, 409032)] ```

Usage:

gapmm2 can be run as a command line script:

``` $ gapmm2 usage: gapmm2 [-o] [-f] [-t] [-m] [-i] [-d] [-h] [--version] reference query

gapmm2: gapped alignment with minimap2. Performs minimap2/mappy alignment with splice options and refines terminal alignments with edlib.

Positional arguments: reference reference genome (FASTA) query transcipts in FASTA or FASTQ

Optional arguments: -o , --out output in PAF format (default: stdout) -f , --out-format output format paf,gff3 -t , --threads number of threads to use with minimap2 (default: 3) -m , --min-mapq minimum map quality value (default: 1) -i , --max-intron max intron length, controls terminal search space (default: 500) -d, --debug write some debug info to stderr (default: False)

Help: -h, --help Show this help message and exit --version Show program's version number and exit ```

Python API

It can also be run as a python module. The module provides several functions for working with spliced alignments:

aligner function

The main function for aligning transcripts to a genome. It can write an output file in either PAF or GFF3. It returns a dictionary with alignment statistics.

```python

from gapmm2.align import aligner stats = aligner('genome.fa', 'transcripts.fa', out_fmt="gff3", output="output.gff3") stats {'n': 6926, 'low-mapq': 0, 'refine-left': 409, 'refine-right': 63} ```

cs2coords function

This function parses the CIGAR string (cs) from minimap2 and converts it to genomic coordinates, identifying exons, introns, and other alignment features.

```python

from gapmm2.align import cs2coords cs2coords(408903, 0, 543, '-', ':129~ct57ac:166~ct64ac:235~ct54ac:13') ([(409609, 409621), (409320, 409554), (409090, 409255), (408904, 409032)], [(0, 13), (13, 248), (248, 414), (414, 543)], 0, 0, True) ```

Installation

You can install gapmm2 using pip:

bash pip install gapmm2

Or you can install the latest development version directly from GitHub:

bash pip install git+https://github.com/nextgenusfs/gapmm2.git

You can also install from conda:

bash conda install -c bioconda gapmm2

Dependencies

Gapmm2 requires the following Python packages:

  • mappy (Python bindings for minimap2)
  • edlib (for sequence alignment)
  • natsort (for natural sorting)

These dependencies will be automatically installed when you install gapmm2 using pip or conda. Note that I've recently seen some seqmentation faults from mappy, so as of v25.4.13 it will run minimap2 directly instead of mappy if minimap2 is installed.

Development

Testing

Gapmm2 includes a test suite that can be run using pytest. To run the tests, first install pytest:

bash pip install pytest pytest-cov

Then run the tests from the root directory of the repository:

bash python -m pytest tests/ --cov=gapmm2

Code Formatting

This project uses pre-commit to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).

To set up pre-commit:

  1. Install pre-commit:

bash pip install pre-commit

  1. Install the git hooks:

bash pre-commit install

  1. (Optional) Run against all files:

bash pre-commit run --all-files

After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.

Owner

  • Name: Jon Palmer
  • Login: nextgenusfs
  • Kind: user
  • Location: Palo Alto, CA

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'gapmm2: gapped alignment using minimap2'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - family-names: Palmer
    given-names: Jonathan Mark
    email: nextgenusfs@gmail.com
    orcid: 'https://orcid.org/0000-0003-0929-3658'
    affiliation: Independent Researcher
repository-code: 'https://github.com/nextgenusfs/gapmm2'
keywords:
  - genome
  - alignment
  - minimap2
  - gapped alignment
  - alignment refinement
license: BSD-2-Clause
version: 25.8.12
date-released: '2025-08-16'

GitHub Events

Total
  • Create event: 3
  • Issues event: 2
  • Release event: 3
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 10
Last Year
  • Create event: 3
  • Issues event: 2
  • Release event: 3
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 10

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 8
  • Total Committers: 2
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.25
Top Committers
Name Email Commits
Jon Palmer n****s@g****m 6
Jon Palmer n****s@g****m 2

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 2
  • Total pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 2
  • Total pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nextgenusfs (1)
  • martinmau1 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 110 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 7
  • Total maintainers: 1
pypi.org: gapmm2

gapmm2: gapped alignment using minimap2

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 110 Last month
Rankings
Dependent packages count: 10.0%
Dependent repos count: 21.8%
Average: 23.4%
Stargazers count: 25.0%
Forks count: 29.8%
Downloads: 30.6%
Maintainers (1)
Last synced: 8 months ago

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • codecov/codecov-action v5 composite
pyproject.toml pypi
  • edlib *
  • mappy *
  • natsort *