moni

MONI: A Pangenomic Index for Finding MEMs

https://github.com/maxrossi91/moni

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary

Keywords

matching-statistics maximal-exact-matches r-index
Last synced: 9 months ago · JSON representation ·

Repository

MONI: A Pangenomic Index for Finding MEMs

Basic Info
  • Host: GitHub
  • Owner: maxrossi91
  • License: gpl-3.0
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 7.78 MB
Statistics
  • Stars: 37
  • Watchers: 2
  • Forks: 9
  • Open Issues: 2
  • Releases: 4
Topics
matching-statistics maximal-exact-matches r-index
Created almost 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Release Downloads Docker Pulls Docker Image Size Bioconda

MONI

console __ __ ____ _ _ _____ | \/ |/ __ \| \ | |_ _| | \ / | | | | \| | | | | |\/| | | | | . ` | | | | | | | |__| | |\ |_| |_ |_| |_|\____/|_| \_|_____| ver 0.2.2 A Pangenomics Index for Finding MEMs.

MONI index uses the prefix-free parsing of the text [2][3] to build the Burrows-Wheeler Transform (BWT) of the reference genomes, the suffix array (SA) samples at the beginning and at the end of each run of the BWT, and the threshold positions of [1].

How to get MONI

Docker

MONI is available on docker:

console docker pull maxrossi91/moni:v0.2.2 docker run maxrossi91/moni:v0.2.2 moni -h if using singularity: console singularity pull moni_sif docker://maxrossi91/moni:v0.2.2 ./moni_sif moni --help

Install Packages

We provide MONI on a .deb package: console wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2_amd64.deb sudo dpkg -i moni_v0.2.2_amd64.deb moni -h We provide MONI on a linux .sh installer: console wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.sh chmod +x moni_v0.2.2-Linux.sh ./moni_v0.2.2-Linux.sh moni -h We provide MONI on a pre-compiled .tar.gz: console wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.tar.gz tar -xzvf moni_v0.2.2-Linux.tar.gz moni_v0.2.2-Linux/bin/moni -h

Compile and install

Install prerequisite packages

console apt-get update apt-get install -y build-essential cmake git python3 zlib1g-dev

Download

console git clone https://github.com/maxrossi91/moni

Compile

console co moni mkdir build cd build; cmake -DCMAKE_INSTALL_PREFIX=<path/to/install/prefix> .. make

Replace <path/to/install/prefix> with your preferred install path. If not specified the install path is /usr/bin by default.

Install

console make install

Construction of the index:

``` usage: moni build [-h] -r REFERENCE [-w WSIZE] [-p MOD] [-t THREADS] [-k] [-v] [-f] [--moni-ms] [--spumoni] -h, --help show this help message and exit -r REFERENCE, --reference REFERENCE reference file name (default: None) -o OUTPUT, --output OUTPUT output directory path (default: same as reference) -w WSIZE, --wsize WSIZE sliding window size (default: 10) -p MOD, --mod MOD hash modulus (default: 100) -t THREADS, --threads THREADS number of helper threads (default: 0) -k keep temporary files (default: False) -v verbose (default: False) -f read fasta (default: False) -g GRAMMAR, --grammar GRAMMAR select the grammar plain, shaped

```

Computing the matching statistics with MONI:

usage: moni ms [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS] -h, --help show this help message and exit -i INDEX, --index INDEX reference index base name (default: None) -p PATTERN, --pattern PATTERN the input query (default: None) -o OUTPUT, --output OUTPUT output directory path (default: .) -t THREADS, --threads THREADS number of helper threads (default: 1) -g GRAMMAR, --grammar GRAMMAR select the grammar [plain, shaped] (default: plain)

Computing the matching statistics with MONI:

usage: moni mems [-h] -i INDEX -p PATTERN [-o OUTPUT] [-e] [-s] [-t THREADS] -h, --help show this help message and exit -i INDEX, --index INDEX reference index base name (default: None) -p PATTERN, --pattern PATTERN the input query (default: None) -o OUTPUT, --output OUTPUT output directory path (default: .) -e, --extended-output output MEM occurrence in the reference (default: False) -s, --sam-output output MEM in a SAM formatted file. (default: False) -t THREADS, --threads THREADS number of helper threads (default: 1) -g GRAMMAR, --grammar GRAMMAR select the grammar [plain, shaped] (default: plain)

Computing the MEM extension with MONI and ksw2:

``` usage: moni extend [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS] [-b BATCH] [-g GRAMMAR] [-L EXTL] [-A SMATCH] [-B SMISMATCH] [-O GAPO] [-E GAPE]

optional arguments: -h, --help show this help message and exit -i INDEX, --index INDEX reference index folder (default: None) -p PATTERN, --pattern PATTERN the input query (default: None) -o OUTPUT, --output OUTPUT output directory path (default: .) -t THREADS, --threads THREADS number of helper threads (default: 1) -b BATCH, --batch BATCH number of reads per thread batch (default: 100) -g GRAMMAR, --grammar GRAMMAR select the grammar plain, shaped -L EXTL, --extl EXTL length of reference substring for extension (default: 100) -A SMATCH, --smatch SMATCH match score value (default: 2) -B SMISMATCH, --smismatch SMISMATCH mismatch penalty value (default: 4) -O GAPO, --gapo GAPO coma separated gap open penalty values (default: 4,13) -E GAPE, --gape GAPE coma separated gap extension penalty values (default: 2,1) ```

Example

Build the index for SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder

console moni build -r data/SARS-CoV2/SARS-CoV2.1k.fa.gz -o sars-cov2 -f It produces three files sars-cov2.plain.slp, sars-cov2.thrbv.ms, and sars-cov2.idx in the current folder which contain the grammar, the rlbwt and the thresholds, and the starting position and name of each fasta sequence in the reference file respectively.

Compute the matching statistics of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder

console moni ms -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads It produces two output files reads.lengths and reads.pointers in the current folder which store the lengths and the positions of the matching statistics of the reads against the reference in a fasta-like format.

Compute the MEMs of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder

console moni mems -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads It produces one output file reads.mems in the current folder which store the MEMs reposted as pairs of position and lengths in a fasta-like format.

Compute the MEM extension of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder

console moni extend -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads It produces one output file reads.sam in the current folder which stores the information of the MEM extensions in SAM format.

External resources

Citation

Please, if you use this tool in an academic setting cite the following papers:

@article{RossiOLGB21,
author      = { Massimiliano Rossi and 
                Marco Oliva and
                Ben Langmead and
                Travis Gagie and
                Christina Boucher},
title       = {MONI: A Pangenomics Index for Finding Maximal Exact Matches},
booktitle   = {Research in Computational Molecular Biology - 25th Annual 
                International Conference, {RECOMB} 2021, Padova, Italy},
journal     = {Journal of Computational Biology},
volume      = {29},
number      = {2},
pages       = {169--187},
year        = {2022},
publisher   = {Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…}
}

Authors

Theoretical results:

  • Christina Boucher
  • Travis Gagie
  • Ben Langmead
  • Massimiliano Rossi

Implementation:

Experiments

Why "MONI"?

Moni is the Finnish word for multi.

References

[1] Hideo Bannai, Travis Gagie, and Tomohiro I, "Refining ther-index", Theoretical Computer Science, 812 (2020), pp. 96–108

[2] Christina Boucher, Travis Gagie, Alan Kuhnle and Giovanni Manzini, "Prefix-Free Parsing for Building Big BWTs", In Proc. of the 18th International Workshop on Algorithms in Bioinformatics (WABI 2018).

[3] Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. "Prefix-free parsing for building big BWTs.", Algorithms for Molecular Biology 14, no. 1 (2019): 13.

Owner

  • Name: Massimiliano Rossi
  • Login: maxrossi91
  • Kind: user
  • Company: University of Florida

Postdoc at the University of Florida, in the Boucher Lab.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rossi"
  given-names: "Massimiliano"
  orcid: "https://orcid.org/0000-0002-3012-1394"
title: "MONI: A Pangenomic Index for Finding Maximal Exact Matches"
url: "https://github.com/maxrossi91/moni"
preferred-citation:
  type: journal-paper
  authors:
  - family-names: "Rossi"
    given-names: "Massimiliano"
    orcid: "https://orcid.org/0000-0002-3012-1394"
  - family-names: "Oliva"
    given-names: "Marco"
    orcid: "https://orcid.org/0000-0003-0525-3114"
  - family-names: "Langmead"
    given-names: "Ben"
    orcid: "https://orcid.org/0000-0003-2437-1976"
  - family-names: "Gagie"
    given-names: "Travis"
    orcid: "https://orcid.org/0000-0003-3689-327X"
  - family-names: "Boucher"
    given-names: "Christina"
    orcid: "https://orcid.org/0000-0001-9509-9725"
  doi: 10.1089/cmb.2021.0290
  journal: "Journal of Computational Biology"
  start: 169  # First page number
  end: 187 # Last page number
  title: "MONI: A Pangenomic Index for Finding Maximal Exact Matchesx"
  year: 2022
  volume: 29
  number: 2

GitHub Events

Total
  • Push event: 1
  • Pull request event: 2
  • Create event: 1
Last Year
  • Push event: 1
  • Pull request event: 2
  • Create event: 1