Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary
Keywords
Repository
MONI: A Pangenomic Index for Finding MEMs
Basic Info
Statistics
- Stars: 37
- Watchers: 2
- Forks: 9
- Open Issues: 2
- Releases: 4
Topics
Metadata Files
README.md
MONI
console
__ __ ____ _ _ _____
| \/ |/ __ \| \ | |_ _|
| \ / | | | | \| | | |
| |\/| | | | | . ` | | |
| | | | |__| | |\ |_| |_
|_| |_|\____/|_| \_|_____|
ver 0.2.2
A Pangenomics Index for Finding MEMs.
MONI index uses the prefix-free parsing of the text [2][3] to build the Burrows-Wheeler Transform (BWT) of the reference genomes, the suffix array (SA) samples at the beginning and at the end of each run of the BWT, and the threshold positions of [1].
How to get MONI
Docker
MONI is available on docker:
console
docker pull maxrossi91/moni:v0.2.2
docker run maxrossi91/moni:v0.2.2 moni -h
if using singularity:
console
singularity pull moni_sif docker://maxrossi91/moni:v0.2.2
./moni_sif moni --help
Install Packages
We provide MONI on a .deb package:
console
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2_amd64.deb
sudo dpkg -i moni_v0.2.2_amd64.deb
moni -h
We provide MONI on a linux .sh installer:
console
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.sh
chmod +x moni_v0.2.2-Linux.sh
./moni_v0.2.2-Linux.sh
moni -h
We provide MONI on a pre-compiled .tar.gz:
console
wget https://github.com/maxrossi91/moni/releases/download/v0.2.2/moni_v0.2.2-Linux.tar.gz
tar -xzvf moni_v0.2.2-Linux.tar.gz
moni_v0.2.2-Linux/bin/moni -h
Compile and install
Install prerequisite packages
console
apt-get update
apt-get install -y build-essential cmake git python3 zlib1g-dev
Download
console
git clone https://github.com/maxrossi91/moni
Compile
console
co moni
mkdir build
cd build; cmake -DCMAKE_INSTALL_PREFIX=<path/to/install/prefix> ..
make
Replace <path/to/install/prefix> with your preferred install path. If not specified the install path is /usr/bin by default.
Install
console
make install
Construction of the index:
``` usage: moni build [-h] -r REFERENCE [-w WSIZE] [-p MOD] [-t THREADS] [-k] [-v] [-f] [--moni-ms] [--spumoni] -h, --help show this help message and exit -r REFERENCE, --reference REFERENCE reference file name (default: None) -o OUTPUT, --output OUTPUT output directory path (default: same as reference) -w WSIZE, --wsize WSIZE sliding window size (default: 10) -p MOD, --mod MOD hash modulus (default: 100) -t THREADS, --threads THREADS number of helper threads (default: 0) -k keep temporary files (default: False) -v verbose (default: False) -f read fasta (default: False) -g GRAMMAR, --grammar GRAMMAR select the grammar plain, shaped
```
Computing the matching statistics with MONI:
usage: moni ms [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS]
-h, --help show this help message and exit
-i INDEX, --index INDEX
reference index base name (default: None)
-p PATTERN, --pattern PATTERN
the input query (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: .)
-t THREADS, --threads THREADS
number of helper threads (default: 1)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
Computing the matching statistics with MONI:
usage: moni mems [-h] -i INDEX -p PATTERN [-o OUTPUT] [-e] [-s] [-t THREADS]
-h, --help show this help message and exit
-i INDEX, --index INDEX
reference index base name (default: None)
-p PATTERN, --pattern PATTERN
the input query (default: None)
-o OUTPUT, --output OUTPUT
output directory path (default: .)
-e, --extended-output
output MEM occurrence in the reference (default: False)
-s, --sam-output
output MEM in a SAM formatted file. (default: False)
-t THREADS, --threads THREADS
number of helper threads (default: 1)
-g GRAMMAR, --grammar GRAMMAR
select the grammar [plain, shaped] (default: plain)
Computing the MEM extension with MONI and ksw2:
``` usage: moni extend [-h] -i INDEX -p PATTERN [-o OUTPUT] [-t THREADS] [-b BATCH] [-g GRAMMAR] [-L EXTL] [-A SMATCH] [-B SMISMATCH] [-O GAPO] [-E GAPE]
optional arguments: -h, --help show this help message and exit -i INDEX, --index INDEX reference index folder (default: None) -p PATTERN, --pattern PATTERN the input query (default: None) -o OUTPUT, --output OUTPUT output directory path (default: .) -t THREADS, --threads THREADS number of helper threads (default: 1) -b BATCH, --batch BATCH number of reads per thread batch (default: 100) -g GRAMMAR, --grammar GRAMMAR select the grammar plain, shaped -L EXTL, --extl EXTL length of reference substring for extension (default: 100) -A SMATCH, --smatch SMATCH match score value (default: 2) -B SMISMATCH, --smismatch SMISMATCH mismatch penalty value (default: 4) -O GAPO, --gapo GAPO coma separated gap open penalty values (default: 4,13) -E GAPE, --gape GAPE coma separated gap extension penalty values (default: 2,1) ```
Example
Build the index for SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder
console
moni build -r data/SARS-CoV2/SARS-CoV2.1k.fa.gz -o sars-cov2 -f
It produces three files sars-cov2.plain.slp, sars-cov2.thrbv.ms, and sars-cov2.idx in the current folder which contain the grammar, the rlbwt and the thresholds, and the starting position and name of each fasta sequence in the reference file respectively.
Compute the matching statistics of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder
console
moni ms -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads
It produces two output files reads.lengths and reads.pointers in the current folder which store the lengths and the positions of the matching statistics of the reads against the reference in a fasta-like format.
Compute the MEMs of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder
console
moni mems -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads
It produces one output file reads.mems in the current folder which store the MEMs reposted as pairs of position and lengths in a fasta-like format.
Compute the MEM extension of reads.fastq.gz against SARS-CoV2.1k.fa.gz in the data/SARS-CoV2 folder
console
moni extend -i sars-cov2 -p data/SARS-CoV2/reads.fastq.gz -o reads
It produces one output file reads.sam in the current folder which stores the information of the MEM extensions in SAM format.
External resources
- Big-BWT
- sdsl-lite
- klib
- ksw2
- r-index
- pfp-thresholds
- bigrepair
- shaped_slp
<!-- * Google Benchmark
- Google Test -->
Citation
Please, if you use this tool in an academic setting cite the following papers:
@article{RossiOLGB21,
author = { Massimiliano Rossi and
Marco Oliva and
Ben Langmead and
Travis Gagie and
Christina Boucher},
title = {MONI: A Pangenomics Index for Finding Maximal Exact Matches},
booktitle = {Research in Computational Molecular Biology - 25th Annual
International Conference, {RECOMB} 2021, Padova, Italy},
journal = {Journal of Computational Biology},
volume = {29},
number = {2},
pages = {169--187},
year = {2022},
publisher = {Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…}
}
Authors
Theoretical results:
- Christina Boucher
- Travis Gagie
- Ben Langmead
- Massimiliano Rossi
Implementation:
Experiments
Why "MONI"?
Moni is the Finnish word for multi.
References
[1] Hideo Bannai, Travis Gagie, and Tomohiro I, "Refining ther-index", Theoretical Computer Science, 812 (2020), pp. 96–108
[2] Christina Boucher, Travis Gagie, Alan Kuhnle and Giovanni Manzini, "Prefix-Free Parsing for Building Big BWTs", In Proc. of the 18th International Workshop on Algorithms in Bioinformatics (WABI 2018).
[3] Christina Boucher, Travis Gagie, Alan Kuhnle, Ben Langmead, Giovanni Manzini, and Taher Mun. "Prefix-free parsing for building big BWTs.", Algorithms for Molecular Biology 14, no. 1 (2019): 13.
Owner
- Name: Massimiliano Rossi
- Login: maxrossi91
- Kind: user
- Company: University of Florida
- Website: https://maxrossi91.com
- Twitter: maxrossi91
- Repositories: 13
- Profile: https://github.com/maxrossi91
Postdoc at the University of Florida, in the Boucher Lab.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Rossi"
given-names: "Massimiliano"
orcid: "https://orcid.org/0000-0002-3012-1394"
title: "MONI: A Pangenomic Index for Finding Maximal Exact Matches"
url: "https://github.com/maxrossi91/moni"
preferred-citation:
type: journal-paper
authors:
- family-names: "Rossi"
given-names: "Massimiliano"
orcid: "https://orcid.org/0000-0002-3012-1394"
- family-names: "Oliva"
given-names: "Marco"
orcid: "https://orcid.org/0000-0003-0525-3114"
- family-names: "Langmead"
given-names: "Ben"
orcid: "https://orcid.org/0000-0003-2437-1976"
- family-names: "Gagie"
given-names: "Travis"
orcid: "https://orcid.org/0000-0003-3689-327X"
- family-names: "Boucher"
given-names: "Christina"
orcid: "https://orcid.org/0000-0001-9509-9725"
doi: 10.1089/cmb.2021.0290
journal: "Journal of Computational Biology"
start: 169 # First page number
end: 187 # Last page number
title: "MONI: A Pangenomic Index for Finding Maximal Exact Matchesx"
year: 2022
volume: 29
number: 2
GitHub Events
Total
- Push event: 1
- Pull request event: 2
- Create event: 1
Last Year
- Push event: 1
- Pull request event: 2
- Create event: 1