katka2

Metagenomic classificator using maximal-exact matches with KATKA kernel and minimizer digests

https://github.com/draessld/katka2

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Metagenomic classificator using maximal-exact matches with KATKA kernel and minimizer digests

Basic Info
  • Host: GitHub
  • Owner: draessld
  • License: cc-by-4.0
  • Language: C++
  • Default Branch: main
  • Size: 5.56 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Citation

README.md

MEM finding tool for taxonomic classification

Brief description

Software for finding the leftmost and the rightmost positions of each MEMs of the read with respect to the text described in Draesslerova, D., Ahmed, O., Gagie, T., Holub, J., Langmead, B., Manzini, G., & Navarro, G. Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests

Implementation takes input as a simple string or a concatenation of strings divided with $ character. GATTACAT$AGATACAT$GATACAT$GATTAGAT$GATTAGATA

Gives an output like ```

pattern timeinms #Mems [startposition,endposition]{leftmostgenome,rightmostgenome} [startposition,endposition]{leftmostgenome,rightmostgenome} ... pattern timeinms #Mems [startposition,endposition]{leftmostgenome,rightmostgenome} [startposition,endposition]{leftmostgenome,rightmostgenome} ... ... ```

Compile

You need the boost library and SDSL installed on your system. Please use version this version of SDSL.

Project uses cmake to generate the Makefile. Create a build folder in the main folder: $ mkdir build $ cd build; cmake .. $ make

Run

To build index, run

$ index-build ../src/tests/test.txt

This command will create the required data structures of the text file and store them using filename as prefix. patterns_files should contains every pattern on a new line. For searching simple pattern can be used option -p<pattern>. To locate all MEMs in relation to pattern $ ./index-locate ../src/tests/test/test -P../src/tests/test.pattern $ ./index-locate ../src/tests/test/test -p<pattern> results are printed on the standard output

Creating KATKA kernels

To create KATKA kernel with parameter k, run $ ./kernelize -i../src/tests/test.txt -k<k> to kernelize full text in the file, or

$ ./kernelize -s<pattern> -k<k> to kernelize simple pattern

Creating minimizer digests

To create minimizer digests with parameter w, run either $ ./minimizer_digest -i../src/tests/test.txt -w<w> to digest full text in the file, or

$ ./minimizer_digest -s<pattern> -w<w> to digest simple pattern

Note: So far, parameter k for minimizers is static k=3

Experiments

Experiments described in the article were run over the SILVA dataset, available here. 1000 samples were taken in range 1100-2100 from total 9118 reference files. Main result of the project is described within the following graph. graph

Owner

  • Name: Dominika Draesslerová
  • Login: draessld
  • Kind: user
  • Location: Prague

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Draesslerova"
  given-names: "Dominika"
- family-names: "Ahmed"
  given-names: "Omar"
- family-names: "Gagie"
  given-names: "Travis"
- family-names: "Holub"
  given-names: "Jan"
- family-names: "Langmead"
  given-names: "Ben"
- family-names: "Manzini"
  given-names: "Giovanni"
- family-names: "Navarro"
  given-names: "Gonzalo"
title: "Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests"
version: 1.0.0
doi: 10.4230/LIPIcs.SEA.2024.10
date-released: 2024-07-11
url: "https://github.com/draessld/KATKA2/tree/main"

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Push event: 1
  • Fork event: 1