katka2
Metagenomic classificator using maximal-exact matches with KATKA kernel and minimizer digests
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Metagenomic classificator using maximal-exact matches with KATKA kernel and minimizer digests
Basic Info
- Host: GitHub
- Owner: draessld
- License: cc-by-4.0
- Language: C++
- Default Branch: main
- Size: 5.56 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MEM finding tool for taxonomic classification
Brief description
Software for finding the leftmost and the rightmost positions of each MEMs of the read with respect to the text described in Draesslerova, D., Ahmed, O., Gagie, T., Holub, J., Langmead, B., Manzini, G., & Navarro, G. Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests
Implementation takes input as a simple string or a concatenation of strings divided with $ character.
GATTACAT$AGATACAT$GATACAT$GATTAGAT$GATTAGATA
Gives an output like ```
pattern timeinms #Mems [startposition,endposition]{leftmostgenome,rightmostgenome} [startposition,endposition]{leftmostgenome,rightmostgenome} ... pattern timeinms #Mems [startposition,endposition]{leftmostgenome,rightmostgenome} [startposition,endposition]{leftmostgenome,rightmostgenome} ... ... ```
Compile
You need the boost library and SDSL installed on your system. Please use version this version of SDSL.
Project uses cmake to generate the Makefile. Create a build folder in the main folder:
$ mkdir build
$ cd build; cmake ..
$ make
Run
To build index, run
$ index-build ../src/tests/test.txt
This command will create the required data structures of the text file and store them using filename as prefix.
patterns_files should contains every pattern on a new line. For searching simple pattern can be used option -p<pattern>.
To locate all MEMs in relation to pattern
$ ./index-locate ../src/tests/test/test -P../src/tests/test.pattern
$ ./index-locate ../src/tests/test/test -p<pattern>
results are printed on the standard output
Creating KATKA kernels
To create KATKA kernel with parameter k, run
$ ./kernelize -i../src/tests/test.txt -k<k>
to kernelize full text in the file, or
$ ./kernelize -s<pattern> -k<k>
to kernelize simple pattern
Creating minimizer digests
To create minimizer digests with parameter w, run either
$ ./minimizer_digest -i../src/tests/test.txt -w<w>
to digest full text in the file, or
$ ./minimizer_digest -s<pattern> -w<w>
to digest simple pattern
Note: So far, parameter k for minimizers is static k=3
Experiments
Experiments described in the article were run over the SILVA dataset, available here. 1000 samples were taken in range 1100-2100 from total 9118 reference files. Main result of the project is described within the following graph.

Owner
- Name: Dominika Draesslerová
- Login: draessld
- Kind: user
- Location: Prague
- Repositories: 1
- Profile: https://github.com/draessld
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Draesslerova" given-names: "Dominika" - family-names: "Ahmed" given-names: "Omar" - family-names: "Gagie" given-names: "Travis" - family-names: "Holub" given-names: "Jan" - family-names: "Langmead" given-names: "Ben" - family-names: "Manzini" given-names: "Giovanni" - family-names: "Navarro" given-names: "Gonzalo" title: "Taxonomic classification with maximal exact matches in KATKA kernels and minimizer digests" version: 1.0.0 doi: 10.4230/LIPIcs.SEA.2024.10 date-released: 2024-07-11 url: "https://github.com/draessld/KATKA2/tree/main"
GitHub Events
Total
- Watch event: 1
- Push event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Push event: 1
- Fork event: 1