nthash

Fast hash function for DNA/RNA sequences

https://github.com/bcgsc/nthash

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

bioinformatics bloom-filter genomics hash hash-algorithm hash-methods k-mer-hashing
Last synced: 4 months ago · JSON representation ·

Repository

Fast hash function for DNA/RNA sequences

Basic Info
Statistics
  • Stars: 103
  • Watchers: 19
  • Forks: 13
  • Open Issues: 4
  • Releases: 6
Topics
bioinformatics bloom-filter genomics hash hash-algorithm hash-methods k-mer-hashing
Created over 10 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Release Downloads Issues

Logo

ntHash is an efficient rolling hash function for k-mers and spaced seeds.

Installation

Make sure Meson is installed on the system.

Download the repo (either from the releases section or close using git clone https://github.com/bcgsc/ntHash). Setup meson in an arbitrary directory (e.g. build), by running the following command in the project's root (include --prefix=PREFIX set the installation prefix to PREFIX):

shell meson setup --buildtype=release --prefix=PREFIX build

Then, install the project and its dependencies using:

shell meson install -C build

This will install include/nthash and lib/libnthash.a to the installation prefix.

Usage

To use ntHash in a C++ project: - Import ntHash in the code using #include <nthash/nthash.hpp> - Access ntHash classes from the nthash namespace - Add the include directory (pass -IPREFIX/include to the compiler) - Link the code with libnthash.a (i.e. pass -LPREFIX/lib -lnthash to the compiler, where PREFIX is the installation prefix) - Compile your code with -std=c++17 (and preferably -O3) enabled

Refer to docs for more information.

Examples

Generally, the nthash::NtHash and nthash::SeedNtHash classes are used for hashing sequences:

C++ nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5); // 1 hash per 5-mer while (nth.roll()) { // use nth.hashes() for canonical hashes // nth.get_forward_hash() for forward strand hashes // nth.get_reverse_hash() for reverse strand hashes }

C++ std::vector<std::string> seeds = {"10101", "11011"}; nthash::SeedNtHash nth("TGACTGATCGAGTCGTACTAG", seeds, 3, 5); while (nth.roll()) { // nth.hashes()[0] = "T#A#T"'s first hash // nth.hashes()[1] = "T#A#T"'s second hash // nth.hashes()[2] = "T#A#T"'s third hash // nth.hashes()[3] = "TG#CT"'s first hash }

For developers

If you would like to contribute to the development of ntHash, after forking/cloning the repo, create the build directory without the release flag:

meson setup build

Compile the code, tests, and benchmarking script using:

meson compile -C build

If compilation is successful, libnthash.a will be available in the build folder. The benchmarking script is also compiled as the bench binary file in build.

Before sending a PR, make sure that:

  • tests pass by running meson test in the project directory
  • code is formatted properly by running ninja clang-format in the build folder (requires clang-format to be available)
  • coding standards have been met by making sure running ninja clang-tidy-check in build returns no errors (requires clang-tools to be installed)
  • documentation is up-to-date by running ninja docs in build (requires doxygen)

Publications

Parham Kazemi, Johnathan Wong, Vladimir Nikolić, Hamid Mohamadi, René L Warren, Inanç Birol, ntHash2: recursive spaced seed hashing for nucleotide sequences, Bioinformatics, 2022;, btac564, https://doi.org/10.1093/bioinformatics/btac564

Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashing. Bioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397

Owner

  • Name: BC Cancer Canada's Michael Smith Genome Sciences Centre
  • Login: bcgsc
  • Kind: organization
  • Email: rwarren@bcgsc.ca
  • Location: Vancouver, BC, Canada

Sequencing centre for genomics and bioinformatics research. NOTE 01/2019: FTP now closed. Cited data available: http://www.bcgsc.ca/downloads/supplementary

Citation (CITATION.bib)

@article{10.1093/bioinformatics/btac564,
    author = {Kazemi, Parham and Wong, Johnathan and Nikolić, Vladimir and Mohamadi, Hamid and Warren, René L and Birol, Inanç},
    title = "{ntHash2: recursive spaced seed hashing for nucleotide sequences}",
    journal = {Bioinformatics},
    volume = {38},
    number = {20},
    pages = {4812-4813},
    year = {2022},
    month = {08},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac564},
    url = {https://doi.org/10.1093/bioinformatics/btac564},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/20/4812/46535020/btac564.pdf},
}

@article{doi:10.1093/bioinformatics/btw397,
author = {Mohamadi, Hamid and Chu, Justin and Vandervalk, Benjamin P. and Birol, Inanc},
title = {ntHash: recursive nucleotide hashing},
journal = {Bioinformatics},
volume = {32},
number = {22},
pages = {3492},
year = {2016},
doi = {10.1093/bioinformatics/btw397},
URL = { + http://dx.doi.org/10.1093/bioinformatics/btw397},
eprint = {/oup/backfile/Content_public/Journal/bioinformatics/32/22/10.1093_bioinformatics_btw397/3/btw397.pdf}
}

GitHub Events

Total
  • Watch event: 7
  • Member event: 1
Last Year
  • Watch event: 7
  • Member event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 19
  • Total pull requests: 15
  • Average time to close issues: 10 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 14
  • Total pull request authors: 8
  • Average comments per issue: 4.63
  • Average comments per pull request: 1.33
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • VeryAmazed (2)
  • rsharris (2)
  • JustinChu (2)
  • ishmeals (1)
  • Lyannic (1)
  • mohamadi (1)
  • yanlifeng (1)
  • marekkokot (1)
  • yoshihikosuzuki (1)
  • shenwei356 (1)
  • luizirber (1)
  • sjackman (1)
  • davisem (1)
  • rica01 (1)
Pull Request Authors
  • jwcodee (6)
  • parham-k (3)
  • ishmeals (2)
  • mirounga (1)
  • yoshihikosuzuki (1)
  • lcoombe (1)
  • JustinChu (1)
  • johnlees (1)
Top Labels
Issue Labels
bug (3) enhancement (3) help wanted (1)
Pull Request Labels
enhancement (2)