Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 8 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Keywords
Repository
Fast hash function for DNA/RNA sequences
Basic Info
- Host: GitHub
- Owner: bcgsc
- License: mit
- Language: C++
- Default Branch: master
- Homepage: http://bcgsc.github.io/ntHash/
- Size: 12.3 MB
Statistics
- Stars: 103
- Watchers: 19
- Forks: 13
- Open Issues: 4
- Releases: 6
Topics
Metadata Files
README.md

ntHash is an efficient rolling hash function for k-mers and spaced seeds.
Installation
Make sure Meson is installed on the system.
Download the repo (either from the releases section or close using git clone https://github.com/bcgsc/ntHash). Setup meson in an arbitrary directory (e.g. build), by running the following command in the project's root (include --prefix=PREFIX set the installation prefix to PREFIX):
shell
meson setup --buildtype=release --prefix=PREFIX build
Then, install the project and its dependencies using:
shell
meson install -C build
This will install include/nthash and lib/libnthash.a to the installation prefix.
Usage
To use ntHash in a C++ project:
- Import ntHash in the code using #include <nthash/nthash.hpp>
- Access ntHash classes from the nthash namespace
- Add the include directory (pass -IPREFIX/include to the compiler)
- Link the code with libnthash.a (i.e. pass -LPREFIX/lib -lnthash to the compiler, where PREFIX is the installation prefix)
- Compile your code with -std=c++17 (and preferably -O3) enabled
Refer to docs for more information.
Examples
Generally, the nthash::NtHash and nthash::SeedNtHash classes are used for hashing sequences:
C++
nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5); // 1 hash per 5-mer
while (nth.roll()) {
// use nth.hashes() for canonical hashes
// nth.get_forward_hash() for forward strand hashes
// nth.get_reverse_hash() for reverse strand hashes
}
C++
std::vector<std::string> seeds = {"10101", "11011"};
nthash::SeedNtHash nth("TGACTGATCGAGTCGTACTAG", seeds, 3, 5);
while (nth.roll()) {
// nth.hashes()[0] = "T#A#T"'s first hash
// nth.hashes()[1] = "T#A#T"'s second hash
// nth.hashes()[2] = "T#A#T"'s third hash
// nth.hashes()[3] = "TG#CT"'s first hash
}
For developers
If you would like to contribute to the development of ntHash, after forking/cloning the repo, create the build directory without the release flag:
meson setup build
Compile the code, tests, and benchmarking script using:
meson compile -C build
If compilation is successful, libnthash.a will be available in the build folder. The benchmarking script is also compiled as the bench binary file in build.
Before sending a PR, make sure that:
- tests pass by running
meson testin the project directory - code is formatted properly by running
ninja clang-formatin thebuildfolder (requiresclang-formatto be available) - coding standards have been met by making sure running
ninja clang-tidy-checkinbuildreturns no errors (requiresclang-toolsto be installed) - documentation is up-to-date by running
ninja docsinbuild(requires doxygen)
Publications
Parham Kazemi, Johnathan Wong, Vladimir Nikolić, Hamid Mohamadi, René L Warren, Inanç Birol, ntHash2: recursive spaced seed hashing for nucleotide sequences, Bioinformatics, 2022;, btac564, https://doi.org/10.1093/bioinformatics/btac564
Hamid Mohamadi, Justin Chu, Benjamin P Vandervalk, and Inanc Birol. ntHash: recursive nucleotide hashing. Bioinformatics (2016) 32 (22): 3492-3494. doi:10.1093/bioinformatics/btw397
Owner
- Name: BC Cancer Canada's Michael Smith Genome Sciences Centre
- Login: bcgsc
- Kind: organization
- Email: rwarren@bcgsc.ca
- Location: Vancouver, BC, Canada
- Website: http://www.bcgsc.ca/
- Repositories: 86
- Profile: https://github.com/bcgsc
Sequencing centre for genomics and bioinformatics research. NOTE 01/2019: FTP now closed. Cited data available: http://www.bcgsc.ca/downloads/supplementary
Citation (CITATION.bib)
@article{10.1093/bioinformatics/btac564,
author = {Kazemi, Parham and Wong, Johnathan and Nikolić, Vladimir and Mohamadi, Hamid and Warren, René L and Birol, Inanç},
title = "{ntHash2: recursive spaced seed hashing for nucleotide sequences}",
journal = {Bioinformatics},
volume = {38},
number = {20},
pages = {4812-4813},
year = {2022},
month = {08},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btac564},
url = {https://doi.org/10.1093/bioinformatics/btac564},
eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/20/4812/46535020/btac564.pdf},
}
@article{doi:10.1093/bioinformatics/btw397,
author = {Mohamadi, Hamid and Chu, Justin and Vandervalk, Benjamin P. and Birol, Inanc},
title = {ntHash: recursive nucleotide hashing},
journal = {Bioinformatics},
volume = {32},
number = {22},
pages = {3492},
year = {2016},
doi = {10.1093/bioinformatics/btw397},
URL = { + http://dx.doi.org/10.1093/bioinformatics/btw397},
eprint = {/oup/backfile/Content_public/Journal/bioinformatics/32/22/10.1093_bioinformatics_btw397/3/btw397.pdf}
}
GitHub Events
Total
- Watch event: 7
- Member event: 1
Last Year
- Watch event: 7
- Member event: 1
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 19
- Total pull requests: 15
- Average time to close issues: 10 months
- Average time to close pull requests: 15 days
- Total issue authors: 14
- Total pull request authors: 8
- Average comments per issue: 4.63
- Average comments per pull request: 1.33
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- VeryAmazed (2)
- rsharris (2)
- JustinChu (2)
- ishmeals (1)
- Lyannic (1)
- mohamadi (1)
- yanlifeng (1)
- marekkokot (1)
- yoshihikosuzuki (1)
- shenwei356 (1)
- luizirber (1)
- sjackman (1)
- davisem (1)
- rica01 (1)
Pull Request Authors
- jwcodee (6)
- parham-k (3)
- ishmeals (2)
- mirounga (1)
- yoshihikosuzuki (1)
- lcoombe (1)
- JustinChu (1)
- johnlees (1)