kmer-counter
Count kmers with a more efficient (faster) hash table
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Keywords
Repository
Count kmers with a more efficient (faster) hash table
Basic Info
Statistics
- Stars: 24
- Watchers: 2
- Forks: 5
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
kmer-counter
Compilation
Run make to build the kmer-counter binary.
This has been compiled under Ubuntu 18.04.4, Cygwin 3.1.4, and Mac OS X 10.15.3, using concurrent GCC/glibc and Clang toolkits.
Usage
Run kmer-counter --help for a list of options.
There are a couple ways to use this.
- You can provide a single-line FASTA input and write counts to standard output, e.g.:
``` $ ./kmer-counter --fasta --k=6 sequences.fa
foo CGTTAA:1 TTAACG:1 bar TTCTTA:1 TAGGGC:1 AAATTC:1 GTGGAA:1 AACTTC:1 ... ... ```
- For a more complex use case, you can provide a four-column BED file with the interval's genomic sequence in the fourth column (i.e., ID field), along with the number k for the k-mers you want to count, an offset value for mer-keys (explained below), and a results directory to write results, e.g.:
$ ./kmer-counter --bed --k=6 --offset=12195 --results-dir="6mers" intervals.bed4
The above example generates 6-mers of the sequences from the file intervals.bed4.
The results are stored in a folder called 6mers, which contains two files count.bed and map.txt.
The first file count.bed contains a BED file of intervals from intervals.bed4, where the fourth column contains a space-delimited pair of "mer"-keys and the number of times that key is seen. Mer-keys are numbers which begin at the offset value provided on the command-line.
The second file map.txt contains a tab-delimited pairing of mers and their mer-key, as found in count.bed.
Notes
I am using a hash table implementation from Emil Ernerfeldt. A discussion about performance characteristics compared with the C++ STL std::unordered_map is available from the author.
Owner
- Name: Alex Reynolds
- Login: alexpreynolds
- Kind: user
- Location: Seattle, WA USA
- Company: Altius Institute for Biomedical Sciences
- Website: bitsumma.com
- Repositories: 92
- Profile: https://github.com/alexpreynolds
Pug caregiver, curler, cyclist, gardener, beginning French scholar
Citation (CITATION.cff)
message: "If you use this software, please cite it as below." authors: - family-names: "Reynolds" given-names: "Alex" title: "kmer-counter" version: 1.0.0 date-released: Sep 14, 2017 url: "https://github.com/alexpreynolds/kmer-counter"
GitHub Events
Total
- Issues event: 2
- Issue comment event: 1
Last Year
- Issues event: 2
- Issue comment event: 1
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 12
- Total pull requests: 1
- Average time to close issues: 5 days
- Average time to close pull requests: about 2 hours
- Total issue authors: 11
- Total pull request authors: 1
- Average comments per issue: 3.0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 18 days
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mmcguffi (2)
- J-I-P (1)
- scchess (1)
- dulunar (1)
- yangkl96 (1)
- qianjia (1)
- diego-rt (1)
- liyan910117 (1)
- wbvguo (1)
- arivers (1)
- suchapalaver (1)
Pull Request Authors
- WillardFord (2)