genomap
A Rust library for storing generic genomic data by sorted chromosome name
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
A Rust library for storing generic genomic data by sorted chromosome name
Basic Info
Statistics
- Stars: 17
- Watchers: 2
- Forks: 2
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
A simple Rust library for storing data indexed by a chromosome name
genomap is a small library for storing a key-value map between chromosome
names and some generic data in a GenomeMap. Since in nearly every case we
want chromosomes to be sorted by their names, GenomeMap maintains an internal
sorted set of keys. GenomeMap uses a specialized chromosome name sorting
function that should properly sort autosomes, sex chromosomes, handle
Drosophila Chromosome names (e.g. 2L and 2R), etc. Please file a GitHub
issue if the sort order is not
as you'd anticipate.
Internally, the data stored in a genomap::GenomeMap<T> is in a Vec<T>, and
the type maintains a sorted list of chromosome names, and a forward and reverse
lookup table that associated the position in the Vec to the chromosome's name.
Below is a code example:
```rust use genomap::GenomeMap;
let mut sm: GenomeMap
// get a reference to a value by name println!("{:?}", sm.get("chr1"));
// iterate through name/values for (name, value) in sm.iter() { println!("{} -> {}", name, value); }
// get the index for a chromosome name let index = sm.getindexbyname("chr1").unwrap(); asserteq!(index, 0);
// get a name by index asserteq!(sm.getnamebyindex(index).unwrap(), "chr1"); ```
In Rust, working with non-Copyable types, such as a String chromosome name
key, can necessitate generic lifetime annotations. This can clutter code and
increase complexity significantly. To prevent this, genomap has O(1) access
by a usize index, so a chromosome name index can be stored in Structs
rather than the String key.
Performance
Multiple creation and access benchmarks are available in
benches/comparison.rs. Here is a small highlight of a sample of benchmarks.
For creation time, GenomeMap is about 20% slower. But this is incurred once
(and the absolute scale is insignificant).
| Data structure | Time | Factor | |----------------|-----------|--------| | FnvHashMap | 28.217 µs | 1.000 | | IndexMap | 29.844 µs | 1.058 | | BTreeMap | 29.401 µs | 1.041 | | HashMap | 29.420 µs | 1.043 | | GenomeMap | 33.913 µs | 1.202 |
GenomeMap has the second fastest sorted access times (it uses FnvHashMap's
hasher internally, but there's one additional constant lookup time operation).
| Data structure | Time | Factor | |----------------|-----------|--------| | FnvHashMap | 68.555 ns | 1.00 | | GenomeMap | 198.55 ns | 2.89 | | IndexMap | 237.47 ns | 3.46 | | HashMap | 336.32 ns | 4.91 | | BTreeMap | 567.95 ns | 8.28 |
Owner
- Name: Vince Buffalo
- Login: vsbuffalo
- Kind: user
- Location: Berkeley, CA
- Company: UC Berkeley
- Website: http://vincebuffalo.com
- Repositories: 129
- Profile: https://github.com/vsbuffalo
Evolutionary geneticist at UC Berkeley, former bioinformatician. ♥s probability, statistics. Author of book Bioinformatics Data Skills.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Buffalo
given-names: Vince
orcid: https://orcid.org/0000-0003-4510-1609
title: "genomap"
version: 0.2.5
identifiers:
- type: doi
value: 10.5281/zenodo.10653719
date-released: 2024-02-13
GitHub Events
Total
Last Year
Packages
- Total packages: 1
-
Total downloads:
- cargo 8,237 total
- Total dependent packages: 2
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
crates.io: genomap
A small library for storing generic genomic data indexed by a chromosome.
- Documentation: https://docs.rs/genomap/
- License: MIT
-
Latest release: 0.2.6
published about 2 years ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- criterion 0.5.1 development
- fnv 1.0.7 development
- indexmap 2.2.2 development
- rand 0.8.5 development
- fnv 1.0.7
- thiserror 1.0.56