genomap

A Rust library for storing generic genomic data by sorted chromosome name

https://github.com/vsbuffalo/genomap

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A Rust library for storing generic genomic data by sorted chromosome name

Basic Info
  • Host: GitHub
  • Owner: vsbuffalo
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Homepage:
  • Size: 28.3 KB
Statistics
  • Stars: 17
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Crates.io Crates.io docs Rust CI DOI

A simple Rust library for storing data indexed by a chromosome name

genomap is a small library for storing a key-value map between chromosome names and some generic data in a GenomeMap. Since in nearly every case we want chromosomes to be sorted by their names, GenomeMap maintains an internal sorted set of keys. GenomeMap uses a specialized chromosome name sorting function that should properly sort autosomes, sex chromosomes, handle Drosophila Chromosome names (e.g. 2L and 2R), etc. Please file a GitHub issue if the sort order is not as you'd anticipate.

Internally, the data stored in a genomap::GenomeMap<T> is in a Vec<T>, and the type maintains a sorted list of chromosome names, and a forward and reverse lookup table that associated the position in the Vec to the chromosome's name.

Below is a code example:

```rust use genomap::GenomeMap;

let mut sm: GenomeMap = GenomeMap::new(); sm.insert("chr1", 1).unwrap(); sm.insert("chr2", 2).unwrap();

// get a reference to a value by name println!("{:?}", sm.get("chr1"));

// iterate through name/values for (name, value) in sm.iter() { println!("{} -> {}", name, value); }

// get the index for a chromosome name let index = sm.getindexbyname("chr1").unwrap(); asserteq!(index, 0);

// get a name by index asserteq!(sm.getnamebyindex(index).unwrap(), "chr1"); ```

In Rust, working with non-Copyable types, such as a String chromosome name key, can necessitate generic lifetime annotations. This can clutter code and increase complexity significantly. To prevent this, genomap has O(1) access by a usize index, so a chromosome name index can be stored in Structs rather than the String key.

Performance

Multiple creation and access benchmarks are available in benches/comparison.rs. Here is a small highlight of a sample of benchmarks. For creation time, GenomeMap is about 20% slower. But this is incurred once (and the absolute scale is insignificant).

| Data structure | Time | Factor | |----------------|-----------|--------| | FnvHashMap | 28.217 µs | 1.000 | | IndexMap | 29.844 µs | 1.058 | | BTreeMap | 29.401 µs | 1.041 | | HashMap | 29.420 µs | 1.043 | | GenomeMap | 33.913 µs | 1.202 |

GenomeMap has the second fastest sorted access times (it uses FnvHashMap's hasher internally, but there's one additional constant lookup time operation).

| Data structure | Time | Factor | |----------------|-----------|--------| | FnvHashMap | 68.555 ns | 1.00 | | GenomeMap | 198.55 ns | 2.89 | | IndexMap | 237.47 ns | 3.46 | | HashMap | 336.32 ns | 4.91 | | BTreeMap | 567.95 ns | 8.28 |

Owner

  • Name: Vince Buffalo
  • Login: vsbuffalo
  • Kind: user
  • Location: Berkeley, CA
  • Company: UC Berkeley

Evolutionary geneticist at UC Berkeley, former bioinformatician. ♥s probability, statistics. Author of book Bioinformatics Data Skills.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Buffalo
    given-names: Vince
    orcid: https://orcid.org/0000-0003-4510-1609
title: "genomap"
version: 0.2.5
identifiers:
  - type: doi
    value: 10.5281/zenodo.10653719
date-released: 2024-02-13

GitHub Events

Total
Last Year

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 8,237 total
  • Total dependent packages: 2
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
crates.io: genomap

A small library for storing generic genomic data indexed by a chromosome.

  • Versions: 6
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 8,237 Total
Rankings
Dependent repos count: 29.4%
Forks count: 30.4%
Dependent packages count: 34.6%
Stargazers count: 35.6%
Average: 42.8%
Downloads: 84.1%
Maintainers (1)
Last synced: 7 months ago

Dependencies

.github/workflows/rust.yml actions
  • actions/checkout v3 composite
Cargo.toml cargo
  • criterion 0.5.1 development
  • fnv 1.0.7 development
  • indexmap 2.2.2 development
  • rand 0.8.5 development
  • fnv 1.0.7
  • thiserror 1.0.56