https://github.com/databio/gtars

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.

https://github.com/databio/gtars

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (20.0%) to scientific vocabulary

Keywords from Contributors

bioinformatics bioinformatics-pipeline ngs pipeline-submission-engine
Last synced: 6 months ago · JSON representation

Repository

Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.

Basic Info
Statistics
  • Stars: 9
  • Watchers: 7
  • Forks: 5
  • Open Issues: 46
  • Releases: 23
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

codecov crates.io

gtars logo

gtars is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, geniml, a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well. For more information, see the public-facing documentation (under construction).

gtars provides these things:

  1. A rust library crate.
  2. A command-line interface, written in rust.
  3. A Python package that provides Python bindings to the rust library.
  4. An R package that provides R bindings to the rust library

Repository organization (for developers)

This repository is a work in progress, and still in early development. This repo is organized like so:

  1. The main gtars rust package (in subfolder /gtars), which contains two crates: A. A rust library crate (/gtars/lib.rs) that provides functions, traits, and structs for working with genomic interval data. B. A rust binary crate (in /gtars/main.rs), a small, wrapper command-line interface for the library crate.
  2. Python bindings (in /bindings/python), which consists of a rust package with a library crate (no binary crate) and Python package.
  3. R bindings (in /bindinds/r), which consists of an R package.

Installation

To install gtars, you must have the rust toolchain installed. You can install it by following the instructions.

You may build the binary locally using cargo build --release. This will create a binary in target/release/gtars. You can then add this to your path, or run it directly.

Usage

gtars provides several useful tools. There are 3 ways to use gtars.

1. From R/Python

Using bindings, you can call some gtars functions from within R or Python.

2. From the CLI

To see the available tools you can use from the CLI run gtars --help. To see the help for a specific tool, run gtars <tool> --help.

3. As a rust library

You can link gtars as a library in your rust project. To do so, add the following to your Cargo.toml file:

toml [dependencies] gtars = { git = "https://github.com/databio/gtars" }

Testing

To run the tests, run cargo test.

Refget tests

The default tests for this module are designed to run quickly on tiny fasta files. To run the test on a full-scale fasta file, you can look at test_loading_large_fasta_file. This is large test, which is ignored by default, so it doesn't run in the typical cargo test. To run just this large test on a fasta file, try something like this:

FASTA_PATH=tests/data/subset.fa.gz cargo test tests::test_loading_large_fasta_file -- --nocapture --ignored FASTA_PATH=`refgenie seek test/fasta` cargo test tests::test_loading_large_fasta_file -- --nocapture --ignored

Contributing

New internal library crate tools

If you'd like to add a new tool, you can do so by creating a new module within the src folder.

New public library crate tools

If you want this to be available to users of gtars, you can add it to the gtars library crate as well. To do so, add the following to src/lib.rs: rust pub mod <tool_name>;

New binary crate tools

Finally, if you want to have command-line functionality, you can add it to the gtars binary crate. This requires two steps:

  1. Create a new cli using clap inside the interfaces module of src/cli.rs:

```rust pub fn makenewtool_cli() -> Command {

} ```

  1. Write your logic in a wrapper function. This will live inside the functions module of src/cli.rs:

```rust // top of file: use tool_name::{ ... }

// inside the module: pub fn newtoolwrapper() -> Result<(), Box> { // your logic here } ```

Please make sure you update the changelog and bump the version number in Cargo.toml when you add a new tool.

VSCode users

If you are using VSCode, make sure you link to the Cargo.toml inside the .vscode folder, so that rust-analyzer can link it all together: json { "rust-analyzer.linkedProjects": [ "./vocab/Cargo.toml", "./Cargo.toml" "./new-tool/Cargo.toml" ] }

Owner

  • Name: Databio
  • Login: databio
  • Kind: organization
  • Location: University of Virginia

Solving problems in computational biology

GitHub Events

Total
  • Create event: 72
  • Release event: 28
  • Issues event: 93
  • Watch event: 5
  • Delete event: 54
  • Issue comment event: 226
  • Push event: 547
  • Pull request review event: 118
  • Pull request review comment event: 136
  • Pull request event: 102
  • Fork event: 4
Last Year
  • Create event: 72
  • Release event: 28
  • Issues event: 93
  • Watch event: 5
  • Delete event: 54
  • Issue comment event: 226
  • Push event: 547
  • Pull request review event: 118
  • Pull request review comment event: 136
  • Pull request event: 102
  • Fork event: 4

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,183
  • Total Committers: 8
  • Avg Commits per committer: 147.875
  • Development Distribution Score (DDS): 0.535
Past Year
  • Commits: 754
  • Committers: 7
  • Avg Commits per committer: 107.714
  • Development Distribution Score (DDS): 0.459
Top Committers
Name Email Commits
Nathan LeRoy N****7@g****m 550
Donald Campbell 1****r 540
Khoroshevskyi s****0@g****m 55
nsheff n****f 18
Sam Park s****0@l****m 11
Ziyang "Claude" Hu 3****u 4
Edward Chen e****5@g****m 4
Gert Hulselmans g****s@k****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 75
  • Total pull requests: 113
  • Average time to close issues: 3 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 9
  • Total pull request authors: 9
  • Average comments per issue: 1.28
  • Average comments per pull request: 2.06
  • Merged pull requests: 87
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 61
  • Pull requests: 97
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Issue authors: 8
  • Pull request authors: 8
  • Average comments per issue: 0.7
  • Average comments per pull request: 1.61
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • donaldcampbelljr (27)
  • nleroy917 (25)
  • ClaudeHu (13)
  • khoroshevskyi (3)
  • nsheff (3)
  • sanghoonio (1)
  • gtzheng (1)
  • ljmills (1)
  • saanikat (1)
Pull Request Authors
  • donaldcampbelljr (45)
  • nleroy917 (35)
  • khoroshevskyi (16)
  • nsheff (5)
  • sstadick (4)
  • ClaudeHu (2)
  • ghuls (2)
  • sanghoonio (2)
  • edward9065 (2)
Top Labels
Issue Labels
bug (12) enhancement (12) likely solved (12) uniwig (12) brainstorming (6) new tool (4) igd (4) question (3) help wanted (2) tokenizers (2) documentation (2) wontfix (2) low priority (1) good first issue (1) scatrs (1) AIList (1)
Pull Request Labels

Dependencies

.github/workflows/CI.yml actions
  • PyO3/maturin-action v1 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
bindings/Cargo.toml cargo
gtars/Cargo.toml cargo
  • rstest 0.18.2 development
  • tempfile 3.8.1 development
  • anyhow 1.0.82
  • bytes 1.6.0
  • clap 4.4.7
  • flate2 1.0.28
  • rust-lapper 1.1.0
  • serde ^1.0
  • serde_yaml ^0.9
bindings/pyproject.toml pypi