https://github.com/databio/gtars
Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (20.0%) to scientific vocabulary
Keywords from Contributors
Repository
Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
Basic Info
- Host: GitHub
- Owner: databio
- License: bsd-2-clause
- Language: Rust
- Default Branch: master
- Homepage: https://docs.bedbase.org/gtars/
- Size: 1.71 MB
Statistics
- Stars: 9
- Watchers: 7
- Forks: 5
- Open Issues: 46
- Releases: 23
Metadata Files
README.md
gtars is a rust crate that provides a set of tools for working with genomic interval data. Its primary goal is to provide processors for our python package, geniml, a library for machine learning on genomic intervals. However, it can be used as a standalone library for working with genomic intervals as well. For more information, see the public-facing documentation (under construction).
gtars provides these things:
- A rust library crate.
- A command-line interface, written in rust.
- A Python package that provides Python bindings to the rust library.
- An R package that provides R bindings to the rust library
Repository organization (for developers)
This repository is a work in progress, and still in early development. This repo is organized like so:
- The main gtars rust package (in subfolder
/gtars), which contains two crates: A. A rust library crate (/gtars/lib.rs) that provides functions, traits, and structs for working with genomic interval data. B. A rust binary crate (in/gtars/main.rs), a small, wrapper command-line interface for the library crate. - Python bindings (in
/bindings/python), which consists of a rust package with a library crate (no binary crate) and Python package. - R bindings (in
/bindinds/r), which consists of an R package.
Installation
To install gtars, you must have the rust toolchain installed. You can install it by following the instructions.
You may build the binary locally using cargo build --release. This will create a binary in target/release/gtars. You can then add this to your path, or run it directly.
Usage
gtars provides several useful tools. There are 3 ways to use gtars.
1. From R/Python
Using bindings, you can call some gtars functions from within R or Python.
2. From the CLI
To see the available tools you can use from the CLI run gtars --help. To see the help for a specific tool, run gtars <tool> --help.
3. As a rust library
You can link gtars as a library in your rust project. To do so, add the following to your Cargo.toml file:
toml
[dependencies]
gtars = { git = "https://github.com/databio/gtars" }
Testing
To run the tests, run cargo test.
Refget tests
The default tests for this module are designed to run quickly on tiny fasta files.
To run the test on a full-scale fasta file, you can look at test_loading_large_fasta_file.
This is large test, which is ignored by default, so it doesn't run in the typical cargo test.
To run just this large test on a fasta file, try something like this:
FASTA_PATH=tests/data/subset.fa.gz cargo test tests::test_loading_large_fasta_file -- --nocapture --ignored
FASTA_PATH=`refgenie seek test/fasta` cargo test tests::test_loading_large_fasta_file -- --nocapture --ignored
Contributing
New internal library crate tools
If you'd like to add a new tool, you can do so by creating a new module within the src folder.
New public library crate tools
If you want this to be available to users of gtars, you can add it to the gtars library crate as well. To do so, add the following to src/lib.rs:
rust
pub mod <tool_name>;
New binary crate tools
Finally, if you want to have command-line functionality, you can add it to the gtars binary crate. This requires two steps:
- Create a new
cliusingclapinside theinterfacesmodule ofsrc/cli.rs:
```rust pub fn makenewtool_cli() -> Command {
} ```
- Write your logic in a wrapper function. This will live inside the
functionsmodule ofsrc/cli.rs:
```rust // top of file: use tool_name::{ ... }
// inside the module:
pub fn newtoolwrapper() -> Result<(), Box
Please make sure you update the changelog and bump the version number in Cargo.toml when you add a new tool.
VSCode users
If you are using VSCode, make sure you link to the Cargo.toml inside the .vscode folder, so that rust-analyzer can link it all together:
json
{
"rust-analyzer.linkedProjects": [
"./vocab/Cargo.toml",
"./Cargo.toml"
"./new-tool/Cargo.toml"
]
}
Owner
- Name: Databio
- Login: databio
- Kind: organization
- Location: University of Virginia
- Website: https://databio.org
- Repositories: 88
- Profile: https://github.com/databio
Solving problems in computational biology
GitHub Events
Total
- Create event: 72
- Release event: 28
- Issues event: 93
- Watch event: 5
- Delete event: 54
- Issue comment event: 226
- Push event: 547
- Pull request review event: 118
- Pull request review comment event: 136
- Pull request event: 102
- Fork event: 4
Last Year
- Create event: 72
- Release event: 28
- Issues event: 93
- Watch event: 5
- Delete event: 54
- Issue comment event: 226
- Push event: 547
- Pull request review event: 118
- Pull request review comment event: 136
- Pull request event: 102
- Fork event: 4
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Nathan LeRoy | N****7@g****m | 550 |
| Donald Campbell | 1****r | 540 |
| Khoroshevskyi | s****0@g****m | 55 |
| nsheff | n****f | 18 |
| Sam Park | s****0@l****m | 11 |
| Ziyang "Claude" Hu | 3****u | 4 |
| Edward Chen | e****5@g****m | 4 |
| Gert Hulselmans | g****s@k****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 75
- Total pull requests: 113
- Average time to close issues: 3 months
- Average time to close pull requests: 16 days
- Total issue authors: 9
- Total pull request authors: 9
- Average comments per issue: 1.28
- Average comments per pull request: 2.06
- Merged pull requests: 87
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 61
- Pull requests: 97
- Average time to close issues: about 2 months
- Average time to close pull requests: 9 days
- Issue authors: 8
- Pull request authors: 8
- Average comments per issue: 0.7
- Average comments per pull request: 1.61
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- donaldcampbelljr (27)
- nleroy917 (25)
- ClaudeHu (13)
- khoroshevskyi (3)
- nsheff (3)
- sanghoonio (1)
- gtzheng (1)
- ljmills (1)
- saanikat (1)
Pull Request Authors
- donaldcampbelljr (45)
- nleroy917 (35)
- khoroshevskyi (16)
- nsheff (5)
- sstadick (4)
- ClaudeHu (2)
- ghuls (2)
- sanghoonio (2)
- edward9065 (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- PyO3/maturin-action v1 composite
- actions/checkout v3 composite
- actions/download-artifact v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- rstest 0.18.2 development
- tempfile 3.8.1 development
- anyhow 1.0.82
- bytes 1.6.0
- clap 4.4.7
- flate2 1.0.28
- rust-lapper 1.1.0
- serde ^1.0
- serde_yaml ^0.9