https://github.com/alejandrogzi/to-trans

A high-performance exon/CDS spliced transcriptome builder from fasta + GTF/GFF

https://github.com/alejandrogzi/to-trans

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

A high-performance exon/CDS spliced transcriptome builder from fasta + GTF/GFF

Basic Info
  • Host: GitHub
  • Owner: alejandrogzi
  • License: mit
  • Language: Rust
  • Default Branch: master
  • Homepage:
  • Size: 177 KB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

version-badge Crates.io GitHub

to-trans

A high-performance exon/CDS spliced transcriptome builder from fasta + GTF/GFF. This is a command-line tool written in Rust designed to build a transcriptome by using a genome (.fa) and a gene model (.gtf/.gff).

Usage

``` plaintext High-performance transcriptome builder from fasta + GTF/GFF

Usage: to-trans --fasta --gtf [OPTIONS]

--Arguments: -f, --fasta Path to your .fa file -g, --gtf Path to your .gtf/.gff file

Options: -m, --mode Feature to extract from GTF/GFF file (exon or CDS) [default: exon] -o, --out Path to output file [default: transcriptome.fa]. -t, --threads Number of threads [default: max ncpus] -h, --help Print help -V, --version Print version ```

crate: https://crates.io/crates/to-trans

What's new on v.0.2.0

  • Now to-trans is ~2-3s faster!
  • A parallel approach is now the main algorithm to assemble transcript sequences

Work coming...

to-trans is intended to grow with time, expanding its options and capabilities. In the next release features like: intron extraction, length-based transcriptomes, chromosome-specific builds, among others are coming!

Install/Build

to install to-trans, do:

  1. get rust: curl https://sh.rustup.rs -sSf | sh on unix, or go here for other options
  2. run cargo install to-trans (make sure ~/.cargo/bin is in your $PATH before running it)

to build to-trans, do:

  1. get rust (as described above)
  2. run git clone https://github.com/alejandrogzi/to-trans.git && cd to-trans
  3. run cargo run --release <FASTA> <GTF/GFF> <MODEL> <OUTPUT>

by default to-trans uses exon mode and sends the output to ./transcriptome.fa

Benchmark

Note that this benchamark is outdated. Now to-trans is ~2-3s faster! For the human genome/gtf, to-trans takes 6 seconds to build a complete transcriptome, that is approximately x3 times faster than GFFRead!

Besides some particular species, such as human (GRCh38) or mouse (GRCm39) that have transcriptomes available, most of the animal kingdom does not count with a pre-defined file with transcript sequences. This becomes a problem when working at the transcript/isoform level.

Compared to GFFRead (1), a gff/gtf utility with a vast range of capabilities, to-trans is able to build a complete transcriptome x2 times faster without the need of indexing the input genome. On the human model, to-trans reaches 8s tops while GFFRead max at 15s with an index (.fai) already available. For the dog, a species without transcript sequences offered in public databases, to-trans takes 3.5s compared to GFFRead doubled times (6s and 12s for indexed and not indexed genomes, respectively).

to-trans offers a novel option to build a transcriptome from a genome + gene model in a very efficent manner. This tool provides high-performance and efficiency without the need of envirnoments or intrincate dependencies, and could be easily attached to workflows/pipelines.

References

  1. Pertea G and Pertea M. GFF Utilities: GffRead and GffCompare [version 1; peer review: 3 approved]. F1000Research 2020, 9:304 (https://doi.org/10.12688/f1000research.23297.1) [https://github.com/gpertea/gffread]

Owner

  • Name: Alejandro Gonzales-Irribarren
  • Login: alejandrogzi
  • Kind: user

GitHub Events

Total
Last Year

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 2,445 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
crates.io: to-trans

A high-performance transcriptome builder from fasta + GTF/GFF

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 2,445 Total
Rankings
Dependent repos count: 30.8%
Dependent packages count: 36.1%
Average: 55.0%
Downloads: 98.2%
Maintainers (1)
Last synced: 10 months ago

Dependencies

Cargo.lock cargo
  • anstream 0.6.4
  • anstyle 1.0.4
  • anstyle-parse 0.2.2
  • anstyle-query 1.0.0
  • anstyle-wincon 3.0.1
  • buffer-redux 1.0.0
  • cfg-if 1.0.0
  • clap 4.4.8
  • clap_builder 4.4.8
  • clap_derive 4.4.7
  • clap_lex 0.6.0
  • colorchoice 1.0.0
  • crossbeam-utils 0.8.16
  • csv 1.3.0
  • csv-core 0.1.11
  • fuchsia-cprng 0.1.1
  • heck 0.4.1
  • itoa 1.0.9
  • libc 0.2.150
  • memchr 2.6.4
  • proc-macro2 1.0.69
  • quote 1.0.33
  • rand 0.4.6
  • rand_core 0.3.1
  • rand_core 0.4.2
  • rdrand 0.4.0
  • remove_dir_all 0.5.3
  • ryu 1.0.15
  • safemem 0.3.3
  • scoped_threadpool 0.1.9
  • seq_io 0.3.2
  • serde 1.0.192
  • serde_derive 1.0.192
  • strsim 0.10.0
  • syn 2.0.39
  • tempdir 0.3.7
  • thiserror 1.0.50
  • thiserror-impl 1.0.50
  • unicode-ident 1.0.12
  • utf8parse 0.2.1
  • winapi 0.3.9
  • winapi-i686-pc-windows-gnu 0.4.0
  • winapi-x86_64-pc-windows-gnu 0.4.0
  • windows-sys 0.48.0
  • windows-targets 0.48.5
  • windows_aarch64_gnullvm 0.48.5
  • windows_aarch64_msvc 0.48.5
  • windows_i686_gnu 0.48.5
  • windows_i686_msvc 0.48.5
  • windows_x86_64_gnu 0.48.5
  • windows_x86_64_gnullvm 0.48.5
  • windows_x86_64_msvc 0.48.5
Cargo.toml cargo