Fasten

Fasten: a toolkit for streaming operations on fastq files - Published in JOSS (2024)

https://github.com/lskatz/fasten

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bioinformatics fastq-files rust
Last synced: 6 months ago · JSON representation

Repository

:construction_worker: Fasten toolkit, for streaming operations on fastq files

Basic Info
  • Host: GitHub
  • Owner: lskatz
  • License: mit
  • Language: Rust
  • Default Branch: master
  • Homepage:
  • Size: 13.8 MB
Statistics
  • Stars: 79
  • Watchers: 4
  • Forks: 7
  • Open Issues: 4
  • Releases: 16
Topics
bioinformatics fastq-files rust
Created about 8 years ago · Last pushed 11 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

Fasten

Crates.io CI DOI

A powerful manipulation suite for interleaved fastq files. Executables can read/write to stdin and stdout, and they are compatible with the interleaved fastq format. This makes it much easier to perform streaming operations using unix pipes.

Synopsis

read metrics

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875

read cleaning

$ cat testdata/R1.fastq testdata/R2.fastq | \
    fasten_shuffle | \
    fasten_clean --paired-end --min-length 2 | \
    gzip -c > cleaned.shuffled.fastq.gz

$ zcat cleaned.shuffled.fastq.gz | fasten_metrics | column -t
totalLength  numReads  avgReadLength  avgQual
800          8         100            19.53875
# No reads were actually filtered with cleaning, with --min-length=2

Installation

Installation from source

Fasten is programmed in the Rust programming language. More information about Rust, including installation and the executable cargo, can be found at rust-lang.org.

After downloading, use the Rust executable cargo like so:

cd fasten
cargo build --release
export PATH=$PATH:$(pwd)/target/release

All executables will be in the directory fasten/target/release.

note: there are some Makefile methods to help including

  • make all to make the following
    • make release install fast executables
    • make debug install executables quickly (although the executables will not be optimized)
    • make fasten/doc compile lastest documents
  • make clean uninstall local binaries

Installation without git

You can also install Fasten straight from https://crates.io using the following command:

cargo install fasten

Detailed information on how this works can be found in the cargo handbook at https://doc.rust-lang.org/cargo/commands/cargo-install.html.

General usage

All scripts accept the parameters, read uncompressed fastq format from stdin, and print uncompressed fastq format to stdout. All paired end fastq files must be in interleaved format, and they are written in interleaved format, except when deshuffling with fasten_shuffle.

  • --help
  • --numcpus Not all scripts will take advantage of numcpus. (not currently implemented)
  • --paired-end Input reads are interleaved paired end
  • --verbose Print more status messages

Documentation

Please see the inline documentation at https://lskatz.github.io/fasten/fasten

This documentation was built with cargo doc --no-deps

Other documentation

  • Some wrapper scripts are noted in the scripts page.

Contributing

Instructions for how to contribute can be found in CONTRIBUTING.md.

Fasten script descriptions

All executables read and write in the fastq format except fasten_convert.

|executable |Description| |-------------------|-----------| |fasten_clean | Trims and cleans a fastq file.| |fasten_convert | Converts between different sequence formats like fastq, sam, fasta.| |fasten_straighten| Convert any fastq file to a standard four-line-per-entry format.| |fasten_metrics | Prints basic read metrics.| |fasten_pe | Determines paired-endedness based on read IDs.| |fasten_randomize | Randomizes reads from input | |fasten_combine | Combines identical reads and updates quality scores.| |fasten_kmer | Kmer counting.| |fasten_normalize | Normalize read depth by using kmer counting.| |fasten_sample | Downsamples reads.| |fasten_shuffle | Shuffles or deshuffles paired end reads.| |fasten_validate | Validates your reads (deprecated in favor of fasten_inspect and fasten_repair| |fasten_inspect | adds information to read IDs such as seqlength | |fasten_repair | Repairs corrupted reads | |fasten_quality_filter | Transforms nucleotides to "N" if the quality is low | |fasten_trim | Blunt-end trims reads and/or removes adapter sequences | |fasten_replace | Find and replace using regex | |fasten_mutate | introduce random mutations | |fasten_regex | Filter for reads using regex | |fasten_progress | Add progress to any place in the pipeline | |fasten_sort | Sort fastq entries |

Etymology

Many of these scripts have inspiration from the fastx toolkit, and I wanted to make a fasty which was already the name of a bioinformatics program. Therefore I cycled through other letters of the alphabet and came across "N." So it is possible to pronounce this project like "Fast-N" or in a way that indicates that you are securing your analysis by "fasten"ing it (with a silent T).

Citation

DOI

To cite, please refer to Katz et al., (2024). Fasten: a toolkit for streaming operations on fastq files. Journal of Open Source Software, 9(94), 6030, https://doi.org/10.21105/joss.06030

Owner

  • Name: Lee Katz
  • Login: lskatz
  • Kind: user
  • Location: Atlanta, GA
  • Company: CDC (work) + personal projects

JOSS Publication

Fasten: a toolkit for streaming operations on fastq files
Published
February 12, 2024
Volume 9, Issue 94, Page 6030
Authors
Lee S. Katz ORCID
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, United States of America, Center for Food Safety, University of Georgia, Griffin, GA, United States of America
John Phan ORCID
General Dynamics Information Technology Inc., Atlanta, GA, United States of America
Henk C. den Bakker ORCID
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, United States of America
Editor
Kelly Rowland ORCID
Tags
command line fastq manipulation interleaved fastq

GitHub Events

Total
  • Create event: 7
  • Issues event: 2
  • Release event: 3
  • Watch event: 3
  • Delete event: 5
  • Push event: 16
  • Pull request event: 9
Last Year
  • Create event: 7
  • Issues event: 2
  • Release event: 3
  • Watch event: 3
  • Delete event: 5
  • Push event: 16
  • Pull request event: 9

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 362
  • Total Committers: 4
  • Avg Commits per committer: 90.5
  • Development Distribution Score (DDS): 0.019
Past Year
  • Commits: 12
  • Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Lee Katz - Aspen g****2@c****v 355
Andrea Telatin, M1 1****n 5
Roderick Bovee r****e@g****m 1
John Phan j****n@g****m 1
Committer Domains (Top 20 + Academic)
cdc.gov: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 28
  • Average time to close issues: 8 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 8
  • Total pull request authors: 4
  • Average comments per issue: 1.94
  • Average comments per pull request: 0.14
  • Merged pull requests: 27
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 10
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bovee (4)
  • lskatz (4)
  • telatin (2)
  • jianshu93 (2)
  • ghuls (1)
  • kapsakcj (1)
  • sharkLoc (1)
  • chrisgulvik (1)
Pull Request Authors
  • lskatz (22)
  • telatin (8)
  • jhphan (1)
  • bovee (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 15,331 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 1
crates.io: fasten

A set of scripts to run basic analysis on fastq files

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 15,331 Total
Rankings
Stargazers count: 17.2%
Forks count: 20.3%
Average: 25.9%
Downloads: 28.9%
Dependent repos count: 29.3%
Dependent packages count: 33.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Cargo.toml cargo
.github/workflows/basic.yml actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
.github/workflows/benchmark.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite
  • conda-incubator/setup-miniconda v2 composite
.github/workflows/docker.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
  • docker/login-action f054a8b539a109f9f41c372932f1ae047eff08c9 composite
  • docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite
.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/shell.yml.bak actions
  • actions-rs/cargo v1 composite
  • actions-rs/toolchain v1 composite
  • actions/checkout v2 composite
Dockerfile docker
  • alpine 3.14 build
  • rust 1.59.0-alpine3.14 build