asperasragetter

AsperaSRAgetter provides an easy way to download sequencing data from ENA by using Aspera.

https://github.com/runjiaji/asperasragetter

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

AsperaSRAgetter provides an easy way to download sequencing data from ENA by using Aspera.

Basic Info
  • Host: GitHub
  • Owner: RunJiaJi
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 153 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

AsperaSRAgetter

PyPI version Downloads Downloads Downloads

AsperaSRAgetter provides an easy way to download sequencing data (fastq.gz format) from European Nucleotide Archive (ENA) by using Aspera.

Installation

AsperaSRAgetter has been distributed on pypi. You can easily install AsperaSRAgetter through pip. AsperaSRAgetter depends on Aspera-CLI to retrive sequencing data from ENA. It is recommended to install Aspera-CLI with Conda.

```shell

You may create a new environment for AsperaSRAgetter, but this is optional

conda create -n AsperaSRAgetter python=3.10 conda activate AsperaSRAgetter

Install AsperaSRAgetter using pip

pip install AsperaSRAgetter

Install Aspera-CLI using conda

conda install -c hcc aspera-cli ```

Workflow

AsperaSRAgetter first inquiry for corresponding fastq.gz file report through ENA filereport API. Sencondly, the MD5 hash value and ftp url of each fastq.gz files are then resolved from the report. Lastly, ftp url is then passed to Aspera transfer command ascp to download the fastq.gz file.

The file reports will be stored as a .tsv table as records of the downloading process.

All files' MD5 hash values are saved in .md5 file which users can further verify the integrity of files.

workflow

Usage

The command name of AsperaSRAgetter is sragetter. It accepts either one SRA accession or one TXT file containing multiple accessions (see the usage example below). Note that users need to provide the path of public key authentication file of Aspera-CLI (normally should be ENVIRONMENTPATH/etc/asperawebid_dsa.openssh)

```bash usage: sragetter [-h] [-v] [-acc ACCESSION | -f FILE] -ssh SSH_KEY -o OUTDIR

options: -h, --help show this help message and exit -v, --version Show SRAdownloader version number and exit -acc ACCESSION, --accession ACCESSION SRA data accession -f FILE, --file FILE TXT file with multiple SRA accessions -ssh SSHKEY, --ssh-key SSHKEY Public key authentication file provided by Aspera command line client download package as the 'asperawebiddsa.openssh' file -o OUTDIR, --outdir OUTDIR Path to store the downloaded SRA data

Usage

Download with one accession: $ sragetter --accession sraaccession --ssh-key sshkeypath.openssh --outdir outdir_path

Download with TXT file containing multiple accessions: $ sragetter --file sraaccessions.txt --ssh-key sshkeypath.openssh --outdir outdir_path ```

Contact

If you have any questions using AsperaSRAgetter, feel free to open an issue or contact me jirunjia@gmail.com.

Owner

  • Name: RunjiaJi
  • Login: RunJiaJi
  • Kind: user
  • Location: BeiJing, China
  • Company: IGGCAS

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Ji
    given-names: Runjia
    orcid: "https://orcid.org/0000-0001-7316-9251"
title: "AsperaSRAgetter: a python package to download sequencing data (fastq.gz format) from European Nucleotide Archive (ENA) by using Aspera-CLI."
version: 2.1
date-released: 2023-3-12
url: https://github.com/RunJiaJi/AsperaSRAgetter

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 27 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
pypi.org: asperasragetter

The AsperaSRAgetter provides a easy way to download sequencing data from ENA by using Aspera.

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 27 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 17.8%
Average: 24.9%
Forks count: 30.5%
Dependent repos count: 30.6%
Stargazers count: 39.1%
Maintainers (1)
Last synced: 11 months ago

Dependencies

setup.py pypi
  • pandas *
  • requests *