grabseqs
A utility for easy downloading of reads from next-gen sequencing repositories like NCBI SRA
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Keywords
Repository
A utility for easy downloading of reads from next-gen sequencing repositories like NCBI SRA
Basic Info
Statistics
- Stars: 106
- Watchers: 6
- Forks: 14
- Open Issues: 8
- Releases: 20
Topics
Metadata Files
README.md
grabseqs
Utility for simplifying bulk downloading data from next-generation sequencing repositories, like NCBI SRA, MG-RAST.
iMicrobe is currently not supported--working to remedy this (2025/08/14)
Install
Install grabseqs and all dependencies via conda:
conda install grabseqs -c louiejtaylor -c bioconda -c conda-forge
Or with pip (and install the non-Python dependencies yourself):
pip install grabseqs
Note: If you're using SRA data, after you've installed sra-tools, run vdb-config -i and turn off local file caching unless you want extra copies of the downloaded sequences taking up space (read more here).
Quick start
Download all samples from a single SRA Project:
grabseqs sra SRP#######
Or any combination of projects (S/ERP), runs (S/ERR), BioProjects (PRJNA):
grabseqs sra SRR######## ERP####### PRJNA######## ERR########
If you'd like to do a dry run and just get a list of samples that will be downloaded, pass -l:
grabseqs sra -l SRP########
Similar syntax works for MG-RAST:
grabseqs mgrast mgp##### mgm#######
Detailed usage
See the grabseqs FAQ for detailed troubleshooting tips.
Fun options:
grabseqs sra -t 10 -m metadata.csv -o proj/ -r 3 SRP#######
(translation: use 10 threads, save metadata to proj/metadata.csv, download to the dir proj/, retry failed downloads 3x, get all samples from SRP#######)
If you'd like to do a dry run and only get a list of samples that will be downloaded, pass -l:
grabseqs sra -l SRP########
If you'd like to pass your own arguments to fasterq-dump to get data in a slightly different format, you can do so like this:
grabseqs sra SRP####### -r 0 --custom_fqdump_args="--split-spot --progress"
Full usage:
grabseqs sra [-h] [-m METADATA] [-o OUTDIR] [-r RETRIES] [-t THREADS]
[-f] [-l] [--no_parsing] [--parse_run_ids]
[--use_fastq_dump]
id [id ...]
positional arguments:
id One or more BioProject, ERR/SRR or ERP/SRP number(s)
optional arguments:
-h, --help show this help message and exit
-m METADATA filename in which to save SRA metadata (.csv format,
relative to OUTDIR)
-o OUTDIR directory in which to save output. created if it doesn't
exist
-r RETRIES number of times to retry download
-t THREADS threads to use (for fasterq-dump/pigz)
-f force re-download of files
-l list (but do not download) samples to be grabbed
--parse_run_ids parse SRR/ERR identifers (do not pass straight to fasterq-
dump)
--custom_fqdump_args CUSTOM_FQD_ARGS
"string" containing args to pass to fastq-dump
--use_fastq_dump use legacy fastq-dump instead of fasterq-dump (no
multithreaded downloading)
Downloads .fastq.gz files to OUTDIR (or the working directory if not specified). If the -m flag is passed, saves metadata to OUTDIR with filename METADATA in csv format.
Similar options are available for downloading from MG-RAST:
grabseqs mgrast [-h] [-m METADATA] [-o OUTDIR] [-r RETRIES]
[-t THREADS] [-f] [-l]
rastid [rastid ...]
Troubleshooting
See the grabseqs FAQ for detailed troubleshooting tips. If the FAQs don't fix your problem, feel free to open an issue!
Dependencies
- Python 3 (external packages req'd: requests, requests-html, pandas, fake-useragent)
- sra-tools>3.2
- pigz
- wget
If you use conda (on Linux), these will be installed for you!
Grabseqs runs on Mac or Linux. We've tested on these specific OSes:
Linux (conda or pip): - CentOS 6, 7, and 8 - Debian 9 and 10 - Ubuntu 16.04, 18.04, and 19.10 - Red Hat Enterprise 6, 7, and 8 - SUSE Enterprise 12 and 15
Mac (pip): - MacOS 10.14
Grabseqs has been tested and works with the following version of the Python dependencies (though these are neither minimal nor pinned version numbers):
- requests 2.22.0
- pandas>2
Citation
If you use grabseqs in your work, please cite:
Louis J Taylor, Arwa Abbas, Frederic D Bushman. "grabseqs: Simple downloading of reads and metadata from multiple next-generation sequencing data repositories." Bioinformatics, (2020), btaa167, https://doi.org/10.1093/bioinformatics/btaa167
Please also cite the researchers who generated the data (and the repository, if appropriate)!
Changelog
1.0.0 (2025-08-14)
- Added a walk-through for adding a new repo using template.py
- Better handling for invalid SRA accession numbers
- Update endpoint for NCBI for SRA downloads
- Temporarily remove iMicrobe--needs rewrite to use a different tool
0.7.0 (2020-01-29) - Allow users to pass custom args to fast(er)q-dump - Minor re-writes of download handling code for easier readability
0.6.1 (2019-12-20) - Validate compressed files (fix #8 and #34)
0.6.0 (2019-12-12) - Gracefully handle incomplete or missing dependencies - Major rewrite of test suite
0.5.2 (2019-12-05) - Improvements to work with multiple versions of Python 3
0.5.1 (2019-11-23) - Hotfix handling outdated versions of sra-tools
0.5.0 (2019-04-11) - Metadata available for all sources in .csv format
History
This project spawned out of/incorporates code from hisss; many thanks to ArwaAbbas for helping make this work!
Owner
- Name: Louis J Taylor
- Login: louiejtaylor
- Kind: user
- Twitter: Louviridae
- Repositories: 4
- Profile: https://github.com/louiejtaylor
Hunting viruses at the University of Pennsylvania.
GitHub Events
Total
- Create event: 1
- Release event: 1
- Issues event: 1
- Watch event: 1
- Issue comment event: 2
- Push event: 15
Last Year
- Create event: 1
- Release event: 1
- Issues event: 1
- Watch event: 1
- Issue comment event: 2
- Push event: 15
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| louiejtaylor | l****r@g****m | 168 |
| Louis J Taylor | l****r | 72 |
| Louis | l****r@g****m | 15 |
| ArwaAbbas | a****a@g****m | 3 |
| Meagan Rubel | r****l@c****l | 2 |
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 39
- Total pull requests: 18
- Average time to close issues: about 1 month
- Average time to close pull requests: about 5 hours
- Total issue authors: 13
- Total pull request authors: 2
- Average comments per issue: 1.28
- Average comments per pull request: 0.0
- Merged pull requests: 18
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- louiejtaylor (27)
- a-h-b (1)
- guanxiangliang (1)
- EisenRa (1)
- ElDeveloper (1)
- Damianyangyang (1)
- dutchscientist (1)
- ressy (1)
- cdiener (1)
- xsq2022 (1)
- nsheff (1)
- sejsong (1)
- ArwaAbbas (1)
Pull Request Authors
- louiejtaylor (17)
- ArwaAbbas (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 101 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 11
- Total maintainers: 1
pypi.org: grabseqs
Easily download reads from next-gen sequencing repositories like NCBI SRA
- Homepage: https://github.com/louiejtaylor/grabseqs
- Documentation: https://grabseqs.readthedocs.io/
- License: MIT License
-
Latest release: 0.7.0
published about 6 years ago
Rankings
Maintainers (1)
Dependencies
- pandas
- pigz
- python >3
- requests
- sra-tools >2.9
- wget
- argparse *
- pandas *
- requests *
- requests-html *