py-pyfastaq

Python3 scripts to manipulate FASTA and FASTQ files

https://github.com/sanger-pathogens/fastaq

Keywords

bioinformatics genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing

Keywords from Contributors

bioinformatics-pipeline

Last synced: 6 months ago · JSON representation

Repository

Python3 scripts to manipulate FASTA and FASTQ files

Basic Info

Host: GitHub
Owner: sanger-pathogens
License: other
Language: Python
Default Branch: master
Size: 371 KB

Statistics

Stars: 71
Watchers: 17
Forks: 21
Open Issues: 10
Releases: 0

Topics

bioinformatics genomics global-health infectious-diseases next-generation-sequencing pathogen research sequencing

Created over 12 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License

Fastaq

Manipulate FASTA and FASTQ files

Introduction

Python3 script to manipulate FASTA and FASTQ (and other format) files, plus API for developers

Installation

There are a number of ways to install Fastaq and details are provided below. If you encounter an issue when installing Fastaq please contact your local system administrator. If you encounter a bug please log it here or email us at path-help@sanger.ac.uk.

Pip install

Install from PyPi

bash pip3 install pyfastaq

Or pip install the latest development version directly from this repo.

bash pip3 install git+https://github.com/sanger-pathogens/Fastaq.git

From source

If you want to edit the codebase, clone this repo and install in editable mode.

```bash

Clone and install from this repository:

git clone https://github.com/sanger-pathogens/Fastaq.git && cd Fastaq && pip install -e ".[tests]" ```

Running the tests

The test can be run from the top level directory:

pytest tests

Runtime dependencies

These must be available in your path at run time: * samtools 0.1.19 * gzip * gunzip

Usage

The installation will put a single script called fastaq in your path. The usage is:

fastaq <command> [options]

Key points: * To list the available commands and brief descriptions, just run fastaq * Use fastaq command -h or fastaq command --help to get a longer description and the usage of that command. * The type of input file is automatically detected. Currently supported: FASTA, FASTQ, GFF3, EMBL, GBK, Phylip. * fastaq only manipulates sequences (and quality scores if present), so annotation is ignored where present in the input. * Input and output files can be gzipped. An input file is assumed to be gzipped if its name ends with .gz. To gzip an output file, just name it with .gz at the end. * You can use a minus sign for a filename to use stdin or stdout, so commands can be piped together. See the example below.

Examples

Reverse complement all sequences in a file:

fastaq reverse_complement in.fastq out.fastq

Reverse complement all sequences in a gzipped file, then translate each sequence:

fastaq reverse_complement in.fastq.gz - | fastaq translate - out.fasta

Available commands

| Command | Description | |-----------------------|----------------------------------------------------------------------| | acgtnonly | Replace every non acgtnACGTN with an N | | addindels | Deletes or inserts bases at given position(s) | | caftofastq | Converts a CAF file to FASTQ format | | capillarytopairs | Converts file of capillary reads to paired and unpaired files | | chunker | Splits sequences into equal sized chunks | | countsequences | Counts the sequences in input file | | deinterleave | Splits interleaved paired file into two separate files | | enumeratenames | Renames sequences in a file, calling them 1,2,3... etc | | expandnucleotides | Makes every combination of degenerate nucleotides | | fastatofastq | Convert FASTA and .qual to FASTQ | | filter | Filter sequences to get a subset of them | | getids | Get the ID of each sequence | | getseqflankinggaps | Gets the sequences flanking gaps | | interleave | Interleaves two files, output is alternating between fwd/rev reads | | makerandomcontigs | Make contigs of random sequence | | merge | Converts multi sequence file to a single sequence | | replacebases | Replaces all occurrences of one letter with another | | reversecomplement | Reverse complement all sequences | | scaffoldstocontigs | Creates a file of contigs from a file of scaffolds | | searchforseq | Find all exact matches to a string (and its reverse complement) | | sequencetrim | Trim exact matches to a given string off the start of every sequence | | sortbyname | Sorts sequences in lexographical (name) order | | sortbysize | Sorts sequences in length order | | splitbybasecount | Split multi sequence file into separate files | | stripilluminasuffix | Strips /1 or /2 off the end of every read name | | tofakequal | Make fake quality scores file | | tofasta | Converts a variety of input formats to nicely formatted FASTA format | | tomiraxml | Create an xml file from a file of reads, for use with Mira assembler | | toorfsgff | Writes a GFF file of open reading frames | | toperfectreads | Make perfect paired reads from reference | | torandomsubset | Make a random sample of sequences (and optionally mates as well) | | totilingbam | Make a BAM file of reads uniformly spread across the input reference | | touniquebyid | Remove duplicate sequences, based on their names. Keep longest seqs | | translate | Translate all sequences in input nucleotide sequences | | trimNsatend | Trims all Ns at the start/end of all sequences | | trimcontigs | Trims a set number of bases off the end of every contig | | trimends | Trim fixed number of bases of start and/or end of every sequence | | version | Print version number and exit |

For developers

Here is a template for counting the sequences in a FASTA or FASTQ file:

python from pyfastaq import sequences seq_reader = sequences.file_reader(infile) count = 0 for seq in seq_reader: count += 1 print(count)

Hopefully you get the idea and there are plenty of examples in tasks.py. Detection of the input file type and whether gzipped or not is automatic. See help(sequences) for the various methods already defined in the classes Fasta and Fastq.

License

Fastaq is free software, licensed under GPLv3.

Feedback/Issues

Please report any issues to the issues page or email path-help@sanger.ac.uk.

Owner

Name: Pathogen Informatics, Wellcome Sanger Institute
Login: sanger-pathogens
Kind: organization
Location: Hinxton, Cambs., UK

Website: http://www.sanger.ac.uk/science/groups/pathogen-informatics
Repositories: 54
Profile: https://github.com/sanger-pathogens

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 3
Watch event: 1
Member event: 1
Issue comment event: 6
Push event: 1
Pull request event: 4
Fork event: 3

Last Year

Create event: 1
Release event: 1
Issues event: 3
Watch event: 1
Member event: 1
Issue comment event: 6
Push event: 1
Pull request event: 4
Fork event: 3

Committers

Last synced: over 2 years ago

All Time

Total Commits: 221
Total Committers: 14
Avg Commits per committer: 15.786
Development Distribution Score (DDS): 0.208

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
martinghunt	m**t@g**m	175
Martin Hunt	m**2@s**k	15
Jorge Soares	j**s@g**m	6
vaofford	v**1@s**k	4
Sascha Steinbiss	s**a@s**e	3
Sara Sjunnebo	s**4@s**k	3
Michael Hall	m**8@g**m	3
Tim Stickland	t**4@s**k	3
Tim Stickland	t**d@g**m	2
nds	n**s@s**k	2
andrewjpage	a**e@g**m	2
Michael Hall	m**l@m**h	1
Martin Aslett	m**a@s**k	1
martinghunt	m****t	1

Committer Domains (Top 20 + Academic)

sanger.ac.uk: 6 mbh.sh: 1 steinbiss.name: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 10
Total pull requests: 78
Average time to close issues: 2 months
Average time to close pull requests: 10 days
Total issue authors: 9
Total pull request authors: 15
Average comments per issue: 0.4
Average comments per pull request: 0.09
Merged pull requests: 74
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 2
Average time to close issues: 4 months
Average time to close pull requests: 14 days
Issue authors: 2
Pull request authors: 2
Average comments per issue: 0.5
Average comments per pull request: 2.5
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mmokrejs (2)
evolighting (1)
ChrisHIV (1)
idwProxima (1)
TheChymera (1)
UzmaBasit (1)
martinghunt (1)
fgvieira (1)
galud27 (1)

Pull Request Authors

martinghunt (56)
trstickland (3)
satta (3)
ssjunnebo (3)
emollier (2)
mbhall88 (2)
Adamtaranto (2)
nds (2)
vaofford (1)
andreyto (1)
andrewjpage (1)
aslett1 (1)
js21 (1)
mmokrejs (1)
sanjaymsh (1)

Top Labels

Issue Labels

invalid (1)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 4,524 last-month
Total docker downloads: 6,888

Total dependent packages: 1
(may contain duplicates)
Total dependent repositories: 42
(may contain duplicates)
Total versions: 29
Total maintainers: 3

pypi.org: pyfastaq

Script to manipulate FASTA and FASTQ files, plus API for developers.

Documentation: https://pyfastaq.readthedocs.io/
License: GPLv3
Latest release: 3.18.0
published 11 months ago

Versions: 22
Dependent Packages: 0
Dependent Repositories: 42
Downloads: 4,524 Last month
Docker Downloads: 6,888

Rankings

Docker downloads count: 1.2%

Dependent repos count: 2.3%

Average: 6.8%

Stargazers count: 8.4%

Forks count: 8.6%

Dependent packages count: 10.0%

Downloads: 10.2%

Maintainers (2)

sanger-pathogens martinghunt

Last synced: 6 months ago

spack.io: py-pyfastaq

Manipulate FASTA and FASTQ files.

Homepage: https://github.com/sanger-pathogens/Fastaq
License: []
Latest release: 3.17.0
published about 2 years ago

Versions: 7
Dependent Packages: 1
Dependent Repositories: 0

Rankings

Dependent repos count: 0.0%

Average: 28.6%

Dependent packages count: 57.3%

Maintainers (1)

adamjstewart

Last synced: 6 months ago

py-pyfastaq

Science Score: 10.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Fastaq

Contents

Introduction

Installation

Pip install

From source

Clone and install from this repository:

Running the tests

Runtime dependencies

Usage

Examples

Available commands

For developers

License

Feedback/Issues

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pyfastaq

Rankings

Maintainers (2)

spack.io: py-pyfastaq

Rankings

Maintainers (1)