SeqPanther

SeqPanther: Sequence manipulation and mutation statistics toolset - Published in JOSS (2023)

https://github.com/codemeleon/seqpanther

Scientific Fields

Biology Life Sciences - 63% confidence

Last synced: 6 months ago · JSON representation

Repository

Variant Caller and Patcher

Basic Info

Host: GitHub
Owner: codemeleon
License: gpl-3.0
Language: Python
Default Branch: master
Size: 3.23 MB

Statistics

Stars: 1
Watchers: 2
Forks: 4
Open Issues: 1
Releases: 1

Created over 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme Contributing License

SeqPanther

SeqPanther is a Python application that provides the user with a suite of tools to further interrogate the circumstance under which certain mutations occur or are missed and to enables the user to modify the consensus as needed. The tool is applicable to non-segmented bacterial and viral genomes where reads are mapped to a reference sequence. SeqPanther generates detailed reports of mutations identified within a genomic segment or positions of interest, including visualization of the genome coverage and depth. The tool is particularly useful in the examination of multiple next-generation sequencing (NGS) short-read samples. Additionally, we have integrated Seqpatcher Singh et. al. which supports the merging of Sanger sequences, or consensus thereof into their respective NGS consensus.

SeqPanther consists of the following set of commands:

codoncounter: performs variant calling and generates nucleotide stats at variant sites and reports the impacts of nucleotide changes on amino acids in the translated proteins.
cc2ns (codon counter 2 nucleotide substitution): performs transformation of codon counter output in a format where the user can select a variant to integrate into the reference or assemblies using the cc2ns command. The user is encourage to inspect and edit the output from codoncounter to only keep the desired changes before running the cc2ns command.
nucsubs: accepts as input the changes file generated by cc2ns and modifies the consensus sequences accordingly i.e. adds or changes the mutations as specified in the file.
seqpatcher: integrates sanger sequencing of missing regions of an incomplete assembly to the assembly. This command is a modification of the SeqPatcher tool.

Operating system compatibility

Unix and OS X Commandline Application.

Dependencies

The tool relies on multiple external open source programs and python modules as listed below:

External Tools

python>=3.7

Samtools

For sorting bam files and indexing them.
conda install -c bioconda samtools

Bcftools

For parsing variant calling and generating consensus sequences.
conda install -c bioconda bcftools to install.

Muscle (v. 3.8.31)

To perform multiple sequence alignment.
conda install -c bioconda muscle=3.8.31 to install.

BLAT

To query sequence location in the genome.
conda install -c bioconda ucsc-blat to install.

MAFFT

To align consensus sequence against the reference.
conda install -c bioconda mafft to install.

Installation

Option 1: Clone repo and install locally

git clone https://github.com/codemeleon/seqPanther
cd seqPanther
pip install .

Option 2: Install directly from Git

To install directly from the Github repo, run the command:

pip install git+https://github.com/codemeleon/seqPanther.git

Usages

seqPanther contains four commands. The commands are listed in the help menu. To view the help menu, type: seqpanther --help or just seqpanther.

codoncounter

This command help is accessible using seqpanther codoncounter or seqpanther codoncounter --help.

cc2ns

This command help is accessible using seqpanther cc2ns or seqpanther cc2ns --help.

nucsubs

You might need to convert bam to consensus before running seqpanther nucsubs. Consensus sequences can be generated using following commands.

samtools index <sorted_bamfile>
bcftools mpileup -f <reference_fasta> <sorted_bamfile> | bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > <sorted_bamfile>.fq
seqtk seq seq -aQ64 <sorted_bamfile>.fq > <sorted_bamfile>.fasta

This command help is accessible using seqpanther nucsubs or seqpanther nucsubs --help.

SeqPatcher

This command help is accessible at seqpanther seqpatcher or seqpanther seqpatcher --help.

Example

A sample dataset of five BAM files from South African SARS-CoV-2 samples can be downloaded from SeqPanther zenodo page. The BAM files were generated by mapping reads mapped against reference (NC_045512.2) using BWA. BWA was also used to generate the SAM files which were converted to BAM and sorted using samtools.
Reference sequences in Fasta format and genome annotation GFF files were downloaded from NCBI Genome Fasta and NCBI Genome GFF respectively. The downloaded files need to un-compressed.
Consensus sequences can be calculated from the BAM files using the command bcftools mpileup -f GCF_009858895.2_ASM985889v3_genomic.fna bam/K032258-consensus_alignment_sorted.bam| bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > K032258-consensus_alignment_sorted.fastq and python fastq2fasta.py -i K032258-consensus_alignment_sorted.fastq -o consensus/K032258-consensus_alignment_sorted.fasta. fastq2fasta.py can be downloaded from the project repo.
Store the downloaded BAM files in a folder, e.g. called bam.
To generate a summary of the codons, nucleotides and indel changes in the Spike gene of these samples, use the command seqpanther codoncounter -bam bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384. This will generate the summaries for all the files in bam folder.

The command will generate four outputs in the current folder including: sub_output.csv containing details of the nucleotide substitutions, indel_output.csv containing details of the indel events, codon_output.csv containing details of the codon changes and output.pdf which is a plot of genome depth and breadth of coverage annotated with the positions with mutations and indels.

If you only want to generate the results for a single BAM file, run the command as seqpanther codoncounter -bam ./bam/K032282-consensus_alignment_sorted.bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384 replacing the BAM file name with your specific bam file name in the command.
Outputs can be explored using a text file reader (for the text files) and pdf reader (e.g Adobe Reader) for the PDFs. An example command to view the text files would be: cat sub_output.csv | sed 's/,/ ,/g' | column -t -s, | less -S. The user needs to explore those files and remove the changes they would like not to be integrated. A text editor of your choice e.g. bbedit or notepad++ can be used to edit the files.
In case you decide that there are certain mutations that you need to change, you will have to convert the outputs from codoncounter to the format required by the nucsubs command and run the command seqpanther cc2ns -s sub_output.csv -i sub_output.csv -o changes. It generates a CSV file for each sample in the ./change folder.
Then execute seqpanther as follows: seqpanther nucsubs -i NC_045512.2 -r NC_045512.2.fasta -c consensus -t changes -o results to integrate relevant changes to the consensus sequences. The output will be generated in a folder named results.
Example data for seqpanther seqpatcher is provided in examples/seqpatcher folder of this project.
To run seqpanther seqpatcher on the example data, use seqpanther seqpatcher -s examples/seqpatcher/ab1 -a examples/seqpatcher/assemblies -o Results -t mmf.csv -O sanger.fasta -g 10 -3 True -x del. It will generate consensus sequences from the Sanger trace files and integrate them into the ngs consensus. The mmf.csv file tells the application whether the sanger data were paired or single ab1, or in single fasta, and sanger.fasta containing all sanger sequences.

Warning

Recursive use of newly generated consensus sequences might result in incorrect final sequence due to the integration of indel events.

Bug reporting

To report a bug, request support or propose a new feature, please open an issue.

Licence

GNU GPL v3.0

Citation

If you use this software please cite:

SeqPanther: Sequence manipulation and mutation statistics toolset. James Emmanuel San, Stephanie Van Wyk, Houriiyah Tegally, Simeon Eche, Eduan Wilkinson, Aquillah M. Kanzi, Tulio de Oliveira, Anmol M. Kiran. bioRxiv 2023. doi: 10.1101/2023.01.26.525629

Zenodo:

Owner

Name: Anmol Kiran
Login: codemeleon
Kind: user

Repositories: 5
Profile: https://github.com/codemeleon

JOSS Publication

SeqPanther: Sequence manipulation and mutation statistics toolset

Published

July 20, 2023

DOI

10.21105/joss.05305

Volume 8, Issue 87, Page 5305

Authors

James Emmanuel San

KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa

Stephanie van Wyk

Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa

Houriiyah Tegally

KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa

Simeon Eche

Yale University School of Medicine, New Haven, Connecticut, United States of America

Eduan Wilkinson

KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa

Aquillah M. Kanzi

KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa

Tulio de Oliveira

KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa, Department of Global Health, University of Washington, Seattle, WA, United States of America

Anmol M. Kiran

School of Biochemistry and Cell Biology, University College Cork, Cork, T12 XF62, Ireland

Editor

Kelly Rowland

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 159
Total Committers: 6
Avg Commits per committer: 26.5
Development Distribution Score (DDS): 0.239

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
codemeleon	a**n@g**m	121
jsan4christ	s**s@g**m	29
Christian Brueffer	c**n@b**o	5
C. Titus Brown	t**s@i**g	2
Kevin Mattheus Moerman	K****n	1
Kelly Rowland	k**d@g**m	1

Committer Domains (Top 20 + Academic)

idyll.org: 1 brueffer.io: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 8
Total pull requests: 12
Average time to close issues: 23 days
Average time to close pull requests: 18 days
Total issue authors: 4
Total pull request authors: 6
Average comments per issue: 1.63
Average comments per pull request: 0.67
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

ctb (4)
cbrueffer (2)
kellyrowland (1)
Cricetinae-hamster (1)

Pull Request Authors

cbrueffer (5)
jsan4christ (3)
kellyrowland (1)
Kevin-Mattheus-Moerman (1)
codemeleon (1)

SeqPanther

Science Score: 93.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

SeqPanther

Operating system compatibility

Dependencies

External Tools

Installation

Option 1: Clone repo and install locally

Option 2: Install directly from Git

Usages

codoncounter

cc2ns

nucsubs

SeqPatcher

Example

Warning

Bug reporting

Licence

Citation

Owner

JOSS Publication

SeqPanther: Sequence manipulation and mutation statistics toolset

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies