SeqPanther
SeqPanther: Sequence manipulation and mutation statistics toolset - Published in JOSS (2023)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Variant Caller and Patcher
Basic Info
- Host: GitHub
- Owner: codemeleon
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 3.23 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 4
- Open Issues: 1
- Releases: 1
Metadata Files
README.md
SeqPanther
SeqPanther is a Python application that provides the user with a suite of tools to further interrogate the circumstance under which certain mutations occur or are missed and to enables the user to modify the consensus as needed. The tool is applicable to non-segmented bacterial and viral genomes where reads are mapped to a reference sequence. SeqPanther generates detailed reports of mutations identified within a genomic segment or positions of interest, including visualization of the genome coverage and depth. The tool is particularly useful in the examination of multiple next-generation sequencing (NGS) short-read samples. Additionally, we have integrated Seqpatcher Singh et. al. which supports the merging of Sanger sequences, or consensus thereof into their respective NGS consensus.

SeqPanther consists of the following set of commands:
codoncounter: performs variant calling and generates nucleotide stats at variant sites and reports the impacts of nucleotide changes on amino acids in the translated proteins.
cc2ns (codon counter 2 nucleotide substitution): performs transformation of codon counter output in a format where the user can select a variant to integrate into the reference or assemblies using the cc2ns command. The user is encourage to inspect and edit the output from codoncounter to only keep the desired changes before running the cc2ns command.
nucsubs: accepts as input the changes file generated by cc2ns and modifies the consensus sequences accordingly i.e. adds or changes the mutations as specified in the file. <!--Providing options to users to select changes of their interests-->
seqpatcher: integrates sanger sequencing of missing regions of an incomplete assembly to the assembly. This command is a modification of the SeqPatcher tool.
Operating system compatibility
Unix and OS X Commandline Application.
Dependencies
The tool relies on multiple external open source programs and python modules as listed below:
External Tools
- python>=3.7
- Samtools
- For sorting bam files and indexing them.
conda install -c bioconda samtools
- Bcftools
- For parsing variant calling and generating consensus sequences.
conda install -c bioconda bcftoolsto install.
- Muscle (v. 3.8.31)
- To perform multiple sequence alignment.
conda install -c bioconda muscle=3.8.31to install.
- BLAT
- To query sequence location in the genome.
conda install -c bioconda ucsc-blatto install.
- MAFFT
- To align consensus sequence against the reference.
conda install -c bioconda mafftto install.
Installation
Option 1: Clone repo and install locally
git clone https://github.com/codemeleon/seqPanthercd seqPantherpip install .
Option 2: Install directly from Git
To install directly from the Github repo, run the command:
pip install git+https://github.com/codemeleon/seqPanther.git
Usages
seqPanther contains four commands. The commands are listed in the help menu. To view the help menu, type: seqpanther --help or just seqpanther.
codoncounter
This command help is accessible using seqpanther codoncounter or seqpanther codoncounter --help.
cc2ns
This command help is accessible using seqpanther cc2ns or seqpanther cc2ns --help.
nucsubs
You might need to convert bam to consensus before running seqpanther nucsubs. Consensus sequences can be generated using following commands.
samtools index <sorted_bamfile>bcftools mpileup -f <reference_fasta> <sorted_bamfile> | bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > <sorted_bamfile>.fqseqtk seq seq -aQ64 <sorted_bamfile>.fq > <sorted_bamfile>.fasta
This command help is accessible using seqpanther nucsubs or seqpanther nucsubs --help.
SeqPatcher
This command help is accessible at seqpanther seqpatcher or seqpanther seqpatcher --help.
Example
A sample dataset of five BAM files from South African SARS-CoV-2 samples can be downloaded from SeqPanther zenodo page. The BAM files were generated by mapping reads mapped against reference (NC_045512.2) using BWA. BWA was also used to generate the SAM files which were converted to BAM and sorted using samtools.
Reference sequences in Fasta format and genome annotation GFF files were downloaded from NCBI Genome Fasta and NCBI Genome GFF respectively. The downloaded files need to un-compressed.
Consensus sequences can be calculated from the BAM files using the command
bcftools mpileup -f GCF_009858895.2_ASM985889v3_genomic.fna bam/K032258-consensus_alignment_sorted.bam| bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > K032258-consensus_alignment_sorted.fastqandpython fastq2fasta.py -i K032258-consensus_alignment_sorted.fastq -o consensus/K032258-consensus_alignment_sorted.fasta.fastq2fasta.pycan be downloaded from the project repo.Store the downloaded BAM files in a folder, e.g. called
bam.To generate a summary of the codons, nucleotides and indel changes in the Spike gene of these samples, use the command
seqpanther codoncounter -bam bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384. This will generate the summaries for all the files in bam folder.
The command will generate four outputs in the current folder including: sub_output.csv containing details of the nucleotide substitutions, indel_output.csv containing details of the indel events, codon_output.csv containing details of the codon changes and output.pdf which is a plot of genome depth and breadth of coverage annotated with the positions with mutations and indels.
If you only want to generate the results for a single BAM file, run the command as
seqpanther codoncounter -bam ./bam/K032282-consensus_alignment_sorted.bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384replacing the BAM file name with your specific bam file name in the command.Outputs can be explored using a text file reader (for the text files) and pdf reader (e.g Adobe Reader) for the PDFs. An example command to view the text files would be:
cat sub_output.csv | sed 's/,/ ,/g' | column -t -s, | less -S. The user needs to explore those files and remove the changes they would like not to be integrated. A text editor of your choice e.g. bbedit or notepad++ can be used to edit the files.In case you decide that there are certain mutations that you need to change, you will have to convert the outputs from
codoncounterto the format required by thenucsubscommand and run the commandseqpanther cc2ns -s sub_output.csv -i sub_output.csv -o changes. It generates a CSV file for each sample in the./changefolder.Then execute seqpanther as follows:
seqpanther nucsubs -i NC_045512.2 -r NC_045512.2.fasta -c consensus -t changes -o resultsto integrate relevant changes to the consensus sequences. The output will be generated in a folder namedresults.Example data for
seqpanther seqpatcheris provided inexamples/seqpatcherfolder of this project.To run
seqpanther seqpatcheron the example data, useseqpanther seqpatcher -s examples/seqpatcher/ab1 -a examples/seqpatcher/assemblies -o Results -t mmf.csv -O sanger.fasta -g 10 -3 True -x del. It will generate consensus sequences from the Sanger trace files and integrate them into the ngs consensus. The mmf.csv file tells the application whether the sanger data were paired or single ab1, or in single fasta, and sanger.fasta containing all sanger sequences.
Warning
Recursive use of newly generated consensus sequences might result in incorrect final sequence due to the integration of indel events.
Bug reporting
To report a bug, request support or propose a new feature, please open an issue.
Licence
GNU GPL v3.0
Citation
If you use this software please cite:
SeqPanther: Sequence manipulation and mutation statistics toolset. James Emmanuel San, Stephanie Van Wyk, Houriiyah Tegally, Simeon Eche, Eduan Wilkinson, Aquillah M. Kanzi, Tulio de Oliveira, Anmol M. Kiran. bioRxiv 2023. doi: 10.1101/2023.01.26.525629
Zenodo:
Owner
- Name: Anmol Kiran
- Login: codemeleon
- Kind: user
- Repositories: 5
- Profile: https://github.com/codemeleon
JOSS Publication
SeqPanther: Sequence manipulation and mutation statistics toolset
Authors
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa, Department of Global Health, University of Washington, Seattle, WA, United States of America
Tags
Bioinformatics sequence analysis NGS codon amino acid substitution nucleotide substitution INDELGitHub Events
Total
Last Year
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| codemeleon | a****n@g****m | 121 |
| jsan4christ | s****s@g****m | 29 |
| Christian Brueffer | c****n@b****o | 5 |
| C. Titus Brown | t****s@i****g | 2 |
| Kevin Mattheus Moerman | K****n | 1 |
| Kelly Rowland | k****d@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 8
- Total pull requests: 12
- Average time to close issues: 23 days
- Average time to close pull requests: 18 days
- Total issue authors: 4
- Total pull request authors: 6
- Average comments per issue: 1.63
- Average comments per pull request: 0.67
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ctb (4)
- cbrueffer (2)
- kellyrowland (1)
- Cricetinae-hamster (1)
Pull Request Authors
- cbrueffer (5)
- jsan4christ (3)
- kellyrowland (1)
- Kevin-Mattheus-Moerman (1)
- codemeleon (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
- biopython >=1.80
- click >=7.1.2
- matplotlib >=3.6.2
- numpy >=1.22.1
- pandas >=1.5.2
- pyfaidx >=0.6.3.1
- pysam >=0.18.0