SeqPanther

SeqPanther: Sequence manipulation and mutation statistics toolset - Published in JOSS (2023)

https://github.com/codemeleon/seqpanther

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Biology Life Sciences - 63% confidence
Last synced: 4 months ago · JSON representation

Repository

Variant Caller and Patcher

Basic Info
  • Host: GitHub
  • Owner: codemeleon
  • License: gpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 3.23 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 4
  • Open Issues: 1
  • Releases: 1
Created over 3 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License

README.md

SeqPanther

DOI

SeqPanther is a Python application that provides the user with a suite of tools to further interrogate the circumstance under which certain mutations occur or are missed and to enables the user to modify the consensus as needed. The tool is applicable to non-segmented bacterial and viral genomes where reads are mapped to a reference sequence. SeqPanther generates detailed reports of mutations identified within a genomic segment or positions of interest, including visualization of the genome coverage and depth. The tool is particularly useful in the examination of multiple next-generation sequencing (NGS) short-read samples. Additionally, we have integrated Seqpatcher Singh et. al. which supports the merging of Sanger sequences, or consensus thereof into their respective NGS consensus.

Seqpanther workflow, genome coverage plot and codon count report

SeqPanther consists of the following set of commands:

  • codoncounter: performs variant calling and generates nucleotide stats at variant sites and reports the impacts of nucleotide changes on amino acids in the translated proteins.

  • cc2ns (codon counter 2 nucleotide substitution): performs transformation of codon counter output in a format where the user can select a variant to integrate into the reference or assemblies using the cc2ns command. The user is encourage to inspect and edit the output from codoncounter to only keep the desired changes before running the cc2ns command.

  • nucsubs: accepts as input the changes file generated by cc2ns and modifies the consensus sequences accordingly i.e. adds or changes the mutations as specified in the file. <!--Providing options to users to select changes of their interests-->

  • seqpatcher: integrates sanger sequencing of missing regions of an incomplete assembly to the assembly. This command is a modification of the SeqPatcher tool.

Operating system compatibility

Unix and OS X Commandline Application.

Dependencies

The tool relies on multiple external open source programs and python modules as listed below:

External Tools

  • python>=3.7
  1. Samtools
  • For sorting bam files and indexing them.
  • conda install -c bioconda samtools
  1. Bcftools
  • For parsing variant calling and generating consensus sequences.
  • conda install -c bioconda bcftools to install.
  1. Muscle (v. 3.8.31)
  • To perform multiple sequence alignment.
  • conda install -c bioconda muscle=3.8.31 to install.
  1. BLAT
  • To query sequence location in the genome.
  • conda install -c bioconda ucsc-blat to install.
  1. MAFFT
  • To align consensus sequence against the reference.
  • conda install -c bioconda mafft to install.

Installation

Option 1: Clone repo and install locally

  1. git clone https://github.com/codemeleon/seqPanther
  2. cd seqPanther
  3. pip install .

Option 2: Install directly from Git

To install directly from the Github repo, run the command:

pip install git+https://github.com/codemeleon/seqPanther.git

Usages

seqPanther contains four commands. The commands are listed in the help menu. To view the help menu, type: seqpanther --help or just seqpanther.

codoncounter

This command help is accessible using seqpanther codoncounter or seqpanther codoncounter --help.

cc2ns

This command help is accessible using seqpanther cc2ns or seqpanther cc2ns --help.

nucsubs

You might need to convert bam to consensus before running seqpanther nucsubs. Consensus sequences can be generated using following commands.

  • samtools index <sorted_bamfile>
  • bcftools mpileup -f <reference_fasta> <sorted_bamfile> | bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > <sorted_bamfile>.fq
  • seqtk seq seq -aQ64 <sorted_bamfile>.fq > <sorted_bamfile>.fasta

This command help is accessible using seqpanther nucsubs or seqpanther nucsubs --help.

SeqPatcher

This command help is accessible at seqpanther seqpatcher or seqpanther seqpatcher --help.

Example

  1. A sample dataset of five BAM files from South African SARS-CoV-2 samples can be downloaded from SeqPanther zenodo page. The BAM files were generated by mapping reads mapped against reference (NC_045512.2) using BWA. BWA was also used to generate the SAM files which were converted to BAM and sorted using samtools.

  2. Reference sequences in Fasta format and genome annotation GFF files were downloaded from NCBI Genome Fasta and NCBI Genome GFF respectively. The downloaded files need to un-compressed.

  3. Consensus sequences can be calculated from the BAM files using the command bcftools mpileup -f GCF_009858895.2_ASM985889v3_genomic.fna bam/K032258-consensus_alignment_sorted.bam| bcftools call -c --ploidy 1 | vcfutils.pl vcf2fq > K032258-consensus_alignment_sorted.fastq and python fastq2fasta.py -i K032258-consensus_alignment_sorted.fastq -o consensus/K032258-consensus_alignment_sorted.fasta. fastq2fasta.py can be downloaded from the project repo.

  4. Store the downloaded BAM files in a folder, e.g. called bam.

  5. To generate a summary of the codons, nucleotides and indel changes in the Spike gene of these samples, use the command seqpanther codoncounter -bam bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384. This will generate the summaries for all the files in bam folder.

The command will generate four outputs in the current folder including: sub_output.csv containing details of the nucleotide substitutions, indel_output.csv containing details of the indel events, codon_output.csv containing details of the codon changes and output.pdf which is a plot of genome depth and breadth of coverage annotated with the positions with mutations and indels.

  1. If you only want to generate the results for a single BAM file, run the command as seqpanther codoncounter -bam ./bam/K032282-consensus_alignment_sorted.bam -rid NC_045512.2 -ref GCF_009858895.2_ASM985889v3_genomic.fna -gff GCF_009858895.2_ASM985889v3_genomic.gff -coor_range 21563-25384 replacing the BAM file name with your specific bam file name in the command.

  2. Outputs can be explored using a text file reader (for the text files) and pdf reader (e.g Adobe Reader) for the PDFs. An example command to view the text files would be: cat sub_output.csv | sed 's/,/ ,/g' | column -t -s, | less -S. The user needs to explore those files and remove the changes they would like not to be integrated. A text editor of your choice e.g. bbedit or notepad++ can be used to edit the files.

  3. In case you decide that there are certain mutations that you need to change, you will have to convert the outputs from codoncounter to the format required by the nucsubs command and run the command seqpanther cc2ns -s sub_output.csv -i sub_output.csv -o changes. It generates a CSV file for each sample in the ./change folder.

  4. Then execute seqpanther as follows: seqpanther nucsubs -i NC_045512.2 -r NC_045512.2.fasta -c consensus -t changes -o results to integrate relevant changes to the consensus sequences. The output will be generated in a folder named results.

  5. Example data for seqpanther seqpatcher is provided in examples/seqpatcher folder of this project.

  6. To run seqpanther seqpatcher on the example data, use seqpanther seqpatcher -s examples/seqpatcher/ab1 -a examples/seqpatcher/assemblies -o Results -t mmf.csv -O sanger.fasta -g 10 -3 True -x del. It will generate consensus sequences from the Sanger trace files and integrate them into the ngs consensus. The mmf.csv file tells the application whether the sanger data were paired or single ab1, or in single fasta, and sanger.fasta containing all sanger sequences.

Warning

Recursive use of newly generated consensus sequences might result in incorrect final sequence due to the integration of indel events.

Bug reporting

To report a bug, request support or propose a new feature, please open an issue.

Licence

GNU GPL v3.0

Citation

If you use this software please cite:

SeqPanther: Sequence manipulation and mutation statistics toolset. James Emmanuel San, Stephanie Van Wyk, Houriiyah Tegally, Simeon Eche, Eduan Wilkinson, Aquillah M. Kanzi, Tulio de Oliveira, Anmol M. Kiran. bioRxiv 2023. doi: 10.1101/2023.01.26.525629

Zenodo:

DOI

Owner

  • Name: Anmol Kiran
  • Login: codemeleon
  • Kind: user

JOSS Publication

SeqPanther: Sequence manipulation and mutation statistics toolset
Published
July 20, 2023
Volume 8, Issue 87, Page 5305
Authors
James Emmanuel San ORCID
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
Stephanie van Wyk ORCID
Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
Houriiyah Tegally ORCID
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
Simeon Eche ORCID
Yale University School of Medicine, New Haven, Connecticut, United States of America
Eduan Wilkinson ORCID
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa
Aquillah M. Kanzi ORCID
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa
Tulio de Oliveira ORCID
KwaZulu Natal Research and Innovation Sequencing Platform, KRISP, University of KwaZulu Natal, Durban, South Africa, Centre for Epidemic Response and Innovation, CERI, University of Stellenbosch, Stellenbosch, South Africa, Department of Global Health, University of Washington, Seattle, WA, United States of America
Anmol M. Kiran ORCID
School of Biochemistry and Cell Biology, University College Cork, Cork, T12 XF62, Ireland
Editor
Kelly Rowland ORCID
Tags
Bioinformatics sequence analysis NGS codon amino acid substitution nucleotide substitution INDEL

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 159
  • Total Committers: 6
  • Avg Commits per committer: 26.5
  • Development Distribution Score (DDS): 0.239
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
codemeleon a****n@g****m 121
jsan4christ s****s@g****m 29
Christian Brueffer c****n@b****o 5
C. Titus Brown t****s@i****g 2
Kevin Mattheus Moerman K****n 1
Kelly Rowland k****d@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 8
  • Total pull requests: 12
  • Average time to close issues: 23 days
  • Average time to close pull requests: 18 days
  • Total issue authors: 4
  • Total pull request authors: 6
  • Average comments per issue: 1.63
  • Average comments per pull request: 0.67
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ctb (4)
  • cbrueffer (2)
  • kellyrowland (1)
  • Cricetinae-hamster (1)
Pull Request Authors
  • cbrueffer (5)
  • jsan4christ (3)
  • kellyrowland (1)
  • Kevin-Mattheus-Moerman (1)
  • codemeleon (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/draft-pdf.yml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
setup.py pypi
  • biopython >=1.80
  • click >=7.1.2
  • matplotlib >=3.6.2
  • numpy >=1.22.1
  • pandas >=1.5.2
  • pyfaidx >=0.6.3.1
  • pysam >=0.18.0
environment.yml pypi