aminonaut

Software to identify and analyse nullomer peptides from UniProt data

https://github.com/stevenshave/aminonaut

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.4%) to scientific vocabulary

Keywords

count-peptides find-nullomer-motifs nullomer-peptides nullomers unique-peptides
Last synced: 6 months ago · JSON representation ·

Repository

Software to identify and analyse nullomer peptides from UniProt data

Basic Info
  • Host: GitHub
  • Owner: stevenshave
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 55.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
count-peptides find-nullomer-motifs nullomer-peptides nullomers unique-peptides
Created almost 6 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

NullomerPeptides

Software to identify and analyse nullomer peptides from UniProt (Swiss-Prot) XML data

This software is an output of the Leverhulme funded project: "Understanding the biological ramifications of a ‘forbidden’ peptide nullomer."

The programs

To generate counts of observed peptides within nature, identify the nullomer peptides and then perform motif searching, first obtain the most recent UniProt Swiss-Prot release as XML. Then in succession and with the usage bellow, run the countpeptides and findnullomer_motifs program. The software tools contained in this package support compressed input and output, inferred from the filename. Files ending in .gz are considered gzip compressed.

Count_peptides.py

This program reads in an XML file such as UniProt (Swiss-Prot), extracting protein sequences between <sequence> tags. Once found, and operating on a certain length peptide, the number of unique peptides of that length are obtained from the file.

countpeptides.py was used to identify peptide nullomers in the _"Understanding the biological ramifications of a ‘forbidden’ peptide nullomer" project.

Usage

``` python count_peptides.py

usage: countpeptides.py [-h] [-c OUTPUT_CUTOFF] [--version] uniprotinputfile outputfilename peptidelengths countpeptides.py: error: the following arguments are required: uniprotinputfile, outputfilename, peptidelengths ```

Extended usage

``` python .\countpeptides.py -h usage: countpeptides.py [-h] [-c OUTPUTCUTOFF] [--version] uniprotinputfile outputfilename peptide_lengths

positional arguments: uniprotinputfile File containing sequences outputfilename File to output sequence counts to peptidelengths Length of peptides

optional arguments: -h, --help show this help message and exit -c OUTPUTCUTOFF, --outputcutoff OUTPUT_CUTOFF Dont output sequences apearing more than this many times --version show program's version number and exit ```

To generate 2,3,4,5, and 6-mer peptide counts, the following commands may be used:

python countpeptides.py uniprotswissprot.xml.gz petides-2mers.txt 2

python countpeptides.py uniprotswissprot.xml.gz petides-3mers.txt 3

python countpeptides.py uniprotswissprot.xml.gz petides-4mers.txt 4

python countpeptides.py uniprotswissprot.xml.gz petides-5mers.txt.gz 5

python countpeptides.py uniprotswissprot.xml.gz petides-6mers.txt.gz 6

Note we specify a file extension ending in .gz for the 5 and 6-mers. This is because of the large number of unique peptides possible in 5 and 6-mer peptides (20^5, and 20^6).

findnullomermotifs.py and findnullomermotifsforwardand_backwards.py

These program read in output from the previous countpeptides.py program, essentailly CSV files with the first column containing the unique peptide sequence, and followed by another column with the number of times the peptide was found. With this read in, it constructs set length motifs and queries how many nullomers this motif covers. For example, with the identification of CQWW, we may theorise that the motiff C..W where dot is any amino acid is enriched within the 5-mers. We may use findnullomermotifs to enumerate all possible motifs and query the previously generated datasets, counting how many nullomers match the queried motif. The program findnullomermotifs matches motifs in only the forwards direction, so C..W would match CQWW, and not WWQC. The program findnullomermotifsforwardandbackwards matches in both directions, so the morif C..W would match CQWW and WWQC.

Usage

python find_nullomer_motifs.py usage: find_nullomer_motifs.py [-h] [--version] input_filename output_filename motif_length find_nullomer_motifs.py: error: the following arguments are required: input_filename, output_filename, motif_length

Extended usage

``` python findnullomermotifs.py -h usage: findnullomermotifs.py [-h] [--version] inputfilename outputfilename motif_length

positional arguments: inputfilename File containing peptides and counts comma separated outputfilename File to output motif hit counts to motif_length File containing sequences

optional arguments: -h, --help show this help message and exit --version show program's version number and exit ```

To generate generate motifs and their nullomer coverage, we cann use the following: To identify nullomer motifs of length 4 within the 5-mer peptides:

python findnullomermotifs petides-5mers.txt.gz motifs-4mersIn5mers.txt 4

To identify nullomer motifs of length 3 within the 6-mer peptides:

python findnullomermotifs petides-6mers.txt.gz motifs-3mersIn6mers.txt 3

Requirements

  • Python (>=3.6)
  • Numpy (>=1.15.0)

Owner

  • Name: Steven Shave
  • Login: stevenshave
  • Kind: user
  • Company: University of Edinburgh, GSK

Cheminformatics, Data Science, ML and Drug Discovery at the University of Edinburgh & GSK

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Aminonaut
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Steven
    family-names: Shave
    email: s.shave@ed.ac.uk
    affiliation: University of Edinburgh
    orcid: 'https://orcid.org/0000-0001-6996-3663'
repository-code: 'https://github.com/stevenshave/Aminonaut'
url: 'https://sites.google.com/view/nullomerpeptides/software'
abstract: >-
  Software to identify and analyse nullomer peptides from
  UniProt (Swiss-Prot) XML data. This software is an output
  of the Leverhulme funded project: "Understanding the
  biological ramifications of a ‘forbidden’ peptide
  nullomer."
keywords:
  - Nullomers
  - NulloPs
  - Nullomer peptides
license: MIT
commit: 9eb4477acb6218fba75882a3a39e4129d211f213
version: 1.0.0
date-released: '2022-04-02'

GitHub Events

Total
Last Year