aminonaut
Software to identify and analyse nullomer peptides from UniProt data
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Keywords
Repository
Software to identify and analyse nullomer peptides from UniProt data
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
NullomerPeptides
Software to identify and analyse nullomer peptides from UniProt (Swiss-Prot) XML data
This software is an output of the Leverhulme funded project: "Understanding the biological ramifications of a ‘forbidden’ peptide nullomer."
The programs
To generate counts of observed peptides within nature, identify the nullomer peptides and then perform motif searching, first obtain the most recent UniProt Swiss-Prot release as XML. Then in succession and with the usage bellow, run the countpeptides and findnullomer_motifs program. The software tools contained in this package support compressed input and output, inferred from the filename. Files ending in .gz are considered gzip compressed.
Count_peptides.py
This program reads in an XML file such as UniProt (Swiss-Prot), extracting protein sequences between <sequence> tags. Once found, and operating on a certain length peptide, the number of unique peptides of that length are obtained from the file.
countpeptides.py was used to identify peptide nullomers in the _"Understanding the biological ramifications of a ‘forbidden’ peptide nullomer" project.
Usage
``` python count_peptides.py
usage: countpeptides.py [-h] [-c OUTPUT_CUTOFF] [--version] uniprotinputfile outputfilename peptidelengths countpeptides.py: error: the following arguments are required: uniprotinputfile, outputfilename, peptidelengths ```
Extended usage
``` python .\countpeptides.py -h usage: countpeptides.py [-h] [-c OUTPUTCUTOFF] [--version] uniprotinputfile outputfilename peptide_lengths
positional arguments: uniprotinputfile File containing sequences outputfilename File to output sequence counts to peptidelengths Length of peptides
optional arguments: -h, --help show this help message and exit -c OUTPUTCUTOFF, --outputcutoff OUTPUT_CUTOFF Dont output sequences apearing more than this many times --version show program's version number and exit ```
To generate 2,3,4,5, and 6-mer peptide counts, the following commands may be used:
python countpeptides.py uniprotswissprot.xml.gz petides-2mers.txt 2
python countpeptides.py uniprotswissprot.xml.gz petides-3mers.txt 3
python countpeptides.py uniprotswissprot.xml.gz petides-4mers.txt 4
python countpeptides.py uniprotswissprot.xml.gz petides-5mers.txt.gz 5
python countpeptides.py uniprotswissprot.xml.gz petides-6mers.txt.gz 6
Note we specify a file extension ending in .gz for the 5 and 6-mers. This is because of the large number of unique peptides possible in 5 and 6-mer peptides (20^5, and 20^6).
findnullomermotifs.py and findnullomermotifsforwardand_backwards.py
These program read in output from the previous countpeptides.py program, essentailly CSV files with the first column containing the unique peptide sequence, and followed by another column with the number of times the peptide was found. With this read in, it constructs set length motifs and queries how many nullomers this motif covers. For example, with the identification of CQWW, we may theorise that the motiff C..W where dot is any amino acid is enriched within the 5-mers. We may use findnullomermotifs to enumerate all possible motifs and query the previously generated datasets, counting how many nullomers match the queried motif. The program findnullomermotifs matches motifs in only the forwards direction, so C..W would match CQWW, and not WWQC. The program findnullomermotifsforwardandbackwards matches in both directions, so the morif C..W would match CQWW and WWQC.
Usage
python find_nullomer_motifs.py
usage: find_nullomer_motifs.py [-h] [--version] input_filename output_filename motif_length
find_nullomer_motifs.py: error: the following arguments are required: input_filename, output_filename, motif_length
Extended usage
``` python findnullomermotifs.py -h usage: findnullomermotifs.py [-h] [--version] inputfilename outputfilename motif_length
positional arguments: inputfilename File containing peptides and counts comma separated outputfilename File to output motif hit counts to motif_length File containing sequences
optional arguments: -h, --help show this help message and exit --version show program's version number and exit ```
To generate generate motifs and their nullomer coverage, we cann use the following: To identify nullomer motifs of length 4 within the 5-mer peptides:
python findnullomermotifs petides-5mers.txt.gz motifs-4mersIn5mers.txt 4
To identify nullomer motifs of length 3 within the 6-mer peptides:
python findnullomermotifs petides-6mers.txt.gz motifs-3mersIn6mers.txt 3
Requirements
- Python (>=3.6)
- Numpy (>=1.15.0)
Owner
- Name: Steven Shave
- Login: stevenshave
- Kind: user
- Company: University of Edinburgh, GSK
- Repositories: 2
- Profile: https://github.com/stevenshave
Cheminformatics, Data Science, ML and Drug Discovery at the University of Edinburgh & GSK
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: Aminonaut
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Steven
family-names: Shave
email: s.shave@ed.ac.uk
affiliation: University of Edinburgh
orcid: 'https://orcid.org/0000-0001-6996-3663'
repository-code: 'https://github.com/stevenshave/Aminonaut'
url: 'https://sites.google.com/view/nullomerpeptides/software'
abstract: >-
Software to identify and analyse nullomer peptides from
UniProt (Swiss-Prot) XML data. This software is an output
of the Leverhulme funded project: "Understanding the
biological ramifications of a ‘forbidden’ peptide
nullomer."
keywords:
- Nullomers
- NulloPs
- Nullomer peptides
license: MIT
commit: 9eb4477acb6218fba75882a3a39e4129d211f213
version: 1.0.0
date-released: '2022-04-02'