proteotools

A simple Python package which lets you programmatically run a few proteomics search engines (Comet, X! Tandem, MS-GF+) and use compiled Trans-Proteomic Pipeline (TPP) binaries without needing to compile the entire pipeline.

https://github.com/kevinkovalchik/proteotools

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

A simple Python package which lets you programmatically run a few proteomics search engines (Comet, X! Tandem, MS-GF+) and use compiled Trans-Proteomic Pipeline (TPP) binaries without needing to compile the entire pipeline.

Basic Info
  • Host: GitHub
  • Owner: kevinkovalchik
  • Language: Python
  • Default Branch: master
  • Size: 45.9 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme Citation

README.md

proteotools

A simple Python package which lets you programmatically convert Thermo raw files using ThermoRawFileParser, run a few proteomics search engines (Comet, X! Tandem, MS-GF+), and use Trans-Proteomic Pipeline (TPP) binaries without needing to compile them on your computer, all from within Python!

Note that this does not run the entire TPP. For example, you don't get to use the fancy GUI or anything. It is simply a way to run compiled TPP tools such as PeptideProphet, InterProphetParser, idconvert, etc without needing to compile the entire pipeline on your computer.

TPP tools are run from a Singularity image, so an installation of Singularity is required. ThermoRawFileParser runs in Linux using Mono, so you need that too.

Requirements

proteotools only runs in Linux. Mostly the reason for this (in fact, probably entirely) is thatI have not made it check the OS and select the appropriate downloads and binaries. I use Ubuntu and I wrote this for myself. If anyone wishes to use it on another OS please raise an issue and I'll check it out.

proteotools requires Singularity to be installed on the computer. See below.

ThermoRawFileParser requires Mono.

Installation

Singularity

You will need Singularity installed on your computer to use most of the functionality of proteotools: https://sylabs.io/singularity. Apologies to Docker users. Perhaps one day I will create a Docker recipe for the TPP, and then everyone can be happy.

proteotools

I haven't put anything up on PyPI yet, so to install proteotools you will need to do the following: commandline cd /path/to/proteotools/repository/after/you/download/it/ pip install .

The first time you run proteotools you will need to have it install the available search engines and download the TPP image from the Singularity library. From a running Python interpreter, do this: ```python from proteotools.software import download_all

download_all() ```

This will download Comet, MS-GF+, X! Tandem, and the TPP Singularity image and install them in ~/.proteotools_software. Note that the installation process isn't very considerate, and will happily overwrite any conflicting files in ~/.proteotools_software, if that directory already exists for some weird reason. If you have already run download_all() and you run it again, it is going to go through the whole process again. But it doesn't take that long unless your internet connection is visiting from 1998.

Usage

Now that things are ready to go, you can...

search!

It is assumed that you have some understanding of the search engines and can set up the appropriate parameters files by yourself. ```python import proteotools.search as search from pathlib import Path

fasta = '/path/to/my/favorite/fastafile.fasta' mzmlfiles = list(Path('/path/to/a/directory/with/ms_files').glob('*.mzML'))

cometparams = '/path/to/an/appropriate/comet.params' search.comet(parameterfile=cometparams, fasta=fasta, *mzmlfiles)

tandemparams = '/path/to/an/appropriate/xtandeminputparameter.xml'

no taxonomy.xml required! (thanks to pyteomics.pepxmltk)

search.tandem(parameterfile=tandemparams, fasta=fasta, *mzml_files)

msgfparams = '/path/to/an/appropriate/MSGFPlusparameters.txt' search.msgfplus(parameterfile=msgfparams, fasta=fasta, *mzmlfiles, decoyprefix='rev', # Should be changed to whatever you use convertto_pepxml=True, # Default is True. Of course, we should all be using mzid files instead. ) ```

validate!

tpp.run_prophets runs InteractParser to fix common pepXML problems, PeptideProphetParser and InterProphetParser. There are a few parameters hardcoded in there, so if you want more control see the next section. ```python import proteotools.tpp as tpp

pepxmlfiles = list(Path('/path/to/the/folder/with/the/searchresults').glob('*.pepXML'))

tpp.runprophets(pepxmlfiles=pepxmlfiles, fasta=fasta, decoytag='rev', enzyme='trypsin', # needed by InteractParser to correct enzyme names in pepXML files peptideprophetflags=('ZERO', 'NONPARAM', 'DECOYPROBS'), # extra flags for PeptideProphet. see below for more details. iprophetflags=None, # same idea as peptideprophetflags iprophetoutfilename='interact-iproph.pepXML', # the final file threads=16, # How many threads iProphet gets to use iprophetminprob=0, # minimum output probability for iProphet mzmldirectory='/path/to/a/directory/with/the/searched/msfiles', # where the original mzML files are located skipexistinginteractpepxmls=True, # skips any files that starts with "interact-", because you are probably trying out different parameters and left the output files from a previous run hanging around. I'm not aware that recursive PSM validation is a helpful thing. maxpeptiderank=1 # the max peptide rank to leave in there, if the search engine report, e.g. the top 5 hits ) ```

run any TPP binary!

This is a simple example of running Tandem2XML. But you should be able to run any of the compiled TPP binaries. python tpp.run_tool(tool='Tandem2XML', # because as much as we should be using mzid files instead of pepXML, i would say pepXML are preferable to tandem XML files. command='/path/to/my/results/search_results.t.xml /path/to/my/results/desired_pepXML_search_results.pepXML', path_to_bind='/path/to/my/results' # this is important. it tells Singularity that it has permission to access this directory. )

get help on any TPP binary!

A convenience function to get the help output for a particular TPP tool. python tpp.tool_help(tool='Tandem2XML') The above will output: commandline Usage: /usr/local/tpp/bin/Tandem2XML [USEDESC] [SCANOFFSETxx] <input-file> [<output-file>] OPTIONS: DESCOFF: Don't Use Spectrum Descriptions for Naming Spectra in PepXML, (Default: parse scan number from the description) INDEXOFFxx: TPP assumes scans start at 1; older X!Tandem resultshad scan index starts at 0, add xx when converting to pepXML (Default: 0)

License information

Proteotools is released under the MIT license. Be sure you are aware of the licenses used by the other software tools of which Proteotools makes use (ThermoRawFileParser, Comet, MS-GF+, X! Tandem, Trans-Proteomic Pipeline).

Citing Proteotools

If you find Proteotools useful for your research, please find relevant information for citing it in the CITATION.cff file in this repository or below: ```text

This CITATION.cff file was generated with cffinit.

Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0 title: Proteotools message: >- If you use this software, please cite it using the metadata from this file. type: software authors: - given-names: Kevin family-names: Kovalchik email: kevin.kovalchik@gmail.com affiliation: 'CHU Sainte-Justine, Université de Montréal' orcid: 'https://orcid.org/0000-0002-2541-7721' ```

Owner

  • Name: Kevin Kovalchik
  • Login: kevinkovalchik
  • Kind: user
  • Location: Montreal, QC

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Proteotools
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Kevin
    family-names: Kovalchik
    email: kevin.kovalchik@gmail.com
    affiliation: 'CHU Sainte-Justine, Université de Montréal'
    orcid: 'https://orcid.org/0000-0002-2541-7721'

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 35
  • Total Committers: 2
  • Avg Commits per committer: 17.5
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
kevinkovalchik k****k@g****m 28
kevin k****j@g****m 7

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 21 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 9
  • Total maintainers: 1
pypi.org: proteotools

A simple Python package which lets you programmatically convert Thermo raw files using ThermoRawFileParser, run a few proteomics search engines (Comet, X! Tandem, MS-GF+), and use compiled Trans-Proteomic Pipeline (TPP) binaries without needing to compile the entire pipeline.

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 21 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 21.6%
Average: 24.6%
Downloads: 29.5%
Forks count: 29.8%
Stargazers count: 31.9%
Maintainers (1)
Last synced: 8 months ago

Dependencies

requirements.txt pypi
  • pyteomics.pepxmltk *
setup.py pypi
  • pyteomics.pepxmltk *