https://github.com/bigbio/py-pgatk

Python tools for proteogenomics analysis toolkit

https://github.com/bigbio/py-pgatk

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary

Keywords

ensembl mass-spectrometry proteogenomics proteogenomics-analysis-toolkit proteomics python vcf

Keywords from Contributors

labels
Last synced: 5 months ago · JSON representation

Repository

Python tools for proteogenomics analysis toolkit

Basic Info
  • Host: GitHub
  • Owner: bigbio
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Size: 125 MB
Statistics
  • Stars: 13
  • Watchers: 5
  • Forks: 12
  • Open Issues: 9
  • Releases: 19
Topics
ensembl mass-spectrometry proteogenomics proteogenomics-analysis-toolkit proteomics python vcf
Created almost 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog License

README.md

ProteoGenomics Analysis Toolkit

Python application install with bioconda Codacy Badge PyPI version PyPI - Downloads

pypgatk is a Python library - part of the ProteoGenomics Analysis Toolkit. It provides different bioinformatics tools for proteogenomics data analysis.

Requirements:

The package requirements vary depending on the way that you want to install it (you need one of the following):

  • pip: if installation goes through pip, you will require Python3 and pip3 installed.
  • Bioconda: if installation goes through Bioconda, you will require that conda is installed and configured to use bioconda channels.
  • Docker container: to use pypgatk from its docker container you will need Docker installed.
  • Source code: to use and install from the source code directly, you will need to have git, Python3 and pip.

Installation

pip

You can install pypgatk with pip:

pip install pypgatk

Bioconda

You can install pypgatk with bioconda (please setup conda and the bioconda channel if you haven't first, as explained here):

conda install pypgatk

Available as a container

You can use the pypgatk tool already setup on a Docker container. You need to choose from the available tags here and replace it in the call below where it says <tag>.

docker pull quay.io/biocontainers/pypgatk:<tag>

NOTE: Please note that Biocontainers containers do not have a latest tag, as such a docker pull/run without defining the tag will fail. For instance, a valid call would be (for version 0.0.2):

docker run -it quay.io/biocontainers/pypgatk:0.0.2--py_0

Inside the container, you can either use the Python interactive shell or the command line version (see below).

Use latest source code

Alternatively, for the latest version, clone this repo and go into its directory, then execute pip3 install . :

``` git clone https://github.com/bigbio/py-pgatk cd py-pgatk

you might want to create a virtualenv for pypgatk before installing

pip3 install . ```

Usage

The pypgatk design combines multiple modules and tools into one framework. All the possible commands are accessible using the commandline tool pypgatk_cli.py.

The library provides multiple commands to download, translate and generate protein sequence databases from reference and mutation genome databases.

``` $: pypgatk_cli -h

Usage: pypgatk [OPTIONS] COMMAND [ARGS]...

This is the main tool that give access to all commands and options provided by the pypgatk

Options: --version Show the version and exit. -h, --help Show this message and exit.

Commands: cbioportal-downloader Command to download the the cbioportal studies cbioportal-to-proteindb Command to translate cbioportal mutation data into proteindb cosmic-downloader Command to download the cosmic mutation database cosmic-to-proteindb Command to translate Cosmic mutation data into proteindb dnaseq-to-proteindb Generate peptides based on DNA sequences ensembl-check Command to check ensembl database for stop codons, gaps ensembl-downloader Command to download the ensembl information generate-decoy Create decoy protein sequences using multiple methods DecoyPYrat, Reverse/Shuffled Proteins. generate-deeplc Generate input for deepLC tool from idXML,mzTab or consensusXML msrescore-configuration Command to generate the msrescore configuration file from idXML peptide-class-fdr Command to compute the Peptide class FDR threeframe-translation Command to perform 3'frame translation vcf-to-proteindb Generate peptides based on DNA variants VCF files

```

Full Documentation

https://pgatk.readthedocs.io/en/latest/pypgatk.html

Cite as

Husen M Umer, Enrique Audain, Yafeng Zhu, Julianus Pfeuffer, Timo Sachsenberg, Janne Lehtiö, Rui M Branca, Yasset Perez-Riverol Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides Bioinformatics, Volume 38, Issue 5, 1 March 2022, Pages 1470–1472 https://doi.org/10.1093/bioinformatics/btab838

Owner

  • Name: BigBio Stack
  • Login: bigbio
  • Kind: organization
  • Email: proteomicsstack@gmail.com
  • Location: Cambridge, UK

Provide big data solutions Bioinformatics

GitHub Events

Total
  • Watch event: 3
  • Fork event: 1
Last Year
  • Watch event: 3
  • Fork event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 460
  • Total Committers: 7
  • Avg Commits per committer: 65.714
  • Development Distribution Score (DDS): 0.293
Top Committers
Name Email Commits
ypriverol y****l@g****m 325
husensofteng h****g@g****m 117
Yafeng Zhu y****u@Y****l 10
Husen M. Umer h****g@u****m 4
enriquea e****n@g****m 2
dependabot[bot] 4****]@u****m 1
Husen M. Umer u****m@k****l 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 44
  • Total pull requests: 40
  • Average time to close issues: 7 months
  • Average time to close pull requests: 16 days
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 2.32
  • Average comments per pull request: 0.6
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ypriverol (21)
  • husensofteng (17)
  • enriquea (2)
  • mpage21 (1)
  • GeorgesBed (1)
  • AlirezaShokrollahi (1)
  • OmerRonen (1)
Pull Request Authors
  • ypriverol (26)
  • husensofteng (9)
  • DongdongdongW (6)
  • dependabot[bot] (6)
  • enriquea (1)
Top Labels
Issue Labels
enhancement (17) bug (5) help wanted (3)
Pull Request Labels
dependencies (6) bug (1) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 104 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 21
  • Total maintainers: 1
pypi.org: pypgatk

Python tools for proteogenomics

  • Versions: 21
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 104 Last month
  • Docker Downloads: 0
Rankings
Docker downloads count: 1.7%
Dependent packages count: 10.1%
Forks count: 10.5%
Average: 14.4%
Stargazers count: 17.7%
Dependent repos count: 21.6%
Downloads: 25.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/pythonapp.yml actions
  • EndBug/add-and-commit v7 composite
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • conda-incubator/setup-miniconda v2.1.1 composite
  • heinrichreimer/github-changelog-generator-action v2.2 composite
.github/workflows/pythonpackage.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v1 composite
  • conda-incubator/setup-miniconda v2.1.1 composite
.github/workflows/pythonpublish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
Dockerfile docker
  • biocontainers/biocontainers debian-stretch-backports build
requirements.txt pypi
  • Click ==7.0
  • PyYAML ==5.1.2
  • biopython ==1.73
  • gffutils ==0.10.1
  • numpy *
  • pandas *
  • pybedtools *
  • pyopenms *
  • pyteomics ==4.4.2
  • ratelimit ==2.2.1
  • requests ==2.21.0
  • simplejson ==3.16.0
setup.py pypi
  • Click ==7.0
  • PyYAML ==5.1.2
  • biopython ==1.73
  • gffutils ==0.10.1
  • numpy *
  • pandas *
  • pybedtools *
  • pyopenms *
  • pyteomics ==4.4.2
  • ratelimit ==2.2.1
  • requests ==2.21.0
  • simplejson ==3.16.0