pssmgen

Generates consistent PSSM and/or PDB files for protein-protein complexes

https://github.com/deeprank/pssmgen

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, zenodo.org
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Generates consistent PSSM and/or PDB files for protein-protein complexes

Basic Info
  • Host: GitHub
  • Owner: DeepRank
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 737 KB
Statistics
  • Stars: 20
  • Watchers: 2
  • Forks: 5
  • Open Issues: 0
  • Releases: 5
Created over 7 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation Zenodo

README.md

PSSMGen

| Fair-software.nl Recommendations | Badges | |:-|:-:| | 1. Code Repository | GitHub URL | |   | GitHub | | 2. License | License | | 3. Community Registry | Research Software Directory | |   | PyPI | | 4. Enable Citation | DOI | | 5. Code Quality Checklist | CII best practices | | Code Analysis | Codacy Badge


PSSMGen: Generates Consistent PSSM and/or PDB Files for Protein-Protein Complexes

Install

  1. Make sure BLAST is installed and its database is available on your machine. Otherwise, install BLAST and download its databases by following the BLAST guide. To calculate PSSM, the recommended database is the non-redundant protein sequences nr (i.e. nr.*.tar.gz files from the ftp site).
  2. Install the PSSMgen by pip install PSSMGen.

Requirements for file structures and names

PSSMGen is geared toward computing the pssm files for all models of a particular protein-protein complex.

File structures

This tool assumes your files have following structure:

workdir |_ pdb |_ fasta |_ pssm_raw |_ pssm |_ pdb_nonmatch

  • workdir is your working directory for one specific protein-protein complex.
  • pdb folder contains the PDB files (consistent PDB files)
  • fasta folder contains the protein sequence FASTA files. The code can generate the FASTA files by extracting sequences from the pdb file , or you can manually create this folder and put customised FASTA files there.
  • pssm_raw folder stores the PSSM files. The code can automatically generate them, or you can manually create this folder and put customised PSSM files there.
  • pssm folder stores consistent PSSM files, whose sequences are aligned with those of PDB files. This folder and its files are created automatically.
  • pdb_nonmatch folder stores the inconsistent PDB files, while the related consistent PDB files are in the pdb folder. This folder and its files are created automatically.

File names

The code assumes you follow the naming rules for different file types: - PDB files: caseID*.chainID.pdb - FASTA files: caseID.chainID.fasta - PSSM files: caseID.chainID.pssm, caseID*.chainID.pdb.pssm

Examples

Here are some examples for the complex 7CEI. The file structure and input files should look like 7CEI ├── pdb │   ├── 7CEI_1w.pdb │   ├── 7CEI_2w.pdb │   └── 7CEI_3w.pdb └── fasta ├── 7CEI.A.fasta └── 7CEI.B.fasta

Calculate PSSM with given FASTA files

```python from pssmgen import PSSM

initiate the PSSM object

gen = PSSM(work_dir='7CEI')

set psiblast executable, database and other psiblast parameters (here shows the defaults)

gen.configure(blastexe='/home/software/blast/bin/psiblast', database='/data/DBs/blastdbs/nrv20180204/nr', numthreads = 4, evalue=0.0001, compbasedstats='T', maxtargetseqs=2000, numiterations=3, outfmt=7, saveeachpssm=True, savepssmafterlast_round=True)

generates raw PSSM files by running BLAST with fasta files

gen.getpssm(fastadir='fasta', outdir='pssmraw', run=True, saveallpsiblast_output=True) ```

The code will automatically create pssm_raw folder to store the generated PSSM files.

Map PSSM files to PDB files to get consistent PSSM and PDB files

After getting the raw PSSMs from last example, we could map them to PDB files to get consistent PSSM and PDB files as following:

```python

map PSSM and PDB to get consisitent/mapped PSSM files

gen.mappssm(pssmdir='pssmraw', pdbdir='pdb', out_dir='pssm', chain=('A','B'))

write consistent/mapped PDB files and move inconsistent ones to another folder for backup

gen.getmappedpdb(pdbpssmdir='pssm', pdbdir='pdb', pdbnonmatchdir='pdbnonmatch') ```

The code will automatically create pssm and pdb_nonmatch folders and related files.

Extract FASTA files from PDB file

If the FASTA files are not provided, you can also generate them from the PDB file.

The file structure and input files should look like 7CEI └── pdb ├── 7CEI_1w.pdb ├── 7CEI_2w.pdb └── 7CEI_3w.pdb

```python

initiate the PSSM object

gen = PSSM('7CEI')

extract FASTA file from the reference pdb file.

if pdbref is not set, the code will randomly select one pdb as reference.

gen.getfasta(pdbdir='pdb', pdbref='7CEI1w.pdb', chain=('A','B'), outdir='fasta') `` The code will automatically createfastaandpssm_raw` folders for fasta files and raw pssm files, repsectively.

Use existing PSSM files to get consistent PSSM and PDB files

You can provide raw PSSM files intead of calculating them.

The file structure and input files should look like 7CEI ├── pdb │   ├── 7CEI_1w.pdb │   ├── 7CEI_2w.pdb │   └── 7CEI_3w.pdb └── pssm_raw ├── 7CEI.A.pssm └── 7CEI.B.pssm

```python from pssmgen import PSSM

initiate the PSSM object

gen = PSSM('7CEI')

map PSSM and PDB to get consisitent files

gen.map_pssm()

write consistent files and move

gen.getmappedpdb() ```

Owner

  • Name: DeepRank
  • Login: DeepRank
  • Kind: organization

Citation (CITATION.cff)

# YAML 1.2
---
abstract: "Generates consistent PSSM and/or PDB files for protein-protein complexes"

authors:
  -
    affiliation: "Netherlands eScience Center"
    family-names: Renaud
    given-names: Nicolas
    orcid: "https://orcid.org/0000-0001-9589-2694"
  -
    affiliation: "Netherlands eScience Center"
    family-names: Geng
    given-names: Cunliang
    orcid: "https://orcid.org/0000-0002-1409-8358"
cff-version: "1.1.0"

keywords:
  - pssm
  - pdb
  - "protein-protein complex"
  - docking
  - bioinformatics
  - CAPRI
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/DeepRank/PSSMGen"
title: PSSMGen
...

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 79
  • Total Committers: 5
  • Avg Commits per committer: 15.8
  • Development Distribution Score (DDS): 0.253
Top Committers
Name Email Commits
Cunliang Geng c****g@e****l 59
Nicolas Renaud n****d@t****l 15
Jurriaan H. Spaaks j****s@e****l 2
CunliangGeng c****g@u****l 2
dependabot-preview[bot] 2****]@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 14
  • Total pull requests: 9
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 6
  • Total pull request authors: 4
  • Average comments per issue: 1.93
  • Average comments per pull request: 0.67
  • Merged pull requests: 6
  • Bot issues: 1
  • Bot pull requests: 7
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • MohamedKhaled20-eng (4)
  • FarzanehParizi (3)
  • LilySnow (3)
  • CunliangGeng (2)
  • dependabot-preview[bot] (1)
  • SimonKitSangChu (1)
Pull Request Authors
  • github-actions[bot] (6)
  • CunliangGeng (1)
  • jspaaks (1)
  • dependabot-preview[bot] (1)
Top Labels
Issue Labels
stale (3) enhancement (1)
Pull Request Labels
autorelease: pending (3) autorelease: tagged (3) stale (1) dependencies (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 48 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 4
  • Total maintainers: 1
pypi.org: pssmgen

Generates consistent PSSM and/or PDB files for protein-protein complexes

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 48 Last month
Rankings
Dependent packages count: 10.1%
Dependent repos count: 11.6%
Forks count: 14.3%
Average: 14.9%
Stargazers count: 15.3%
Downloads: 23.5%
Maintainers (1)
Last synced: 7 months ago

Dependencies

setup.py pypi
  • biopython *
  • numpy *
  • pdb2sql *
  • scipy *