pssmgen
Generates consistent PSSM and/or PDB files for protein-protein complexes
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org -
✓Committers with academic emails
1 of 5 committers (20.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary
Repository
Generates consistent PSSM and/or PDB files for protein-protein complexes
Basic Info
Statistics
- Stars: 20
- Watchers: 2
- Forks: 5
- Open Issues: 0
- Releases: 5
Metadata Files
README.md
PSSMGen
| Fair-software.nl Recommendations | Badges |
|:-|:-:|
| 1. Code Repository | |
| |
|
| 2. License |
|
| 3. Community Registry |
|
| |
|
| 4. Enable Citation |
|
| 5. Code Quality Checklist |
|
| Code Analysis |
PSSMGen: Generates Consistent PSSM and/or PDB Files for Protein-Protein Complexes
Install
- Make sure BLAST is installed and its database is available on your machine. Otherwise, install BLAST and download its databases by following the BLAST guide. To calculate PSSM, the recommended database is the non-redundant protein sequences
nr(i.e.nr.*.tar.gzfiles from the ftp site). - Install the PSSMgen by
pip install PSSMGen.
Requirements for file structures and names
PSSMGen is geared toward computing the pssm files for all models of a particular protein-protein complex.
File structures
This tool assumes your files have following structure:
workdir
|_ pdb
|_ fasta
|_ pssm_raw
|_ pssm
|_ pdb_nonmatch
workdiris your working directory for one specific protein-protein complex.pdbfolder contains the PDB files (consistent PDB files)fastafolder contains the protein sequence FASTA files. The code can generate the FASTA files by extracting sequences from thepdbfile , or you can manually create this folder and put customised FASTA files there.pssm_rawfolder stores the PSSM files. The code can automatically generate them, or you can manually create this folder and put customised PSSM files there.pssmfolder stores consistent PSSM files, whose sequences are aligned with those of PDB files. This folder and its files are created automatically.pdb_nonmatchfolder stores the inconsistent PDB files, while the related consistent PDB files are in thepdbfolder. This folder and its files are created automatically.
File names
The code assumes you follow the naming rules for different file types: - PDB files: caseID*.chainID.pdb - FASTA files: caseID.chainID.fasta - PSSM files: caseID.chainID.pssm, caseID*.chainID.pdb.pssm
Examples
Here are some examples for the complex 7CEI.
The file structure and input files should look like
7CEI
├── pdb
│ ├── 7CEI_1w.pdb
│ ├── 7CEI_2w.pdb
│ └── 7CEI_3w.pdb
└── fasta
├── 7CEI.A.fasta
└── 7CEI.B.fasta
Calculate PSSM with given FASTA files
```python from pssmgen import PSSM
initiate the PSSM object
gen = PSSM(work_dir='7CEI')
set psiblast executable, database and other psiblast parameters (here shows the defaults)
gen.configure(blastexe='/home/software/blast/bin/psiblast', database='/data/DBs/blastdbs/nrv20180204/nr', numthreads = 4, evalue=0.0001, compbasedstats='T', maxtargetseqs=2000, numiterations=3, outfmt=7, saveeachpssm=True, savepssmafterlast_round=True)
generates raw PSSM files by running BLAST with fasta files
gen.getpssm(fastadir='fasta', outdir='pssmraw', run=True, saveallpsiblast_output=True) ```
The code will automatically create pssm_raw folder to store the generated PSSM files.
Map PSSM files to PDB files to get consistent PSSM and PDB files
After getting the raw PSSMs from last example, we could map them to PDB files to get consistent PSSM and PDB files as following:
```python
map PSSM and PDB to get consisitent/mapped PSSM files
gen.mappssm(pssmdir='pssmraw', pdbdir='pdb', out_dir='pssm', chain=('A','B'))
write consistent/mapped PDB files and move inconsistent ones to another folder for backup
gen.getmappedpdb(pdbpssmdir='pssm', pdbdir='pdb', pdbnonmatchdir='pdbnonmatch') ```
The code will automatically create pssm and pdb_nonmatch folders and related files.
Extract FASTA files from PDB file
If the FASTA files are not provided, you can also generate them from the PDB file.
The file structure and input files should look like
7CEI
└── pdb
├── 7CEI_1w.pdb
├── 7CEI_2w.pdb
└── 7CEI_3w.pdb
```python
initiate the PSSM object
gen = PSSM('7CEI')
extract FASTA file from the reference pdb file.
if pdbref is not set, the code will randomly select one pdb as reference.
gen.getfasta(pdbdir='pdb', pdbref='7CEI1w.pdb', chain=('A','B'), outdir='fasta')
``
The code will automatically createfastaandpssm_raw` folders for fasta files and raw pssm files, repsectively.
Use existing PSSM files to get consistent PSSM and PDB files
You can provide raw PSSM files intead of calculating them.
The file structure and input files should look like
7CEI
├── pdb
│ ├── 7CEI_1w.pdb
│ ├── 7CEI_2w.pdb
│ └── 7CEI_3w.pdb
└── pssm_raw
├── 7CEI.A.pssm
└── 7CEI.B.pssm
```python from pssmgen import PSSM
initiate the PSSM object
gen = PSSM('7CEI')
map PSSM and PDB to get consisitent files
gen.map_pssm()
write consistent files and move
gen.getmappedpdb() ```
Owner
- Name: DeepRank
- Login: DeepRank
- Kind: organization
- Repositories: 19
- Profile: https://github.com/DeepRank
Citation (CITATION.cff)
# YAML 1.2
---
abstract: "Generates consistent PSSM and/or PDB files for protein-protein complexes"
authors:
-
affiliation: "Netherlands eScience Center"
family-names: Renaud
given-names: Nicolas
orcid: "https://orcid.org/0000-0001-9589-2694"
-
affiliation: "Netherlands eScience Center"
family-names: Geng
given-names: Cunliang
orcid: "https://orcid.org/0000-0002-1409-8358"
cff-version: "1.1.0"
keywords:
- pssm
- pdb
- "protein-protein complex"
- docking
- bioinformatics
- CAPRI
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/DeepRank/PSSMGen"
title: PSSMGen
...
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 79
- Total Committers: 5
- Avg Commits per committer: 15.8
- Development Distribution Score (DDS): 0.253
Top Committers
| Name | Commits | |
|---|---|---|
| Cunliang Geng | c****g@e****l | 59 |
| Nicolas Renaud | n****d@t****l | 15 |
| Jurriaan H. Spaaks | j****s@e****l | 2 |
| CunliangGeng | c****g@u****l | 2 |
| dependabot-preview[bot] | 2****]@u****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 14
- Total pull requests: 9
- Average time to close issues: 4 months
- Average time to close pull requests: about 2 months
- Total issue authors: 6
- Total pull request authors: 4
- Average comments per issue: 1.93
- Average comments per pull request: 0.67
- Merged pull requests: 6
- Bot issues: 1
- Bot pull requests: 7
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- MohamedKhaled20-eng (4)
- FarzanehParizi (3)
- LilySnow (3)
- CunliangGeng (2)
- dependabot-preview[bot] (1)
- SimonKitSangChu (1)
Pull Request Authors
- github-actions[bot] (6)
- CunliangGeng (1)
- jspaaks (1)
- dependabot-preview[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 48 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 4
- Total maintainers: 1
pypi.org: pssmgen
Generates consistent PSSM and/or PDB files for protein-protein complexes
- Homepage: https://github.com/DeepRank/PSSMGen
- Documentation: https://pssmgen.readthedocs.io/
- License: Apache Software License 2.0
-
Latest release: 0.1.2
published about 5 years ago
Rankings
Maintainers (1)
Dependencies
- biopython *
- numpy *
- pdb2sql *
- scipy *