protein-detective
Python package to detect proteins in EM density maps.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Python package to detect proteins in EM density maps.
Basic Info
- Host: GitHub
- Owner: haddocking
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://www.bonvinlab.org/protein-detective/
- Size: 25.3 MB
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 10
- Releases: 4
Metadata Files
README.md
protein-detective
Python package to detect proteins in EM density maps.
It uses
- Uniprot Sparql endpoint to search for proteins and their measured or predicted 3D structures.
- powerfit to fit protein structure in a Electron Microscopy (EM) density map.
An example workflow:
mermaid
graph LR;
search{Search UniprotKB} --> |uniprot_accessions|fetchpdbe{Retrieve PDBe}
search{Search UniprotKB} --> |uniprot_accessions|fetchad{Retrieve AlphaFold}
fetchpdbe -->|mmcif_files| residuefilter{Filter on nr residues + write chain A}
fetchad -->|pdb_files| densityfilter{Filter out low confidence}
residuefilter -->|pdb_files| powerfit
densityfilter -->|pdb_files| powerfit
powerfit -->|*/solutions.out| solutions{Best scoring solutions}
solutions -->|dataframe| fitmodels{Fit models}
Install
shell
pip install protein-detective
Or to use the latest development version:
pip install git+https://github.com/haddocking/protein-detective.git
Usage
The main entry point is the protein-detective command line tool which has multiple subcommands to perform actions.
To use programmaticly, see the notebooks and API documentation.
Search Uniprot for structures
shell
protein-detective search \
--taxon-id 9606 \
--reviewed \
--subcellular-location-uniprot nucleus \
--subcellular-location-go GO:0005634 \
--molecular-function-go GO:0003677 \
--limit 100 \
./mysession
(GO:0005634 is "Nucleus" and GO:0003677 is "DNA binding")
In ./mysession directory, you will find session.db file, which is a DuckDB database with search results.
To retrieve a bunch of structures
shell
protein-detective retrieve ./mysession
In ./mysession directory, you will find mmCIF files from PDBe and PDB files and AlphaFold DB.
To filter AlphaFold structures on confidence
Filter AlphaFoldDB structures based on confidence. Keeps entries with requested number of residues which have a confidence score above the threshold. Also writes pdb files with only those residues.
shell
protein-detective confidence-filter \
--confidence-threshold 50 \
--min-residues 100 \
--max-residues 1000 \
./mysession
To prune PDBe files
Make PDBe files smaller by only keeping first chain of found uniprot entry and renaming to chain A.
shell
protein-detective prune-pdbs \
--min-residues 100 \
--max-residues 1000 \
./mysession
Powerfit
Rotate and translate the prepared structures to fit and score them into the EM density map using powerfit.
shell
protein-detective powerfit run ../powerfit-tutorial/ribosome-KsgA.map 13 docs/session1
This will use dask-distributed to run powerfit for each structure in parallel on multiple CPU cores or GPUs.
Run powerfits on Slurm
You can use [dask-jobqueue](https://jobqueue.dask.org/en/latest/) to run the powerfits on a Slurm deployment on multiple machines on a shared filesystem. In one terminal start the Dask cluster with ```shell pip install dask-jobqueue python3 ``` ```python from dask_jobqueue import SLURMCluster cluster = SLURMCluster(cores=8, processes=4, memory="16GB", queue="normal") print(cluster.scheduler_address) # Prints something like: 'tcp://192.168.1.1:34059' # Keep this Python process running until powerfits are done ``` In second terminal, run the powerfits on Dask cluster with ```shell protein-detective powerfit run ../powerfit-tutorial/ribosome-KsgA.map 13 docs/session1 --scheduler-address tcp://192.168.1.1:34059 ```How to run efficiently
Powerfit is quickest on GPU, but can also run on CPU. To run powerfits on a GPU you can use the `--gpuAlternativly run powerfit yourself
You can use the `protein-detective powerfit commands` to print the commands. The commands can then be run in whatever way you prefer, like sequentially, with [GNU parallel](https://www.gnu.org/software/parallel/), or as a [Slurm array job](https://slurm.schedmd.com/job_array.html). For example to run with parallel and 4 slots: ```shell protein-detective powerfit commands ../powerfit-tutorial/ribosome-KsgA.map 13 docs/session1 > commands.txt parallel --jobs 4 < commands.txt ```To print top 10 solutions to the terminal, you can use:
shell
protein-detective powerfit report docs/session1
Outputs something like:
powerfit_run_id,structure,rank,cc,fishz,relz,translation,rotation,pdb_id,pdb_file,uniprot_acc
10,A8MT69_pdb4e45.ent_B2A,1,0.432,0.463,10.091,227.18:242.53:211.83,0.0:1.0:1.0:0.0:0.0:1.0:1.0:0.0:0.0,4E45,docs/session1/single_chain/A8MT69_pdb4e45.ent_B2A.pdb,A8MT69
10,A8MT69_pdb4ne5.ent_B2A,1,0.423,0.452,10.053,227.18:242.53:214.9,0.0:-0.0:-0.0:-0.604:0.797:0.0:0.797:0.604:0.0,4NE5,docs/session1/single_chain/A8MT69_pdb4ne5.ent_B2A.pdb,A8MT69
...
To generate model PDB files rotated/translated to PowerFit solutions, you can use:
shell
protein-detective powerfit fit-models docs/session1
Contributing
For development information and contribution guidelines, please see CONTRIBUTING.md.
Owner
- Name: HADDOCK
- Login: haddocking
- Kind: organization
- Location: Utrecht, The Netherlands
- Website: http://bonvinlab.org
- Repositories: 55
- Profile: https://github.com/haddocking
Computational Structural Biology Group @ Utrecht University
Citation (CITATION.cff)
# YAML 1.2
---
cff-version: 1.2.0
title: protein detective
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- affiliation: "Netherlands eScience Center"
family-names: Verhoeven
given-names: Stefan
orcid: "https://orcid.org/0000-0002-5821-2060"
- given-names: Anna
family-names: Engel
affiliation: '@UtrechtUniversity'
orcid: "https://orcid.org/0009-0002-1806-7951"
- given-names: Alexandre
family-names: Bonvin
affiliation: '@UtrechtUniversity'
orcid: "https://orcid.org/0000-0001-7369-1322"
repository-code: https://github.com/haddocking/protein-detective
identifiers:
- description: Latest version of software
type: doi
value: 10.5281/zenodo.15632658
GitHub Events
Total
- Create event: 17
- Release event: 1
- Issues event: 23
- Delete event: 15
- Member event: 2
- Issue comment event: 20
- Push event: 74
- Pull request review comment event: 3
- Pull request review event: 3
- Pull request event: 29
Last Year
- Create event: 17
- Release event: 1
- Issues event: 23
- Delete event: 15
- Member event: 2
- Issue comment event: 20
- Push event: 74
- Pull request review comment event: 3
- Pull request review event: 3
- Pull request event: 29
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 19
- Total pull requests: 23
- Average time to close issues: 8 days
- Average time to close pull requests: 6 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 1.16
- Average comments per pull request: 0.09
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 19
- Pull requests: 23
- Average time to close issues: 8 days
- Average time to close pull requests: 6 days
- Issue authors: 2
- Pull request authors: 1
- Average comments per issue: 1.16
- Average comments per pull request: 0.09
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sverhoeven (18)
- rvhonorato (1)
Pull Request Authors
- sverhoeven (21)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 20 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 4
- Total maintainers: 1
pypi.org: protein-detective
Deduce the protein from a EM density
- Homepage: https://github.com/haddocking/protein-detective
- Documentation: https://www.bonvinlab.org/protein-detective/
- License: apache-2.0
-
Latest release: 0.3.1
published 6 months ago