https://github.com/bartongroup/fragsys

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
✓
Academic publication links
Links to: nature.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.1%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: bartongroup
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 23.7 MB

Statistics

Stars: 5
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 2

Created about 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

FRAGSYS

This repository contains the fragment screeening analysis pipeline (FRAGSYS) used for the analysis of our manuscript Classification of likely functional class for ligand binding sites identified from fragment screening.

Our pipeline for the analysis of binding sites, FRAGSYS, can be executed from the jupyter notebook running_fragsys.ipynb. The input for this pipeline is a table containing a series of PDB codes and their respective UniProt accession identifiers.

Installation

For complete installation instructions refer here.

Pipeline methodology

Refer to run jupyter notebook running_fragsys.ipynb in order to run FRAGSYS. You can do so interactively in a notebook by running this command: main(main_dir, prot, panddas) using the appropriate environment: varalign_env.

Where main_dir is the directory where the output will be saved, prot is the query protein, and panddas is a pandas dataframe that has to contain at least two columns: entry_uniprot_accession, and pdb_id, for all protein structures in the data set.

For another example, check this other notebook where we ran FRAGSYS for the main protease (MPro) of SARS-CoV-2 (P0DTD1).

For each structural segment of each protein in panddas, FRAGSYS will: 1. Download biological assemblies from PDBe 2. Structurally superimpose structures using STAMP 3. Get accessibility and secondary structure elements from DSSP via ProIntVar 4. Mapping PDB residues to UniProt using SIFTS 5. Obtain protein-ligand interactions running Arpeggio 6. Cluster ligands into binding sites using OC 7. Generate visualisation scripts for UCSF Chimera 8. Generate multiple sequence alignment (MSA) with jackhmmer 9. Calculate Shenkin divergence score [1] 10. Calculate missense enrichment scores with VarAlign

The final output of the pipeline consists of multiple tables for each structural segment collating the results from the different steps of the analysis for each residue, and for the defined ligand binding sites. These data include relative solvent accessibility (RSA), angles, secondary structure, PDB/UniProt residue number, alignment column, column occupancy, divergence score, missense enrichment score, p-value, etc.

These tables are concatenated into master tables, with data for all 37 structual segments, which form the input for the analyses carried out in the analysis notebooks.

Refer to notebook 15 to predict RSA cluster labels for your binding sites of interest.

Dependencies

The pipeline, as well as the whole of the analysis are run in an interactive manner in a series of jupyter notebooks, found in the analysis folder.

Third party dependencies for these notebooks include: - Arpeggio (GNU GPL v3.0 License) - DSSP (Boost Software License) - Hmmer (BSD-3 Clause License) - OC - STAMP (GNU GPL v3.0 License) - ProIntVar (MIT License) - ProteoFAV (MIT License) - VarAlign (MIT License)

Other standard python libraries: - Biopython (BSD 3-Clause License) - Keras (Apache v2.0 License) - Matplotlib (PSF License) - Numpy (BSD 3-Clause License) - Pandas (BSD 3-Clause License) - Scipy (BSD 3-Clause License) - Seaborn (BSD 3-Clause License) - Scikit-learn (BSD 3-Clause License) - Tensorflow (Apache v2.0 License)

For more information on the dependencies, refere to the .yml files in the envs directory. To install all the dependencies, refer to the installation manual.

Files

Apart from the INSTALL, LICENSE and README files, there are 5 other files on this repository main directory. Two of these are python libraries, a configuration file and two notebooks. + fragsys_config.txt contains the default parameters to run FRAGSYS and it is read by fragsys.py. + fragsys.py contains all the function, lists and dictionaries needed to run the pipeline. + fragsys_main.py contains the main FRAGSYS function, where all functions in fragsys.py are called. This script represents the pipeline itself. + running_fragsys.ipynb is the notebook where the pipeline is executed in an interactive way. + running_fragsys_for_MPRO.ipynb.ipynb is the notebook where the pipeline is executed in an interactive way for a case study of SARS-CoV-2 MPro.

Directories

There are 6 directories in this repository.

`scripts`

This environment contains clean_pdb.py, a python script grabbed from here. This script will be used to pre-process the PDB files before running Arpeggio on them.

`envs`

The envs folder contains three .yml files describing the necessary packages and dependencies for the different parts of the pipeline and analysis. + arpeggio_env contains Arpeggio. + deeplearningenv contains the packages necessary to do the machine learning in notebooks 11, and 12. + main_env supports all analysis notebooks, with the exception of number 11, 12, in which the machine learning models are executed. + varalign_env is needed to run FRAGSYS.

Citation

If you use FRAGSYS, please cite:

Utgés, J.S. et al. Classification of likely functional class for ligand binding sites identified from fragment screening. Commun Biol 7, 320 (2024). https://doi.org/10.1038/s42003-024-05970-8

References

Shenkin PS, Erman B, Mastrandrea LD. Information-theoretical entropy as a measure of sequence variability. Proteins. 1991; 11(4):297–313. Epub 1991/01/01. https://doi.org/10.1002/prot.340110408 PMID: 1758884.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/bartongroup/fragsys

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

FRAGSYS

Installation

Pipeline methodology

Dependencies

Files

Directories

`scripts`

`envs`

`input`

`analysis`

`results`

`figs`

Citation

References

Owner

GitHub Events

Total

Last Year