fingernat-ml

Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands.

https://github.com/filipspl/fingernat-ml

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.1%) to scientific vocabulary

Keywords

benchmark rna rna-ligand virtual-screening
Last synced: 6 months ago · JSON representation

Repository

Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands.

Basic Info
  • Host: GitHub
  • Owner: filipsPL
  • License: cc0-1.0
  • Default Branch: main
  • Homepage:
  • Size: 66.4 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 3
Topics
benchmark rna rna-ligand virtual-screening
Created over 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation Zenodo

README.md

Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA <!-- omit in toc -->

Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands - Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA, available here.

Check Markdown links <!-- markdown-link-check-disable-line --> cffconvert DOI <!-- markdown-link-check-disable-line -->

Repository content

All data reside in the data directory.

Quick start

CSV files with fingerprints and the binding/activity class (0 = non-binder, 1=binder) are stored in data/4-SIFts%2Bactivity-csv.

0-activity_datasets

Datasets with active and inactive ligands for five RNA targets. Each sub dir contains a csv file with the structural and activity data, eg.:

| molecule structure | name | activity | type | | :---------------------------------- | :----------------------- | :------- | :---- | | c1[nH]c2c(n1)c(ncn2)N | adenine | 1 | real | | c1[nH]c2c(n1)c(nc(n2)N)N | 2,6-diaminopurine | 1 | real | | c1c([nH+]c(nc1N)N)N | 2,4,6-triaminopyrimidine | 1 | real | | c1c2c([nH]cn2)nc(n1)N | 2-aminopurine | 1 | real | | c1[nH]c2c(=O)[nH]c(nc2n1)N | guanine | 1 | real | | c1c2c([nH]cn2)ncn1 | purine | 1 | real | | CCc1nc(c(o1)NN)C#N | P12618448 | 0 | decoy | | C[C@@H]1[C@H]([C@H]([C@@H](O1)O)O)O | P20855218 | 0 | decoy | | C[C@@H](C1CC1)N(C(=O)N)O | P21190230 | 0 | decoy | | Cc1c(c(c(o1)C)C(=O)[O-])C[NH3+] | P23843064 | 0 | decoy |

Columns contain: - molecule structure: SMILES encoded structure - name - activity: - 1 - active - 0 - not active - type: - real - taken from the literature and the activity was tested experimentally - decoy - putative inactive molecule were generated with DUD-E web server

1-rna_targets

RNA pdb files used for modelling, as fetched from the pdb database. Eg:

adenine 1Y26 1Y26.pdb 4TZX 4TZX.pdb 4XNR 4XNR.pdb

1-rna_targets-dockprep

RNA files (pdb and mol2), cleaned and prepared for docking with dockprep. Eg:

adenine 1Y26 rna.mol2 rna.pdb 4TZX rna.mol2 rna.pdb 4XNR rna.mol2 rna.pdb

2-docking_poses

Three best poses from molecular docking, for each RNA structure, saved as sdf files. Eg:

adenine 1Y26 best_3.sdf 4TZX best_3.sdf 4XNR best_3.sdf

3-rescoring

Scores of docked poses rescored with various functions, calculated for each of the RNA targets. Combined scores are saved in _all_scores.csv. Eg:

adenine 1Y26 _all_scores.csv annapurna.merged.csv ligandrna-scores-basic.csv ligandrna-scores-modern.csv rdock-docksolv.score rfscore-vs.score

4-SIFts+activity

Structural Interaction Fingerprints (SIFts) merged with the activity data. For each target, SIFts are calculated in three resolutions: - full (high-resolution, the one used in the manuscript) - pbs (medium-resolution, contacts only) - simple (low-resolution, contacts only)

For each resolution SIFts are calculated for: - the single best pose (1-pose sub directory) - three top-scored best poses (3-pose sub directory; used in the manuscript)

In each subdirectory there are: - SIFts calculated for the individual structures (eg. 1Y26.csv.gz) - horizontally joined SIFts (joined subdirectory; as used for building ML models)

Each joined SIFts is available in six variants: - zero-only containing columns: - kept (withZeros) - removed (when in the column there are only 0 the column was removed; noZeros) - interactions (see also the table above): - allInteractions - all non-covalent interactions detected - basicInteractions - basic set of interactions present - basicInteractionsNoLipo - as above, but without lipophilic interactions

| Subset of interactions | Interactions detected | | :----------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Basic: basicInteractions | (i) hydrogen bonds, (ii) halogen bonds, (iii) cation-anion interactions, (iv) Pi-cation interactions, (v) Pi-anion interactions, (vi) Pi-stacking interactions, (vii) metal cation-mediated: magnesium, potassium, sodium, and other metal cation-mediated, (viii) water-mediated interactions, and (ix) lipophilic interactions. | | Basic + Extended: allInteractions | All interactions in the Basic subset, and: (x) any interaction (any contact between nucleic acid and ligand), (xi) polar interactions, i.e., hydrogen bonds without angle restraints, (xii) weak polar interactions, i.e., weak hydrogen bonds without angle restraints, (xiii) weak hydrogen bonds without angle restraints (xiv) n* interactions, and (xv) halogen multipolar interactions. | | Basic - {lipo}: basicInteractionsNoLipo | Interactions in the Basic subset without lipophilic interactions |

adenine full 1-pose 1Y26.csv.gz 4TZX.csv.gz 4XNR.csv.gz joined withZeros_allInteractions.csv.gz withZeros_basicInteractions.csv.gz withZeros_basicInteractionsNoLipo.csv.gz noZeros_allInteractions.csv.gz noZeros_basicInteractions.csv.gz noZeros_basicInteractionsNoLipo.csv.gz 3-pose 1Y26.csv.gz 4TZX.csv.gz 4XNR.csv.gz joined noZeros_allInteractions.csv.gz noZeros_basicInteractions.csv.gz noZeros_basicInteractionsNoLipo.csv.gz withZeros_allInteractions.csv.gz withZeros_basicInteractions.csv.gz withZeros_basicInteractionsNoLipo.csv.gz

4-SIFts+activity-arff

Data formatted for Weka (average for 3 best poses, 3 structures, with zeros, all interactions).

4-SIFts+activity-csv

Auxiliary csv files (average for 3 best poses, 3 structures, with zeros, all interactions).

5-results

Raw and compiled ML results. All results are combined in _collected_ml_results.csv.

6-HIV-structures

Data for SIFts composed from seven RNA structures, to investigate the influence of number of structures on the ML accuracy. Raw data, results, and figures.

fig

Feedback, issues, and questions

We welcome any feedback, please send an email to Filip Stefaniak or Natalia Szulc

How to cite

If you used the datasets from this repository, please cite:

Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA Natalia A. Szulc, Zuzanna Mackiewicz, Janusz M. Bujnicki, Filip Stefaniak

bioRxiv

doi: 10.1101/2023.01.11.523582

Funding

This research was funded in part by the National Science Centre in Poland (grant number 2020/39/B/NZ2/03127 to Filip Stefaniak).

Owner

  • Name: filips
  • Login: filipsPL
  • Kind: user
  • Location: Warsaw, Poland
  • Company: @thervira @genesilico

- computer aided drug design + medicinal chemistry - python programming, web devel - ML - QSAR, tox prediction - :swimmer: :bicyclist: :runner:

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 0
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • n-szulc (2)
  • filipsPL (1)
  • imgbot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/action-links.yml actions
  • actions/checkout master composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/cffconvert.yml actions
  • actions/checkout v2 composite
  • citation-file-format/cffconvert-github-action 2.0.0 composite