fingernat-ml
Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.1%) to scientific vocabulary
Keywords
Repository
Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands.
Basic Info
Statistics
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 3
Topics
Metadata Files
README.md
Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA <!-- omit in toc -->
Data accompanying the manuscript on SIFts- and ML-based methods in Virtual Screening for RNA binding ligands - Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA, available here.
<!-- markdown-link-check-disable-line -->
<!-- markdown-link-check-disable-line -->
Repository content
All data reside in the data directory.
Quick start
CSV files with fingerprints and the binding/activity class (0 = non-binder, 1=binder) are stored in data/4-SIFts%2Bactivity-csv.
0-activity_datasets
Datasets with active and inactive ligands for five RNA targets. Each sub dir contains a csv file with the structural and activity data, eg.:
| molecule structure | name | activity | type |
| :---------------------------------- | :----------------------- | :------- | :---- |
| c1[nH]c2c(n1)c(ncn2)N | adenine | 1 | real |
| c1[nH]c2c(n1)c(nc(n2)N)N | 2,6-diaminopurine | 1 | real |
| c1c([nH+]c(nc1N)N)N | 2,4,6-triaminopyrimidine | 1 | real |
| c1c2c([nH]cn2)nc(n1)N | 2-aminopurine | 1 | real |
| c1[nH]c2c(=O)[nH]c(nc2n1)N | guanine | 1 | real |
| c1c2c([nH]cn2)ncn1 | purine | 1 | real |
| CCc1nc(c(o1)NN)C#N | P12618448 | 0 | decoy |
| C[C@@H]1[C@H]([C@H]([C@@H](O1)O)O)O | P20855218 | 0 | decoy |
| C[C@@H](C1CC1)N(C(=O)N)O | P21190230 | 0 | decoy |
| Cc1c(c(c(o1)C)C(=O)[O-])C[NH3+] | P23843064 | 0 | decoy |
Columns contain:
- molecule structure: SMILES encoded structure
- name
- activity:
- 1 - active
- 0 - not active
- type:
- real - taken from the literature and the activity was tested experimentally
- decoy - putative inactive molecule were generated with DUD-E web server
1-rna_targets
RNA pdb files used for modelling, as fetched from the pdb database. Eg:
adenine
1Y26
1Y26.pdb
4TZX
4TZX.pdb
4XNR
4XNR.pdb
1-rna_targets-dockprep
RNA files (pdb and mol2), cleaned and prepared for docking with dockprep. Eg:
adenine
1Y26
rna.mol2
rna.pdb
4TZX
rna.mol2
rna.pdb
4XNR
rna.mol2
rna.pdb
2-docking_poses
Three best poses from molecular docking, for each RNA structure, saved as sdf files. Eg:
adenine
1Y26
best_3.sdf
4TZX
best_3.sdf
4XNR
best_3.sdf
3-rescoring
Scores of docked poses rescored with various functions, calculated for each of the RNA targets. Combined scores are saved in _all_scores.csv. Eg:
adenine
1Y26
_all_scores.csv
annapurna.merged.csv
ligandrna-scores-basic.csv
ligandrna-scores-modern.csv
rdock-docksolv.score
rfscore-vs.score
4-SIFts+activity
Structural Interaction Fingerprints (SIFts) merged with the activity data. For each target, SIFts are calculated in three resolutions: - full (high-resolution, the one used in the manuscript) - pbs (medium-resolution, contacts only) - simple (low-resolution, contacts only)
For each resolution SIFts are calculated for:
- the single best pose (1-pose sub directory)
- three top-scored best poses (3-pose sub directory; used in the manuscript)
In each subdirectory there are:
- SIFts calculated for the individual structures (eg. 1Y26.csv.gz)
- horizontally joined SIFts (joined subdirectory; as used for building ML models)
Each joined SIFts is available in six variants:
- zero-only containing columns:
- kept (withZeros)
- removed (when in the column there are only 0 the column was removed; noZeros)
- interactions (see also the table above):
- allInteractions - all non-covalent interactions detected
- basicInteractions - basic set of interactions present
- basicInteractionsNoLipo - as above, but without lipophilic interactions
| Subset of interactions | Interactions detected |
| :----------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Basic: basicInteractions | (i) hydrogen bonds, (ii) halogen bonds, (iii) cation-anion interactions, (iv) Pi-cation interactions, (v) Pi-anion interactions, (vi) Pi-stacking interactions, (vii) metal cation-mediated: magnesium, potassium, sodium, and other metal cation-mediated, (viii) water-mediated interactions, and (ix) lipophilic interactions. |
| Basic + Extended: allInteractions | All interactions in the Basic subset, and: (x) any interaction (any contact between nucleic acid and ligand), (xi) polar interactions, i.e., hydrogen bonds without angle restraints, (xii) weak polar interactions, i.e., weak hydrogen bonds without angle restraints, (xiii) weak hydrogen bonds without angle restraints (xiv) n* interactions, and (xv) halogen multipolar interactions. |
| Basic - {lipo}: basicInteractionsNoLipo | Interactions in the Basic subset without lipophilic interactions |
adenine
full
1-pose
1Y26.csv.gz
4TZX.csv.gz
4XNR.csv.gz
joined
withZeros_allInteractions.csv.gz
withZeros_basicInteractions.csv.gz
withZeros_basicInteractionsNoLipo.csv.gz
noZeros_allInteractions.csv.gz
noZeros_basicInteractions.csv.gz
noZeros_basicInteractionsNoLipo.csv.gz
3-pose
1Y26.csv.gz
4TZX.csv.gz
4XNR.csv.gz
joined
noZeros_allInteractions.csv.gz
noZeros_basicInteractions.csv.gz
noZeros_basicInteractionsNoLipo.csv.gz
withZeros_allInteractions.csv.gz
withZeros_basicInteractions.csv.gz
withZeros_basicInteractionsNoLipo.csv.gz
4-SIFts+activity-arff
Data formatted for Weka (average for 3 best poses, 3 structures, with zeros, all interactions).
4-SIFts+activity-csv
Auxiliary csv files (average for 3 best poses, 3 structures, with zeros, all interactions).
5-results
Raw and compiled ML results. All results are combined in _collected_ml_results.csv.
6-HIV-structures
Data for SIFts composed from seven RNA structures, to investigate the influence of number of structures on the ML accuracy. Raw data, results, and figures.

Feedback, issues, and questions
We welcome any feedback, please send an email to Filip Stefaniak or Natalia Szulc
How to cite
If you used the datasets from this repository, please cite:
Structural Interaction Fingerprints and Machine Learning for predicting and explaining binding of small molecule ligands to RNA Natalia A. Szulc, Zuzanna Mackiewicz, Janusz M. Bujnicki, Filip Stefaniak
bioRxiv
doi: 10.1101/2023.01.11.523582
Funding
This research was funded in part by the National Science Centre in Poland (grant number 2020/39/B/NZ2/03127 to Filip Stefaniak).
Owner
- Name: filips
- Login: filipsPL
- Kind: user
- Location: Warsaw, Poland
- Company: @thervira @genesilico
- Website: https://filipspl.github.io/
- Repositories: 10
- Profile: https://github.com/filipsPL
- computer aided drug design + medicinal chemistry - python programming, web devel - ML - QSAR, tox prediction - :swimmer: :bicyclist: :runner:
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 0
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- n-szulc (2)
- filipsPL (1)
- imgbot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout master composite
- gaurav-nelson/github-action-markdown-link-check v1 composite
- actions/checkout v2 composite
- citation-file-format/cffconvert-github-action 2.0.0 composite