rhea-fingerprints
Automatically generate differential reaction fingerprints on reactions in Rhea
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: rsc.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
Automatically generate differential reaction fingerprints on reactions in Rhea
Basic Info
Statistics
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction
This repository generates differential reaction fingerprints for reactions in Rhea.
🚀 Usage
The SMILES dataframe and DRFP-derived fingerprint dataframe can be loaded from GitHub with:
```python import pandas as pd
baseurl = "https://github.com/cthoyt/rhea-fingerprints/raw/main/docs" smilesurl = f"{baseurl}/127/reactionsmiles.tsv" smilesdf = pd.readcsv(smiles_url, sep="\t")
fingerprinturl = f"{baseurl}/127/reactionfingerprints.tsv.gz" fingerprintdf = pd.readcsv(fingerprinturl, sep="\t", index_col=0) ```
Here's a 2D PCA scatterplot of the embeddings:

Analysis
This repository also generates reusable models for predicting enzyme codes based on DRFPs, trained using Rhea. It uses simple classifiers and performs really well.

You can re-use existing models in combination with drfp like:
```python import pystow from drfp import DrfpEncoder
baseurl = "https://github.com/cthoyt/rhea-fingerprints/raw/main/docs" url = f"{baseurl}/127/models/LogisticRegression.pkl" clf = pystow.ensure_pickle("bio", "rhea", "models", "127", url=url)
rxnsmiles = [ "CO.O[C@@H]1CCNC1.[C-]#[N+]CC(=O)OC>>[C-]#[N+]CC(=O)N1CCC@@HC1", "CCOC(=O)C(CC)c1cccnc1.Cl.O>>CCC(C(=O)O)c1cccnc1", ] fps = DrfpEncoder.encode(rxnsmiles)
predictions = clf.predict(fps) ```
Warning There might be some issues with reloading model weights, please let me know if this comes up.
♻️ Update
Installation of the requirements and running of the build script are handled with tox. The current
version of Rhea is looked up with bioversions so the
provenance of the data can be properly traced. Run with:
shell
$ pip install tox
$ tox
Additionally, a GitHub Action runs this update script on a monthly basis.
⚖️ License
Code in this repository is licensed under the MIT License. Redistribution of parts of the Rhea database are redistributed under the CC-BY-4.0 license (more information here).
📖 Citation
If you find this useful in your own work, please consider citing:
bibtex
@software{charles_tapley_hoyt_2023_7591839,
author = {Charles Tapley Hoyt},
title = {Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction},
month = jan,
year = 2023,
publisher = {Zenodo},
version = {v124},
doi = {10.5281/zenodo.7591839},
url = {https://doi.org/10.5281/zenodo.7591839}
}
I also gave a talk on this in case you want to read up more.
🙏 Acknowledgements
Rhea can be cited with:
bibtex
@article{Lombardot2019,
author = {Lombardot, Thierry and Morgat, Anne and Axelsen, Kristian B and Aimo, Lucila and Hyka-Nouspikel, Nevila and Niknejad, Anne and Ignatchenko, Alex and Xenarios, Ioannis and Coudert, Elisabeth and Redaschi, Nicole and Bridge, Alan},
doi = {10.1093/nar/gky876},
journal = {Nucleic acids research},
number = {D1},
pages = {D596--D600},
pmid = {30272209},
title = {{Updates in Rhea: SPARQLing biochemical reaction data.}},
volume = {47},
year = {2019}
}
Differential reaction fingerprints can be cited with:
bibtex
@article{Probst2022,
abstract = {Differential Reaction Fingerprint DRFP is a chemical reaction fingerprint enabling simple machine learning models running on standard hardware to reach DFT- and deep learning-based accuracies in reaction yield prediction and reaction classification.},
author = {Probst, Daniel and Schwaller, Philippe and Reymond, Jean-Louis},
doi = {10.1039/D1DD00006C},
issn = {2635-098X},
journal = {Digital Discovery},
title = {{Reaction classification and yield prediction using the differential reaction fingerprint DRFP}},
url = {http://xlink.rsc.org/?DOI=D1DD00006C},
year = {2022}
}
Owner
- Name: Charles Tapley Hoyt
- Login: cthoyt
- Kind: user
- Location: Bonn, Germany
- Company: RWTH Aachen University
- Website: https://cthoyt.com
- Repositories: 484
- Profile: https://github.com/cthoyt
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction
doi: 10.5281/zenodo.7591839
authors:
- family-names: Hoyt
given-names: Charles Tapley
orcid: 0000-0003-4423-4370
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 12 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: about 2 hours
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- cthoyt (1)