rhea-fingerprints

Automatically generate differential reaction fingerprints on reactions in Rhea

https://github.com/cthoyt/rhea-fingerprints

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: rsc.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary

Keywords

differential-reaction-fingerprints rhea
Last synced: 7 months ago · JSON representation ·

Repository

Automatically generate differential reaction fingerprints on reactions in Rhea

Basic Info
  • Host: GitHub
  • Owner: cthoyt
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 33.9 MB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
differential-reaction-fingerprints rhea
Created over 4 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction

DOI

This repository generates differential reaction fingerprints for reactions in Rhea.

🚀 Usage

The SMILES dataframe and DRFP-derived fingerprint dataframe can be loaded from GitHub with:

```python import pandas as pd

baseurl = "https://github.com/cthoyt/rhea-fingerprints/raw/main/docs" smilesurl = f"{baseurl}/127/reactionsmiles.tsv" smilesdf = pd.readcsv(smiles_url, sep="\t")

fingerprinturl = f"{baseurl}/127/reactionfingerprints.tsv.gz" fingerprintdf = pd.readcsv(fingerprinturl, sep="\t", index_col=0) ```

Here's a 2D PCA scatterplot of the embeddings:

Scatterplot of DRFPs

Analysis

This repository also generates reusable models for predicting enzyme codes based on DRFPs, trained using Rhea. It uses simple classifiers and performs really well.

Scatterplot of classifier results

You can re-use existing models in combination with drfp like:

```python import pystow from drfp import DrfpEncoder

baseurl = "https://github.com/cthoyt/rhea-fingerprints/raw/main/docs" url = f"{baseurl}/127/models/LogisticRegression.pkl" clf = pystow.ensure_pickle("bio", "rhea", "models", "127", url=url)

rxnsmiles = [ "CO.O[C@@H]1CCNC1.[C-]#[N+]CC(=O)OC>>[C-]#[N+]CC(=O)N1CCC@@HC1", "CCOC(=O)C(CC)c1cccnc1.Cl.O>>CCC(C(=O)O)c1cccnc1", ] fps = DrfpEncoder.encode(rxnsmiles)

predictions = clf.predict(fps) ```

Warning There might be some issues with reloading model weights, please let me know if this comes up.

♻️ Update

Installation of the requirements and running of the build script are handled with tox. The current version of Rhea is looked up with bioversions so the provenance of the data can be properly traced. Run with:

shell $ pip install tox $ tox

Additionally, a GitHub Action runs this update script on a monthly basis.

⚖️ License

Code in this repository is licensed under the MIT License. Redistribution of parts of the Rhea database are redistributed under the CC-BY-4.0 license (more information here).

📖 Citation

If you find this useful in your own work, please consider citing:

bibtex @software{charles_tapley_hoyt_2023_7591839, author = {Charles Tapley Hoyt}, title = {Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction}, month = jan, year = 2023, publisher = {Zenodo}, version = {v124}, doi = {10.5281/zenodo.7591839}, url = {https://doi.org/10.5281/zenodo.7591839} }

I also gave a talk on this in case you want to read up more.

🙏 Acknowledgements

Rhea can be cited with:

bibtex @article{Lombardot2019, author = {Lombardot, Thierry and Morgat, Anne and Axelsen, Kristian B and Aimo, Lucila and Hyka-Nouspikel, Nevila and Niknejad, Anne and Ignatchenko, Alex and Xenarios, Ioannis and Coudert, Elisabeth and Redaschi, Nicole and Bridge, Alan}, doi = {10.1093/nar/gky876}, journal = {Nucleic acids research}, number = {D1}, pages = {D596--D600}, pmid = {30272209}, title = {{Updates in Rhea: SPARQLing biochemical reaction data.}}, volume = {47}, year = {2019} }

Differential reaction fingerprints can be cited with:

bibtex @article{Probst2022, abstract = {Differential Reaction Fingerprint DRFP is a chemical reaction fingerprint enabling simple machine learning models running on standard hardware to reach DFT- and deep learning-based accuracies in reaction yield prediction and reaction classification.}, author = {Probst, Daniel and Schwaller, Philippe and Reymond, Jean-Louis}, doi = {10.1039/D1DD00006C}, issn = {2635-098X}, journal = {Digital Discovery}, title = {{Reaction classification and yield prediction using the differential reaction fingerprint DRFP}}, url = {http://xlink.rsc.org/?DOI=D1DD00006C}, year = {2022} }

Owner

  • Name: Charles Tapley Hoyt
  • Login: cthoyt
  • Kind: user
  • Location: Bonn, Germany
  • Company: RWTH Aachen University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: Rhea Differential Reaction Fingerprints for Enzyme Classification Prediction
doi: 10.5281/zenodo.7591839 
authors:
  - family-names: Hoyt
    given-names: Charles Tapley
    orcid: 0000-0003-4423-4370

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cthoyt (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels