https://github.com/bayer-group/mlr-xai-selfies

https://github.com/bayer-group/mlr-xai-selfies

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, sciencedirect.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary

Keywords

beat-undefined
Last synced: 7 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Bayer-Group
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 56.6 KB
Statistics
  • Stars: 8
  • Watchers: 6
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Topics
beat-undefined
Created about 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Codeowners

README.md

mlr-xai-selfies

SELFIES mutation method to obtain atom attributions for any QSAR model.

What are SELFIES?

SELFIES (SELF-referencing Embedded Strings) are a string representation for molecules. More information can be found in the paper and the code to compute SELFIES from molecules is available in github. SELFIES differ from SMILES in the fact that they are backed by a grammar which ensures chemical validity. Meaning that any valid SELFIES string is also a valid molecule.

What is the idea behind xai-selfies?

XAI-SELFIES can be viewed as a generalization of the XAI method published in this paper and can be considered an outcome of the Bayer LSC project "Explainable AI".

The general concept is to explain any trained QSAR model using string permutations to obtain character-level attribution scores. The overall algorithm is the following

For each input molecule of interest: Obtain the corresponding SELFIES string Obtain the prediction from the model to explain For each position in the string: Mutate the string at the position of interest by replacing the SELFIES character by all possible characters in the SELFIES vocabulary Check for SELFIES validity Optionally check for distance to input molecule Obtain predictions for all valid mutated strings Attribution_for_position_i = original prediction - average(mutated predictions) convert the SELFIES attributions into atom attributions by using SELFIES-to-SMILES correspondences

How do I get started?

  • Create a conda environment with all necessary dependencies using the environment.yml file: conda env create -f environment.yml

  • Have a look at example.py: by running it you will download a public logD dataset, create a demo QSAR model based on this dataset, and create attribution vectors for the first 200 molecules of the dataset. It shows how the pretrained model should look like and how the featurizer should look like.

How can I visualize attributions computed with XAI-SELFIES?

Several ways!

  • The first one would be to use the RDKit library, specifically by using the SimilarityMaps functionality as shown here.

  • Another option is to use the beautiful xSMILES library published by Henry Heberle, which can work as a jupyter notebook plugin.

  • Finally we have also built CIME, a visual analytics platform which integrates xSMILES in a webapp, and lets you upload datasets as csv format (i.e., you can just save the pandas dataframe obtained from running XAI-SELFIES as an sdf and move on to CIME to analyze your data. The public version of CIME is available here and can be launched as a docker container.

Acknowledgements

Code developed by Floriane Montanari while employed in the Machine Learning Group at Bayer. Kudos to Linlin Zhao (whose xBCF implementation helped make XAI-SELFIES), Marco Bertolini and Thomas Wolf for contibuting ideas!

Owner

  • Name: Bayer Open Source
  • Login: Bayer-Group
  • Kind: organization

Science for a better life

GitHub Events

Total
  • Push event: 6
  • Pull request event: 7
  • Fork event: 1
Last Year
  • Push event: 6
  • Pull request event: 7
  • Fork event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 13
  • Total Committers: 2
  • Avg Commits per committer: 6.5
  • Development Distribution Score (DDS): 0.077
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
flo f****i@b****m 12
Jan Wollschlaeger j****r@b****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 1
  • Total pull requests: 1
  • Average time to close issues: 20 days
  • Average time to close pull requests: 3 minutes
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • kerjans (3)
  • jmwoll (2)
  • ivanmilevtues (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

environment.yml pypi
  • selfies *
  • tqdm *
setup.py pypi