decimer_short_communication
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: Kohulan
- License: mit
- Language: Python
- Default Branch: main
- Size: 544 KB
Statistics
- Stars: 5
- Watchers: 3
- Forks: 1
- Open Issues: 0
- Releases: 1
Created almost 5 years ago
· Last pushed over 4 years ago
Metadata Files
Readme
License
Citation
README.md
Performance of chemical structure string representations for chemical image recognition using transformers
- The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemical structure depictions in the form of bitmap images to corresponding molecular string representations was examined. An analysis of the recently developed DeepSMILES and SELFIES representations in comparison with the most commonly used SMILES representation is presented where the ability to translate image features into string representations with transformer models was specifically tested. The SMILES representation exhibits the best overall performance whereas SELFIES guarantee valid chemical structures. DeepSMILES perform in between SMILES and SELFIES, InChIs are not appropriate for the learning task. All investigations were performed using publicly available datasets and the code used to train and evaluate the models has been made available to the public.
Usage
- To use scripts available here, please clone the repository in your local hard disk and you can continue working with it.
- The datasets are available in zenodo as SMILES, you can use the provided SMILES Depictor java code to generate the image files.
We recommend to use DECIMER inside a Conda environment to facilitate the installation of the dependencies.
- Conda can be downloaded as part of the Anaconda or the Miniconda plattforms (Python 3.7). We recommend to install miniconda3. Using Linux you can get it with:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh $ bash Miniconda3-latest-Linux-x86_64.sh
More on using the model to train and evaluate please refer to our DECIMER Image Transformer repository
License:
- This project is licensed under the MIT License - see the LICENSE file for details
Citation
- Rajan K, Steinbeck C, Zielesny A. Performance of chemical structure string representations for chemical image recognition using transformers. ChemRxiv. Cambridge: Cambridge Open Engage; 2021; This content is a preprint and has not been peer-reviewed.
Acknowledgement
- We are grateful for the company @Google making free computing time on their TensorFlow Research Cloud infrastructure available to us.
Author: Kohulan
Project Website: DECIMER
Research Group
Owner
- Name: Kohulan Rajan
- Login: Kohulan
- Kind: user
- Location: Jena,Germany
- Company: Friedrich-Schiller-University
- Website: https://kohulanr.com
- Twitter: KohulanRajan
- Repositories: 12
- Profile: https://github.com/Kohulan
PostDoc @Steinbeck-Lab Currently based at Friedrich-Schiller-University, Jena
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
title: "Performance of chemical structure string representations for chemical image recognition using transformers"
abstract: "The use of molecular string representations for deep learning in chemistry has been steadily increasing in recent years. The complexity of existing string representations, and the difficulty in creating meaningful tokens from them, lead to the development of new string representations for chemical structures. In this study, the translation of chemical structure depictions in the form of bitmap images to corresponding molecular string representations was examined. An analysis of the recently developed DeepSMILES and SELFIES representations in comparison with the most commonly used SMILES representation is presented where the ability to translate image features into string representations with transformer models was specifically tested. The SMILES representation exhibits the best overall performance whereas SELFIES guarantee valid chemical structures. DeepSMILES perform in between SMILES and SELFIES, InChIs are not appropriate for the learning task. All investigations were performed using publicly available datasets and the code used to train and evaluate the models has been made available to the public.
"
authors:
- family-names: "Rajan"
given-names: "Kohulan"
orcid: "https://orcid.org/0000-0003-1066-7792"
- family-names: "Steinbeck"
given-names: "Christoph"
orcid: "https://orcid.org/0000-0001-6966-0814"
- family-names: "Zielesny"
given-names: "Achim"
orcid: "https://orcid.org/0000-0003-0722-4229"
version: 1.0
date-released: "2021-09-17"
identifiers:
- description: "This is the scientific publication which describes the software"
type: doi
value: "n/a"
- description: "This is the archived snapshot"
type: doi
value: "10.5281/zenodo.5513452"
- description: "Data archive"
type: doi
value: "10.5281/zenodo.5155037"
license: MIT
repository-code: "https://github.com/Kohulan/DECIMER_Short_Communication"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0


