https://github.com/cthoyt/syba

Synthetic Bayesian Classification

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 6 months ago · JSON representation

Repository

Synthetic Bayesian Classification

Basic Info

Host: GitHub
Owner: cthoyt
License: gpl-3.0
Default Branch: master
Size: 71.7 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Fork of lich-uct/syba

Created about 5 years ago · Last pushed about 5 years ago

https://github.com/cthoyt/syba/blob/master/

# SYBA
SYnthetic BAyesian classifier (SYBA) is a Python package for the classification of organic compounds as easy-to-synthesize (ES) or hard-to-synthesize (ES). SYBA is a fragment-based method. The molecule is decomposed into ECFP4-like fragments, a fragment score is assigned to each fragment and all fragment scores are summed up to give the resulting SYBA score. If SYBA score is positive, the molecule is considered to be ES, otherwise it is considered to be HS. Fragment scores are the part of the SYBA algorithm and they were obtained by the analysis of the frequency of fragments in the databases of ES and HS compounds. ES compounds were obtained by a random selection from the ZINC15 [http://zinc.docking.org/] database, HS compounds were generated by the Nonpher [https://github.com/lich-uct/nonpher] approach. More details can be found in SYBA [as soon as accepted] and Nonpher [http://dx.doi.org/10.1186/s13321-017-0206-2] papers.

## Instalation
### Prerequisities
#### Supported platforms:
* All platforms

#### Dependencies
* RDKit [https://github.com/rdkit/rdkit] (recommended version 2018_03_1 or later)

### Installation with Anaconda
SYBA is distributed as a Conda package. Conda is an open source package management system and environment management system that makes setting up a development environment for any project very easy. To install Conda package, you have to get either full Anaconda [https://www.anaconda.com/] distribution or its lightweight variant, Miniconda [https://docs.conda.io/en/latest/miniconda.html]. SYBA is installed from Anaconda/Miniconda by running the following command from the Linux terminal:
```bash
conda install -c rdkit -c lich syba
```

### Installation with setup.py
Once you have RDKit[https://github.com/rdkit/rdkit] installed, you can install SYBA from its directory with the following command:
```bash
python setup.py install
```

## Quick start
SYBA input is a CSV (comma-separated value) file consisting of the following columns: CMPND_ID,SMILES,OTHER_COLUMNS. OTHER_COLUMNS can contain any additional data and these columns are skipped. Output is a CSV file in the format ID,SMILES,SYBA_SCORE. SYBA reflects how confident the classifier is with its prediction (i.e., SYBA score can't be considered as a measure of the ease of synthesis). Negative SYBA values mean a hard-to-synthesize compound and positive mean an easy-to-synthesize one.

SYBA classification is performed by the following command:

```bash
python -m syba.syba [INPUT_FILE [OUTPUT_FILE]]
```
## Use in Python script
### Basic usage
```python
from rdkit import Chem
from syba.syba import SybaClassifier

syba = SybaClassifier()
syba.fitDefaultScore()
smi = "O=C(C)Oc1ccccc1C(=O)O"
syba.predict(smi)
# syba works also with RDKit RDMol objects
mol = Chem.MolFromSmiles(smi)
syba.predict(mol=mol)
# syba.predict is actually method with two keyword parameters "smi" and "mol", if both provided score is calculated for compound defined in "smi" parameter has the priority
syba.predict(smi=smi, mol=mol)
```

## SYBA workflow
SYBA training (i.e., SYBA fragment score calculation) is demonstrated in Jupyter notebook accessible in `docs/notebooks/prepare_fragment_counts.ipynb`. The example of SYBA, as well as SAScore, SCScore and Random forest, classification for a new compound is available in `docs/notebooks/prepare_results.ipynb` Jupyter notebook. Jupyter notebook can be installed from Conda with the command `conda install jupyter`.

Owner

Name: Charles Tapley Hoyt
Login: cthoyt
Kind: user
Location: Bonn, Germany
Company: RWTH Aachen University

Website: https://cthoyt.com
Repositories: 489
Profile: https://github.com/cthoyt

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cthoyt/syba

Science Score: 13.0%

Repository

Basic Info

Statistics

https://github.com/cthoyt/syba/blob/master/

Owner

GitHub Events

Total

Last Year