onglai-classify-homologues
OngLai: A cheminformatics algorithm to classify homologous chemical series
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 11 committers (9.1%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
OngLai: A cheminformatics algorithm to classify homologous chemical series
Basic Info
- Host: GitHub
- Owner: adelenelai
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://doi.org/10.1186/s13321-022-00663-y
- Size: 72.5 MB
Statistics
- Stars: 7
- Watchers: 4
- Forks: 5
- Open Issues: 3
- Releases: 2
Topics
Metadata Files
README.md
OngLai: An Algorithm to Classify Homologous Series

Introduction
Homologous series are groups of chemical compounds sharing the same core structure(s) and different numbers of repeating units (RU) connected end-to-end.
This is an open-source algorithm to classify homologous series within compound datasets provided as SMILES, implemented using the RDKit.
For example, these series were classified in COCONUT and the NORMAN Suspect List Exchange, datasets containing natural products and environmental chemicals respectively.
CH2 Repeating Unit:

CF2 Repeating Unit:

Requirements
The algorithm requires RDKit to be installed via conda-forge.
shell
$ conda create -c conda-forge -n my-rdkit-env rdkit
$ conda activate my-rdkit-env
Installation
shell
$ git clone https://github.com/adelenelai/onglai-classify-homologues
$ cd onglai-classify-homologues
$ pip install -e .
Note that pip installing the package is not enough; in addition, the repo must be cloned from GitHub because the algorithm runs as a script (see below).
Usage
Run:
shell
$ python nextgen_classify_homols.py [-in <arg>] [-sep <arg>] [-s <arg>] [-n <arg>] [-ru <arg>] [-min <arg>] [-max <arg>] 2>log
| Flag | Description |
| --- | ----------- |
| -in --inputcsv
Try:
shell
$ cd onglai_classify_homologues
$ python nextgen_classify_homols.py -in ../tests/test1_23.csv -s SMILES -n Name -ru '[#6&H2]' -min 3 -max 30 -f 2 2>log
Successful classification will generate an output directory containing the following files:
- A TXT file containing the summary of classification results and explanation of outputs (series_no codes)
- A CSV file containing 8 columns:
series_no,cpd_name,CanoSmiles_FinalCores,SMILES,InChI,InChIKey,molecular_formulaandmonoisotopic_mass. The first columnseries_nocontains the results of the homologous series classification.CanoSmiles_FinalCoresindicates the common core shared by all members within a given series. The remaining columns contain information calculated based on theSMILES. - A TXT file of unparseable SMILES that were removed (if all SMILES were parsed OK, then empty)
Reproducing Classification described in Lai et al.
Classification using default settings as described above. Code below runs for sample datasets provided in input/, full datasets have been archived on Zenodo (amend -in accordingly to classify full datasets).
```
activate your rdkit environment
NORMAN-SLE
$ python nextgenclassifyhomols.py -in ../../input/pubchemnormansletreeparentcid981162022-03-21from115115trial.csv -s isosmiles -n cmpdname 2>log
PubChemLite
$ python nextgenclassifyhomols.py -in ../../input/PubChemLiteexposomics20220225_trial.csv -n CompoundName 2>log
COCONUT
$ python nextgenclassifyhomols.py -in ../../input/COCONUTDB2021-11_trial.txt 2>log ```
References and Links
- Lai, A., Schaub, J., Steinbeck, C. et al. An algorithm to classify homologous series within compound datasets. J Cheminform 14, 85 (2022). https://doi.org/10.1186/s13321-022-00663-y
- Poster presented at the 17th German Cheminformatics Conference, Garmisch-Partenkirchen, Germany (May 8-10, 2022)
Acknowledgements
Steffen Neumann, Charles Tapley-Hoyt, Kohulan Rajan, Mahnoor Zulfiqar, Anjana Elapavalore, Zhanyun Wang, Christos Nicolaou, Maximilian Beckers, Greg Landrum, Paolo Tosco. (and Kohulan for the logo :))
License
This project is licensed under Apache 2.0 - see LICENSE for details.
Our Research Groups
Owner
- Name: Adelene Lai
- Login: adelenelai
- Kind: user
- Location: Luxembourg
- Website: https://adelenel.ai
- Twitter: AdeleneLai
- Repositories: 3
- Profile: https://github.com/adelenelai
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Lai
given-names: Adelene
orcid: https://orcid.org/0000-0002-2985-6473
title: "An Algorithm to Classify Homologous Series"
version: 1.0.0
doi: 10.5281/zenodo.6806919
date-released: 2022-07-07
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 113
- Total Committers: 11
- Avg Commits per committer: 10.273
- Development Distribution Score (DDS): 0.681
Top Committers
| Name | Commits | |
|---|---|---|
| Adelene | a****i@u****u | 36 |
| Adelene LAI | a****i@m****x | 32 |
| adelenelai | a****i@g****m | 13 |
| Adelene LAI | a****i@m****x | 11 |
| Charles Tapley Hoyt | c****t@g****m | 8 |
| alai | a****i@a****x | 5 |
| alai | a****i@a****x | 2 |
| alai | a****i@a****l | 2 |
| Kohulan | k****n@u****e | 2 |
| Steffen Neumann | s****n@i****e | 1 |
| Egon Willighagen | e****n@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 7
- Total pull requests: 6
- Average time to close issues: 7 days
- Average time to close pull requests: about 1 month
- Total issue authors: 1
- Total pull request authors: 5
- Average comments per issue: 1.0
- Average comments per pull request: 0.67
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- adelenelai (7)
Pull Request Authors
- adelenelai (2)
- Kohulan (1)
- sneumann (1)
- egonw (1)
- cthoyt (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 9 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
pypi.org: onglai-classify-homologues
A cheminformatics algorithm to classify homologous series.
- Homepage: https://github.com/adelenelai/onglai-classify-homologues
- Documentation: https://onglai-classify-homologues.readthedocs.io/
- License: Apache
-
Latest release: 1.0.0
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- datamol *
- matplotlib *
- numpy *
- pandas *
- pytest *
- rdkit *

