chemplot

A python package for chemical space visualization.

https://github.com/mcsorkun/chemplot

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, sciencedirect.com, wiley.com
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.5%) to scientific vocabulary

Keywords

chemical-space cheminformatics data-visualization dimensionality-reduction
Last synced: 6 months ago · JSON representation ·

Repository

A python package for chemical space visualization.

Basic Info
Statistics
  • Stars: 141
  • Watchers: 5
  • Forks: 31
  • Open Issues: 8
  • Releases: 8
Topics
chemical-space cheminformatics data-visualization dimensionality-reduction
Created about 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md


ChemPlot

Chemplot is a python library for chemical space visualization that allows users to plot the chemical space of their molecular datasets. Chemplot contains both structural and tailored similarity algorithms to plot similar molecules together based on the needs of users. Moreover, it is easy to use even for non-experts.

Current Release Info

| Version | Downloads | License | Documentation | Testing | | --- | --- | --- | --- | --- | | Conda Version PyPI version | Conda Downloads PyPI - Downloads | PyPI - License | Documentation Status | Tests Coverage Status |

Resources

User Manual

You can find the detailed features and examples in the following link: User Manual.

Web Application

ChemPlot is also available as a web application. You can use it at the following link: Web Application.

Paper

You can find the details for the background on ChemPlot in our paper. You can download our paper at: Paper.

Installation

There are two different options to install ChemPlot:

Option 1: Use conda

To install ChemPlot using conda, run the following from the command line:

conda install -c conda-forge chemplot

Option 2: Use pip

You can also install ChemPlot using pip:

pip install chemplot

How to use ChemPlot

ChemPlot is a cheminformatics tool whose purpose is to visualize subsets of the chemical space in two dimensions. It uses the RDKit chemistry framework, the scikit-learn API and the umap-learn API.

Getting started

To demonstrate how to use the functions the library offers we use BBBP (blood-brain barrier penetration) [1] molecular dataset. BBBP is a set of molecules encoded as SMILES, which have been assigned a binary label according to their permeability properties. This dataset can be retrieved from the library as a pandas DataFrame object.

{.sourceCode .python3} import chemplot as cp data_BBBP = cp.load_data("BBBP")

To visualize the molecules in 2D according to their similarity it is first needed to construct a Plotter object. This is the class containing all the functions ChemPlot uses to produce the desired visualizations. A Plotter object can be constructed using classmethods, which differentiate between the type of input that is feed to the object. In our example we need to use the method from_smiles. We pass three parameters: the list of SMILES from the BBBP dataset, their target values (the binary labels) and the target type (in this case “C”, which stands for “Classification”).

{.sourceCode .python3} plotter = cp.Plotter.from_smiles(data_BBBP["smiles"], target=data_BBBP["target"], target_type="C")

Plotting the results

When the Plotter object was constructed descriptors for each SMILES were calculated, using the library mordred, and then selected based on the target values. We reduce the number of dimensions for each molecule from the number of descriptors selected to only 2. ChemPlot uses three different algorithms in order to achieve this. In this example we will first use t-SNE [2].

{.sourceCode .python3} plotter.tsne()

The output will be a dataframe containg the reduced dimensions and the target values.

| t-SNE-1 | t-SNE-2 | target | |------------------|------------------|------------------| | -41.056122 | 0.355575 | 1 | | -35.535915 | 21.648867 | 1 | | 23.771597 | -14.438373 | 1 |

To now visualize the chemical space of the dataset we use visualize_plot().

{.sourceCode .python3} plotter.visualize_plot()

image

The second figure shows the results obtained by reducing the dimensions of features Principal Component Analysis (PCA) [3].

{.sourceCode .python3} plotter.pca() plotter.visualize_plot()

image

The third figure shows the results obtained by reducing the dimensions of features by UMAP [4].

{.sourceCode .python3} plotter.umap() plotter.visualize_plot()

image

In each figure the molecules are coloured by class value.

Citation

If you use ChemPlot for your scientific projects, we would appreciate if you would cite the paper from the Chemestry-Methods journal:

bibtex @article{2022ChemPlot, author = {Cihan Sorkun, Murat and Mullaj, Dajt and Koelman, J. M. Vianney A. and Er, Süleyman}, title = {ChemPlot, a Python Library for Chemical Space Visualization}, journal = {Chemistry–Methods}, volume = {2}, number = {7}, pages = {e202200005}, keywords = {chemical space visualization, cheminformatics, molecular similarity, Python, tailored similarity}, doi = {https://doi.org/10.1002/cmtd.202200005}, url = {https://chemistry-europe.onlinelibrary.wiley.com/doi/abs/10.1002/cmtd.202200005}, eprint = {https://chemistry-europe.onlinelibrary.wiley.com/doi/pdf/10.1002/cmtd.202200005}, abstract = {Visualizing chemical spaces streamlines the analysis of molecular datasets by reducing the information to human perception level, hence it forms an integral piece of molecular engineering, including chemical library design, high-throughput screening, diversity analysis, and outlier detection. We present here ChemPlot, which enables users to visualize the chemical space of molecular datasets in both static and interactive ways. ChemPlot features structural and tailored similarity methods, together with three different dimensionality reduction methods: PCA, t-SNE, and UMAP. ChemPlot is the first visualization software that tackles the activity/property cliff problem by incorporating tailored similarity. With tailored similarity, the chemical space is constructed in a supervised manner considering target properties. Additionally, we propose a metric, the Distance Property Relationship score, to quantify the property difference of similar (i. e. close) molecules in the visualized chemical space. ChemPlot can be installed via Conda or PyPI (pip) and a web application is freely accessible at https://www.amdlab.nl/chemplot/.}, year = {2022} }

Contact

For any question you can contact us through email:


References:

[1]: Martins, Ines Filipa, et al. (2012). A Bayesian approach to in silico blood-brain barrier penetration modeling. Journal of chemical information and modeling 52.6, 1686-1697

[2]: van der Maaten, Laurens, Hinton, Geoffrey. (2008). Viualizingdata using t-SNE. Journal of Machine Learning Research. 9. 2579-2605.

[3]: Wold, S., Esbensen, K., Geladi, P. (1987). Principal component analysis. Chemometrics and intelligent laboratory systems. 2(1-3). 37-52.

[4]: McInnes, L., Healy, J., Melville, J. (2018). Umap: Uniform manifold approximation and projection for dimension reduction. arXivpreprint arXiv:1802.03426.

Citation (CITATION.bib)

@article{2022ChemPlot,
    author = {Cihan Sorkun, Murat and Mullaj, Dajt and Koelman, J. M. Vianney A. and Er, Süleyman},
    title = {ChemPlot, a Python Library for Chemical Space Visualization},
    journal = {Chemistry–Methods},
    volume = {2},
    number = {7},
    pages = {e202200005},
    keywords = {chemical space visualization, cheminformatics, molecular similarity, Python, tailored similarity},
    doi = {https://doi.org/10.1002/cmtd.202200005},
    url = {https://chemistry-europe.onlinelibrary.wiley.com/doi/abs/10.1002/cmtd.202200005},
    eprint = {https://chemistry-europe.onlinelibrary.wiley.com/doi/pdf/10.1002/cmtd.202200005},
    abstract = {Visualizing chemical spaces streamlines the analysis of molecular datasets by reducing the information 
    to human perception level, hence it forms an integral piece of molecular engineering, including chemical library design, 
    high-throughput screening, diversity analysis, and outlier detection. We present here ChemPlot, which enables users to 
    visualize the chemical space of molecular datasets in both static and interactive ways. ChemPlot features structural and 
    tailored similarity methods, together with three different dimensionality reduction methods: PCA, t-SNE, and UMAP. 
    ChemPlot is the first visualization software that tackles the activity/property cliff problem by incorporating tailored similarity. 
    With tailored similarity, the chemical space is constructed in a supervised manner considering target properties. Additionally, 
    we propose a metric, the Distance Property Relationship score, to quantify the property difference of similar (i. e. close) 
    molecules in the visualized chemical space. ChemPlot can be installed via Conda or PyPI (pip) and a web application is freely 
    accessible at https://www.amdlab.nl/chemplot/.},
    year = {2022}
}

GitHub Events

Total
  • Create event: 2
  • Issues event: 4
  • Release event: 4
  • Watch event: 35
  • Issue comment event: 12
  • Push event: 26
  • Pull request event: 2
  • Fork event: 5
Last Year
  • Create event: 2
  • Issues event: 4
  • Release event: 4
  • Watch event: 35
  • Issue comment event: 12
  • Push event: 26
  • Pull request event: 2
  • Fork event: 5

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 344
  • Total Committers: 4
  • Avg Commits per committer: 86.0
  • Development Distribution Score (DDS): 0.387
Past Year
  • Commits: 21
  • Committers: 3
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
dajtmullaj 3****j@u****m 211
dajtmullaj d****j@s****l 65
dajtmullaj d****t@c****l 63
Murat Cihan Sorkun m****n@g****m 5
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 20
  • Total pull requests: 9
  • Average time to close issues: 5 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 17
  • Total pull request authors: 4
  • Average comments per issue: 1.9
  • Average comments per pull request: 0.56
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 1
  • Average time to close issues: 4 months
  • Average time to close pull requests: 6 days
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 5.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jessielyons (3)
  • xoolCCG (2)
  • alfredoq (1)
  • dajtmullaj (1)
  • GattiMh (1)
  • Prospero1988 (1)
  • cjvsimoes (1)
  • Nickspizza001 (1)
  • jeep3 (1)
  • khoroshyy (1)
  • XuanZhoudiffer (1)
  • shkao (1)
  • cyan20200410 (1)
  • egonw (1)
  • Kohulan (1)
Pull Request Authors
  • dajtmullaj (6)
  • JacksonBurns (2)
  • Kohulan (2)
  • mrcblt (1)
Top Labels
Issue Labels
enhancement (3) documentation (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 191 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 13
  • Total maintainers: 1
pypi.org: chemplot

A python library for chemical space visualization.

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 191 Last month
Rankings
Stargazers count: 7.5%
Forks count: 8.2%
Average: 9.6%
Dependent packages count: 10.1%
Downloads: 10.8%
Dependent repos count: 11.5%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: chemplot

In the last decades, Machine Learning (ML) applications have had a great impact on molecular and material science. However, every ML model requires a definition of its applicability domain. Chemplot is a python library for chemical space visualization that allows users to plot the chemical space of their datasets. Chemplot contains smart algorithms behind which uses both structural and tailored similarity. Moreover, it is easy to use even for non-experts.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 36.4%
Forks count: 37.3%
Average: 39.7%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • chemplot *
  • readthedocs-sphinx-search ==0.1.1
  • sphinx ==4.2.0
  • sphinx_rtd_theme ==1.0.0
requirements.txt pypi
  • bokeh >=2.2.3
  • matplotlib >=3.3.2
  • mordred >=1.2.0
  • networkx >=2.5
  • numpy >=1.19.2
  • pandas >=1.1.3
  • pytest >=6.2.5
  • pytest-cov >=3.0.0
  • scikit-learn >=0.23.2
  • scipy >=1.5.2
  • seaborn >=0.11.1
  • umap-learn >=0.5.1
setup.py pypi
  • bokeh >=2.2.3
  • matplotlib >=3.3.2
  • mordred >=1.2.0
  • networkx >=2.5
  • numpy >=1.19.2
  • pandas >=1.1.3
  • scikit-learn >=0.23.2
  • scipy >=1.5.2
  • seaborn >=0.11.1
  • umap-learn >=0.5.1