elementembeddings

Python package to interact with high-dimensional representations of the chemical elements

https://github.com/wmd-group/elementembeddings

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: rsc.org, zenodo.org
  • Committers with academic emails
    2 of 5 committers (40.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.4%) to scientific vocabulary

Keywords

computational-chemistry machine-learning materials-design python

Keywords from Contributors

materials-science materials-informatics materials-screening mesh interpretability sequences projection interactive optim hacking
Last synced: 6 months ago · JSON representation ·

Repository

Python package to interact with high-dimensional representations of the chemical elements

Basic Info
Statistics
  • Stars: 44
  • Watchers: 4
  • Forks: 4
  • Open Issues: 7
  • Releases: 10
Topics
computational-chemistry machine-learning materials-design python
Created almost 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

ElementEmbeddings

made-with-python License: MIT Code style: black GitHub issues CI Status codecov DOI PyPI Conda documentation python version PyPI - Downloads

The Element Embeddings package provides high-level tools for analysing elemental and ionic species embeddings data. This primarily involves visualising the correlation between embedding schemes using different statistical measures.

Motivation

Machine learning approaches for materials informatics have become increasingly widespread. Some of these involve the use of deep learning techniques where the representation of the elements is learned rather than specified by the user of the model. While an important goal of machine learning training is to minimise the chosen error function to make more accurate predictions, it is also important for us material scientists to be able to interpret these models. As such, we aim to evaluate and compare different atomic embedding schemes in a consistent framework.

Getting started

ElementEmbeddings's main feature, the Embedding class is accessible by importing the class.

Installation

The latest stable release can be installed via pip using:

bash pip install ElementEmbeddings

Alternatively, ElementEmbeddings is available via conda through the conda-forge channel on Anaconda Cloud:

bash conda install -c conda-forge elementembeddings

For installing the development or documentation dependencies via pip:

bash pip install "ElementEmbeddings[dev]" pip install "ElementEmbeddings[docs]"

For development, you can clone the repository and install the package in editable mode. To clone the repository and make a local installation, run the following commands:

bash git clone https://github.com/WMD-group/ElementEmbeddings.git cd ElementEmbeddings pip install -e ".[docs,dev]"

With -e pip will create links to the source folder so that changes to the code will be immediately reflected on the PATH.

Usage

For simple usage, you can instantiate an Embedding object using one of the embeddings in the data directory. For this example, let's use the magpie elemental representation.

```pycon

Import the class

from elementembeddings.core import Embedding

Load the magpie data

magpie = Embedding.load_data("magpie") ```

We can access some of the properties of the Embedding class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

```pycon

Print out some of the properties of the ElementEmbeddings class

print(f"The magpie representation has embeddings of dimension {magpie.dim}") print( ... f"The magpie representation contains these elements: \n {magpie.elementlist}" ... ) # prints out all the elements considered for this representation print( ... f"The magpie representation contains these features: \n {magpie.featurelabels}" ... ) # Prints out the feature labels of the chosen representation

The magpie representation has embeddings of dimension 22 The magpie representation contains these elements: ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk'] The magpie representation contains these features: ['Number', 'MendeleevNumber', 'AtomicWeight', 'MeltingT', 'Column', 'Row', 'CovalentRadius', 'Electronegativity', 'NsValence', 'NpValence', 'NdValence', 'NfValence', 'NValence', 'NsUnfilled', 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled', 'GSvolume_pa', 'GSbandgap', 'GSmagmom', 'SpaceGroupNumber'] ```

Plotting

We can quickly generate heatmaps of distance/similarity measures between the element vectors using heatmap_plotter and plot the representations in two dimensions using the dimension_plotter from the plotter module. Before we do that, we will standardise the embedding using the standardise method available to the Embedding class

```python from elementembeddings.plotter import heatmapplotter, dimensionplotter import matplotlib.pyplot as plt

magpie.standardise(inplace=True) # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6, 6)) heatmapparams = {"vmin": -1, "vmax": 1} heatmapplotter( embedding=magpie, metric="cosinesimilarity", showaxislabels=False, cmap="Bluesr", ax=ax, **heatmapparams ) ax.settitle("Magpie cosine similarities") fig.tightlayout() fig.show() ```

Cosine similarity heatmap of the magpie representation

```python fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducerparams = {"nneighbors": 30, "randomstate": 42} scatterparams = {"s": 100}

dimensionplotter( embedding=magpie, reducer="umap", ncomponents=2, ax=ax, adjusttext=True, reducerparams=reducerparams, scatterparams=scatterparams, ) ax.settitle("Magpie UMAP (nneighbours=30)") ax.legend().remove() handles, labels = ax1.getlegendhandleslabels() fig.legend(handles, labels, bboxto_anchor=(1.25, 0.5), loc="center right", ncol=1)

fig.tight_layout() fig.show() ```

Scatter plot of the Magpie representation reduced to 2 dimensions using UMAP

Compositions

The package can also be used to featurise compositions. Your data could be a list of formula strings or a pandas dataframe of the following format:

| formula | | ------- | | CsPbI3 | | Fe2O3 | | NaCl | | ZnS |

The composition_featuriser function can be used to featurise the data. The compositions can be featurised using different representation schemes and different types of pooling through the embedding and stats arguments respectively.

```python from elementembeddings.composition import composition_featuriser

dffeaturised = compositionfeaturiser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised ```

| formula | meanNumber | meanMendeleevNumber | meanAtomicWeight | meanMeltingT | meanColumn | meanRow | meanCovalentRadius | meanElectronegativity | meanNsValence | meanNpValence | meanNdValence | meanNfValence | meanNValence | meanNsUnfilled | meanNpUnfilled | meanNdUnfilled | meanNfUnfilled | meanNUnfilled | meanGSvolumepa | meanGSbandgap | meanGSmagmom | meanSpaceGroupNumber | sumNumber | sumMendeleevNumber | sumAtomicWeight | sumMeltingT | sumColumn | sumRow | sumCovalentRadius | sumElectronegativity | sumNsValence | sumNpValence | sumNdValence | sumNfValence | sumNValence | sumNsUnfilled | sumNpUnfilled | sumNdUnfilled | sumNfUnfilled | sumNUnfilled | sumGSvolumepa | sumGSbandgap | sumGSmagmom | sumSpaceGroupNumber | | ------- | ----------- | -------------------- | ------------------ | ----------------- | ----------- | -------- | ------------------- | ---------------------- | -------------- | -------------- | ------------------ | ------------------ | ------------- | --------------- | --------------- | --------------- | --------------- | -------------- | ---------------- | -------------- | ------------------ | --------------------- | ---------- | ------------------- | ----------------- | ------------ | ---------- | ------- | ------------------ | --------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | -------------- | -------------- | -------------- | -------------- | ------------- | ------------------ | ------------- | ------------ | -------------------- | | CsPbI3 | 59.2 | 74.8 | 144.16377238 | 412.55 | 13.2 | 5.4 | 161.39999999999998 | 2.22 | 1.8 | 3.4 | 8.0 | 2.8000000000000003 | 16.0 | 0.2 | 1.4 | 0.0 | 0.0 | 1.6 | 54.584 | 0.6372 | 0.0 | 129.20000000000002 | 296.0 | 374.0 | 720.8188619 | 2062.75 | 66.0 | 27.0 | 807.0 | 11.100000000000001 | 9.0 | 17.0 | 40.0 | 14.0 | 80.0 | 1.0 | 7.0 | 0.0 | 0.0 | 8.0 | 272.92 | 3.186 | 0.0 | 646.0 | | Fe2O3 | 15.2 | 74.19999999999999 | 31.937640000000002 | 757.2800000000001 | 12.8 | 2.8 | 92.4 | 2.7960000000000003 | 2.0 | 2.4 | 2.4000000000000004 | 0.0 | 6.8 | 0.0 | 1.2 | 1.6 | 0.0 | 2.8 | 9.755 | 0.0 | 0.8442651200000001 | 98.80000000000001 | 76.0 | 371.0 | 159.6882 | 3786.4 | 64.0 | 14.0 | 462.0 | 13.98 | 10.0 | 12.0 | 12.0 | 0.0 | 34.0 | 0.0 | 6.0 | 8.0 | 0.0 | 14.0 | 48.775000000000006 | 0.0 | 4.2213256 | 494.0 | | NaCl | 14.0 | 48.0 | 29.221384640000004 | 271.235 | 9.0 | 3.0 | 134.0 | 2.045 | 1.5 | 2.5 | 0.0 | 0.0 | 4.0 | 0.5 | 0.5 | 0.0 | 0.0 | 1.0 | 26.87041666665 | 1.2465 | 0.0 | 146.5 | 28.0 | 96.0 | 58.44276928000001 | 542.47 | 18.0 | 6.0 | 268.0 | 4.09 | 3.0 | 5.0 | 0.0 | 0.0 | 8.0 | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 53.7408333333 | 2.493 | 0.0 | 293.0 | | ZnS | 23.0 | 78.5 | 48.7225 | 540.52 | 14.0 | 3.5 | 113.5 | 2.115 | 2.0 | 2.0 | 5.0 | 0.0 | 9.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 19.8734375 | 1.101 | 0.0 | 132.0 | 46.0 | 157.0 | 97.445 | 1081.04 | 28.0 | 7.0 | 227.0 | 4.23 | 4.0 | 4.0 | 10.0 | 0.0 | 18.0 | 0.0 | 2.0 | 0.0 | 0.0 | 2.0 | 39.746875 | 2.202 | 0.0 | 264.0 |

The returned dataframe contains the mean-pooled and sum-pooled features of the magpie representation for the four formulas.

Development notes

Bugs, features and questions

Please use the issue tracker to report bugs and any feature requests. Hopefully, most questions should be solvable through the docs. For any other queries related to the project, please contact Anthony Onwuli by e-mail: anthony.onwuli16@imperial.ac.uk.

Code contributions

We welcome new contributions to this project. See the contributing guide for detailed instructions on how to contribute to our project.

Add an embedding scheme

The steps required to add a new representation scheme are:

  1. Add data file to data/element_representations.
  2. Edit docstring table in core.py.
  3. Edit utils/config.py to include the representation in DEFAULT_ELEMENT_EMBEDDINGS and CITATIONS.
  4. Update the documentation reference.md and README.md.

Developer

References

A. Onwuli et al, "Ionic species representations for materials informatics"

H. Park et al, "Mapping inorganic crystal chemical space" Faraday Discuss. (2024)

A. Onwuli et al, "Element similarity in high-dimensional materials representations" Digital Discovery 2, 1558 (2023)

Owner

  • Name: Materials Design Group
  • Login: WMD-group
  • Kind: organization
  • Location: London

Research group in computational chemistry & physics led by @aronwalsh at @ImperialCollegeLondon

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: ElementEmbeddings
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Anthony
    family-names: Onwuli
    email: anthony.onwuli16@imperial.ac.uk
    affiliation: Imperial College London
    orcid: "https://orcid.org/0000-0003-2107-153X"
  - given-names: Aron
    family-names: Walsh
    email: a.walsh@imperial.ac.uk
    affiliation: Imperial College London
    orcid: "https://orcid.org/0000-0001-5460-7033"
identifiers:
  - type: doi
    value: 10.5281/zenodo.8101633
repository-code: "https://github.com/WMD-group/ElementEmbeddings"
url: "https://wmd-group.github.io/ElementEmbeddings/"
abstract: >-
  The Element Embeddings package provides high-level tools
  for analysing elemental embedding data. This primarily
  involves visualising the correlation between embedding
  schemes using different statistical measures.
keywords:
  - materials science
  - computational chemistry
  - materials informatics
  - statistics
  - element representation
license: MIT
commit: d3f30602abf825ba3dcd5f247694a174a358ef49
version: "0.2.0"
date-released: "2023-07-07"

GitHub Events

Total
  • Issues event: 3
  • Watch event: 10
  • Delete event: 4
  • Issue comment event: 12
  • Push event: 35
  • Pull request review event: 3
  • Pull request review comment event: 2
  • Pull request event: 13
  • Fork event: 3
  • Create event: 8
Last Year
  • Issues event: 3
  • Watch event: 10
  • Delete event: 4
  • Issue comment event: 12
  • Push event: 35
  • Pull request review event: 3
  • Pull request review comment event: 2
  • Pull request event: 13
  • Fork event: 3
  • Create event: 8

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 317
  • Total Committers: 5
  • Avg Commits per committer: 63.4
  • Development Distribution Score (DDS): 0.167
Past Year
  • Commits: 249
  • Committers: 5
  • Avg Commits per committer: 49.8
  • Development Distribution Score (DDS): 0.181
Top Committers
Name Email Commits
Anthony a****6@i****k 264
dependabot[bot] 4****] 32
Anthony Onwuli 3****i 13
Aron Walsh a****h@g****m 7
AntObi a****6@i****k 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 20
  • Total pull requests: 149
  • Average time to close issues: 4 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 2
  • Total pull request authors: 5
  • Average comments per issue: 0.4
  • Average comments per pull request: 1.0
  • Merged pull requests: 126
  • Bot issues: 0
  • Bot pull requests: 94
Past Year
  • Issues: 1
  • Pull requests: 4
  • Average time to close issues: 28 days
  • Average time to close pull requests: 4 days
  • Issue authors: 1
  • Pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.75
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • AntObi (18)
  • AyhamSaffar (1)
Pull Request Authors
  • dependabot[bot] (112)
  • AntObi (58)
  • github-actions[bot] (23)
  • aronwalsh (2)
  • pre-commit-ci[bot] (1)
Top Labels
Issue Labels
bug (3) enhancement (3) documentation (1) python (1) dependencies (1)
Pull Request Labels
dependencies (112) python (88) github_actions (24) enhancement (3) documentation (2)