elementembeddings
Python package to interact with high-dimensional representations of the chemical elements
Science Score: 64.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: rsc.org, zenodo.org -
✓Committers with academic emails
2 of 5 committers (40.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Python package to interact with high-dimensional representations of the chemical elements
Basic Info
- Host: GitHub
- Owner: WMD-group
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://wmd-group.github.io/ElementEmbeddings/
- Size: 43.6 MB
Statistics
- Stars: 44
- Watchers: 4
- Forks: 4
- Open Issues: 7
- Releases: 10
Topics
Metadata Files
README.md
ElementEmbeddings
The Element Embeddings package provides high-level tools for analysing elemental and ionic species embeddings data. This primarily involves visualising the correlation between embedding schemes using different statistical measures.
- Documentation: https://wmd-group.github.io/ElementEmbeddings/
- Examples: https://github.com/WMD-group/ElementEmbeddings/tree/main/examples
Motivation
Machine learning approaches for materials informatics have become increasingly widespread. Some of these involve the use of deep learning techniques where the representation of the elements is learned rather than specified by the user of the model. While an important goal of machine learning training is to minimise the chosen error function to make more accurate predictions, it is also important for us material scientists to be able to interpret these models. As such, we aim to evaluate and compare different atomic embedding schemes in a consistent framework.
Getting started
ElementEmbeddings's main feature, the Embedding class is accessible by importing the class.
Installation
The latest stable release can be installed via pip using:
bash
pip install ElementEmbeddings
Alternatively, ElementEmbeddings is available via conda through the conda-forge channel on Anaconda Cloud:
bash
conda install -c conda-forge elementembeddings
For installing the development or documentation dependencies via pip:
bash
pip install "ElementEmbeddings[dev]"
pip install "ElementEmbeddings[docs]"
For development, you can clone the repository and install the package in editable mode. To clone the repository and make a local installation, run the following commands:
bash
git clone https://github.com/WMD-group/ElementEmbeddings.git
cd ElementEmbeddings
pip install -e ".[docs,dev]"
With -e pip will create links to the source folder so that changes to the code will be immediately reflected on the PATH.
Usage
For simple usage, you can instantiate an Embedding object using one of the embeddings in the data directory. For this example, let's use the magpie elemental representation.
```pycon
Import the class
from elementembeddings.core import Embedding
Load the magpie data
magpie = Embedding.load_data("magpie") ```
We can access some of the properties of the Embedding class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.
```pycon
Print out some of the properties of the ElementEmbeddings class
print(f"The magpie representation has embeddings of dimension {magpie.dim}") print( ... f"The magpie representation contains these elements: \n {magpie.elementlist}" ... ) # prints out all the elements considered for this representation print( ... f"The magpie representation contains these features: \n {magpie.featurelabels}" ... ) # Prints out the feature labels of the chosen representation
The magpie representation has embeddings of dimension 22 The magpie representation contains these elements: ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk'] The magpie representation contains these features: ['Number', 'MendeleevNumber', 'AtomicWeight', 'MeltingT', 'Column', 'Row', 'CovalentRadius', 'Electronegativity', 'NsValence', 'NpValence', 'NdValence', 'NfValence', 'NValence', 'NsUnfilled', 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled', 'GSvolume_pa', 'GSbandgap', 'GSmagmom', 'SpaceGroupNumber'] ```
Plotting
We can quickly generate heatmaps of distance/similarity measures between the element vectors using heatmap_plotter and plot the representations in two dimensions using the dimension_plotter from the plotter module. Before we do that, we will standardise the embedding using the standardise method available to the Embedding class
```python from elementembeddings.plotter import heatmapplotter, dimensionplotter import matplotlib.pyplot as plt
magpie.standardise(inplace=True) # Standardises the representation
fig, ax = plt.subplots(1, 1, figsize=(6, 6)) heatmapparams = {"vmin": -1, "vmax": 1} heatmapplotter( embedding=magpie, metric="cosinesimilarity", showaxislabels=False, cmap="Bluesr", ax=ax, **heatmapparams ) ax.settitle("Magpie cosine similarities") fig.tightlayout() fig.show() ```

```python fig, ax = plt.subplots(1, 1, figsize=(6, 6))
reducerparams = {"nneighbors": 30, "randomstate": 42} scatterparams = {"s": 100}
dimensionplotter( embedding=magpie, reducer="umap", ncomponents=2, ax=ax, adjusttext=True, reducerparams=reducerparams, scatterparams=scatterparams, ) ax.settitle("Magpie UMAP (nneighbours=30)") ax.legend().remove() handles, labels = ax1.getlegendhandleslabels() fig.legend(handles, labels, bboxto_anchor=(1.25, 0.5), loc="center right", ncol=1)
fig.tight_layout() fig.show() ```

Compositions
The package can also be used to featurise compositions. Your data could be a list of formula strings or a pandas dataframe of the following format:
| formula | | ------- | | CsPbI3 | | Fe2O3 | | NaCl | | ZnS |
The composition_featuriser function can be used to featurise the data. The compositions can be featurised using different representation schemes and different types of pooling through the embedding and stats arguments respectively.
```python from elementembeddings.composition import composition_featuriser
dffeaturised = compositionfeaturiser(df, embedding="magpie", stats=["mean", "sum"])
df_featurised ```
| formula | meanNumber | meanMendeleevNumber | meanAtomicWeight | meanMeltingT | meanColumn | meanRow | meanCovalentRadius | meanElectronegativity | meanNsValence | meanNpValence | meanNdValence | meanNfValence | meanNValence | meanNsUnfilled | meanNpUnfilled | meanNdUnfilled | meanNfUnfilled | meanNUnfilled | meanGSvolumepa | meanGSbandgap | meanGSmagmom | meanSpaceGroupNumber | sumNumber | sumMendeleevNumber | sumAtomicWeight | sumMeltingT | sumColumn | sumRow | sumCovalentRadius | sumElectronegativity | sumNsValence | sumNpValence | sumNdValence | sumNfValence | sumNValence | sumNsUnfilled | sumNpUnfilled | sumNdUnfilled | sumNfUnfilled | sumNUnfilled | sumGSvolumepa | sumGSbandgap | sumGSmagmom | sumSpaceGroupNumber | | ------- | ----------- | -------------------- | ------------------ | ----------------- | ----------- | -------- | ------------------- | ---------------------- | -------------- | -------------- | ------------------ | ------------------ | ------------- | --------------- | --------------- | --------------- | --------------- | -------------- | ---------------- | -------------- | ------------------ | --------------------- | ---------- | ------------------- | ----------------- | ------------ | ---------- | ------- | ------------------ | --------------------- | ------------- | ------------- | ------------- | ------------- | ------------ | -------------- | -------------- | -------------- | -------------- | ------------- | ------------------ | ------------- | ------------ | -------------------- | | CsPbI3 | 59.2 | 74.8 | 144.16377238 | 412.55 | 13.2 | 5.4 | 161.39999999999998 | 2.22 | 1.8 | 3.4 | 8.0 | 2.8000000000000003 | 16.0 | 0.2 | 1.4 | 0.0 | 0.0 | 1.6 | 54.584 | 0.6372 | 0.0 | 129.20000000000002 | 296.0 | 374.0 | 720.8188619 | 2062.75 | 66.0 | 27.0 | 807.0 | 11.100000000000001 | 9.0 | 17.0 | 40.0 | 14.0 | 80.0 | 1.0 | 7.0 | 0.0 | 0.0 | 8.0 | 272.92 | 3.186 | 0.0 | 646.0 | | Fe2O3 | 15.2 | 74.19999999999999 | 31.937640000000002 | 757.2800000000001 | 12.8 | 2.8 | 92.4 | 2.7960000000000003 | 2.0 | 2.4 | 2.4000000000000004 | 0.0 | 6.8 | 0.0 | 1.2 | 1.6 | 0.0 | 2.8 | 9.755 | 0.0 | 0.8442651200000001 | 98.80000000000001 | 76.0 | 371.0 | 159.6882 | 3786.4 | 64.0 | 14.0 | 462.0 | 13.98 | 10.0 | 12.0 | 12.0 | 0.0 | 34.0 | 0.0 | 6.0 | 8.0 | 0.0 | 14.0 | 48.775000000000006 | 0.0 | 4.2213256 | 494.0 | | NaCl | 14.0 | 48.0 | 29.221384640000004 | 271.235 | 9.0 | 3.0 | 134.0 | 2.045 | 1.5 | 2.5 | 0.0 | 0.0 | 4.0 | 0.5 | 0.5 | 0.0 | 0.0 | 1.0 | 26.87041666665 | 1.2465 | 0.0 | 146.5 | 28.0 | 96.0 | 58.44276928000001 | 542.47 | 18.0 | 6.0 | 268.0 | 4.09 | 3.0 | 5.0 | 0.0 | 0.0 | 8.0 | 1.0 | 1.0 | 0.0 | 0.0 | 2.0 | 53.7408333333 | 2.493 | 0.0 | 293.0 | | ZnS | 23.0 | 78.5 | 48.7225 | 540.52 | 14.0 | 3.5 | 113.5 | 2.115 | 2.0 | 2.0 | 5.0 | 0.0 | 9.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | 19.8734375 | 1.101 | 0.0 | 132.0 | 46.0 | 157.0 | 97.445 | 1081.04 | 28.0 | 7.0 | 227.0 | 4.23 | 4.0 | 4.0 | 10.0 | 0.0 | 18.0 | 0.0 | 2.0 | 0.0 | 0.0 | 2.0 | 39.746875 | 2.202 | 0.0 | 264.0 |
The returned dataframe contains the mean-pooled and sum-pooled features of the magpie representation for the four formulas.
Development notes
Bugs, features and questions
Please use the issue tracker to report bugs and any feature requests. Hopefully, most questions should be solvable through the docs. For any other queries related to the project, please contact Anthony Onwuli by e-mail: anthony.onwuli16@imperial.ac.uk.
Code contributions
We welcome new contributions to this project. See the contributing guide for detailed instructions on how to contribute to our project.
Add an embedding scheme
The steps required to add a new representation scheme are:
- Add data file to data/element_representations.
- Edit docstring table in core.py.
- Edit utils/config.py to include the representation in
DEFAULT_ELEMENT_EMBEDDINGSandCITATIONS. - Update the documentation reference.md and README.md.
Developer
- Anthony Onwuli (Department of Materials, Imperial College London)
References
A. Onwuli et al, "Ionic species representations for materials informatics"
H. Park et al, "Mapping inorganic crystal chemical space" Faraday Discuss. (2024)
Owner
- Name: Materials Design Group
- Login: WMD-group
- Kind: organization
- Location: London
- Website: https://wmd-group.github.io
- Repositories: 57
- Profile: https://github.com/WMD-group
Research group in computational chemistry & physics led by @aronwalsh at @ImperialCollegeLondon
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: ElementEmbeddings
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Anthony
family-names: Onwuli
email: anthony.onwuli16@imperial.ac.uk
affiliation: Imperial College London
orcid: "https://orcid.org/0000-0003-2107-153X"
- given-names: Aron
family-names: Walsh
email: a.walsh@imperial.ac.uk
affiliation: Imperial College London
orcid: "https://orcid.org/0000-0001-5460-7033"
identifiers:
- type: doi
value: 10.5281/zenodo.8101633
repository-code: "https://github.com/WMD-group/ElementEmbeddings"
url: "https://wmd-group.github.io/ElementEmbeddings/"
abstract: >-
The Element Embeddings package provides high-level tools
for analysing elemental embedding data. This primarily
involves visualising the correlation between embedding
schemes using different statistical measures.
keywords:
- materials science
- computational chemistry
- materials informatics
- statistics
- element representation
license: MIT
commit: d3f30602abf825ba3dcd5f247694a174a358ef49
version: "0.2.0"
date-released: "2023-07-07"
GitHub Events
Total
- Issues event: 3
- Watch event: 10
- Delete event: 4
- Issue comment event: 12
- Push event: 35
- Pull request review event: 3
- Pull request review comment event: 2
- Pull request event: 13
- Fork event: 3
- Create event: 8
Last Year
- Issues event: 3
- Watch event: 10
- Delete event: 4
- Issue comment event: 12
- Push event: 35
- Pull request review event: 3
- Pull request review comment event: 2
- Pull request event: 13
- Fork event: 3
- Create event: 8
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Anthony | a****6@i****k | 264 |
| dependabot[bot] | 4****] | 32 |
| Anthony Onwuli | 3****i | 13 |
| Aron Walsh | a****h@g****m | 7 |
| AntObi | a****6@i****k | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 20
- Total pull requests: 149
- Average time to close issues: 4 months
- Average time to close pull requests: 15 days
- Total issue authors: 2
- Total pull request authors: 5
- Average comments per issue: 0.4
- Average comments per pull request: 1.0
- Merged pull requests: 126
- Bot issues: 0
- Bot pull requests: 94
Past Year
- Issues: 1
- Pull requests: 4
- Average time to close issues: 28 days
- Average time to close pull requests: 4 days
- Issue authors: 1
- Pull request authors: 3
- Average comments per issue: 1.0
- Average comments per pull request: 0.75
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 3
Top Authors
Issue Authors
- AntObi (18)
- AyhamSaffar (1)
Pull Request Authors
- dependabot[bot] (112)
- AntObi (58)
- github-actions[bot] (23)
- aronwalsh (2)
- pre-commit-ci[bot] (1)