rholearn

Learning and predicting electronic densities decomposed on a basis and global electronic densities of states at DFT accuracy

https://github.com/lab-cosmo/rholearn

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: aps.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization lab-cosmo has institutional domain (cosmo.epfl.ch)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Learning and predicting electronic densities decomposed on a basis and global electronic densities of states at DFT accuracy

Basic Info
  • Host: GitHub
  • Owner: lab-cosmo
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 36 MB
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 3
  • Open Issues: 0
  • Releases: 1
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

rholearn

DOI

metatensor-torch workflows for training descriptor-based equivariant neural networks to predict at DFT-level accuracy:

1) real-space electronic density scalar fields decomposed on a basis (molecular & periodic systems) 2) electronic density of states (DOS) (periodic systems)

Authors: * Joseph W. Abbott, PhD Student @ Lab COSMO, EPFL * Wei Bin How, PhD Student @ Lab COSMO, EPFL

Note: under active development, breaking changes are likely!

rholearn workflow summary

Background

Real-space electronic densities

Electronic densities, such as the electron density and local density of states, are central quantities in understanding the electronic properties of molecules and materials on the atomic scale. First principles quantum simulations such as density-functional theory (DFT) are able to accurately predict such fields as a linear combination of single-particle solutions to the Kohn-Sham equations. While reliable and accurate, such methods scale unfavourably with the number of electrons in the system.

Machine learning methods offer a complementary solution to probing the electronic structure of matter on the atomic scale. With a sufficiently expressive model, one can learn the mapping between nuclear geometry and real-space electronic density and predict such quantities with more favourable scaling. Typically, predictions can be used to accelerate DFT by providing initial guesses, or directly probe electronic structure.

There are many approaches to learn the aforementioned mapping. In the density fitting approach, the real-space target electronic density $\rho^{\text{DFT}}(\mathbf{r})$ is decomposed onto a linear atom-centered basis set:

$$ \rho^{\text{DFT}}(\mathbf{r}) \approx \rho^{\text{RI}}(\mathbf{r}) = \sum{b} db^{\text{RI}} \ \varphi_b^{\text{RI}}(\mathbf{r}) $$

where $\rho^{\text{RI}}(\mathbf{r})$ the basis set approximation to it, $\varphib^{\text{RI}}(\mathbf{r})$ are fitted basis functions (each a product of radial function and spherical harmonics) generated by the resolution-of-the-identity (RI) approach, and $db^{\text{RI}}$ are the coefficients that minimize the basis set expansion error for the given basis set definition.

An equivariant model is then trained to predict coefficients $d_b^{\text{ML}}$ that reconstruct a density in real-space, ideally minimising the generalisation error on the real-space DFT densities of a test set.

For one of the original workflows for predicting the electron density under the density-fitting framework, readers are referred to SALTED. This uses a symmetry-adapted Gaussian process regression (SA-GPR) method via sparse kernel ridge regression to learn and predict $d_b^{\text{ML}}$.

Electronic density of states (DOS)

The electronic density of states (DOS) provides information regarding the distribution of available electronic states in a material. With the DOS of a material, one is able to infer many optical and electronic properties of a material, such as its electrical conductivity, bandgap and absorption spectra. This allows the DOS to be relevant for material design as a tool to screen potential material candidates. The DOS is typically computed using DFT, but as mentioned above, DFT is prohibitively expensive for large and complex systems.

Machine learning has also been applied to the DOS and a variety of representatinos have been developed for the DOS. Thus far, there have been three main approaches, 1) Projecting the DOS on a discretized energy grid, 2) Projecting the integrated DOS on a discretized energy grid and, 3) Decomposing the DOS using Principal Component Analysis (PCA). Unlike the electronic density, the DOS is invariant to rotations and thus an invariant model can be employed to predict the DOS under any of the three representations.

Goals

rholearn also operates under the density fitting approach. The nuclear coordinates $\to$ electonic density mapping is learned via a feature-based equivariant neural network whose outputs are the predicted coefficients. Currently, rholearn is integrated with the electronic structure code FHI-aims for both data generation and building of real-space fields from predicted coefficients. rholearn aims to improve the scalability of the density-fitting approach to learning electronic densities.

doslearn represents the DOS by projecting it on a discretized energy grid. Additionally, a locality ansatz is employed whereby the global DOS of a structure, is expressed as a sum of local contributions from each atomic environment.

Both are built on top of a modular software ecosystem, with the following packages forming the main components of the workflow:

  • metatensor (GitHub) is used as the self-describing block-sparse data storage format, wrapping multidimensional tensors with metadata. Subpackages metatensor-operations and metatensor-learn are used to provide convenient sparse operations and ML building blocks respectively that operate on the metatensor.TensorMap object.
  • featomic (GitHub) is used to transform the nuclear coordinates into local equivariant descriptors that encode physical symmetries and geometric information for input into the neural network.
  • PyTorch is used as the learning framework, allowing definition of arbitrarily complex neural networks that can be trained by minibatch gradient descent.

Leveraging the speed- and memory-efficient operations of torch, and using building on top of metatensor and featomic, descriptors, models, and learning methodologies can be flexibly prototyped and customized for a specific learning task.

Getting Started

Installing rholearn

With a working conda installation, first set up an environment: bash conda create -n rho python==3.12 conda activate rho Then clone and install rholearn: ```bash git clone https://github.com/lab-cosmo/rholearn.git cd rholearn

Specify CPU-only torch

pip install --extra-index-url https://download.pytorch.org/whl/cpu . ```

Running tox from the top directory will run linting and formatting. To run some tests (currently limited to testing rholearn.loss), run pytest tests/rholearn/loss.py.

Installing FHI-aims

For generating reference data, using the aims_interface of rholearn, a working installation of FHIaims >= 240926 is required. FHI-aims is not open source but is free for academic use. Follow the instructions on their website fhi-aims.org/get-the-code to get and build the code. The end result should be an executable, compiled for your specific system.

There are also useful tutorials on the basics of running FHI-aims here.

Basic usage

In a run directory, user-options are defined in YAML files named "dft-options.yaml", "hpc-options.yaml", and "ml-options.yaml". Any options specified in these files overwrite the defaults.

Default options can be found in the rholearn/options/ directory, and some templates for user options can be found in the examples/options/ directory.

rholearn

Data can be generated with the following:

```bash rholearnrunscf # run SCF with FHI-aims

rholearnprocessscf # process SCF outputs

rholearnsetupri_fit # setup RI fitting calculation

rholearnrunri_fit # run RI fitting with FHI-aims

rholearnprocessri_fit # process RI outputs ```

and model training and evaluation run with:

```bash rholearn_train # train model

rholearn_eval # evaluate model ```

doslearn

Data can be generated with the following:

```bash doslearnrunscf # run SCF with FHI-aims

doslearnprocessscf # process SCF outputs ```

and model training and evaluation run with:

```bash doslearn_train # train model

doslearn_eval # evaluate model ```

Tutorial

For a more in-depth walkthrough of the functionality, see the following tutorials:

  1. rholearn tutorial on data generation using FHI-aims and model training using rholearn to predict the electron density decomposed on a basis.
  2. doslearn tutorial on data generation using FHI-aims and model training using doslearn to predict the electron density of states.

Citing this work

```bib @software{abbott202413891847, author = {Abbott, Joseph W. and How, Wei Bin and Fraux, Guillaume and Ceriotti, Michele}, title = {lab-cosmo/rholearn: rholearn v0.1.0}, month = oct, year = 2024, publisher = {Zenodo}, version = {v0.1.0}, doi = {10.5281/zenodo.13891847}, url = {https://doi.org/10.5281/zenodo.13891847} }

@article{PhysRevMaterials.9.013802, author = {How, Wei Bin and Chong, Sanggyu and Grasselli, Federico and Huguenin-Dumittan, Kevin K. and Ceriotti, Michele}, title = {Adaptive energy reference for machine-learning models of the electronic density of states}, journal = {Phys. Rev. Mater.}, volume = {9}, issue = {1}, pages = {013802}, numpages = {10}, year = {2025}, month = {Jan}, publisher = {American Physical Society}, doi = {10.1103/PhysRevMaterials.9.013802}, url = {https://link.aps.org/doi/10.1103/PhysRevMaterials.9.013802}, }

```

Owner

  • Name: Laboratory of Computational Science and Modeling
  • Login: lab-cosmo
  • Kind: organization
  • Location: EPFL - STI - Institute of Materials

Public repositories for code developed at the L-COSMO

Citation (CITATION.cff)

abstract: <p>metatensor-torch workflows for training descriptor-based 
    equivariant neural networks to predict at DFT accuracy: 1) real-space 
    electronic density scalar fields decomposed on a basis and 2) global 
    electronic density of states.</p>
authors:
- affiliation: Lab COSMO, EPFL
  family-names: Abbott
  given-names: Joseph W.
  orcid: 0000-0002-0502-6790
- affiliation: Lab COSMO, EPFL
  family-names: How
  given-names: Wei Bin
  orcid: 0000-0002-1060-5885
- affiliation: Lab COSMO, EPFL
  family-names: Fraux
  given-names: Guillaume
- affiliation: Lab COSMO, EPFL
  family-names: Ceriotti
  given-names: Michele
  orcid: 0000-0003-2571-2832
cff-version: 1.2.0
date-released: '2024-10-04'
doi: 10.5281/zenodo.13891848
license:
- cc-by-4.0
repository-code: https://github.com/lab-cosmo/rholearn/tree/v0.1.0
title: 'lab-cosmo/rholearn: rholearn v0.1.0'
type: software
version: v0.1.0

GitHub Events

Total
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Member event: 1
  • Push event: 58
  • Pull request review event: 7
  • Pull request review comment event: 12
  • Pull request event: 12
  • Fork event: 1
  • Create event: 6
Last Year
  • Watch event: 1
  • Delete event: 6
  • Issue comment event: 3
  • Member event: 1
  • Push event: 58
  • Pull request review event: 7
  • Pull request review comment event: 12
  • Pull request event: 12
  • Fork event: 1
  • Create event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 days
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.29
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 days
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.29
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • jwa7 (22)
  • HowWeiBin (3)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

pyproject.toml pypi
  • ase *
  • chemfiles *
  • chemiscope *
  • cube_toolz @ git+https://github.com/funkymunkycool/Cube-Toolz
  • metatensor [torch]
  • numpy *
  • py3dmol *
  • pyyaml *
  • rascaline-torch @ git+https://github.com/luthaf/rascaline@0311925f9aba803a0744a48d448567a9b65316e1#subdirectory=python/rascaline-torch
  • tox *
  • vesin *