surface-sampling
MCMC-based algorithm for sampling surface reconstructions
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 16 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization learningmatter-mit has institutional domain (gomezbombarelli.mit.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
MCMC-based algorithm for sampling surface reconstructions
Basic Info
- Host: GitHub
- Owner: learningmatter-mit
- License: mit
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://github.com/learningmatter-mit/surface-sampling
- Size: 124 MB
Statistics
- Stars: 27
- Watchers: 5
- Forks: 5
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Virtual Surface Site Relaxation-Monte Carlo (VSSR-MC)
Contents
Overview
This is the VSSR-MC algorithm for sampling surface reconstructions. VSSR-MC samples across both compositional and configurational spaces. It can interface with both a neural network potential (through ASE) or a classical potential (through ASE or LAMMPS). It is a key component of the Automatic Surface Reconstruction (AutoSurfRecon) pipeline described in the following work: Machine-learning-accelerated simulations to enable automatic surface reconstruction. VSSR-MC can be used to sample either surfaces under gas/vacuum conditions as demonstrated in the original work or under aqueous electrochemical conditions as described in this work: Accelerating and enhancing thermodynamic simulations of electrochemical interfaces.

System requirements
We recommend a computer with the following specs:
- RAM: 16+ GB
- CPU: 4+ cores, 3 GHz/core
To run with a neural network force field, a GPU is recommended. We ran on a single NVIDIA GeForce RTX 2080 Ti 11 GB GPU. The code has been tested on Linux Ubuntu 20.04.6 LTS but we expect it to work on other Linux distributions.
Setup
To start, run git clone git@github.com:learningmatter-mit/surface-sampling.git to your local directory or a workstation.
Conda environment
We recommend creating a new Conda environment. Following that, the Python dependencies for the code can be installed. In the surface-sampling directory, run the following commands:
bash
conda create -n vssr-mc python=3.11
conda activate vssr-mc
conda install -c conda-forge kimpy lammps openkim-models
pip install -e .
If you're intending to contribute to the code, you can
pip install -e '.[dev]'instead to also install the development dependencies.
To run with LAMMPS, add the following to ~/.bashrc or equivalent with appropriate paths and then source ~/.bashrc. conda would have installed LAMMPS as a dependency.
bash
export LAMMPS_COMMAND="/path/to/lammps/src/lmp"
export LAMMPS_POTENTIALS="/path/to/lammps/potentials/"
export ASE_LAMMPSRUN_COMMAND="$LAMMPS_COMMAND"
The LAMMPS_COMMAND should point to the LAMMPS executable, which can be found here: /path/to/[vssr-mc-env]/bin/lmp.
The LAMMPS_POTENTIALS directory should contain the LAMMPS potential files, which can found here: /path/to/[surface-sampling-repo]/mcmc/potentials/.
The ASE_LAMMPSRUN_COMMAND should point to the same LAMMPS executable. More information can be found here: ASE LAMMPS.
If the conda installed LAMMPS does not work, you might have to install LAMMPS from source. More information can be found here: LAMMPS.
You might have to re-open/re-login to your terminal shell for the new settings to take effect.
Demo
A toy demo and other examples can be found in the tutorials/ folder.
tutorials/
├── example.ipynb
├── GaN_0001.ipynb
├── Si_111_5x5.ipynb
├── SrTiO3_001.ipynb
├── latent_space_clustering.ipynb
└── prepare_surface.ipynb
More data/examples can be found in our Zenodo datasets: 1 and 2.
Toy example of Cu(100)
A toy example to illustrate the use of VSSR-MC. It should only take about a few seconds to run. Refer to tutorials/example.ipynb.
GaN(0001) surface sampling with Tersoff potential
This example could take a few minutes to run. Refer to tutorials/GaN_0001.ipynb.
Si(111) 5x5 surface sampling with modified Stillinger–Weber potential
This example could take a few minutes to run. Refer to tutorials/Si_111_5x5.ipynb.
SrTiO3(001) surface sampling with machine learning potential
Demonstrates the integration of VSSR-MC with a neural network force field. This example could take a few minutes to run. Refer to tutorials/SrTiO3_001.ipynb.
Clustering MC-sampled surfaces in the latent space
Retrieves the neural network embeddings of VSSR-MC structures and performs clustering. This example should only take a minute to run. Refer to tutorials/latent_space_clustering.ipynb.
Preparing surface from a bulk structure
This example demonstrates how to cut a surface from a bulk structure. Refer to tutorials/prepare_surface.ipynb.
Scripts
Scripts can be found in the scripts/ folder, including:
scripts/
├── sample_surface.py
├── sample_pourbaix_surface.py
├── clustering.py
└── create_surface_formation_entries.py
The arguments for the scripts can be found by running python /path/to/script.py -h.
Example usage:
Original VSSR-MC with PaiNN model trained on SrTiO3(001) surfaces
bash
python scripts/sample_surface.py --run_name "SrTiO3_001_painn" \
--starting_structure_path "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_pristine_slab.pkl" \
--model_type "PaiNN" --model_paths "tutorials/data/SrTiO3_001/nff/model01/best_model" \
"tutorials/data/SrTiO3_001/nff/model02/best_model" \
"tutorials/data/SrTiO3_001/nff/model03/best_model" \
--settings_path "scripts/configs/sample_config_painn.json"
Pre-trained CHGNet model on SrTiO3(001) surfaces
bash
python scripts/sample_surface.py --run_name "SrTiO3_001_chgnet" \
--starting_structure_path "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_pristine_slab.pkl" \
--model_type "CHGNetNFF" --settings_path "scripts/configs/sample_config_chgnet.json"
Pre-trained CHGNet model on LaMnO3(001) under pH-$U_\mathrm{SHE}$ conditions
bash
python scripts/sample_pourbaix_surface.py --run_name LaMnO3_001_chgnet \
--starting_structure_path "tutorials/data/LaMnO3_001/LaMnO3_001_2x2x3_top_pristine.pkl" --model_type CHGNetNFF \
--phase_diagram_path "tutorials/data/LaMnO3_001/pourbaix/LaMnO_pd_dict.json" \
--pourbaix_diagram_path "tutorials/data/LaMnO3_001/pourbaix/LaMnO_no_ternary_pbx_dict.json" \
--settings_path "scripts/configs/sample_pourbaix_config.json"
Latent space clustering
bash
python scripts/clustering.py --file_paths "tutorials/data/SrTiO3_001/SrTiO3_001_2x2_mcmc_structures_100.pkl" \
--save_folder "SrTiO3_001/clustering" --nff_model_type "PaiNN" \
--nff_paths "tutorials/data/SrTiO3_001/nff/model01/best_model" \
"tutorials/data/SrTiO3_001/nff/model02/best_model" \
"tutorials/data/SrTiO3_001/nff/model03/best_model" \
--clustering_metric "force_std" --cutoff_criterion "distance" \
--clustering_cutoff 0.2 --nff_device "cuda"
Create surface surface formation entries for Pourbaix analysis
bash
python scripts/create_surface_formation_entries.py --surface_name "LaMnO3_001_2x2" \
--file_paths "tutorials/data/LaMnO3_001/20241120-003720_AtomsBatch_surface_48.pkl" --model_type "CHGNetNFF" \
--model_paths "tutorials/data/LaMnO3_001/nff/finetuned/best_model" \
--phase_diagram_path "tutorials/data/LaMnO3_001/pourbaix/LaMnO_pd_dict.json" \
--pourbaix_diagram_path "tutorials/data/LaMnO3_001/pourbaix/LaMnO_no_ternary_pbx_dict.json" --correct_hydroxide_energy \
--input_job_id --elements "La" "Mn" "O" --device "cuda" --save_folder "tutorials/data/LaMnO3_001/pourbaix/"
Citations
Original VSSR-MC work:
bib @article{duMachinelearningacceleratedSimulationsEnable2023, title = {Machine-Learning-Accelerated Simulations to Enable Automatic Surface Reconstruction}, author = {Du, Xiaochen and Damewood, James K. and Lunger, Jaclyn R. and Millan, Reisel and Yildiz, Bilge and Li, Lin and {G{\'o}mez-Bombarelli}, Rafael}, year = {2023}, month = dec, journal = {Nature Computational Science}, pages = {1--11}, publisher = {Nature Publishing Group}, issn = {2662-8457}, doi = {10.1038/s43588-023-00571-7}, urldate = {2023-12-07}, keywords = {Computational methods,Computational science,Software,Surface chemistry} }VSSR-MC with aqueous electrochemical conditions:
bib @misc{duAcceleratingEnhancingThermodynamic2025, title = {Accelerating and Enhancing Thermodynamic Simulations of Electrochemical Interfaces}, author = {Du, Xiaochen and Liu, Mengren and Peng, Jiayu and Chun, Hoje and Hoffman, Alexander and Yildiz, Bilge and Li, Lin and Bazant, Martin Z. and {G{\'o}mez-Bombarelli}, Rafael}, year = {2025}, month = mar, number = {arXiv:2503.17870}, publisher = {arXiv}, doi = {10.48550/arXiv.2503.17870}, keywords = {Computer Science - Computational Engineering Finance and Science,Computer Science - Machine Learning,Condensed Matter - Materials Science,Condensed Matter - Statistical Mechanics}, }
Development & Bugs
VSSR-MC is under active development, if you encounter any bugs in installation and usage, please open an issue. We appreciate your contributions!
Owner
- Name: Learning Matter @ MIT
- Login: learningmatter-mit
- Kind: organization
- Email: rafagb@mit.edu
- Website: https://gomezbombarelli.mit.edu/
- Repositories: 33
- Profile: https://github.com/learningmatter-mit
Rafael Gomez-Bombarelli Group @ MIT
Citation (citation.cff)
cff-version: 1.2.0
message: If you use this software, please cite it as below.
title: Machine-Learning-Accelerated Simulations to Enable Automatic Surface Reconstruction
authors:
- family-names: Du
given-names: Xiaochen
- family-names: Damewood
given-names: James K.
- family-names: Lunger
given-names: Jaclyn R.
- family-names: Millan
given-names: Reisel
- family-names: Yildiz
given-names: Bilge
- family-names: Li
given-names: Lin
- family-names: {G{\'o}mez-Bombarelli}
given-names: Rafael
date-released: 2023-11-08
repository-code: https://github.com/learningmatter-mit/surface-sampling
arxiv: https://arxiv.org/abs/2305.07251
doi: 10.1038/s43588-023-00571-7
type: software
keywords:
[monte carlo, neural network, force field, active learning]
version: 0.1.0 # replace with the version you use
journal: Nature Computational Science
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 3
- Watch event: 13
- Issue comment event: 2
- Push event: 4
- Pull request event: 2
- Fork event: 2
Last Year
- Create event: 2
- Release event: 1
- Issues event: 3
- Watch event: 13
- Issue comment event: 2
- Push event: 4
- Pull request event: 2
- Fork event: 2