https://github.com/coudertlab/simple-adsorption-workflow
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: coudertlab
- Language: Jupyter Notebook
- Default Branch: main
- Size: 3.85 MB
Statistics
- Stars: 3
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
simple-adsorption-workflow
Automating RASPA simulations and analysis for adsorption studies using a minimalist workflow with JSON input/output, CoRE MOF database integration, isotherm generation, prospecting tools and plot examples.
Description
This package provides a demonstrative workflow for automating RASPA simulations for adsorption studies. It allows users to specify simulation parameters and default settings through a JSON file, and generates directories with input files and submission scripts for RASPA. The program retrieves CIF files from the CoRE MOF database and stores isotherms as CSV or JSON files. This workflow aims to streamline the simulations and simplify the analysis process, making it easier to study adsorption phenomena.
Part of the code presented here was heavily inspired by RASPA python wrapper, and been adapted to meet the workflow architecture.
Install
With Apptainer
For a ready-to-use solution of this workflow, one can use an apptainer image (similar to Docker), further information can be found on the DIAMOND website.
With Conda
For development purposes, use the following instructions :
- Create a conda environment with Python 3.9:
bash conda create -n simple-adsorption-workflow python=3.9 conda activate simple-adsorption-workflow Build and install RASPA from source (using version 2.0.48)
git clone https://github.com/iRASPA/RASPA2.git cd RASPA2 git checkout v2.0.48See the install section in the original documentation to install from source.Install Openbabel using conda
bash conda install -c conda-forge openbabelUse pip to install the rest of the libraries listed in
requirements.txtinto the conda environment:bash pip install -r requirements.txtDownload and compile Zeo++ at http://www.zeoplusplus.org/download.html. To inform the workflow of the executable
networkpath, modify the corresponding line inset_environment:export ZEO_DIR=/opt/zeo++-0.3Dependencies for PACMOF2
bash git clone https://github.com/tdpham2/pacmof2 conda install -c conda-forge pymatgen=2023.10.4 conda install -c conda-forge scikit-learn=1.3.2 cd pacmof2 wget -P pacmof2/models/ https://zenodo.org/records/12747095/files/PACMOF2_neutral.gz pip install build pip install .Define the environment variables:
bash source set_environment
Optionally, you can set up this recovering of the environment variables at each activation of the conda environment:
bash
mkdir -p $CONDA_PREFIX/etc/conda/activate.d/
printf "%s\n" "#/bin/sh" "source $PWD/set_environment" > $CONDA_PREFIX/etc/conda/activate.d/simple-adsorption-workflow_set_env.sh
chmod +x $CONDA_PREFIX/etc/conda/activate.d/simple-adsorption-workflow_set_env.sh
In the case you are running in a fish shell, set_environment_fish should be used in the commands above instead of set_environment.
User input (JSON file)
This is the standard format of the input.json file to be provided by the user:
{
"parameters":
{
"structure":["MIBQAR","VOGTIV"],
"molecule_name": ["N2", "CO2"],
"pressure": [10,1E6],
"npoints":5,
"temperature": [298.15],
"charge_method":["EQeq"]
}
"defaults":
{
"forcefield":"ExampleMOFsForceField",
"init_cycles":10,
"cycles":20,
"print_every":5,
"grid_use":"yes"
}
}
In the field parameters, the keywords refer to the specific workflow inputs :
structure: six-letter CSD codes to identify the material (e.g.,MIBQAR)molecule_name: guest molecules (e.g., N2).temperature: temperatures in Kpressure: a pressure range, minimum and maximum pressuresnpoints: number of pressure points linearly distributed in the pressure rangecharge_method: methods for charge assignment (actual : 'None' or 'EQeq')
In the field defaults, the keywords refer to RASPA default parameters, see more details about RASPA input file in a further section.
Note :
parametersanddefaultshave been divided for the following purpose : if there are lists of parameters in theparametersfield (e.g., multiple materials), the program will generate simulation folders for each unique combination ofparameters.
In this workflow, there are certain restrictions in order to keep the simulations simple and make some assumptions: - The material must be available in the CoreMOF database, accessible through the MOFXDB database (https://github.com/n8ta/mofdb-client). - The guest molecule should be a rare gas such as argon (Ar) or xenon (Xe), or it should have a spherical model, like nitrogen (N2), methane (CH4), or sulfur hexafluoride (SF6). - No electrostatic interactions are considered in the simulations.
These restrictions and assumptions aim to streamline the simulations and simplify the modeling process.
Workflow example
Run simulations
Bash
python $PACKAGE_DIR/saw.py run
To specify input and output locations :
bash
python $PACKAGE_DIR/saw.py run -i <path/to/myinput>.json -o <path/to/data/directory>
By default, when the flag -o is not provided, a new directory will be created with the following formatting name : ./<Date>_<Time>_<Runtype>/ .
The run type data for normal runs and <name_of_test> for tests cases.
Merge outputs from two independent runs
python $PACKAGE_DIR/saw.py merge -i run<index1>.json run<index2>.json -o ./
Two files will be created :
- run_merged.json : the workflow file with all single-point outputs
- isotherms.json : the isotherms file
Use a GUI to create the input file of the workflow
python $PACKAGE_DIR/saw.py input
Workflow diagram
The schematic diagram (Fig. 1) outlines the primary functions executed within the workflow:
Parsing Input File: The input file undergoes parsing and interpretation to generate simulation input files. Error messages are generated if the structure names are not found in the database or if gas names do not match the default ones in RASPA.
Calculations: Two types of calculations are performed:
- Grand Canonical Monte Carlo simulations using RASPA.
- Pore characterization of the structures using Zeo++.
Storage of Output Data: The resulting gas uptakes, along with input data and metadata from RASPA, are stored in a common
run<runID>.jsonfile.Data Transformation: Output data can be converted into more user-friendly array-like tables, facilitating the plotting of isotherms.
Data Merging: Various workflow outputs can be merged, enabling the retrieval of isotherms based on group selections using both data and metadata.
Figure 1: Diagram of the workflow.
Tests
Recover isotherms from CSV
It runs 20 simulations on RASPA and compute geometric features using ZEO++, then stores the results in CSV files. It then reconstructs the isotherms curves from the simulation results and compares line by line all isotherms files from pre-computed data found in the package repository. The geometrical features are also stored in a CSV format, and
To run it, use -t or --test-isotherms-csv flags:
bash
python $PACKAGE_DIR/saw.py run -t
The input file used here is located in $PACKAGE_DIR/tests/test_isotherms_csv/.
Recover isotherms from JSON
It runs 20 simulations on RASPA then stores the results in a single JSON file. It then reconstructs the isotherms and store the results in JSON format.
To run it, use the -t2 or --test-isotherm-json flag :
bash
python $PACKAGE_DIR/saw.py run -t2
The input file used here is located in $PACKAGE_DIR/tests/test_isotherms_json/.
Merge and plot isotherms from two workflow runs
To run it, use the -t3 or --test-merge-json flag :
bash
python $PACKAGE_DIR/saw.py merge -t3
The json files containing the data to be merged (single pressure data points) are located in $PACKAGE_DIR/tests/test_merge_json/gcmc/.
Calculate the partial charges using the EQeq method
bash
python $PACKAGE_DIR/saw.py run --test-charges
The output of the EQeq code should appear with the atom types and the atomic experimental properties used for the calibration of the method.
Calculate the partial charges using the PACMOF method
bash
python $PACKAGE_DIR/saw.py run --test-charges-pacmof2
Run simulations on user CIF
bash
python $PACKAGE_DIR/saw.py run --test-cif-local-directory
This test will copy the corresponding input files to run simulations from the CIF files found in the current directory (or subdirectory)
Documentation
JSON input
structure: a list of MOF crystal identifiers. The input parser employs MOFXDB API, so for now the only structures that can be used are structures found in CoREMOF (2014 or 2019) and h-MOF. By default, the input parser reads only a 6-letter refcode (CSD format). It then filters out only the structures found incoremof-2019database. The CIF corresponding files will be downloaded in the./cifdirectory at the root of the output directory and will contains files suffixes that indicates the level of curation (e.g. in VOGTIVcleanhcoremof-2019.cif, `cleanrefers to the deletion of adsorbates species from the original X-crystallography characterization, and_h` refers to adjustment of the hydrogen atoms)
Note : one six-letter identifier may result in two structures in the original CORE-MOF database, then all corresponding structures will be added to the list of cifs for further calculations, e.g. : structure=["AHUTIH"] will generate calculations for both files AHUTIH01cleancoremof-2019.cif and AHUTIHcleancoremof-2019.cif.
Users can also supply CIF files in a local archive ./cif. This option is enabled by setting database=local in the input JSON file. The ./cif directory must be located in the current directory.
Note : The user is responsible for cleaning the structures he provides. To use the automatic EQeq charge assignment calculation, the CIF format must check the program specifications.
molecule_name: a list of gas molecule names. By default, the force field for adsorbate molecules is ExampleDefinitions. Therefore a file of the same name must exist in$RASPA_DIR/share/raspa/molecules/ExampleDefinitions.forcefield: the key name of the predefined force field in RASPA. The force field are stored in$RASPA_DIR/share/raspa/forcefield/and here the list of possible key names :- CoreShellSchroderSauer
- Dubbeldam2007FlexibleIRMOF-1
- Dubbeldam2007FlexibleIRMOF-10
- Dubbeldam2007FlexibleIRMOF-16
- ExampleMOFsForceField
- ExampleMoleculeForceField
- ExampleZeolitesForceField
- Nicholas
others parameters in section
defaults: all other parameters that can be modified through a template. The workflow uses by default the following template for RASPA inputs, all keys in brackets can be modified from JSON input file: ``` SimulationType {simulationtype} NumberOfCycles {cycles} NumberOfInitializationCycles {initcycles} PrintEvery {print_every} RestartFile noForcefield {forcefield} CutOff 12 ChargeMethod Ewald EwaldPrecision 1e-6 UseChargesFromCIFFile {chargesfromcif}
Framework 0 FrameworkName {structure} InputFileType {inputfiletype} UnitCells {a} {b} {c} HeliumVoidFraction {heliumvoidfraction} ExternalTemperature {temperature} ExternalPressure {pressure}
Movies no WriteMoviesEvery 100
NumberOfGrids {gridnatoms} GridTypes {gridatoms} SpacingVDWGrid {gridspacing} SpacingCoulombGrid {gridspacing} UseTabularGrid {griduse}
Component 0 MoleculeName {molecule_name} StartingBead 0 MoleculeDefinition ExampleDefinitions IdealGasRosenbluthWeight 1.0 TranslationProbability 1.0 RotationProbability 1.0 ReinsertionProbability 1.0 SwapProbability 1.0 CreateNumberOfMolecules 0 ``` Definitions :
{molecule_name}: The molecule to test for adsorption. A file of the same name must exist in$RASPA_DIR/share/raspa/molecules/ExampleDefinitions.{temperature}: The temperature of the simulation, in Kelvin.{pressure}: The pressure of the simulation, in Pascals.{helium_void_fraction}: The helium void fraction of the input structure. Required for excess adsorption back-calculation.{unit_cells}: The number of unit cells to use, by dimension.{simulation_type}: The type of simulation to run, defaults to "MonteCarlo".{cycles}: The number of simulation cycles to run.{init_cycles}: The number of initialization cycles to run. Defaults to the minimum of cycles / 2 and 10,000.{forcefield}: The forcefield to use. Name must match a folder in$RASPA_DIR/share/raspa/forcefield, which contains the properly named.deffiles.{input_file_type}: The type of input structure. Assumes cif.{grid_}: All parameters which refer to pre-calculation of energies on grids.
Note : In the future, we might let the user provide its own custom templates files to be able to take into account other parameters.
JSON outputs
In the first version of the workflow, the outputs were stored in CSV files. Since this format is not appropriate to store metadata, we added routines to export all simulation data in a more suitable format, a JSON format. We also add a routine that transform single-pressure-point adsorption to isotherms.
Workflow database JSON file
location : ./gcmc/run<runID>.json
{
"input": {
"parameters": {
"structure": ["MIBQAR", "VOGTIV"],
"molecule_name": ["N2", "CO2"],
"pressure": [10, 1000000.0],
"npoints": 5,
"temperature": [298.15]
},
"defaults": {
"unit_cells": [1, 1, 1],
"forcefield": "ExampleMOFsForceField",
"init_cycles": 10,
"cycles": 20,
"print_every": 5
}
},
"metadata": {
"timestamp": "2024-01-15 17:30:42",
"python_version": "3.9.16",
"python_compiler": "GCC 11.3.0",
"os_platform": "Linux-x86_64",
"os_version": "#40~22.04.1-Ubuntu",
"os_system": "Linux",
"os_release": "6.2.0-39-generic",
"machine": "x86_64",
"processor": "x86_64",
"workflow_package_git_hash": "b505d5e63fee21be57836e7041c429b48e7c26dc",
"cif_source": {
"database": "mofxdb",
"version": "dc8a0295db"
}
},
"results": [
{
"cycles": 20.0,
"forcefield": "ExampleMOFsForceField",
"init_cycles": 10.0,
"molecule_name": "N2",
"npoints": 5.0,
"pressure": 10.0,
"print_every": 5.0,
"simkey": "simca3ee227",
"structure": "MIBQAR16_clean_coremof-2019",
"temperature": 298.15,
"unit_cells": "[2, 2, 2]",
"Pressure(Pa)": 10.0,
"uptake(cm^3 (STP)/cm^3 framework)": 0.0
},
{
...
}
]
}
isotherm JSON file
location : ./gcmc/isotherms.json
This is just a transformation in the way data is printed in JSON keys, all data are grouped by a set of identical parameters defined as follows : Pressure(Pa), uptake(cm^3 (STP)/cm^3 framework),simkey,pressure,npoints .
``` { "isotherms": [ { "simkey": [ "sim36c5696b", "sim50f808f6", "simc4bc79e4", "sim8d5c22ad", "sim887bbdbe" ], "isokey": "iso191854b1", "moleculename": "CO2", "structure": "MIBQAR16cleancoremof-2019", "unitcells": "[2, 2, 2]", "temperature": 298.15, "cycles": 20.0, "forcefield": "ExampleMOFsForceField", "initcycles": 10.0, "npoints": 5.0, "printevery": 5.0, "pressure": [10.0, 250008.0, 500005.0, 750002.0, 1000000.0], "Pressure(Pa)": [10.0, 250008.0, 500005.0, 750002.0, 1000000.0], "uptake(cm^3 (STP)/cm^3 framework)": [0.0, 16.2411601942, 27.2677672825, 33.5143673572, 40.2498317856] }, { "simkey": [ ... ], ... }, { ... } ] }
```
Notes : For now, all metadata are not including in the isotherms JSON database. The JSON isotherms is nevertheless useful to quickly visualize the data results. To store long-term data, we recommend to use the database JSON obtained merging from a previous file.
Further development
In the future, we would like to update these routines to use a existent JSON format, for instance the one used by MOFXDB for interoperable operations between databases.
What other data/metadata should be added ? - units - cif - identifiers (e.g. : common names for MOF; InChiKey for adsorbates) - all metadata from CSD (dois, authors, solvent, ...) - composition to deal with co-adsorption - more simulation-related parameters from RASPA (e.g. SimulationType, UseChargesFromCIFFile, RASPA warnings, ...)
Charge Assignment
The partial charges can be calculated automatically given the atomic positions. To use this option ,add the following key with the corresponding keyword in the JSON input :
...
"parameters":
{
...
"charge_method":<keyword>
}
...
The possible keywords are :
None
All atoms have a null partial charge.
EQeq
It calculates the partial charges using the EQeq method from the python wrapper pyeqeq.
First each CIF file is passed through Openbabel to correct format not compatible with EQeq (e.g. CIFs with columns in wrong order). A file with _openbabel prefix is written in the same directory.
It will then duplicate the CIF files present in ./cif directory with a suffix name related to the method; e.g.: MIBQAR16_clean_coremof-2019_openbabel.cif_EQeq_ewald_1.20_-2.00.cif contains an extra column for the partial charges calculated with Ewald Coulombic interaction, a dielectric parameter of 1.2 and the hydrogen electron affinity is -2.
Other default parameters can be found in the original code page. For instance, one important implementation of EQeq (with respect to the Rappé and Goddard method) is the use of non-zero centered charge (parameter chargecenters), the charges are hence equilibrated around its number of oxidation and the oxidation number of the atom are fixed by default. If you use the same default parameters for the whole screening, a wrong oxidation number could be assign, and the calculation of the partial charges will be affected. To solve this, one can recalculate oxidation numbers from the structure file using ref1 or ref2 (MOSAEC code by Woo et al., not available yet).
PACMOF
This charge assigment method here (PACMOF v.2), use a descritor-based Machine Learning model to predict partial charges from the structure of the materials. The training set is based on DFT calculations in the GCA approximations with PBE fonctionals and the reference structures comes from QMOF database.
Grid Calculation
In RASPA, one can speed up GCMC calculations by computing energy grids. It stores energies (Van der Waals and electrostatic) of all host atoms for a given framework in $RASPA_DIR/share/raspa/grids/. The position of a randomly inserted host molecule during GCMC is marked in the grid, then the energy of the host molecule is interpolated from the energy values on the neighboring grid nodes.
It can speed up calculations if a consequent number of simulations use the same framework-host (different pressures, temperatures).
To use this option, one can to pass these parameters in the default fields :
...
"default":
{
...
"grid_use":<yes/no>
"grid_spacing":<value_in_angstrom>
}
...
If grid_use is set to 'yes', all GCMC simulations running in the workflow will use grids, these latter are calculated in a previous step during the workflow.
TODO : There is a bug to fix with this option ! for now,when this option is on, it recalculates for each GCMC simulations the energy grid, even if the grid has already been computed previously (same pair of adsorbate atom and adsorbent material).
What can not be done (yet) with simple-adsorption-workflow ?
- If the user wants to run calculation on its own structures, several verification must be performed to be used in a GCMC simulation which is out of the scope of the present tool (curate CIF, check presence of force field parameters for the new atoms name defined, ...)
- Use a user-defined force field
Owner
- Name: Coudert lab
- Login: coudertlab
- Kind: organization
- Website: http://coudert.name
- Repositories: 3
- Profile: https://github.com/coudertlab
GitHub Events
Total
- Release event: 2
- Watch event: 3
- Delete event: 2
- Push event: 21
- Create event: 7
Last Year
- Release event: 2
- Watch event: 3
- Delete event: 2
- Push event: 21
- Create event: 7
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| ahardiag | a****g@g****m | 135 |
| BissuelD | d****l@g****m | 4 |
| FX Coudert | f****t@g****m | 2 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- ahardiag (1)