https://github.com/chimie-paristech-ctm/ml_dft_benchmarking
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 8 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: chimie-paristech-CTM
- License: mit
- Language: Python
- Default Branch: main
- Size: 6.24 MB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Improving the reliability of, and confidence in, DFT functional benchmarking through active learning
This repository contains the code and auxiliary data associated with the "Improving the reliability of, and confidence in, DFT functional benchmarking through active learning" project. Code is provided "as-is". Minor edits may be required to tailor the scripts for different computational systems.

Conda environment
To set up the conda environment:
conda env create --name <env-name> --file environment.yml
Requirements
In order to execute the autodE high-throughput reaction profile computation workflow, Gaussian09/Gaussian16 and xTB needs to be accessible.
Curating the dataset
The scripts used for transforming the xyz-coordinates in reaction SMILES is scripts/analyze_data.py. Execution as
follows:
python scripts/analyze_data.py --raw_data data/raw_data --iter initial --generate_initial_data
The xyz files should be in the directory data/XYZ_files. A data_smiles.csv file will be generated in the data
directory. The script only works with neutral molecules, for the case of charged molecules, an error will be displayed,
and you should add manually the reaction SMILES. The final version of the initial training set can be found in
data/data_smiles_curated.csv
Generating the chemical space
The script used for generating the chemical space is script/generate_space.py. Execution as follows:
python scripts/generate_space.py --template_cores data/hypothetical_space_core.csv
A hypothetical_chemical_space.csv file will be generated in the data directory.
Baseline ML models
The script used for running the baseline models is script/baseline_models.py. The baseline_model.py script,
which runs each of the baseline models sequentially, can be executed as follows:
python baseline_models.py --csv-file data/data_smiles_curated.csv
The fingerprints are generated during runtime. The DRFP and
Morgan fingerprint is used. A nested cross-validation is implemented. For a final
evaluation of the model you should use the option --final_cv
An example of the input file is included in the data directory: data_smiles_curated.csv.
Bayesian optimization campaign
Each iteration of the bayesian optimization campaign is launched with the help of the final_model.py script. Execution as follows:
final_model.py [-h] [--train_file TRAIN_FILE] [--pool_file POOL_FILE] [--new_data_file NEW_DATA_FILE] [--iteration ITERATION] [--seed SEED] [--beta BETA]
[--final_dir FINAL_DIR] [--cutoff CUTOFF] [--conda_env CONDA_ENV] [--new_data NEW_DATA] [--selective_sampling SELECTIVE_SAMPLING]
[--selective_sampling_data SELECTIVE_SAMPLING_DATA]
Reproducibility
To reproduce the acquired reaction each round as well the mean absolute error of each set and the plots, a bash script is provided. Execution as follows:
bash reproducibility.sh
An example of the autodE input can be found in data/autode_input_8.
References
If (parts of) this workflow are used as part of a publication please cite the associated paper:
@article{ml_functional,
author = {Alfonso-Ramos, Javier E. and Adamo, Carlo and Br{\'e}mond, {\'E}ric and Stuyver, Thijs},
title = {Improving the Reliability of, and Confidence in, {DFT} Functional Benchmarking through Active Learning},
journal = {J. Chem. Theory Comput.},
volume = {21},
number = {4},
pages = {1752-1761},
year = {2025},
doi = {10.1021/acs.jctc.4c01729},
}
Owner
- Name: chimie-paristech-CTM
- Login: chimie-paristech-CTM
- Kind: organization
- Repositories: 1
- Profile: https://github.com/chimie-paristech-CTM
GitHub Events
Total
- Watch event: 6
- Push event: 4
- Public event: 1
Last Year
- Watch event: 6
- Push event: 4
- Public event: 1