et-al
Entropy-targeted active learning for bias mitigation in materials data.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Keywords
Repository
Entropy-targeted active learning for bias mitigation in materials data.
Basic Info
- Host: GitHub
- Owner: Henrium
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://doi.org/10.1063/5.0138913
- Size: 19.4 MB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
Entropy-Targeted Active Learning
This repository contains an implementation of entropy-targeted active learning (ET-AL) for materials data bias mitigation, associated with our paper.

Copyright
This code is open-sourced under the MIT license. Feel free to use all or portions for your research or related projects so long as you provide the following citation information:
Hengrui Zhang, Wei (Wayne) Chen, James M. Rondinelli, and Wei Chen, ET-AL: Entropy-targeted active learning for bias mitigation in materials data, Applied Physics Reviews 10, 021403 (2023).
Descriptions
etal_main.py implements the ET-AL algorithm and demonstrates on the Jarvis-CFID dataset.
ML_comparison.ipynb compares several ML models on different training sets.
plot_data.ipynb is used for creating relevant plots for visualization.
datasets/ provides data required for reproducing the results in our paper.
results/ contains data generated in ET-AL demonstration on the Jarvis-CFID dataset
utils/ contains tools for data pre-processing:
Jarvis_data.ipynbis used for retrieving, cleaning the Jarvis CFID data and generating graph embeddings.Jarvis_featurize.ipynbgenerates physical descriptors for the Jarvis CFID data.compound_featurizer.pyautomatic tool for physical descriptorscgcnn/the CGCNN model for graph embeddings
Usage
Set up environment
Navigate to the code directory and create the environment:
bash
conda env create -f environment.yml
Then activate the new environment:
bash
conda activate gp-torch
Data preparation
Organize the dataset in a Data Frame and change the data paths in etal_main.py. For demonstration purposes, a dataset derived from the Jarvis CFID data is provided in datasets/: the crystal structures and properties are in data_cleaned.pkl, and the graph embeddings are in cgcnn_embeddings.pkl.
*Note: Git LFS is required for data_cleaned.pkl to be downloaded properly. Please download the file here if Git LFS doesn't work.
Run code
Set up experimental parameters in
etal_main.py:n_iterfor maximum number of ET-AL iterations,n_testfor number of data points left as test set,n_unlabeledfor number of data points left as unlabeled. Edit the following part to change the selection of unlabeled data.Run ET-AL model:
bash
python etal_main.py
Run
ML_comparisonto compare ML models on training set generated by ET-AL sampling and random sampling.Use
plot_datato visualize the results and reproduce plots in the paper.
Owner
- Name: Henry Zhang
- Login: Henrium
- Kind: user
- Repositories: 1
- Profile: https://github.com/Henrium
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this repository, please cite the following work."
authors:
- family-names: "Zhang"
given-names: "Hengrui"
orcid: "https://orcid.org/0000-0002-3183-1654"
title: "ET-AL: Entropy-targeted active learning"
date-released: 2023-02-20
url: "https://github.com/Henrium/ET-AL"
version: 1.0.0
preferred-citation:
authors:
- family-names: "Zhang"
given-names: "Hengrui"
- family-names: "Chen"
given-names: "Wei Wayne"
- family-names: "Rondinelli"
given-names: "James M."
- family-names: "Chen"
given-names: "Wei"
title: "ET-AL: Entropy-targeted active learning for bias mitigation in materials data"
doi: 10.1063/5.0138913
year: 2023
journal: "Applied Physics Reviews"
volume: 10
issue: 2
start: "021403"
type: article
GitHub Events
Total
- Push event: 5
Last Year
- Push event: 5