et-al

Entropy-targeted active learning for bias mitigation in materials data.

https://github.com/henrium/et-al

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

active-learning data-curation materials-informatics
Last synced: 6 months ago · JSON representation ·

Repository

Entropy-targeted active learning for bias mitigation in materials data.

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Topics
active-learning data-curation materials-informatics
Created over 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Entropy-Targeted Active Learning

DOI

This repository contains an implementation of entropy-targeted active learning (ET-AL) for materials data bias mitigation, associated with our paper.

ET-AL algorithm

Copyright

This code is open-sourced under the MIT license. Feel free to use all or portions for your research or related projects so long as you provide the following citation information:

Hengrui Zhang, Wei (Wayne) Chen, James M. Rondinelli, and Wei Chen, ET-AL: Entropy-targeted active learning for bias mitigation in materials data, Applied Physics Reviews 10, 021403 (2023).

Descriptions

etal_main.py implements the ET-AL algorithm and demonstrates on the Jarvis-CFID dataset.

ML_comparison.ipynb compares several ML models on different training sets.

plot_data.ipynb is used for creating relevant plots for visualization.

datasets/ provides data required for reproducing the results in our paper.

results/ contains data generated in ET-AL demonstration on the Jarvis-CFID dataset

utils/ contains tools for data pre-processing:

  • Jarvis_data.ipynb is used for retrieving, cleaning the Jarvis CFID data and generating graph embeddings.
  • Jarvis_featurize.ipynb generates physical descriptors for the Jarvis CFID data.
  • compound_featurizer.py automatic tool for physical descriptors
  • cgcnn/ the CGCNN model for graph embeddings

Usage

Set up environment

Navigate to the code directory and create the environment:

bash conda env create -f environment.yml

Then activate the new environment:

bash conda activate gp-torch

Data preparation

Organize the dataset in a Data Frame and change the data paths in etal_main.py. For demonstration purposes, a dataset derived from the Jarvis CFID data is provided in datasets/: the crystal structures and properties are in data_cleaned.pkl, and the graph embeddings are in cgcnn_embeddings.pkl.

*Note: Git LFS is required for data_cleaned.pkl to be downloaded properly. Please download the file here if Git LFS doesn't work.

Run code

  1. Set up experimental parameters in etal_main.py: n_iter for maximum number of ET-AL iterations, n_test for number of data points left as test set, n_unlabeled for number of data points left as unlabeled. Edit the following part to change the selection of unlabeled data.

  2. Run ET-AL model:

bash python etal_main.py

  1. Run ML_comparison to compare ML models on training set generated by ET-AL sampling and random sampling.

  2. Use plot_data to visualize the results and reproduce plots in the paper.

Owner

  • Name: Henry Zhang
  • Login: Henrium
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite the following work."
authors:
  - family-names: "Zhang"
    given-names: "Hengrui"
    orcid: "https://orcid.org/0000-0002-3183-1654"
title: "ET-AL: Entropy-targeted active learning"
date-released: 2023-02-20
url: "https://github.com/Henrium/ET-AL"
version: 1.0.0
preferred-citation:
  authors:
    - family-names: "Zhang"
      given-names: "Hengrui"
    - family-names: "Chen"
      given-names: "Wei Wayne"
    - family-names: "Rondinelli"
      given-names: "James M."
    - family-names: "Chen"
      given-names: "Wei"
  title: "ET-AL: Entropy-targeted active learning for bias mitigation in materials data"
  doi: 10.1063/5.0138913
  year: 2023
  journal: "Applied Physics Reviews"
  volume: 10
  issue: 2
  start: "021403"
  type: article

GitHub Events

Total
  • Push event: 5
Last Year
  • Push event: 5