elasticnet

Predict the mechanical properties of multi-component transition metal carbides (MTMCs)

https://github.com/jzhang-github/elasticnet

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Predict the mechanical properties of multi-component transition metal carbides (MTMCs)

Basic Info
  • Host: GitHub
  • Owner: jzhang-github
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 9.72 MB
Statistics
  • Stars: 5
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

PyPI - Downloads DOI GitHub PyPI - Wheel GitHub tag (with filter)

Elastic net

Machine learning model for predicting multi-component transition metal carbides (MTMCs)

This is the manual to reproduce results and support conclusions of Lattice Distortion Informed Exceptional Multi-Component Transition Metal Carbides Discovered by Machine Learning.

We recommend using a Linux/Windows operating system to run the following examples, under the current directory.

ML-workflow

Table of Contents

Installation

Install under conda environment

  • Create a new environment
    console conda create -n ElasticNet python==3.10

  • Activate the environment
    console conda activate ElasticNet

  • Install package
    console pip install elasticnet

Alternatively, you can install with pip.

  • Install the package. Use --user option if you don't have the root permission.
    console pip install elasticnet --user

  • If your IP locates in mainland China, you may need to install it from the tsinghua mirror.
    console pip install elasticnet -i https://pypi.tuna.tsinghua.edu.cn/simple

Requirements file: requirements.txt

Key modules
numpy==1.25.0 scikit-learn==1.2.2 tensorflow==2.10.0 ase==3.22.1 pandas==1.5.3

Example of using the well-trained model

  • Download the well-trained parameters: checkpoint
  • Run the following python code:
    python from elasticnet import predict_formula pf = predict_formula(config='input_config.json',ckpt_file='checkpoint') pf.predict(*['VNbTa', 'TiNbTa'])
  • The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44. python array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 , 175.50716 ], [283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 , 178.52579 ]], dtype=float32)

Train a new model from scratch

Prepare DFT calculations

  • Bulk optimization.
  • Elastic constants calculation.

Collect DFT results

Prepare configurations files

  • input_config.json: defines how to generate input features and labels. You are recommended to download this file and modify then.
    | Variable | Type | Meaning |
    | -------------------- | ---- | ------------------------------------------------------------------------------------------------------------ |
    | includemore | bool | If True, the `bulkenergyperformulaandvolumeperformulaare also be included in the input features. | | split_test | bool | IfTrue, a new test set will be split from the dataset. For cross validation, it is OK to set this asFalse. | | clean_by_pearson_r | bool | Clean input features. Highly correlated features will be removed if this isTrue. | | reduce_dimension_by_pca | bool | Clean input features by [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn-decomposition-pca). Choose one amongcleanbypearsonrandreducedimensionbypca. | | prop_precursor_path | str | A file storing the properties of precursory binary carbides. File extension can be.csvand.json. See example: [file/HECC_precursors.csv](file/HECC_precursors.csv)| | model_save_path | str | Path for storing [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn-decomposition-pca) model and other information when generating input features and labels | | props | list | A list of properties that are encoded into the input features. Choose among the column names of [files/HECC_precursors.csv](files/HECC_precursors.csv). | | operators | list | A list of operators to expand the input dimension. Choose among: ['cube', 'exp_n', 'exp', 'plus', 'minus', 'multiply', 'sqrt', 'log10', 'log', 'square']. | | HECC_properties_path | str | A file contains the collected properties of MTMCs. | | labels | list | A list of label names that need to fit/learn. | | soap_features | bool | Whether to use [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) descriptor. | | soap_config | dict | A python dict that defines the configuration of [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) descriptor. - input_structure_type: 'POSCAR' or 'CONTCAR'. Use 'POSCAR' or 'CONTCAR' to generate [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) features. - You can find the explanations for other specifications here: [SOAP.init`](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html#dscribe.descriptors.soap.SOAP.init) |

  • train.json: defines how to train the machine-learning model.

| Variable | Type | Meaning |
| -------------------- | ---- | ------------------------------------------------------------------------------------------------------------ | | Nodesperlayer | list | Number of nodes of every hidden layers |
| Numberoffold | int | Number of cross-validation folds. Normally 5 or 10. |
| featurefile | str | A file contains input features. |
| label
file | str | A file contains labels of samples. |
| Activationfunction | str | Activation function of hidden layers. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' |
| Output
activation | str | Activation function of the output layer. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' |
| Numberofoutnode | int/'auto' | Number of nodes of the output layer. If there is only one column in the `labelfile, this variable should be1. 'auto' is for multiple columns. | | Optimizer | str | The name of the optimizer. Examples: [tf.keras.optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) | | Cost_function | str | Name of cost function inTensorflow. Examples: [tf.keras.losses](https://www.tensorflow.org/api_docs/python/tf/keras/losses) | | Metrics | list | A list of metrics to evaluate the model. Examples: [tf.keras.metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) | | Batch_size | int | The batch size. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) | | Epochs | int | Number of epochs for training. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) | | Verbose | int | Verbosity mode. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) | | Regularization | bool | Whether to used the L2 regularization. See [tf.keras.regularizers.L2`](https://www.tensorflow.org/apidocs/python/tf/keras/regularizers/L2). | | Modelsavepath | str | A folder to store the well-trained NN model. | | Logsavepath | str | A folder to store the training log. | | Predictionsave_path | str | A folder to store the predictions of input features after training. | | SEED | int | Random seed for shuffling input dataset. |

Run main function

console python -m elasticnet

The following python code will be executed. ```python def main(): # prepare dataset from elasticnet.prepareinput import xmain, ymain xmain('inputconfig.json', loadPCA=False, savePCA=True) ymain('input_config.json')

# train
from elasticnet.ann import CV_ML_RUN, load_and_pred
CV_ML_RUN('train.json')
load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)

main() ```

You may want to prepare the dataset and train the model in separate steps, see below .

Collect input features and labels

python from elasticnet.prepare_input import x_main, y_main x_main('input_config.json', load_PCA=False, save_PCA=True) y_main('input_config.json')

Three files will be generated:
- x_data_init.txt: input features without PCA.
- x_data_after_pca.txt: input features after PCA. - y_data.txt: labels

Train

  • Run the following python code.
    python from elasticnet import CV_ML_RUN, load_and_pred if __name__ == '__main__': CV_ML_RUN('train.json') load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)

  • You can also execute python -m elasticnet directly in the console. See Run main function.

Check training results

  • Generated files/folders
    • checkpoint: A folder for PCA model, NN model, and other information for generating input features.
    • cp.ckpt: Location of NN model.
    • log: Learning curves and weights of all CV models.
    • pred: Predictions of input features.
      • prediction_all.txt: all CV models.
      • prediction_mean.txt: average of CV models.
    • pca_model.joblib: PCA model.
    • scale_range.json: Range to rescale input features.
    • scalerange1.json: Range to rescale input features again.

Predict

  • After training, run the following python code:
    python from elasticnet import predict_formula pf = predict_formula(config='input_config.json',ckpt_file='checkpoint') pf.predict(*['VNbTa', 'TiNbTa'])
  • The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44. python array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 , 175.50716 ], [283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 , 178.52579 ]], dtype=float32)

High-throughput predict

  • Run the following python code:
    python from elasticnet import high_throughput_predict high_throughput_predict()
  • Output: ANN_predictions.xlsx

Ternary plot

  • Run the following python code:
    python from elasticnet import ternary_plot ternary_plot(elements = ['Ti', 'Nb', 'Ta'])
  • Alternatively, elements = ['VNbTa', 'Ti', 'Hf'].

  • Output: phasediagrams/**diagram.csv

  • Plot.

Other scripts

Get ROM

  • Run the following python code:
    python from elasticnet import get_rom ROM = get_rom(config='input_config.json', formulas='formulas.txt', props=['B', 'G', 'E', 'Hv', 'VEC']) print(ROM)

  • Output. If the formulas.txt contains ['VNbTa', 'TiNbTa'] only. python array([[310.33922223, 210.80075867, 515.61666613, 26.20022487, 9. ], [291.74733333, 199.9075404 , 488.11937417, 25.52194014, 8.66666667]])

Get VEC

  • VEC is simply the last column of Get ROM.

Abbreviations

| Abbr. | Full name |
| -------------------- | ---------- |
| MTMC | Multi-component transition metal carbides |
| HECC | High-entropy carbide ceramic |
| HEC | High-entropy ceramic | | ML | Machine learning | | SOAP | Smooth overlap of atomic positions |
| NN | Neural networks | | CV | cross validation | | ROM | Rule of mixtures | | VEC | Valence electron concentration |

Owner

  • Name: jzhang
  • Login: jzhang-github
  • Kind: user
  • Location: HongKong, China
  • Company: CityU

Ph.D. student of CityU

Citation (CITATION.cff)

cff-version: 1.0.2
message: "If you use this software, please cite it as below."
authors:
  - family-names: Jun
    given-names: Zhang
    orcid: https://orcid.org/0000-0001-8872-6153
title: "My Research Software"
version: 1.0.2
date-released: 2023-07-12

GitHub Events

Total
  • Watch event: 1
  • Fork event: 1
Last Year
  • Watch event: 1
  • Fork event: 1