elasticnet
Predict the mechanical properties of multi-component transition metal carbides (MTMCs)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: nature.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary
Repository
Predict the mechanical properties of multi-component transition metal carbides (MTMCs)
Basic Info
- Host: GitHub
- Owner: jzhang-github
- License: mit
- Language: Python
- Default Branch: main
- Size: 9.72 MB
Statistics
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Elastic net
Machine learning model for predicting multi-component transition metal carbides (MTMCs)
This is the manual to reproduce results and support conclusions of Lattice Distortion Informed Exceptional Multi-Component Transition Metal Carbides Discovered by Machine Learning.
We recommend using a Linux/Windows operating system to run the following examples, under the current directory.
Table of Contents
Installation
Install under conda environment
Create a new environment
console conda create -n ElasticNet python==3.10Activate the environment
console conda activate ElasticNetInstall package
console pip install elasticnet
Alternatively, you can install with pip.
Install the package. Use
--useroption if you don't have the root permission.
console pip install elasticnet --userIf your IP locates in mainland China, you may need to install it from the tsinghua mirror.
console pip install elasticnet -i https://pypi.tuna.tsinghua.edu.cn/simple
Requirements file: requirements.txt
Key modules
numpy==1.25.0
scikit-learn==1.2.2
tensorflow==2.10.0
ase==3.22.1
pandas==1.5.3
Example of using the well-trained model
- Download the well-trained parameters: checkpoint
- Run the following python code:
python from elasticnet import predict_formula pf = predict_formula(config='input_config.json',ckpt_file='checkpoint') pf.predict(*['VNbTa', 'TiNbTa']) - The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
python array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 , 175.50716 ], [283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 , 178.52579 ]], dtype=float32)
Train a new model from scratch
Prepare DFT calculations
- Bulk optimization.
- Elastic constants calculation.
Collect DFT results
- Collect elastic constants into a file with
csvextension. See example: files/HECCpropertiesover_sample.CSV. - You may refer to these papers to calculate modulus from C11, C12, and C44: PHYSICAL REVIEW B 87, 094114 (2013) and Journal of the European Ceramic Society 41 (2021) 6267-6274
- The
*csvfile should contain at least these columns:nominal_formula,C11,C12,C44,B,G,E,Hv, andreal_formula. See example: files/HECCpropertiesover_sample.CSV.
Prepare configurations files
input_config.json: defines how to generate input features and labels. You are recommended to download this file and modify then.
| Variable | Type | Meaning |
| -------------------- | ---- | ------------------------------------------------------------------------------------------------------------ |
| includemore | bool | IfTrue, the `bulkenergyperformulaandvolumeperformulaare also be included in the input features. | | split_test | bool | IfTrue, a new test set will be split from the dataset. For cross validation, it is OK to set this asFalse. | | clean_by_pearson_r | bool | Clean input features. Highly correlated features will be removed if this isTrue. | | reduce_dimension_by_pca | bool | Clean input features by [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn-decomposition-pca). Choose one amongcleanbypearsonrandreducedimensionbypca. | | prop_precursor_path | str | A file storing the properties of precursory binary carbides. File extension can be.csvand.json. See example: [file/HECC_precursors.csv](file/HECC_precursors.csv)| | model_save_path | str | Path for storing [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn-decomposition-pca) model and other information when generating input features and labels | | props | list | A list of properties that are encoded into the input features. Choose among the column names of [files/HECC_precursors.csv](files/HECC_precursors.csv). | | operators | list | A list of operators to expand the input dimension. Choose among: ['cube', 'exp_n', 'exp', 'plus', 'minus', 'multiply', 'sqrt', 'log10', 'log', 'square']. | | HECC_properties_path | str | A file contains the collected properties of MTMCs. | | labels | list | A list of label names that need to fit/learn. | | soap_features | bool | Whether to use [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) descriptor. | | soap_config | dict | A python dict that defines the configuration of [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) descriptor. - input_structure_type: 'POSCAR' or 'CONTCAR'. Use 'POSCAR' or 'CONTCAR' to generate [SOAP](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html) features. - You can find the explanations for other specifications here: [SOAP.init`](https://singroup.github.io/dscribe/latest/tutorials/descriptors/soap.html#dscribe.descriptors.soap.SOAP.init) |train.json: defines how to train the machine-learning model.
| Variable | Type | Meaning |
| -------------------- | ---- | ------------------------------------------------------------------------------------------------------------ |
| Nodesperlayer | list | Number of nodes of every hidden layers |
| Numberoffold | int | Number of cross-validation folds. Normally 5 or 10. |
| featurefile | str | A file contains input features. |
| labelfile | str | A file contains labels of samples. |
| Activationfunction | str | Activation function of hidden layers. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' |
| Outputactivation | str | Activation function of the output layer. Alternatives: 'relu', 'softmax', 'sigmoid', 'tanh' |
| Numberofoutnode | int/'auto' | Number of nodes of the output layer. If there is only one column in the `labelfile, this variable should be1. 'auto' is for multiple columns. |
| Optimizer | str | The name of the optimizer. Examples: [tf.keras.optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) |
| Cost_function | str | Name of cost function inTensorflow. Examples: [tf.keras.losses](https://www.tensorflow.org/api_docs/python/tf/keras/losses) |
| Metrics | list | A list of metrics to evaluate the model. Examples: [tf.keras.metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) |
| Batch_size | int | The batch size. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) |
| Epochs | int | Number of epochs for training. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) |
| Verbose | int | Verbosity mode. See [tf.keras.Model.fit](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) |
| Regularization | bool | Whether to used the L2 regularization. See [tf.keras.regularizers.L2`](https://www.tensorflow.org/apidocs/python/tf/keras/regularizers/L2). |
| Modelsavepath | str | A folder to store the well-trained NN model. |
| Logsavepath | str | A folder to store the training log. |
| Predictionsave_path | str | A folder to store the predictions of input features after training. |
| SEED | int | Random seed for shuffling input dataset. |
Run main function
console
python -m elasticnet
The following python code will be executed. ```python def main(): # prepare dataset from elasticnet.prepareinput import xmain, ymain xmain('inputconfig.json', loadPCA=False, savePCA=True) ymain('input_config.json')
# train
from elasticnet.ann import CV_ML_RUN, load_and_pred
CV_ML_RUN('train.json')
load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)
main() ```
You may want to prepare the dataset and train the model in separate steps, see below .
Collect input features and labels
python
from elasticnet.prepare_input import x_main, y_main
x_main('input_config.json', load_PCA=False, save_PCA=True)
y_main('input_config.json')
Three files will be generated:
- x_data_init.txt: input features without PCA.
- x_data_after_pca.txt: input features after PCA.
- y_data.txt: labels
Train
Run the following python code.
python from elasticnet import CV_ML_RUN, load_and_pred if __name__ == '__main__': CV_ML_RUN('train.json') load_and_pred('train.json', 'x_data_after_pca.txt', write_pred_log=True, drop_cols=None)You can also execute
python -m elasticnetdirectly in the console. See Run main function.
Check training results
- Generated files/folders
checkpoint: A folder forPCAmodel, NN model, and other information for generating input features.cp.ckpt: Location of NN model.log: Learning curves and weights of all CV models.- The file with extension
*.global.acc.losssummarizes the model performance. Example: 4layer-80808080_nodes.global.acc.loss
- The file with extension
pred: Predictions of input features.- prediction_all.txt: all CV models.
- prediction_mean.txt: average of CV models.
- pca_model.joblib:
PCAmodel. - scale_range.json: Range to rescale input features.
- scalerange1.json: Range to rescale input features again.
Predict
- After training, run the following python code:
python from elasticnet import predict_formula pf = predict_formula(config='input_config.json',ckpt_file='checkpoint') pf.predict(*['VNbTa', 'TiNbTa']) - The mechanical properties of (VNbTa)C3 and (TiNbTa)C3 will show on the screen. The specific modulus of each column is: B, G, E, Hv, C11, C44.
python array([[294.43195 , 203.70157 , 496.67032 , 25.989697, 632.3356 , 175.50716 ], [283.17245 , 201.96506 , 489.7816 , 26.824062, 607.07336 , 178.52579 ]], dtype=float32)
High-throughput predict
- Run the following python code:
python from elasticnet import high_throughput_predict high_throughput_predict() - Output: ANN_predictions.xlsx
Ternary plot
- Run the following python code:
python from elasticnet import ternary_plot ternary_plot(elements = ['Ti', 'Nb', 'Ta']) Alternatively,
elements = ['VNbTa', 'Ti', 'Hf'].Output: phasediagrams/**diagram.csv
Plot.
Other scripts
Get ROM
Run the following python code:
python from elasticnet import get_rom ROM = get_rom(config='input_config.json', formulas='formulas.txt', props=['B', 'G', 'E', 'Hv', 'VEC']) print(ROM)Output. If the formulas.txt contains ['VNbTa', 'TiNbTa'] only.
python array([[310.33922223, 210.80075867, 515.61666613, 26.20022487, 9. ], [291.74733333, 199.9075404 , 488.11937417, 25.52194014, 8.66666667]])
Get VEC
- VEC is simply the last column of Get ROM.
Abbreviations
| Abbr. | Full name |
| -------------------- | ---------- |
| MTMC | Multi-component transition metal carbides |
| HECC | High-entropy carbide ceramic |
| HEC | High-entropy ceramic |
| ML | Machine learning |
| SOAP | Smooth overlap of atomic positions |
| NN | Neural networks |
| CV | cross validation |
| ROM | Rule of mixtures |
| VEC | Valence electron concentration |
Owner
- Name: jzhang
- Login: jzhang-github
- Kind: user
- Location: HongKong, China
- Company: CityU
- Repositories: 4
- Profile: https://github.com/jzhang-github
Ph.D. student of CityU
Citation (CITATION.cff)
cff-version: 1.0.2
message: "If you use this software, please cite it as below."
authors:
- family-names: Jun
given-names: Zhang
orcid: https://orcid.org/0000-0001-8872-6153
title: "My Research Software"
version: 1.0.2
date-released: 2023-07-12
GitHub Events
Total
- Watch event: 1
- Fork event: 1
Last Year
- Watch event: 1
- Fork event: 1