https://github.com/bobleesj/saf-caf-performance
This repository contains model performance on crystal structure classification for binary compounds, derived from 1,400 .cif files using features generated with SAF and CAF.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Repository
This repository contains model performance on crystal structure classification for binary compounds, derived from 1,400 .cif files using features generated with SAF and CAF.
Basic Info
- Host: GitHub
- Owner: bobleesj
- Language: Python
- Default Branch: main
- Size: 37 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SAF CAF classification performance
This repository contains model performance on crystal structure classification for binary compounds, derived from 1,400 .cif files using features generated with SAF and CAF.
How to reproduce
```bash
Download the repository
git clone https://www.github.com/bobleesj/CAFSAFperfomance
Enter the folder
cd CAFSAFperfomance ```
Install packages listed in requirements.txt:
bash
pip install -r requirements.txt
Or you may install all packages at once:
bash
pip install matplotlib scikit-learn pandas CBFV numpy
To reproduce results
Run python main.py
``` imac@imacs-iMac digitial-discovery % python main.py
Processing outputs/CAF/featuresbinary.csv with 133 features (1/7). (1/4) Running SVM model... (2/4) Running PLSDA n=2... (3/4) Running PLS_DA model with the best n... (4/4) Running XGBoost model... ===========Elapsed time: 8.30 seconds===========
...
Processing outputs/CBFV/oliynyk.csv with 308 features (7/7). (1/4) Running SVM model... (2/4) Running PLSDA n=2... (3/4) Running PLSDA model with the best n... (4/4) Running XGBoost model... ===========Elapsed time: 12.88 seconds=========== imac@imacs-iMac digitial-discovery % ```
Check the outputs folder for ML reports, plots, etc.
For Figures 8 and 9, run:
python figure_8_case_study.py and
python figure_9_case_study.py
Result
Our SAF+CAF features does a great job with classifying crystal structuree for intermetallic binary compouds.
This a PLS-DA Component N=2 result for crystal structure that you can find under outputs/SAF_CAF/PLS_DA_plot

- Compositional features were created using CAF. Ex)
outputs/CAF/features_binary.csv - Structural features were created using SAF Ex)
outputs/SAF/binary_features.csv
To customize for your data
- Place a
features.csvfile in thedatafolder. It should have a "Structure" column, from which we'll extract all "y" values. - Place a CSV file with features in a subdirectory within
outputs. Example:outputs/SAF_CAF/binary_features.csv
To format the code
To automatically format Python code and organize imports:
bash
black -l 79 . && isort .
To generate features with CBFV
Run the following command:
bash
python featurizer.py
Questions?
For help with generating structural data using SAF, contact Bob at sl5400@columbia.edu.
Owner
- Name: Sangjoon Bob Lee
- Login: bobleesj
- Kind: user
- Location: New York, NY
- Company: Columbia University
- Website: boblee.io
- Repositories: 2
- Profile: https://github.com/bobleesj
1st-Year MS Materials Science and Engineering at Columbia Engineering, Department of Applied Physics and Applied Mathematics
GitHub Events
Total
- Issues event: 1
- Issue comment event: 3
- Push event: 2
Last Year
- Issues event: 1
- Issue comment event: 3
- Push event: 2
Dependencies
- CBFV *
- matplotlib *
- numpy *
- pandas *
- scikit-learn *
- xgboost *