synthgpt

Code and Data for "Large Language Models for Inorganic Synthesis Prediction"

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: scholar.google, acs.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

chemistry inorganic-chemistry inorganic-materials inorganic-materials-synthesis llm machine-learning python3 wolfram-language

Last synced: 6 months ago · JSON representation

Repository

Code and Data for "Large Language Models for Inorganic Synthesis Prediction"

Basic Info

Host: GitHub
Owner: jschrier
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 45.6 MB

Statistics

Stars: 26
Watchers: 2
Forks: 3
Open Issues: 0
Releases: 1

Topics

chemistry inorganic-chemistry inorganic-materials inorganic-materials-synthesis llm machine-learning python3 wolfram-language

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

SynthGPT

This repository contains the data and code for Large Language Models for Inorganic Synthesis Predictions by Seongmin Kim, Yousung Jung, and Joshua Schrier.

graphical table of contents

Organization

Input data and pre-defined training and cross-validation and train/test splits are found in the data_MP and data folders, for the synthesizability and precursor selection tasks, respectively.

Results are in the results_MP and results folders, for the synthesizability and precursor selection tasks, respectively. We have used a JSON format to facilitate interpretation of the results.

Prompts for the LLM are in the prompts folder as plain text files; they can also be found in the online Supporting Information file.

Source code is in the src folder; some haphazard tests are included in tests.

Instructions

Run the notebooks in the top-level directory in order. Mathematica code (.wls) uses Mathematica 14.0 and no other libraries. Python code (.py) uses python 3.8.13 and requires libraries; Numpy (version == 1.22.3), PyTorch (version == 1.11.0), and Pymatgen (version == 2022.9.21).

The directory is organized around the order in which we performed the work, dividing the work into discrete tasks:
- Precursor selection (scripts 00_Data_Curation.py - 07_Estimate_Perfect_Elemwise.py) - Synthesizability prediction (08_Data_Preparation_Synthesizability.wls - 11_Score_GPT_Outputs_Synthesizability.wls) - Evaluation of precursor rescoring results with GPT-4 (12a_SetupData_Combined.wls and 12b_Evaluate_Combined.wls ) and by removing recommendations that do not consist of only allowed precursors (13_Precursor_Compliance.wls and 14_Evaluate_Combination_Retaining_Only_Allowed_Precursors.wls) - Evaluation of the effects of prompt modification on the synthesizability prediction. These are each evaluated for only the first 5000 test items. They include modifying the prompt to add additional specialization ("You are an expert oxide inorganic chemist...", 15a_Prompt_Modification_Oxide.wls), removing specialization ("You are a magician..." 15b_Prompt_Modification_Magician.wls), and alternate ways of expressing the positive-unlabelled training task ("...items labeled \"U\" could be positive or negative (i.e., synthesizable or unsynthesizable"), 15c_Prompt_Modification_Labeling.wls).

Yes, this is different from the order the paper. "Life can only be understood backwards; but it must be lived forwards." --Sren Kierkegaard

Cite

A publication appears on the Journal of the American Chemical Society as doi:10.1021/jacs.4c05840

Owner

Name: Joshua Schrier
Login: jschrier
Kind: user
Location: The Bronx, NY
Company: Fordham University

Website: https://scholar.google.com/citations?user=zJC_7roAAAAJ&hl=en
Twitter: JoshuaSchrier
Repositories: 2
Profile: https://github.com/jschrier

GitHub Events

Total

Watch event: 9
Fork event: 4

Last Year

Watch event: 9
Fork event: 4

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science