sga

[ICML 2024] LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

https://github.com/pingchuanma/sga

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

[ICML 2024] LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Basic Info
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

# Scientific Generative Agent [![arXiv](https://img.shields.io/badge/ICML2024-2405.09783-b31b1b.svg)](https://arxiv.org/abs/2405.09783) **Tl;dr: We propose a bilevel optimization framework for physical scientific discovery where (1) LLMs optimize the discrete scientific hypothesis and (2) physical simulations optimize the continuous parameters.** ![Pipeline](./assets/pipeline.jpg)

Getting Started

Prerequisites

This codebase is built upon the following environment:

  • Ubuntu 22.04
  • CUDA 12.1
  • GCC 11
  • Python 3.10.14
  • PyTorch 2.1.2

Although it is not guaranteed, the codebase should work on other versions of the above dependencies. Please make sure you have NVIDIA GPU with CUDA support and Internet connection.

Installation

Clone the repository:

bash git clone https://github.com/PingchuanMa/SGA.git cd SGA

Prepare anaconda environment and activate it:

bash conda create -n sga python=3.10 conda activate sga

Install the dependencies:

bash pip install -r requirements.txt

Dataset Preparation

You can either (1) download the dataset from [link] and extract it at experiment/log/dataset or (2) generate it by yourself:

bash python experiment/script/dataset/dataset_elasticity.py python experiment/script/dataset/dataset_plasticity.py

Training

First set up the global API key for the LLM you want to use if you haven't done so:

bash export OPENAI_API_KEY=your-api-key # for OpenAI export ANTHROPIC_API_KEY=your-api-key # for Anthropic export MISTRAL_API_KEY=your-api-key # for Mistral

Otherwise, you can set up a project-wise API key in the configuration files in sga/config/llm.

We provide a few example LLM services in the codebase:

  • openai-gpt-4-1106-preview
  • openai-gpt-3.5-turbo-0125
  • mistral-open-mixtral-8x7b
  • anthropic-claude-3-sonnet-20240229

Choose one of them and train the model by running the following command:

bash llm=openai-gpt-4-1106-preview # or other backbones python experiment/script/physics/invent_neohookean_from_linear.py --llm ${llm} python experiment/script/physics/invent_plasticine_from_identity.py --llm ${llm} python experiment/script/physics/invent_sand_from_identity.py --llm ${llm} python experiment/script/physics/invent_water_from_identity.py --llm ${llm}

Customization

To use another LLM model, you can add the configuration in sga/config/llm and modify the llm argument in the training script. Please make sure the API key is set up correctly. You can also introduce other LLM providers by implementing the corresponding BaseResponseData class and send_message branch in sga/agent/physicist.py file.

To learn from other constitutive laws, you can implement the ground-truth constitutive law in sga/config/physics/env/physics/templates folder, add the corresponding configuration file in sga/config/physics/env/physics folder, re-generate the dataset, and modify the dataset_path argument in the training script.

Citation

Please consider citing our work if you are using our codebase.

tex @inproceedings{ma2024llm, title={LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery}, author={Ma, Pingchuan and Wang, Tsun-Hsuan and Guo, Minghao and Sun, Zhiqing and Tenenbaum, Joshua B and Rus, Daniela and Gan, Chuang and Matusik, Wojciech}, booktitle={International Conference on Machine Learning}, year={2024}, organization={PMLR} }

Owner

  • Name: Pingchuan Ma
  • Login: PingchuanMa
  • Kind: user
  • Company: MIT CSAIL

Physical Simulation / Machine Learning

Citation (CITATION.bib)

@inproceedings{ma2024llm,
  title={LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery},
  author={Ma, Pingchuan and Wang, Tsun-Hsuan and Guo, Minghao and Sun, Zhiqing and Tenenbaum, Joshua B and Rus, Daniela and Gan, Chuang and Matusik, Wojciech},
  booktitle={International Conference on Machine Learning},
  year={2024},
  organization={PMLR}
}

GitHub Events

Total
  • Issues event: 2
  • Watch event: 25
  • Issue comment event: 1
  • Fork event: 4
Last Year
  • Issues event: 2
  • Watch event: 25
  • Issue comment event: 1
  • Fork event: 4

Dependencies

requirements.txt pypi
  • anthropic ==0.26.1
  • mistralai ==0.3.0
  • numpy ==1.26.4
  • openai ==1.30.4
  • pyvista ==0.43.8
  • tensorboard ==2.16.2
  • torch ==2.1.2
  • torchaudio ==2.1.2
  • torchvision ==0.16.2
  • tqdm ==4.66.4
  • trimesh ==4.4.0
  • warp-lang ==1.1.0
setup.py pypi
  • numpy *
  • pyvista *
  • tensorboard *
  • torch *
  • tqdm *
  • trimesh *
  • warp-lang *