chembench

How good are LLMs at chemistry?

https://github.com/lamalab-org/chembench

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

benchmark chemistry llm llms llms-benchmarking machine-learning materials-science safety

Last synced: 9 months ago · JSON representation

Repository

How good are LLMs at chemistry?

Basic Info

Host: GitHub
Owner: lamalab-org
License: mit
Language: Python
Default Branch: dev
Homepage: https://chembench.lamalab.org/
Size: 261 MB

Statistics

Stars: 111
Watchers: 4
Forks: 12
Open Issues: 58
Releases: 2

Topics

benchmark chemistry llm llms llms-benchmarking machine-learning materials-science safety

Created about 3 years ago · Last pushed 9 months ago

Metadata Files

Readme Changelog Contributing License Citation

ChemBench

ChemBench.

ChemBench is a Python package for building and running benchmarks of large language models and multimodal models such as vision-language models.

ChemBench was developed as a comprehensive benchmarking suite for the performance of LLMs in chemistry (see our paper) but has since then been extended to support multimodal models as well (see our paper). ChemBench is designed to be modular and extensible, allowing users to easily add new datasets, models, and evaluation metrics.

For more detailed information, see the documentation.

Installing ChemBench

Prerequisites

Python 3.10 or newer:
Virtual environment: (Highly recommended) Isolate your project dependencies.

bash pip install chembench

For more detailed installation instructions, see the installation guide.

Running ChemBench Evaluation

Run comprehensive evaluations across all topics in the benchmark suite. Results are automatically saved as topic-specific reports.

``` python from chembench.prompter import PrompterBuilder from chembench.evaluate import ChemBenchmark, savetopicreports

benchmark = ChemBenchmark.from_huggingface(verbose=True)

prompter = PrompterBuilder.frommodelobject("groq/gemma2-9b-it", model_kwargs={"temperature": 0})

results = benchmark.bench(prompter)

savetopicreports(benchmark, results) ```

For getting started see our docs

Citation

If you use ChemBench in your work or wish to refer to published evaluation results, please cite:

@article{mirza2024large, title = {Are large language models superhuman chemists?}, author = {Adrian Mirza and Nawaf Alampara and Sreekanth Kunchapu and Martio Ros-Garca and Benedict Emoekabu and Aswanth Krishnan and Tanya Gupta and Mara Schilling-Wilhelmi and Macjonathan Okereke and Anagha Aneesh and Amir Mohammad Elahi and Mehrdad Asgari and Juliane Eberhardt and Hani M. Elbeheiry and Mara Victoria Gil and Maximilian Greiner and Caroline T. Holick and Christina Glaubitz and Tim Hoffmann and Abdelrahman Ibrahim and Lea C. Klepsch and Yannik Kster and Fabian Alexander Kreth and Jakob Meyer and Santiago Miret and Jan Matthias Peschel and Michael Ringleb and Nicole Roesner and Johanna Schreiber and Ulrich S. Schubert and Leanne M. Stafast and Dinga Wonanke and Michael Pieler and Philippe Schwaller and Kevin Maik Jablonka}, year = {2024}, journal = {arXiv preprint arXiv: 2404.01475} } If your work utilizes ChemBench's multimodal capabilities, or refers to published multimodal results, please also cite:

@article{alampara2024probing, title = {Probing the limitations of multimodal language models for chemistry and materials research}, author = {Nawaf Alampara and Mara Schilling-Wilhelmi and Martio Ros-Garca and Indrajeet Mandal and Pranav Khetarpal and Hargun Singh Grover and N. M. Anoop Krishnan and Kevin Maik Jablonka}, year = {2024}, journal = {arXiv preprint arXiv: 2411.16955} }

How to contribute

See the developer notes for details on how to contribute to this project.

Owner

Name: Laboratory for AI for Materials
Login: lamalab-org
Kind: organization

Repositories: 1
Profile: https://github.com/lamalab-org

Research group led by Kevin Maik Jablonka

GitHub Events

Total

Create event: 38
Release event: 1
Issues event: 41
Watch event: 38
Delete event: 47
Issue comment event: 172
Push event: 123
Pull request event: 71
Pull request review comment event: 93
Pull request review event: 110
Fork event: 5

Last Year

Create event: 38
Release event: 1
Issues event: 41
Watch event: 38
Delete event: 47
Issue comment event: 172
Push event: 123
Pull request event: 71
Pull request review comment event: 93
Pull request review event: 110
Fork event: 5

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 93
Total pull requests: 192
Average time to close issues: 4 months
Average time to close pull requests: 6 days
Total issue authors: 10
Total pull request authors: 12
Average comments per issue: 1.56
Average comments per pull request: 8.68
Merged pull requests: 95
Bot issues: 0
Bot pull requests: 76

Past Year

Issues: 37
Pull requests: 127
Average time to close issues: 27 days
Average time to close pull requests: 6 days
Issue authors: 8
Pull request authors: 7
Average comments per issue: 0.84
Average comments per pull request: 8.49
Merged pull requests: 35
Bot issues: 0
Bot pull requests: 76

View more stats

Top Authors

Issue Authors

kjappelbaum (63)
MLSun22 (5)
n0w0f (3)
lzpp2598 (3)
MrtinoRG (3)
chenruduan (1)
a1ix2 (1)
BY571 (1)
AdrianM0 (1)

Pull Request Authors

codeflash-ai[bot] (76)
MrtinoRG (29)
n0w0f (19)
sreekanth221998 (17)
kjappelbaum (14)
AdrianM0 (10)
tguptaa (9)
Macjoechim (6)
MLSun22 (2)
marawilhelmi (2)
aaneesh1 (2)
pschwllr (1)

Top Labels

Issue Labels

enhancement (12) bug (2) help wanted (2) good first issue (2) question-corpus (2) nice-to-have (1) new-question (1) prompter (1) requires-discussion (1)

Pull Request Labels

⚡️ codeflash (76) bug (1) enhancement (1)

Packages

Total packages: 1
Total downloads:
- pypi 6,152 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: chembench

Benchmark chemistry performance of LLMs

Documentation: https://chembench.readthedocs.io/
License: MIT
Latest release: 0.3.0
published over 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 6,152 Last month

Rankings

Dependent packages count: 9.6%

Average: 31.7%

Dependent repos count: 53.9%

Maintainers (1)

lamalab

Last synced: 10 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science