ether0
A scientific reasoning model, dataset, and reward functions for chemistry.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
A scientific reasoning model, dataset, and reward functions for chemistry.
Basic Info
Statistics
- Stars: 125
- Watchers: 4
- Forks: 13
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
ether0 Reward Model
ether0: a scientific reasoning model, dataset, and reward functions for chemistry.
This repo contains the reward model for evaluating ether0 and similar models, along with utilities for working with the verifiable rewards in our benchmark.
Overview
ether0 is a reasoning language model post-trained through a loop of:
- Supervised fine-tuning (SFT) on long chain-of-thought reasoning traces, to elicit reasoning from a base model.
- Reinforcement learning with verifiable rewards (RLVR) to improve reasoning on focused task groups, at their own pace. These multitask learned models are referred to as 'specialists'.
- Rejection sampling to filter specialists' reasoning for correctness and quality.
- SFT on the base model again to make a 'generalist' reasoning model.
- RLVR to recover any lost performance and push further in an all-task setting.

Repo Structure
This repo contains several packages:
ether0: reward functions,rdkitdata utilities, dataset generation prompts, dataset data models, language model training prompts, and data models.ether0.remotes: server code for ether0 reward functions involving exotic packages and/or third party models.
[!NOTE] This repo does not contain training code, although you can find open source repositories like NeMo-RL or Hugging Face TRL that can do the SFT and RL phases of training.
Open Weights
Please see our open-source weights on Hugging Face: https://huggingface.co/futurehouse/ether0
```python from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.frompretrained("futurehouse/ether0") tokenizer = AutoTokenizer.frompretrained("futurehouse/ether0") ```
Open Test Set
Please see our open-source benchmark (test set) on Hugging Face: https://huggingface.co/datasets/futurehouse/ether0-benchmark
```python from datasets import load_dataset
testds = loaddataset("futurehouse/ether0-benchmark", split="test") ```
Usage
Installation
The easiest way to get started is a pip install from GitHub:
bash
pip install git+https://github.com/Future-House/ether0.git
Or if you want the full set up, clone the repo and use uv:
bash
git clone https://github.com/Future-House/ether0.git
cd ether0
uv sync
Reward Functions
Here is a basic example of how to use the reward functions:
```python from ether0.rewards import validmoleval
Task: provide a valid completion of this molecule
partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"
Here's two model-proposed SMILES completions
invalidcompletionsmiles = "CCC" validcompletionsmiles = ")C=6C=CC=CC6"
Evaluate the completions
assert not validmoleval(invalidcompletionsmiles, partialsmiles) assert validmoleval(validcompletionsmiles, partialsmiles) ```
Visualization
If it helps, you can visualize the molecules:
```python from ether0.data import draw_molecule
See above reward functions demo for where these came from
partialsmiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14" invalidcompletionsmiles = "CCC" validcompletion_smiles = ")C=6C=CC=CC6"
validmoltext = drawmolecule(partialsmiles + validcompletionsmiles) with open("validmolecule.svg", "w") as f: f.write(validmol_text) ```
The output of draw_molecule can also be easily visualized using IPython.display,
or in your terminal via chafa valid_molecule.svg
(chafa docs).
Benchmark
Here is a sample baseline of
ether0-benchmark
on gpt-4o using lmi.
To install lmi, please install ether0 with the baselines extra
(for example uv sync --extra baselines).
We also need to run our remote rewards server via ether0-serve
(for more information, see ether0.remotes docs):
bash
ETHER0_REMOTES_API_TOKEN=abc123 ether0-serve
Next, start ipython with the relevant environment variables set:
bash
ETHER0_REMOTES_API_BASE_URL="http://127.0.0.1:8000" ETHER0_REMOTES_API_TOKEN=abc123 \
ipython
And run the following Python code:
```python import itertools import statistics from collections import defaultdict
from aviary.core import Message from datasets import loaddataset from lmi import LiteLLMModel from tqdm.asyncio import tqdmasyncio as asyncio
from ether0.data import getproblemcategory from ether0.modelprompts import LOOSEXMLANSWERUSERPROMPT, extractanswerloose from ether0.models import RewardFunctionInfo from ether0.rewards import EVALFUNCTIONS
Add LLM prompt of your making to the dataset
testds = loaddataset("futurehouse/ether0-benchmark", split="test").map( lambda x: {"prompt": "\n\n".join((LOOSEXMLANSWERUSERPROMPT, x["problem"]))} )
Prompt to LLM
model = LiteLLMModel(name="gpt-4o") results = await asyncio.gather( *(model.acompletion([Message(content=row["prompt"])]) for row in test_ds), desc="Running evaluation", )
Compute rewards
percategoryrewards = defaultdict(list) for row, result in zip(testds, results, strict=True): # NOTE: you can also use `ether0.rewards.accuracyreward`, # but we decided to go a bit "lower level" for this demo rewardinfo = RewardFunctionInfo.modelvalidate(row["solution"]) yhat = extractanswerloose(result[0].text) reward = EVALFUNCTIONSrewardinfo.fxnname percategoryrewards[getproblemcategory(rewardinfo.problem_type)].append(reward)
for category, rewards in sorted(percategoryrewards.items()): print( f"In category {category!r} of {len(rewards)} questions," f" average reward was {statistics.mean(rewards):.3f}." ) accuracy = statistics.mean(itertools.chain.fromiterable(percategoryrewards.values())) print(f"Cumulative average reward across {len(testds)} questions was {accuracy:.3f}.") ```
Owner
- Name: FutureHouse
- Login: Future-House
- Kind: organization
- Email: help@futurehouse.org
- Location: United States of America
- Website: futurehouse.org
- Repositories: 1
- Profile: https://github.com/Future-House
Citation (CITATION.cff)
---
cff-version: 1.2.0
title: "Training a Scientific Reasoning Model for Chemistry"
message: >-
If you use this software, please cite it using the
metadata from this file.
authors:
- given-names: Siddharth M.
family-names: Narayanan
- given-names: James D.
family-names: Braza
- given-names: Ryan-Rhys
family-names: Griffiths
- given-names: Albert
family-names: Bou
- given-names: Geemi P.
family-names: Wellawatte
- given-names: Mayk
family-names: Caldas Ramos
- given-names: Ludovico
family-names: Mitchener
- given-names: Samuel G.
family-names: Rodriques
- given-names: Andrew D.
family-names: White
identifiers:
- type: doi
value: 10.48550/arXiv.2506.17238
description: ArXiv DOI
- type: url
value: https://arxiv.org/abs/2506.17238
description: ArXiv abstract
repository-code: https://github.com/Future-House/ether0
keywords:
- Artificial Intelligence
- Chemistry
- Computation and Language
- Machine Learning
- Reasoning Model
license: Apache-2.0
preferred-citation:
authors:
- given-names: Siddharth M.
family-names: Narayanan
- given-names: James D.
family-names: Braza
- given-names: Ryan-Rhys
family-names: Griffiths
- given-names: Albert
family-names: Bou
- given-names: Geemi P.
family-names: Wellawatte
- given-names: Mayk
family-names: Caldas Ramos
- given-names: Ludovico
family-names: Mitchener
- given-names: Samuel G.
family-names: Rodriques
- given-names: Andrew D.
family-names: White
date-published: 2025-06-04
doi: 10.48550/arXiv.2506.17238
journal: preprint
title: "Training a Scientific Reasoning Model for Chemistry"
type: article
url: https://arxiv.org/abs/2506.17238
GitHub Events
Total
- Create event: 9
- Issues event: 2
- Watch event: 64
- Delete event: 6
- Issue comment event: 10
- Push event: 17
- Public event: 1
- Pull request review event: 25
- Pull request review comment event: 10
- Pull request event: 15
- Fork event: 6
Last Year
- Create event: 9
- Issues event: 2
- Watch event: 64
- Delete event: 6
- Issue comment event: 10
- Push event: 17
- Public event: 1
- Pull request review event: 25
- Pull request review comment event: 10
- Pull request event: 15
- Fork event: 6
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 6
- Average time to close issues: N/A
- Average time to close pull requests: about 3 hours
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jamesbraza (1)
- cbartmann (1)
Pull Request Authors
- jamesbraza (10)
- whitead (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- astral-sh/setup-uv v6 composite
- pre-commit-ci/lite-action v1.1.0 composite
- pre-commit/action v3.0.1 composite
- suzuki-shunsuke/github-action-renovate-config-validator v1.1.1 composite
- OpenNMT-py ==2.3.0
- fastapi *
- molbloom >=2.3.4
- molsol >=0.0.3
- numpy >=1.20
- pydantic >=2
- rdkit *
- torch <2.6
- datasets *
- exmol >=3.3.0
- httpx *
- huggingface-hub *
- molbloom ==2.3.4
- pydantic >=2
- rdkit *
- regex *
- tenacity *
- 196 dependencies