ether0

A scientific reasoning model, dataset, and reward functions for chemistry.

https://github.com/future-house/ether0

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A scientific reasoning model, dataset, and reward functions for chemistry.

Basic Info

Host: GitHub
Owner: Future-House
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 16.5 MB

Statistics

Stars: 125
Watchers: 4
Forks: 13
Open Issues: 5
Releases: 0

Created about 1 year ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

ether0 Reward Model

ether0 logo

ether0: a scientific reasoning model, dataset, and reward functions for chemistry.

This repo contains the reward model for evaluating ether0 and similar models, along with utilities for working with the verifiable rewards in our benchmark.

Overview

ether0 is a reasoning language model post-trained through a loop of:

Supervised fine-tuning (SFT) on long chain-of-thought reasoning traces, to elicit reasoning from a base model.
Reinforcement learning with verifiable rewards (RLVR) to improve reasoning on focused task groups, at their own pace. These multitask learned models are referred to as 'specialists'.
Rejection sampling to filter specialists' reasoning for correctness and quality.
SFT on the base model again to make a 'generalist' reasoning model.
RLVR to recover any lost performance and push further in an all-task setting.

ether0 training info

Repo Structure

This repo contains several packages:

ether0: reward functions, rdkit data utilities, dataset generation prompts, dataset data models, language model training prompts, and data models.
ether0.remotes: server code for ether0 reward functions involving exotic packages and/or third party models.

[!NOTE] This repo does not contain training code, although you can find open source repositories like NeMo-RL or Hugging Face TRL that can do the SFT and RL phases of training.

Open Weights

Please see our open-source weights on Hugging Face: https://huggingface.co/futurehouse/ether0

```python from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.frompretrained("futurehouse/ether0") tokenizer = AutoTokenizer.frompretrained("futurehouse/ether0") ```

Open Test Set

Please see our open-source benchmark (test set) on Hugging Face: https://huggingface.co/datasets/futurehouse/ether0-benchmark

```python from datasets import load_dataset

testds = loaddataset("futurehouse/ether0-benchmark", split="test") ```

Usage

Installation

The easiest way to get started is a pip install from GitHub:

bash pip install git+https://github.com/Future-House/ether0.git

Or if you want the full set up, clone the repo and use uv:

bash git clone https://github.com/Future-House/ether0.git cd ether0 uv sync

Reward Functions

Here is a basic example of how to use the reward functions:

```python from ether0.rewards import validmoleval

Task: provide a valid completion of this molecule

partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"

Here's two model-proposed SMILES completions

invalidcompletionsmiles = "CCC" validcompletionsmiles = ")C=6C=CC=CC6"

Evaluate the completions

assert not validmoleval(invalidcompletionsmiles, partialsmiles) assert validmoleval(validcompletionsmiles, partialsmiles) ```

Visualization

If it helps, you can visualize the molecules:

```python from ether0.data import draw_molecule

See above reward functions demo for where these came from

partialsmiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14" invalidcompletionsmiles = "CCC" validcompletion_smiles = ")C=6C=CC=CC6"

validmoltext = drawmolecule(partialsmiles + validcompletionsmiles) with open("validmolecule.svg", "w") as f: f.write(validmol_text) ```

The output of draw_molecule can also be easily visualized using IPython.display, or in your terminal via chafa valid_molecule.svg (chafa docs).

valid molecule

Benchmark

Here is a sample baseline of ether0-benchmark on gpt-4o using lmi. To install lmi, please install ether0 with the baselines extra (for example uv sync --extra baselines).

We also need to run our remote rewards server via ether0-serve (for more information, see ether0.remotes docs):

bash ETHER0_REMOTES_API_TOKEN=abc123 ether0-serve

Next, start ipython with the relevant environment variables set:

bash ETHER0_REMOTES_API_BASE_URL="http://127.0.0.1:8000" ETHER0_REMOTES_API_TOKEN=abc123 \ ipython

And run the following Python code:

```python import itertools import statistics from collections import defaultdict

from aviary.core import Message from datasets import loaddataset from lmi import LiteLLMModel from tqdm.asyncio import tqdmasyncio as asyncio

from ether0.data import getproblemcategory from ether0.modelprompts import LOOSEXMLANSWERUSERPROMPT, extractanswerloose from ether0.models import RewardFunctionInfo from ether0.rewards import EVALFUNCTIONS

Add LLM prompt of your making to the dataset

testds = loaddataset("futurehouse/ether0-benchmark", split="test").map( lambda x: {"prompt": "\n\n".join((LOOSEXMLANSWERUSERPROMPT, x["problem"]))} )

Prompt to LLM

model = LiteLLMModel(name="gpt-4o") results = await asyncio.gather( *(model.acompletion([Message(content=row["prompt"])]) for row in test_ds), desc="Running evaluation", )

Compute rewards

percategoryrewards = defaultdict(list) for row, result in zip(testds, results, strict=True): # NOTE: you can also use `ether0.rewards.accuracyreward`, # but we decided to go a bit "lower level" for this demo rewardinfo = RewardFunctionInfo.modelvalidate(row["solution"]) yhat = extractanswerloose(result[0].text) reward = EVALFUNCTIONSrewardinfo.fxnname percategoryrewards[getproblemcategory(rewardinfo.problem_type)].append(reward)

for category, rewards in sorted(percategoryrewards.items()): print( f"In category {category!r} of {len(rewards)} questions," f" average reward was {statistics.mean(rewards):.3f}." ) accuracy = statistics.mean(itertools.chain.fromiterable(percategoryrewards.values())) print(f"Cumulative average reward across {len(testds)} questions was {accuracy:.3f}.") ```

Owner

Name: FutureHouse
Login: Future-House
Kind: organization
Email: help@futurehouse.org
Location: United States of America

Website: futurehouse.org
Repositories: 1
Profile: https://github.com/Future-House

Citation (CITATION.cff)

---
cff-version: 1.2.0
title: "Training a Scientific Reasoning Model for Chemistry"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Siddharth M.
    family-names: Narayanan
  - given-names: James D.
    family-names: Braza
  - given-names: Ryan-Rhys
    family-names: Griffiths
  - given-names: Albert
    family-names: Bou
  - given-names: Geemi P.
    family-names: Wellawatte
  - given-names: Mayk
    family-names: Caldas Ramos
  - given-names: Ludovico
    family-names: Mitchener
  - given-names: Samuel G.
    family-names: Rodriques
  - given-names: Andrew D.
    family-names: White
identifiers:
  - type: doi
    value: 10.48550/arXiv.2506.17238
    description: ArXiv DOI
  - type: url
    value: https://arxiv.org/abs/2506.17238
    description: ArXiv abstract
repository-code: https://github.com/Future-House/ether0
keywords:
  - Artificial Intelligence
  - Chemistry
  - Computation and Language
  - Machine Learning
  - Reasoning Model
license: Apache-2.0
preferred-citation:
  authors:
    - given-names: Siddharth M.
      family-names: Narayanan
    - given-names: James D.
      family-names: Braza
    - given-names: Ryan-Rhys
      family-names: Griffiths
    - given-names: Albert
      family-names: Bou
    - given-names: Geemi P.
      family-names: Wellawatte
    - given-names: Mayk
      family-names: Caldas Ramos
    - given-names: Ludovico
      family-names: Mitchener
    - given-names: Samuel G.
      family-names: Rodriques
    - given-names: Andrew D.
      family-names: White
  date-published: 2025-06-04
  doi: 10.48550/arXiv.2506.17238
  journal: preprint
  title: "Training a Scientific Reasoning Model for Chemistry"
  type: article
  url: https://arxiv.org/abs/2506.17238

GitHub Events

Total

Create event: 9
Issues event: 2
Watch event: 64
Delete event: 6
Issue comment event: 10
Push event: 17
Public event: 1
Pull request review event: 25
Pull request review comment event: 10
Pull request event: 15
Fork event: 6

Last Year

Create event: 9
Issues event: 2
Watch event: 64
Delete event: 6
Issue comment event: 10
Push event: 17
Public event: 1
Pull request review event: 25
Pull request review comment event: 10
Pull request event: 15
Fork event: 6

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 6
Average time to close issues: N/A
Average time to close pull requests: about 3 hours
Total issue authors: 0
Total pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 6
Average time to close issues: N/A
Average time to close pull requests: about 3 hours
Issue authors: 0
Pull request authors: 2
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jamesbraza (1)
cbartmann (1)

Pull Request Authors

jamesbraza (10)
whitead (2)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

enhancement (9) bug (1)

Dependencies

.github/workflows/lint-test.yaml actions

actions/checkout v4 composite
actions/setup-python v5 composite
astral-sh/setup-uv v6 composite
pre-commit-ci/lite-action v1.1.0 composite
pre-commit/action v3.0.1 composite
suzuki-shunsuke/github-action-renovate-config-validator v1.1.1 composite

packages/remotes/pyproject.toml pypi

OpenNMT-py ==2.3.0
fastapi *
molbloom >=2.3.4
molsol >=0.0.3
numpy >=1.20
pydantic >=2
rdkit *
torch <2.6

pyproject.toml pypi

datasets *
exmol >=3.3.0
httpx *
huggingface-hub *
molbloom ==2.3.4
pydantic >=2
rdkit *
regex *
tenacity *

uv.lock pypi

196 dependencies