ether0

A scientific reasoning model, dataset, and reward functions for chemistry.

https://github.com/future-house/ether0

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A scientific reasoning model, dataset, and reward functions for chemistry.

Basic Info
  • Host: GitHub
  • Owner: Future-House
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 16.5 MB
Statistics
  • Stars: 125
  • Watchers: 4
  • Forks: 13
  • Open Issues: 5
  • Releases: 0
Created 10 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

ether0 Reward Model

GitHub arXiv Project Status: Active License

Tests Code style: black python Model on HF Dataset on HF

ether0 logo

ether0: a scientific reasoning model, dataset, and reward functions for chemistry.

This repo contains the reward model for evaluating ether0 and similar models, along with utilities for working with the verifiable rewards in our benchmark.

Overview

ether0 is a reasoning language model post-trained through a loop of:

  1. Supervised fine-tuning (SFT) on long chain-of-thought reasoning traces, to elicit reasoning from a base model.
  2. Reinforcement learning with verifiable rewards (RLVR) to improve reasoning on focused task groups, at their own pace. These multitask learned models are referred to as 'specialists'.
  3. Rejection sampling to filter specialists' reasoning for correctness and quality.
  4. SFT on the base model again to make a 'generalist' reasoning model.
  5. RLVR to recover any lost performance and push further in an all-task setting.

ether0 training info

Repo Structure

This repo contains several packages:

  • ether0: reward functions, rdkit data utilities, dataset generation prompts, dataset data models, language model training prompts, and data models.
  • ether0.remotes: server code for ether0 reward functions involving exotic packages and/or third party models.

[!NOTE] This repo does not contain training code, although you can find open source repositories like NeMo-RL or Hugging Face TRL that can do the SFT and RL phases of training.

Open Weights

Please see our open-source weights on Hugging Face: https://huggingface.co/futurehouse/ether0

```python from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.frompretrained("futurehouse/ether0") tokenizer = AutoTokenizer.frompretrained("futurehouse/ether0") ```

Open Test Set

Please see our open-source benchmark (test set) on Hugging Face: https://huggingface.co/datasets/futurehouse/ether0-benchmark

```python from datasets import load_dataset

testds = loaddataset("futurehouse/ether0-benchmark", split="test") ```

Usage

Installation

The easiest way to get started is a pip install from GitHub:

bash pip install git+https://github.com/Future-House/ether0.git

Or if you want the full set up, clone the repo and use uv:

bash git clone https://github.com/Future-House/ether0.git cd ether0 uv sync

Reward Functions

Here is a basic example of how to use the reward functions:

```python from ether0.rewards import validmoleval

Task: provide a valid completion of this molecule

partial_smiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14"

Here's two model-proposed SMILES completions

invalidcompletionsmiles = "CCC" validcompletionsmiles = ")C=6C=CC=CC6"

Evaluate the completions

assert not validmoleval(invalidcompletionsmiles, partialsmiles) assert validmoleval(validcompletionsmiles, partialsmiles) ```

Visualization

If it helps, you can visualize the molecules:

```python from ether0.data import draw_molecule

See above reward functions demo for where these came from

partialsmiles = "O=C(OC1C(OC(=O)C=2C=CC=CC2)C3(O)C(C)(C)CCCC3(C)C4CC=5OC=CC5C(C)C14" invalidcompletionsmiles = "CCC" validcompletion_smiles = ")C=6C=CC=CC6"

validmoltext = drawmolecule(partialsmiles + validcompletionsmiles) with open("validmolecule.svg", "w") as f: f.write(validmol_text) ```

The output of draw_molecule can also be easily visualized using IPython.display, or in your terminal via chafa valid_molecule.svg (chafa docs).

valid molecule

Benchmark

Here is a sample baseline of ether0-benchmark on gpt-4o using lmi. To install lmi, please install ether0 with the baselines extra (for example uv sync --extra baselines).

We also need to run our remote rewards server via ether0-serve (for more information, see ether0.remotes docs):

bash ETHER0_REMOTES_API_TOKEN=abc123 ether0-serve

Next, start ipython with the relevant environment variables set:

bash ETHER0_REMOTES_API_BASE_URL="http://127.0.0.1:8000" ETHER0_REMOTES_API_TOKEN=abc123 \ ipython

And run the following Python code:

```python import itertools import statistics from collections import defaultdict

from aviary.core import Message from datasets import loaddataset from lmi import LiteLLMModel from tqdm.asyncio import tqdmasyncio as asyncio

from ether0.data import getproblemcategory from ether0.modelprompts import LOOSEXMLANSWERUSERPROMPT, extractanswerloose from ether0.models import RewardFunctionInfo from ether0.rewards import EVALFUNCTIONS

Add LLM prompt of your making to the dataset

testds = loaddataset("futurehouse/ether0-benchmark", split="test").map( lambda x: {"prompt": "\n\n".join((LOOSEXMLANSWERUSERPROMPT, x["problem"]))} )

Prompt to LLM

model = LiteLLMModel(name="gpt-4o") results = await asyncio.gather( *(model.acompletion([Message(content=row["prompt"])]) for row in test_ds), desc="Running evaluation", )

Compute rewards

percategoryrewards = defaultdict(list) for row, result in zip(testds, results, strict=True): # NOTE: you can also use `ether0.rewards.accuracyreward`, # but we decided to go a bit "lower level" for this demo rewardinfo = RewardFunctionInfo.modelvalidate(row["solution"]) yhat = extractanswerloose(result[0].text) reward = EVALFUNCTIONSrewardinfo.fxnname percategoryrewards[getproblemcategory(rewardinfo.problem_type)].append(reward)

for category, rewards in sorted(percategoryrewards.items()): print( f"In category {category!r} of {len(rewards)} questions," f" average reward was {statistics.mean(rewards):.3f}." ) accuracy = statistics.mean(itertools.chain.fromiterable(percategoryrewards.values())) print(f"Cumulative average reward across {len(testds)} questions was {accuracy:.3f}.") ```

Owner

  • Name: FutureHouse
  • Login: Future-House
  • Kind: organization
  • Email: help@futurehouse.org
  • Location: United States of America

Citation (CITATION.cff)

---
cff-version: 1.2.0
title: "Training a Scientific Reasoning Model for Chemistry"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
authors:
  - given-names: Siddharth M.
    family-names: Narayanan
  - given-names: James D.
    family-names: Braza
  - given-names: Ryan-Rhys
    family-names: Griffiths
  - given-names: Albert
    family-names: Bou
  - given-names: Geemi P.
    family-names: Wellawatte
  - given-names: Mayk
    family-names: Caldas Ramos
  - given-names: Ludovico
    family-names: Mitchener
  - given-names: Samuel G.
    family-names: Rodriques
  - given-names: Andrew D.
    family-names: White
identifiers:
  - type: doi
    value: 10.48550/arXiv.2506.17238
    description: ArXiv DOI
  - type: url
    value: https://arxiv.org/abs/2506.17238
    description: ArXiv abstract
repository-code: https://github.com/Future-House/ether0
keywords:
  - Artificial Intelligence
  - Chemistry
  - Computation and Language
  - Machine Learning
  - Reasoning Model
license: Apache-2.0
preferred-citation:
  authors:
    - given-names: Siddharth M.
      family-names: Narayanan
    - given-names: James D.
      family-names: Braza
    - given-names: Ryan-Rhys
      family-names: Griffiths
    - given-names: Albert
      family-names: Bou
    - given-names: Geemi P.
      family-names: Wellawatte
    - given-names: Mayk
      family-names: Caldas Ramos
    - given-names: Ludovico
      family-names: Mitchener
    - given-names: Samuel G.
      family-names: Rodriques
    - given-names: Andrew D.
      family-names: White
  date-published: 2025-06-04
  doi: 10.48550/arXiv.2506.17238
  journal: preprint
  title: "Training a Scientific Reasoning Model for Chemistry"
  type: article
  url: https://arxiv.org/abs/2506.17238

GitHub Events

Total
  • Create event: 9
  • Issues event: 2
  • Watch event: 64
  • Delete event: 6
  • Issue comment event: 10
  • Push event: 17
  • Public event: 1
  • Pull request review event: 25
  • Pull request review comment event: 10
  • Pull request event: 15
  • Fork event: 6
Last Year
  • Create event: 9
  • Issues event: 2
  • Watch event: 64
  • Delete event: 6
  • Issue comment event: 10
  • Push event: 17
  • Public event: 1
  • Pull request review event: 25
  • Pull request review comment event: 10
  • Pull request event: 15
  • Fork event: 6

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 0
  • Total pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: about 3 hours
  • Issue authors: 0
  • Pull request authors: 2
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jamesbraza (1)
  • cbartmann (1)
Pull Request Authors
  • jamesbraza (10)
  • whitead (2)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels
enhancement (9) bug (1)

Dependencies

.github/workflows/lint-test.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • astral-sh/setup-uv v6 composite
  • pre-commit-ci/lite-action v1.1.0 composite
  • pre-commit/action v3.0.1 composite
  • suzuki-shunsuke/github-action-renovate-config-validator v1.1.1 composite
packages/remotes/pyproject.toml pypi
  • OpenNMT-py ==2.3.0
  • fastapi *
  • molbloom >=2.3.4
  • molsol >=0.0.3
  • numpy >=1.20
  • pydantic >=2
  • rdkit *
  • torch <2.6
pyproject.toml pypi
  • datasets *
  • exmol >=3.3.0
  • httpx *
  • huggingface-hub *
  • molbloom ==2.3.4
  • pydantic >=2
  • rdkit *
  • regex *
  • tenacity *
uv.lock pypi
  • 196 dependencies