https://github.com/bowang-lab/bioreason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

https://github.com/bowang-lab/bioreason

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

bioinformatics computational-biology dna foundation-models grpo large-language-models reasoning
Last synced: 6 months ago · JSON representation

Repository

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model

Basic Info
Statistics
  • Stars: 168
  • Watchers: 4
  • Forks: 27
  • Open Issues: 5
  • Releases: 0
Topics
bioinformatics computational-biology dna foundation-models grpo large-language-models reasoning
Created 9 months ago · Last pushed 9 months ago
Metadata Files
Readme License

README.md

🧬 BioReason
Incentivizing Multimodal Biological Reasoning
within a DNA-LLM Model

arXiv GitHub Website HuggingFace Dataset


Updates [Jun 10, 2025]

  • We are integrating vLLM to improve the speed and efficiency of the GRPO pipeline. We expect this to be pushed by end of week.
  • Checkpoints along with the custom DNA-LLM model class will be released on HuggingFace by end of week.
  • More training results with GRPO will be shared soon.


Abstract

Unlocking deep, interpretable biological reasoning from complex genomic data is a major AI challenge hindering scientific discovery. Current DNA foundation models, despite strong sequence representation, struggle with multi-step reasoning and lack inherent transparent, biologically intuitive explanations. We introduce BioReason, a pioneering architecture that, for the first time, deeply integrates a DNA foundation model with a large language model (LLM). This novel connection enables the LLM to directly process and reason with genomic information as a fundamental input, fostering a new form of multimodal biological understanding. BioReason's sophisticated multi-step reasoning is developed through supervised fine-tuning and targeted reinforcement learning, guiding the system to generate logical, biologically coherent deductions. On biological reasoning benchmarks including KEGG-based disease pathway prediction—where accuracy improves from 88% to 97%—and variant effect prediction, BioReason demonstrates an average 15% performance gain over strong single-modality baselines.


Key Contributions

Novel multimodal architecture: The first successful integration of a DNA foundation model with an LLM, establishing a new methodology for AI-driven biological studies.

Advanced reasoning methodology: A systematic training approach combining supervised fine-tuning and reinforcement learning that incentivizes multi-step biological reasoning.

New biological reasoning benchmarks: Development and curation of novel benchmarks for evaluating biological reasoning capabilities, including an annotated reasoning dataset for gene pathway and disease prediction from KEGG.

Empirical performance improvements: Demonstration that BioReason outperforms both DNA foundation models and LLMs used independently or in simple combination, with average performance gains of 15%+ over baseline.

Interpretable reasoning traces: A mechanism for generating step-by-step biological reasoning traces that provide interpretable predictions, enhancing scientific insight and hypothesis generation.


Datasets

The datasets used to train and evaluate BioReason can be found on our HuggingFace collection with detailed download and usage instructions.


Checkpoints

We will release the checkpoints soon!


Installation

Prerequisites

  • Python 3.11+
  • CUDA/GPU for best performance

Installation Steps

```bash

Clone the repository

git clone https://github.com/bowang-lab/BioReason.git cd BioReason

Install package

pip install -e . ```


Results

KEGG-Derived Biological Reasoning Task

Performance comparison on 290 test datapoints for multi-step mechanistic reasoning:

| Model | Accuracy | F1-Score | Precision | Recall | |-------|----------|----------|-----------|---------| | [DNA] NT - 500M | 86.55 | 69.76 | 73.23 | 66.61 | | [DNA] Evo2 - 1B | 88.28 | 72.43 | 75.23 | 69.83 | | [LLM] Qwen3 - 1B | 85.17 | 65.71 | 71.39 | 64.19 | | [LLM] Qwen3 - 4B | 93.48 | 85.44 | 88.31 | 86.72 | | [DNA-LLM] NT + Qwen3 - 1B | 88.42 | 72.13 | 75.42 | 71.91 | | [DNA-LLM] NT + Qwen3 - 1B (+RL) | 89.66 | 74.11 | 78.82 | 72.96 | | [DNA-LLM] NT + Qwen3 - 4B | 96.90 | 89.03 | 90.99 | 89.38 | | [DNA-LLM] Evo2 + Qwen3 - 1B | 90.42 | 75.62 | 77.42 | 73.91 | | [DNA-LLM] Evo2 + Qwen3 - 4B | 97.24 | 86.30 | 86.75 | 87.25 |

Variant Effect Prediction Benchmarks

Performance on pathogenic/benign classification:

| Model | Variant Effect - Coding | | Variant Effect - Non-SNV | | |-------|------------|----------|------------|----------| | | Accuracy | F1-Score | Accuracy | F1-Score | | [DNA] NT - 500M | 60.91 | 45.20 | 67.93 | 65.97 | | [DNA] Evo2 - 1B | 70.07 | 49.19 | 76.17 | 66.51 | | [LLM] Qwen3 - 1B | 46.55 | 34.82 | 70.67 | 76.21 | | [LLM] Qwen3 - 4B | 48.99 | 39.58 | 61.86 | 67.60 | | [DNA-LLM] NT + Qwen3 - 1B | 55.58 | 54.50 | 72.82 | 76.93 | | [DNA-LLM] NT + Qwen3 - 4B | 60.94 | 55.66 | 65.59 | 73.00 | | [DNA-LLM] Evo2 + Qwen3 - 1B | 72.83 | 68.90 | 88.20 | 89.91 | | [DNA-LLM] Evo2 + Qwen3 - 4B | 80.21 | 80.00 | 83.85 | 85.02 |


Citation

If you find this work useful, please cite our paper:

bibtex @misc{fallahpour2025bioreasonincentivizingmultimodalbiological, title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model}, author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang}, year={2025}, eprint={2505.23579}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2505.23579}, }


Authors

  • Adibvafa Fallahpour¹²³⁵ * (adibvafa.fallahpour@mail.utoronto.ca)
  • Andrew Magnuson¹² *
  • Purav Gupta¹² *
  • Shihao Ma¹²³
  • Jack Naimer¹²³
  • Arnav Shah¹²³
  • Haonan Duan¹²
  • Omar Ibrahim³
  • Hani Goodarzi†⁴⁶
  • Chris J. Maddison†¹²⁷
  • Bo Wang†¹²³

¹ University of Toronto ² Vector Institute ³ University Health Network (UHN)
⁴ Arc Institute ⁵ Cohere ⁶ University of California, San Francisco ⁷ Google DeepMind


* Equal contribution
† Equal advising


Made with ❤️ at University of Toronto, Vector Institute, and University Health Network

Owner

  • Name: WangLab @ U of T
  • Login: bowang-lab
  • Kind: organization
  • Location: 190 Elizabeth St, Toronto, ON M5G 2C4 Canada

BoWang's Lab at University of Toronto

GitHub Events

Total
  • Create event: 4
  • Issues event: 9
  • Watch event: 172
  • Delete event: 1
  • Issue comment event: 7
  • Member event: 3
  • Public event: 1
  • Push event: 12
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 25
Last Year
  • Create event: 4
  • Issues event: 9
  • Watch event: 172
  • Delete event: 1
  • Issue comment event: 7
  • Member event: 3
  • Public event: 1
  • Push event: 12
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 25

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 18
  • Total Committers: 2
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.111
Past Year
  • Commits: 18
  • Committers: 2
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.111
Top Committers
Name Email Commits
Adibvafa Fallahpour 9****a 16
magnuso7 m****7@v****l 2

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 9
  • Total pull requests: 1
  • Average time to close issues: about 5 hours
  • Average time to close pull requests: 8 minutes
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 0.44
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 1
  • Average time to close issues: about 5 hours
  • Average time to close pull requests: 8 minutes
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 0.44
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Adibvafa (6)
  • wangnuo-log (2)
  • Hsu-Che-Wei (1)
  • JunSeok94 (1)
  • zouni6666 (1)
Pull Request Authors
  • ajwm8103 (2)
Top Labels
Issue Labels
enhancement (5) bug (4) documentation (1)
Pull Request Labels

Dependencies

pyproject.toml pypi
  • accelerate *
  • bitsandbytes *
  • datasets *
  • deepspeed *
  • jupyter *
  • peft *
  • pytorch_lightning *
  • qwen-vl-utils *
  • torch *
  • torchvision *
  • transformers *
  • trl [vllm]
  • wandb *
requirements.txt pypi
  • accelerate *
  • bitsandbytes *
  • datasets *
  • deepspeed *
  • jupyter *
  • peft *
  • pytorch_lightning *
  • qwen-vl-utils *
  • torch *
  • torchvision *
  • transformers *
  • trl *
  • wandb *