Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

HealthBench

Basic Info
  • Host: GitHub
  • Owner: StanfordBDHG
  • License: mit
  • Language: Ruby
  • Default Branch: main
  • Size: 104 KB
Statistics
  • Stars: 3
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Medicine on the Edge: On-Device LLM Benchmark 🏥📱

Overview

This repository contains the codebase for benchmarking on-device Large Language Models (LLMs) for clinical reasoning. The study evaluates the feasibility, accuracy, and performance of mobile LLM inference using the AMEGA medical benchmark dataset.

Abstract 📄

The deployment of Large Language Models (LLM) on mobile devices offers significant potential for medical applications, enhancing privacy, security, and cost-efficiency by eliminating reliance on cloud-based services and keeping sensitive health data local. However, the performance and accuracy of on-device LLMs in real-world medical contexts remain underexplored. In this study, we benchmark publicly available on-device LLMs using the AMEGA dataset, evaluating accuracy, computational efficiency, and thermal limitation across various mobile devices. Our results indicate that compact general-purpose models like Phi-3 Mini achieve a strong balance between speed and accuracy, while medically fine-tuned models such as Med42 and Aloe attain the highest accuracy. Notably, deploying LLMs on older devices remains feasible, with memory constraints posing a greater challenge than raw processing power. Our study underscores the potential of on-device LLMs for healthcare while emphasizing the need for more efficient inference and models tailored to real-world clinical reasoning.

Features

  • Benchmarking mobile LLMs on real-world medical cases
  • Supports on-device execution for Apple Silicon (iPhones & iPads)
  • Thermal and battery performance tracking
  • Evaluation using AMEGA benchmark dataset
  • SpeziLLM & MLX-based inference for efficient on-device execution

Selcted Models 🏆

  • Med42 🏅 (Highest Accuracy)
  • Aloe 8B 🏅 (Top-tier medical LLM)
  • Phi-3 Mini ⚡ (Best balance of speed & accuracy)
  • DeepSeek R1 (Reasoning-focused LLM)
  • Llama & Qwen Variants (General-purpose models)
  • and more

Owner

  • Name: Stanford Biodesign Digital Health
  • Login: StanfordBDHG
  • Kind: organization
  • Location: United States of America

Citation (CITATION.cff)

#
# This source file is part of the StanfordBDHG Template Application project
#
# SPDX-FileCopyrightText: 2023 Stanford University
#
# SPDX-License-Identifier: MIT
#

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Schmiedmayer"
  given-names: "Paul"
  orcid: "https://orcid.org/0000-0002-8607-9148"
title: "TemplateApplication"
doi: 10.5281/zenodo.7633671
url: "https://github.com/StanfordBDHG/TemplateApplication"

GitHub Events

Total
  • Watch event: 11
  • Member event: 1
  • Push event: 2
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 11
  • Member event: 1
  • Push event: 2
  • Fork event: 1
  • Create event: 2

Dependencies

.github/workflows/beta-deployment.yml actions
.github/workflows/build-and-test.yml actions