Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.6%) to scientific vocabulary
Repository
HealthBench
Basic Info
- Host: GitHub
- Owner: StanfordBDHG
- License: mit
- Language: Ruby
- Default Branch: main
- Size: 104 KB
Statistics
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Medicine on the Edge: On-Device LLM Benchmark 🏥📱
Overview
This repository contains the codebase for benchmarking on-device Large Language Models (LLMs) for clinical reasoning. The study evaluates the feasibility, accuracy, and performance of mobile LLM inference using the AMEGA medical benchmark dataset.
Abstract 📄
The deployment of Large Language Models (LLM) on mobile devices offers significant potential for medical applications, enhancing privacy, security, and cost-efficiency by eliminating reliance on cloud-based services and keeping sensitive health data local. However, the performance and accuracy of on-device LLMs in real-world medical contexts remain underexplored. In this study, we benchmark publicly available on-device LLMs using the AMEGA dataset, evaluating accuracy, computational efficiency, and thermal limitation across various mobile devices. Our results indicate that compact general-purpose models like Phi-3 Mini achieve a strong balance between speed and accuracy, while medically fine-tuned models such as Med42 and Aloe attain the highest accuracy. Notably, deploying LLMs on older devices remains feasible, with memory constraints posing a greater challenge than raw processing power. Our study underscores the potential of on-device LLMs for healthcare while emphasizing the need for more efficient inference and models tailored to real-world clinical reasoning.
Features
- Benchmarking mobile LLMs on real-world medical cases
- Supports on-device execution for Apple Silicon (iPhones & iPads)
- Thermal and battery performance tracking
- Evaluation using AMEGA benchmark dataset
- SpeziLLM & MLX-based inference for efficient on-device execution
Selcted Models 🏆
- Med42 🏅 (Highest Accuracy)
- Aloe 8B 🏅 (Top-tier medical LLM)
- Phi-3 Mini ⚡ (Best balance of speed & accuracy)
- DeepSeek R1 (Reasoning-focused LLM)
- Llama & Qwen Variants (General-purpose models)
- and more
Owner
- Name: Stanford Biodesign Digital Health
- Login: StanfordBDHG
- Kind: organization
- Location: United States of America
- Twitter: StanfordBDHG
- Repositories: 18
- Profile: https://github.com/StanfordBDHG
Citation (CITATION.cff)
# # This source file is part of the StanfordBDHG Template Application project # # SPDX-FileCopyrightText: 2023 Stanford University # # SPDX-License-Identifier: MIT # cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Schmiedmayer" given-names: "Paul" orcid: "https://orcid.org/0000-0002-8607-9148" title: "TemplateApplication" doi: 10.5281/zenodo.7633671 url: "https://github.com/StanfordBDHG/TemplateApplication"
GitHub Events
Total
- Watch event: 11
- Member event: 1
- Push event: 2
- Fork event: 1
- Create event: 2
Last Year
- Watch event: 11
- Member event: 1
- Push event: 2
- Fork event: 1
- Create event: 2