Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.0%) to scientific vocabulary
Keywords
Repository
Benchmarking Large Language Models for FHIR
Basic Info
Statistics
- Stars: 30
- Watchers: 0
- Forks: 3
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
@flexpa/llm-fhir-eval
[!NOTE] Follow the development progress on FHIR Chat.
Overview
@flexpa/llm-fhir-eval is an evaluation framework designed to benchmark the performance of LLMs on FHIR-specific tasks including generation, validation, and extraction. This framework systematically tests and validates the capabilities of LLMs in handling various healthcare-interoperability related tasks, ensuring they meet the standards required for effective FHIR implementations. It implements evaluations from prior art such as FHIR-GPT.
Benchmark
@flexpa/llm-fhir-eval benchmarks FHIR-specific tasks including:
FHIR Resource Generation:
- Generate accurate FHIR resources such as
Patient,Observation,MedicationStatement, etc. - Test the ability to create complex resource relationships and validate terminology bindings.
- Generate accurate FHIR resources such as
FHIR Resource Validation:
- Validate FHIR resources using operations like
$validate. - Check for schema compliance, required field presence, and value set binding verification.
- Validate FHIR resources using operations like
Data Extraction:
- Extract structured FHIR-compliant data from clinical notes and other unstructured data.
- Evaluate the proficiency of LLMs in extracting specific healthcare data elements.
Tool Use:
- Test models' ability to use FHIR validation tools and other healthcare-specific functions.
- Validate proper tool calling for FHIR operations.
Available Evaluations
Data Extraction (
evals/extraction/)- Description: Comprehensive evaluation of extracting structured FHIR data from unstructured clinical text.
- Configurations: Both minimalist and specialist approaches available.
- Test categories: Basic demographics, conditions, explanations of benefit, medication requests, observations.
FHIR Resource Generation (
evals/generation/)- Description: Tests the ability to generate valid FHIR resources and bundles.
- Configurations: Zero-shot bundle generation and multi-turn tool use scenarios.
- Models supported: GPT-3.5-turbo, GPT-4.1, O3 (low/high reasoning), Claude 3.5 Haiku, Claude 3.5 Sonnet, Claude Sonnet 4, Claude Opus 4
Custom Assertions
The framework includes custom assertion functions:
fhirPathEquals.mjs: Validates FHIR Path expressionsisBundle.mjs: Checks if output is a valid FHIR BundlemetaElementMissing.mjs: Validates required metadata elementsvalidateOperation.mjs: Validates FHIR operation results
Tools
validateFhirBundle.mjs: Tool for validating FHIR Bundle resources
Custom Providers
AnthropicMessagesWithRecursiveToolCallsProvider.ts: Enhanced Anthropic provider with recursive tool calling (up to 10 depth levels)OpenAiResponsesWithRecursiveToolCallsProvider.ts: Enhanced OpenAI provider with recursive tool calling
These providers enable multi-turn tool interactions where models can iteratively call validation tools to improve their FHIR resource generation.
Commands to Run Evaluations
Install dependencies and set up environment variables:
bash
yarn install
Copy the .env.template file to .env and supply your API keys for the models you plan to test.
Run an evaluation:
```bash
Example: Run the extraction evaluation with minimalist config
promptfoo eval -c evals/extraction/config-minimalist.yaml
Example: Run the FHIR bundle generation evaluation
promptfoo eval -c evals/generation/config-zero-shot-bundle.yaml
Example: Run multi-turn tool use evaluation
promptfoo eval -c evals/generation/config-multi-turn-tool-use.js ```
The evaluation will print its performance metrics to the console and optionally save results to files.
Owner
- Name: Flexpa
- Login: flexpa
- Kind: organization
- Website: https://www.flexpa.com
- Twitter: flexpa
- Repositories: 2
- Profile: https://github.com/flexpa
Citation (CITATION.cff)
cff-version: 1.2.0
message: 'If you use this software, please cite it as below.'
authors:
- family-names: 'Kelly'
given-names: 'Joshua'
orcid: 'https://orcid.org/0009-0000-7191-0595'
title: 'FHIR LLM Eval'
version: 0.0.1
date-released: 2024-11-22
url: 'https://github.com/flexpa/fhir-llm-evals'
GitHub Events
Total
- Watch event: 33
- Push event: 13
- Fork event: 5
- Create event: 1
Last Year
- Watch event: 33
- Push event: 13
- Fork event: 5
- Create event: 1
Dependencies
- @types/bun latest development
- openai ^4.53.1