verifact
Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Basic Info
Statistics
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
Preprint Manuscript: VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records
VeriFactis a long-form text fact-checker that verifies any text written about a patient against their own electronic health record (EHR). VeriFact decomposes the text into a set of propositions which are individually verified against the patient's EHR. VeriFact combines RAG with LLM-as-a-Judge to perform fact verification.
VeriFact-BHC is a dataset to benchmark VeriFact performance against human clinicians. This dataset is derived from MIMIC-III Clinical Database v1.4. It contains human-written Brief Hospital Course (BHC) narratives typically found in discharge summaries and also a LLM-written BHC for 100 patients. It also contains the reference EHR for each patient. All BHC narratives are decomposed into propositions which are annotated by clinicians to develop a human clinician ground truth.
Scripts
Scripts to generate the unannotated VeriFact-BHC dataset, run the VeriFact system to generate AI rater labels, and compute interrater agreement and classificaiton metrics are contained in scripts. These scripts rely on the locally-deployed services which are described below.
Environment Variables
Add your environment variables in .env.
```sh
Hugging Face Token: https://huggingface.co/docs/hub/en/security-tokens
HFTOKEN=${HUGGINGFACEREAD_TOKEN}
Hugging Face Cache
HF_HOME=${HOME}/.cache/huggingface
Local Machine URL
SERVERBASEURL=localhost
Traefik Configuration
ADMIN_EMAIL=email@domain.edu ```
If you plan to commit this code to a public repo, git ignore the .env file so you do not commit your secrets. The .env is made available in this repo for visibility to default environment variables which are used by docker containers and scripts.
Python Environment
```sh
Create python virtual environment
uv venv
Create/update lock file (only if needed, otherwise skip this step)
uv lock
Sync virtual environment with lockfile specification
uv sync --all-packages ```
Services
All models used in VeriFact are local open-source models which can be launched using the provided docker-compose.yml configuration.
Local services include:
- Local Embedding Model (requires GPU): customized
infinityinference engine to serve the BAAI/bge-m3 model with both dense and sparse embedding generation. - Local Rerank Model (requires GPU): customized
infinityinference engine to serve the BAAI/bge-reranker-v2-m3 reranking model. - Local LLM Inference Service (requires GPU):
vLLMserving for hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4. - Vector Database: locally hosted
qdrantvector database - Traefik: router, reverse proxy, load balancer
- Redis: key-value store for redis-queue
- Redis-Queue (RQ) Dashboard: monitoring
rqjobs - Prometheus + Grafana: monitoring dashboard for
vLLM.
These services are all containerized using docker. Docker Compose is used to coordinate launching and stopping these microservices.
```sh
Start All Services (in detached mode)
docker compose up -d
Check All Services Running
docker ps
Check Logs
docker logs
Inspect Each Container
docker exec -it
Stop All Services
docker compose down ```
Example Service Deployment
LLM Inference is significantly more compute intensive than Embedding or Reranking. Thus it is recommended to setup LLMs in data parallel configuration. Embedding and Reranking models can share a GPU.
On a server with 4-GPUs (using the docker-compose.yml in this project):
```sh
Launch Traefik for reverse proxy & load balancing
Traefik Dashboard: ${SERVER_URL}:8090/dashboard
docker compose up traefik -d
Launch Qdrant for vector database, Redis & RQ-Dashboard for tracking tasks in queue
Qdrant Dashboard: ${SERVER_URL}:6333/dashboard
Redis Stack Dashboard: ${SERVER_URL}:6380/redis-stack
RQ-Dashboard: ${SERVER_URL}:9181
docker compose up qdrant redis rq-dashboard -d
Launch Local LLM Inference API in Tensor Parallel Configuration (uses vLLM)
The default LLM is a quantized Llama 3.1 70B model, which requires 37GB VRAM for the model itself. This container configures the LLM inference service in tensor parallelism which splits model weights across 2 GPUs.
docker compose up llm-tp2 -d
Alternatively, launch local LLM Inference on a single GPU. Multiple docker containers can be launched and traefik will distribute API requests across the LLM containers in round-robin fashion
docker compose up llm0 llm1 llm2 -d
Launch Prometheus, Grafana dashboards for monitoring vLLM inference throughput
Prometheus Dashboard: ${SERVER_URL}:9090
Grafana Dashboard: ${SERVER_URL}:3000
docker compose up prometheus grafana -d
Launch Embedding & Rerank Inference API on GPU3 (uses Infinity Embeddings)
These containers are customized for compatibility with BGE-M3 model and to reduce VRAM use
docker compose up embed1 rerank1 -d ```
Specific configurations for ports and URLs are found in the .env file that docker-compose.yml references.
Docker services are reached via Traefik reverse proxy and load balancer. Using Traefik, multiple docker containers providing LLM inference can service the same API endpoint. Same is true for embedding and rerank inference services. Traefik will load balance the API requests equally across docker containers hosting the same service.
Parallel tasks are managed using rq which is a queue backed by redis.
The vLLM inference service metrics are monitored via Prometheus and a Grafana dashboard. Prometheus and Grafana setup is described in verifact/services/vllm/monitoring/README.md.
Performance
Performance of locally-hosted models is dependent on your GPU accelerator and local hardware. Lower latency and higher throughput may be achieved by replacing locally-hosted models with dedicated API inference services.
Citation
@article{Chung2025,
title={VeriFact: Verifying Facts in LLM-Generated Clinical Text with Electronic Health Records},
author={Philip Chung and Akshay Swaminathan and Alex J. Goodell and Yeasul Kim and S. Momsen Reincke and Lichy Han and Ben Deverett and Mohammad Amin Sadeghi and Abdel-Badih Ariss and Marc Ghanem and David Seong and Andrew A. Lee and Caitlin E. Coombes and Brad Bradshaw and Mahir A. Sufian and Hyo Jung Hong and Teresa P. Nguyen and Mohammad R. Rasouli and Komal Kamra and Mark A. Burbridge and James C. McAvoy and Roya Saffary and Stephen P. Ma and Dev Dash and James Xie and Ellen Y. Wang and Clifford A. Schmiesing and Nigam Shah and Nima Aghaeepour},
year={2025},
eprint={2501.16672},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2501.16672},
}
Owner
- Name: Philip Chung
- Login: philipchung
- Kind: user
- Repositories: 13
- Profile: https://github.com/philipchung
Citation (CITATION.cff)
cff-version: 1.2.0
title: >-
VeriFact: Verifying Facts in LLM-Generated Clinical Text
with Electronic Health Records
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Philip
family-names: Chung
orcid: "https://orcid.org/0000-0002-1194-7510"
- given-names: Akshay
family-names: Swaminathan
orcid: "https://orcid.org/0000-0003-3426-9289"
- given-names: Alex
family-names: Goodell
orcid: "https://orcid.org/0000-0003-0229-8843"
- given-names: Yeasul
family-names: Kim
orcid: "https://orcid.org/0000-0001-8289-1297"
- given-names: S. Momsen
family-names: Reincke
orcid: "https://orcid.org/0000-0002-8132-3527"
- given-names: Lichy
family-names: Han
orcid: "https://orcid.org/0000-0002-5785-0968"
- given-names: Ben
family-names: Deverett
orcid: "https://orcid.org/0000-0002-3119-7649"
- given-names: Mohammad Amin
family-names: Sadeghi
orcid: "https://orcid.org/0000-0003-3335-1758"
- given-names: Abdel Badih
family-names: Ariss
orcid: "https://orcid.org/0000-0003-0269-3130"
- given-names: Marc
family-names: Ghanem
orcid: "https://orcid.org/0000-0002-7479-7994"
- given-names: David
family-names: Seong
orcid: "https://orcid.org/0000-0002-8980-5731"
- given-names: Andrew
family-names: Lee
orcid: "https://orcid.org/0009-0006-8964-6677"
- given-names: Caitlin
family-names: Coombes
orcid: "https://orcid.org/0000-0001-8414-4279"
- given-names: Brad
family-names: Bradshaw
orcid: "https://orcid.org/0000-0001-5371-9682"
- given-names: Mahir
family-names: Sufian
orcid: "https://orcid.org/0000-0002-9702-4556"
- given-names: Hyo Jung
family-names: Hong
orcid: "https://orcid.org/0000-0001-7674-8398"
- given-names: Teresa
family-names: Nguyen
orcid: "https://orcid.org/0000-0001-9522-8937"
- given-names: Mohammad
family-names: Rasouli
orcid: "https://orcid.org/0000-0001-7181-5803"
- given-names: Komal
family-names: Kamra
orcid: "https://orcid.org/0000-0003-4700-583X"
- given-names: Mark
family-names: Burbridge
orcid: "https://orcid.org/0000-0001-6765-5739"
- given-names: James
family-names: McAvoy
orcid: "https://orcid.org/0009-0006-3838-5438"
- given-names: Roya
family-names: Saffary
orcid: "https://orcid.org/0000-0001-9959-9399"
- given-names: Stephen
family-names: Ma
orcid: "https://orcid.org/0000-0003-3738-9569"
- given-names: Dev
family-names: Dash
orcid: "https://orcid.org/0000-0002-0223-1641"
- given-names: James
family-names: Xie
orcid: "https://orcid.org/0000-0002-9511-0012"
- given-names: Ellen
family-names: Wang
orcid: "https://orcid.org/0000-0002-9151-938X"
- given-names: Clifford
family-names: Schmiesing
orcid: "https://orcid.org/0000-0002-8979-5959"
- given-names: Nigam
family-names: Shah
orcid: "https://orcid.org/0000-0001-9385-7158"
- given-names: Nima
family-names: Aghaeepour
orcid: "https://orcid.org/0000-0002-6117-8764"
identifiers:
- type: doi
value: 10.48550/arXiv.2501.16672
description: arXiv Preprint
repository-code: "https://github.com/philipchung/verifact"
abstract: >-
VeriFact: A long-form text fact-checker that verifies any
text written about a patient against their own electronic
health record (EHR). VeriFact decomposes the text into a
set of propositions which are individually verified
against the patient's EHR. VeriFact combines RAG with
LLM-as-a-Judge to perform fact verification.
keywords:
- Fact Checking
- Evaluation
- Large Language Models
- Medicine
- Electronic Health Records
license: MIT
GitHub Events
Total
- Watch event: 16
- Push event: 4
- Public event: 1
Last Year
- Watch event: 16
- Push event: 4
- Public event: 1
Dependencies
- cjlapao/rq-dashboard 0.7.1
- grafana/grafana 11.4.0-ubuntu
- infinity/embed latest
- infinity/rerank latest
- prom/prometheus v2.55.1
- qdrant/qdrant v1.10.0
- redis/redis-stack latest
- traefik v3.0
- vllm/vllm-openai v0.6.4
- michaelf34/infinity 0.0.53 build
- michaelf34/infinity 0.0.53 build
- irrcac >=0.4.4
- krippendorff >=0.8.0
- numpy >=1.26.4
- openpyxl >=3.1.5
- pandas >=2.2.3
- pingouin >=0.5.5
- pydantic >=2.10.3
- scipy >=1.12.0
- llama-index >=0.12.4
- pandas >=2.2.3
- pydantic >=2.10.3
- qdrant-client >=1.12.1
- tqdm >=4.67.1
- llama-index >=0.12.4
- pandas >=2.2.3
- tqdm >=4.67.1
- llama-index >=0.12.4
- pydantic >=2.10.3
- tqdm >=4.67.1
- pydantic >=2.10.3
- llama-index >=0.12.4
- llama-index-core >=0.12.4
- llama-index-embeddings-huggingface >=0.4.0
- llama-index-llms-azure-openai >=0.3.0
- llama-index-llms-huggingface >=0.4.0
- llama-index-llms-openai >=0.3.3
- llama-index-llms-openai-like >=0.3.0
- llama-index-vector-stores-qdrant >=0.4.0
- pydantic >=2.10.3
- qdrant-client >=1.12.1
- redis >=5.2.1
- rq >=1.16.2,<2.0.0
- rq-dashboard >=0.8.2.2
- setproctitle >=1.3.4
- tqdm >=4.67.1
- python-dotenv >=1.0.1
- tenacity >=8.0.0
- tiktoken >=0.8.0
- tqdm >=4.67.1
- httpx >=0.28.1
- ipykernel >=6.29.0
- ipywidgets >=8.1.5
- jupyter >=1.1.0
- matplotlib >=3.9.2
- mypy >=1.13.0
- nest-asyncio >=1.6.0
- numpy >=1.26.4
- openai >=1.57.0
- pandas >=2.2.0
- pandas-stubs >=2.2.3.241126
- pyarrow >=18.1.0
- python-dotenv >=1.0.1
- ruff >=0.8.2
- scipy >=1.12.0
- seaborn >=0.13.0
- tqdm >=4.67.1
- transformers >=4.46.3
- typer >=0.15.1
- types-tqdm >=4.67.0.20241119