annodash

[JAMIA Open] A clinical terminology annotation dashboard created using Plotly Dash & the MIMIC-IV database.

https://github.com/justin13601/annodash

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

annotation clinical-data dashboard data-visualization nlp-machine-learning python

Last synced: 6 months ago · JSON representation ·

Repository

[JAMIA Open] A clinical terminology annotation dashboard created using Plotly Dash & the MIMIC-IV database.

Basic Info

Host: GitHub
Owner: justin13601
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 191 MB

Statistics

Stars: 9
Watchers: 2
Forks: 3
Open Issues: 4
Releases: 0

Topics

annotation clinical-data dashboard data-visualization nlp-machine-learning python

Created almost 4 years ago · Last pushed about 2 years ago

Metadata Files

Readme License Citation

AnnoDash

A Clinical Terminology Annotation Dashboard
(Supports LOINC®, SNOMED CT, ICD-10-CM, OMOP v5)

Table of Contents

About
Getting Started
- Requirements
- Installation
Usage
Demo Data
License
Acknowledgments

About

AnnoDash is a deployable clinical terminology annotation dashboard developed primarily in Python using Plotly Dash. It allows users to annotate medical concepts on a straightforward interface supported by visualizations of associated patient data and natural language processing.

The dashboard seeks to provide a flexible and customizable solution for clinical annotation. Recent large language models (LLMs) are supported to aid the annotation process. Additional extensions, such as machine learning-powered plugins and search algorithms, can be easily added by technical experts.

A demo with chartevents & d_items from the MIMIC-IV v2.2 icu module is available under releases.

Previously featured on Plotly & Dash 500!

Overview

Home The top left section of the dashboard features a dropdown to keep track of target concepts the user wishes to annotate. The target vocabulary is also selected in a dropdown in this section. The top right module contains the data visualization component. The bottom half of the dashboard includes modules dedicated to querying and displaying candidate ontology codes.

Data Visualization

| graph1 | graph2 | |------------------------------------|-------------------------------------|

The dashboard is supported by visualization of relevant patient data. For any given target concept, patient observations are queried from the source data. The Distribution Overview tab contains a distribution summarizing all patient observations. Sample Records selects the top 5 patients (as ranked by most observations) and displays their records over a 96-hour window. Both numerical and text data are supported. The format of the source data is detailed below in Usage.

Annotation

The user annotates target concepts by first selecting the to-be annotated item in the first dropdown. The following dropdown allows users to select the target ontology. Several default vocabularies are available, but users are free to modify the dashboard for additional ontology support via scripts detailed in Other Relevant Files. Code suggestions are then generated in the bottom table. Users are able to select their target annotation via clicking and the appropriate data is saved in .json files after submission.

Ontology Search & Ranking

The dashboard automatically generates ontology code suggestions based on the target concept. A string search supported by PyLucene and the Porter stemming algorithm sorts results by relevance, as indicated by the colour of the circle icons. Several other methods of string search are available, such as full text search using SQLite3's FTS5 or ElasticSearch, vector search using TF-IDF, and similarity scoring using Jaro-Winkler/Fuzzy partial ratios. NLM UMLS API is also available for the SNOMED CT ontology.

After searching, the dashboard is able to re-rank ontology codes using LLMs. Currently, OpenAI's GPT-3.5 API and CohereAI's re-ranking API endpoint is supported by default. LLM re-ranking is disabled by default; however, if desired, API keys will be required along with associated costs.

(back to top)

Getting Started

Below are steps to download, install, and run the dashboard locally. Leave all configuration fields unchanged to run the demo using MIMIC-IV data.

Requirements

The dashboard requires the following major Python packages to run:

Dash~=2.6.0
Pandas~=1.4.2
Plotly~=5.8.0
NumPy~=1.22.3
PyYAML~=6.0
SciPy~=1.7.3

All other packages are listed in requirements.txt.

Additionally, the latest version of the dashboard requires PyLucene 8 for its primary ontology code searching algorithm. Please follow setup instructions available here.

Required Files:

A .csv file containing all patient observations/data (missingness allowed, except for the itemid column): itemid,subject_id,charttime,value,valueuom 52038,123,2150-01-01 10:00:00,5,mEq/L 52038,123,2150-01-01 11:00:00,6,ug/mL ...
A .csv file containing all concepts to be annotated in id-label pairs, {id: label}: itemid,label 52038,Base Excess 52041,pH ...
The config.yaml:
- Define results directory (default: /results-json/demo)
- Define location of the source data .csv (default: /demo-data/CHARTEVENTS.csv)
- Define location of the concepts .csv (default: /demo-data/demo_chartevents_user_1.csv)
- Define location of ontology SQLite3 databases (default: /ontology)
- Define string search algorithm (default: pylucene)
- Define ranking algorithm (default: None)
- Define dashboard aesthetics for graphs (defaults are shown in the configuration file)

Using ElasticSearch:

To utilize ElasticSearch as the string search algorithm, run a local ElasticSearch cluster via Docker and specify ' elastic' in the appropriate configuration field:

sh docker run --rm -p 9200:9200 -p 9300:9300 -e "xpack.security.enabled=false" -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:8.7.0

Using APIs:

If desired, please define your API keys (OpenAI, CohereAI, NLM UMLS) as environment variables prior to running the dashboard. This can be done explicitly via editing the Docker Compose file below.

(back to top)

Installation

Docker Install (Recommended):

Clone repository: sh git clone https://github.com/justin13601/AnnoDash.git
Install packages needed to generate a configuration file: sh pip install PyYAML ml_collections
Edit /src/generate_config.py with desired directories and configurations and run: sh python3 generate_config.py This creates the config.yaml required by the dashboard.
Build dashboard image: sh docker build -t annodash .
Retrieve the Docker image ID and run the Docker container:

Get <IMAGE ID>: sh docker images

Copy appropriate <IMAGE ID> and start the container: sh docker run --publish 8080:8080 <IMAGE ID>

Manual Install:

Clone repository: sh git clone https://github.com/justin13601/AnnoDash.git
Install requirements: sh pip install -r requirements.txt
Install PyLucene and associated Java libraries. ```sh

use shell scripts to install jcc and pylucene

```
Edit /src/generate_config.py with desired directories and configurations and run: sh python3 generate_config.py This creates the config.yaml required by the dashboard.
Run dashboard: sh python3 main.py

(back to top)

Usage

Install/run the dashboard and visit http://127.0.0.1:8080/ or http://localhost:8080/.

Other Relevant Files

/src/generate_config.py is used to generate the config.yaml file.

/src/generate_ontology_database.py uses SQLite3 to generate the .db database files used to store the ontology vocabulary. This is needed when defining custom vocabularies outside the default list of available ones. External packages are required to execute this script. In particular, PyMedTermino is needed to generate SNOMED CT's database file. Please see installation instructions here.

/src/generate_pylucene_index.py is used to generate the index used by PyLucene for ontology querying. This is needed when defining custom vocabularies outside the default list of available ones.

/src/generate_elastic_index.py is used to generate the index used by ElasticSearch for ontology querying. This is needed when defining custom vocabularies outside the default list of available ones. This can be run only after a local ElasticSearch cluster is created via Docker.

/src/search.py includes classes for ontology searching.

/src/rank.py includes classes for ontology ranking.

(back to top)

Demo Data

Demo data and respective licenses are included in the demo-data folder.

MIMIC-IV Clinical Database demo is available on Physionet (Johnson, A., Bulgarelli, L., Pollard, T., Horng, S., Celi, L. A., & Mark, R. (2023). MIMIC-IV Clinical Database Demo (version 2.2). PhysioNet. https://doi.org/10.13026/dp1f-ex47).
LOINC® Ontology Codes are available at https://loinc.org.
SNOMED CT Ontology Codes are available at https://www.nlm.nih.gov/healthit/snomedct/index.html.
ICD-10-CM Codes are available at https://www.cms.gov/medicare/icd-10/2022-icd-10-cm.
OMOP v5 Codes are available at https://athena.ohdsi.org/search-terms/start.

(back to top)

License

Distributed under the MIT License.

(back to top)

Acknowledgments

Alistair Johnson, DPhil | The Hospital for Sick Children | Scientist
Mjaye Mazwi, MBChB, MD | The Hospital for Sick Children | Staff Physician
Danny Eytan, MD, PhD | The Hospital for Sick Children | Staff Physician
Oshri Zaulan, MD | The Hospital for Sick Children | Staff Intensivist
Azadeh Assadi, MN | The Hospital for Sick Children | Pediatric Nurse Practitioner

(back to top)

Owner

Name: Justin Xu
Login: justin13601
Kind: user
Location: Palo Alto, CA
Company: Stanford Center for Artificial Intelligence in Medicine & Imaging (AIMI)

Website: justin13601.github.io
Twitter: JustinX13601
Repositories: 24
Profile: https://github.com/justin13601

PhD Candidate @ Oxford | Fulbrighter @ Stanford

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'AnnoDash, a clinical terminology annotation dashboard'
message: Please cite AnnoDash using the metadata from this file.
type: software
authors:
  - given-names: Justin
    family-names: Xu
    email: justin.xu@mail.utoronto.ca
    affiliation: The Hospital for Sick Children
    orcid: 'https://orcid.org/0000-0003-4700-6277'
  - given-names: Mjaye
    family-names: Mazwi
    email: mjaye.mazwi@sickkids.ca
    affiliation: The Hospital for Sick Children
    orcid: 'https://orcid.org/0000-0003-1345-5429'
  - given-names: Alistair E W
    family-names: Johnson
    email: alistair.johnson@sickkids.ca
    orcid: 'https://orcid.org/0000-0002-8735-3014'
    affiliation: The Hospital for Sick Children
identifiers:
  - type: doi
    value: 10.5281/zenodo.8043943
    description: 'AnnoDash, a clinical terminology annotation dashboard'
repository-code: 'https://github.com/justin13601/AnnoDash'
abstract: >-
  Background: Standard ontologies are critical for
  interoperability and multisite analyses of health data.
  Nevertheless, mapping concepts to ontologies is often done
  with generic tools and is labor-intensive. Contextualizing
  candidate concepts within source data is also done in an
  ad hoc manner.


  Methods and Results: We present AnnoDash, a flexible
  dashboard to support annotation of concepts with terms
  from a given ontology. Text-based similarity is used to
  identify likely matches, and large language models are
  used to improve ontology ranking. A convenient interface
  is provided to visualize observations associated with a
  concept, supporting the disambiguation of vague concept
  descriptions. Time-series plots contrast the concept with
  known clinical measurements. We evaluated the dashboard
  qualitatively against several ontologies (SNOMED CT,
  LOINC, etc.) by using MIMIC-IV measurements. The dashboard
  is web-based and step-by-step instructions for deployment
  are provided, simplifying usage for nontechnical
  audiences. The modular code structure enables users to
  extend upon components, including improving similarity
  scoring, constructing new plots, or configuring new
  ontologies.


  Conclusion: AnnoDash, an improved clinical terminology
  annotation tool, can facilitate data harmonizing by
  promoting mapping of clinical data. AnnoDash is freely
  available at https://github.com/justin13601/AnnoDash
  (https://doi.org/10.5281/zenodo.8043943).
keywords:
  - clinical concepts
  - ontology
  - annotation
  - natural language processing
  - software
license: MIT

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Issues and Pull Requests

Last synced: almost 2 years ago

All Time

Total issues: 7
Total pull requests: 7
Average time to close issues: about 1 month
Average time to close pull requests: less than a minute
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 1.86
Average comments per pull request: 0.14
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 1
Pull request authors: 1
Average comments per issue: 8.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

justin13601 (5)
LCCarmody (1)
alistairewj (1)

Pull Request Authors

justin13601 (7)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Dockerfile docker

coady/pylucene 8.11 build

docker-compose.yml docker

justin13601/mimic-iv-dash latest

requirements.txt pypi

Bottleneck ==1.3.4
Brotli ==1.0.9
Flask ==2.1.2
Flask-Compress ==1.12
Jinja2 ==3.1.2
MarkupSafe ==2.1.1
PyYAML ==6.0
Werkzeug ==2.1.2
atomicwrites ==1.4.0
attrs ==21.4.0
cachetools ==5.2.0
charset-normalizer ==2.0.12
click ==8.1.3
colorama ==0.4.4
dash ==2.4.1
dash-bootstrap-components ==1.2.1
dash-core-components ==2.0.0
dash-html-components ==2.0.0
dash-iconify ==0.1.2
dash-mantine-components ==0.10.2
dash-renderer ==1.9.0
dash-table ==5.0.0
ftfy ==6.1.1
fuzzywuzzy ==0.18.0
google-api-core ==2.10.1
google-auth ==2.6.6
google-cloud-bigquery ==3.1.0
google-cloud-bigquery-storage ==2.13.1
google-cloud-core ==2.3.0
google-cloud-storage ==2.5.0
google-crc32c ==1.3.0
google-resumable-media ==2.3.3
googleapis-common-protos ==1.56.4
grpcio ==1.46.3
grpcio-status ==1.46.3
gunicorn ==20.1.0
importlib-metadata ==4.11.4
iniconfig ==1.1.1
itsdangerous ==2.1.2
jaro-winkler ==2.0.0
joblib ==1.1.0
lupyne *
ml_collections ==0.1.1
numexpr ==2.8.1
numpy ==1.22.3
packaging ==21.3
pandas ==1.4.3
pickle-mixin ==1.0.2
plotly ==5.8.0
pluggy ==1.0.0
proto-plus ==1.22.1
protobuf ==4.21.6
py ==1.11.0
pyarrow ==8.0.0
pyasn1 ==0.4.8
pyasn1-modules ==0.2.8
pyparsing ==3.0.4
pytest ==7.1.1
python-Levenshtein ==0.20.5
python-dateutil ==2.8.2
pytz ==2021.3
related_ontologies ==0.2.0
requests ==2.27.1
rsa ==4.8
scikit-learn ==1.1.1
scipy ==1.7.3
six ==1.16.0
tenacity ==8.0.1
threadpoolctl ==3.1.0
tomli ==1.2.2
tqdm ==4.64.0
urllib3 ==1.26.9
wcwidth ==0.2.5
wincertstore ==0.2
zipp ==3.8.0

annodash

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

AnnoDash

About

Overview

Data Visualization

Annotation

Ontology Search & Ranking

Getting Started

Requirements

Required Files:

Using ElasticSearch:

Using APIs:

Installation

Docker Install (Recommended):

Manual Install:

use shell scripts to install jcc and pylucene

Usage

Other Relevant Files

Demo Data

License

Acknowledgments

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies