MUGS

https://github.com/celehs/mugs

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: celehs
Language: R
Default Branch: main
Homepage: https://celehs.github.io/MUGS/
Size: 717 MB

Statistics

Stars: 0
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# Multisource Graph Synthesis with EHR Data

[![CRAN](https://www.r-pkg.org/badges/version/PheCAP)](https://CRAN.R-project.org/package=PheCAP)
(This badge is a placeholder; update it once MUGS is on CRAN.)

## Overview

## MUGS

We develop **MUlti-source Graph Synthesis (MUGS)**, an algorithm designed to create embeddings for pediatric EHR codes by leveraging graphical information from three distinct sources:

1. Pediatric EHR data
2. EHR data from the general patient population
3. Existing hierarchical medical ontology knowledge shared across different patient populations (e.g., PheCode, LOINC, RxNorm)

Utilizing existing hierarchical medical ontology as prior general knowledge, MUGS facilitates efficient transfer learning by grouping similar codes, thereby enhancing the transferability of knowledge from general to pediatric systems. To address the heterogeneity within code groups and between sites, we propose to decompose a code embedding into three components:

- **Group effect**: defined based on the hierarchical medical ontology.
- **Site-nonspecific code effect**: shared characteristics of a code that differ from its group effect.
- **Code-site effect**: site-specific characteristics of a code.

This decomposition, coupled with penalty functions applied to the code and code-site effects, allows adaptability to varying degrees of heterogeneity within code groups and between sites, and protects against negative knowledge transfer through hyperparameter tuning.

![Flowchart](man/figures/MUGSFlowchart.png)

1. **Initial Embeddings**: Obtain two sets of initial embeddings (one per site) using the `get_embed()` function, then align them via an orthogonal Procrustes solution.
2. **Ontology-based Grouping**: Use existing hierarchical ontologies (e.g., [PheCodes, LOINC, RxNorms](https://shiny.parse-health.org/hierarchies/)) to group codes. Pool the aligned embeddings to initialize group, code, and code-site effects.
3. **Iterative Updates**: Update group effects, code effects, and code-site effects in an alternating fashion using `GroupEff_par`, `CodeSiteEff_l2_par`, and `CodeEff_Matrix`.

For hyperparameter tuning, we leverage known code-code pairs curated from the literature to select the optimal penalties on code effects and code-site effects, all evaluated with `evaluation.sim`. This helps distinguish established related code pairs from random pairs across a wide range of scenarios.

While real data from MGB and BCH cannot be shared, we offer a [Shiny App](https://shiny.parse-health.org/multi-view-net/) for exploring MUGS embeddings in downstream tasks such as pediatric feature engineering and knowledge graph construction.

---

## Installation

Install the stable version from CRAN (once available):

```{r, eval=FALSE}
install.packages("MUGS")

```

## Getting Started

Load the simulated data (if provided) and review the example code in the `vignettes/MUGS.Rmd`. 

For real EHR data:
1. Convert your data into the same format as the simulated data.
2. Call the main function `MUGS()` with your formatted data as input.

Here is a simplified illustration of the workflow (not actual production code):

```{r, eval=FALSE}
library(MUGS)
## Step 1: Get initial embeddings
embeddings_site1 <- get_embed(data_site1, method = "someMethod")
embeddings_site2 <- get_embed(data_site2, method = "someMethod")
## Step 2: Align them (Orthogonal Procrustes)
aligned <- align(embeddings_site1, embeddings_site2)
site1_aligned <- aligned$aligned1
site2_aligned <- aligned$aligned2
#Step 3: Use an existing ontology to group codes
group_info <- create_groups_from_ontology("hierarchy_file")
## Initialize group/code/code-site effects
groupEff_init <- init_group_effect(group_info, list(site1_aligned, site2_aligned))
codeEff_init <- init_code_effect(list(site1_aligned, site2_aligned))
siteEff_init <- init_code_site_effect(list(site1_aligned, site2_aligned))
## Iterative updates
for (iter in 1:max_iter) {
groupEff_init <- GroupEff_par(groupEff_init, codeEff_init, siteEff_init, ...)
siteEff_init <- CodeSiteEff_l2_par(groupEff_init, codeEff_init, siteEff_init, ...)
codeEff_init <- CodeEff_Matrix(groupEff_init, codeEff_init, siteEff_init, ...)
}
## Step 4: Hyperparameter tuning using known code-code pairs
known_pairs <- data.frame(code1 = c("A"), code2 = c("B"))
performance <- evaluation.sim(codeEff_init, known_pairs)
```


## Citation

If you use **MUGS** in your research, please cite:

Li, M., Li, X., Pan, K., Geva, A., Yang, D., Sweet, S. M., Bonzel, C.-L., Panickan, V. A., Xiong, X., Mandl, K., & Cai, T. (2024).  
**Multisource representation learning for pediatric knowledge extraction from electronic health records.**  
*npj Digital Medicine*. [https://doi.org/10.1038/s41746-024-01320-4](https://doi.org/10.1038/s41746-024-01320-4)

---

**Thank you for using MUGS!** For issues or feature requests, please open an issue at [https://github.com/celehs/MUGS](https://github.com/celehs/MUGS).

Owner

Name: CELEHS
Login: celehs
Kind: user
Location: Boston, USA

Website: https://celehs.hms.harvard.edu
Repositories: 15
Profile: https://github.com/celehs

Translational Data Science Center for a Learning Health System

GitHub Events

Total

Member event: 2
Push event: 9
Public event: 1
Pull request event: 4
Fork event: 1
Create event: 1

Last Year

Member event: 2
Push event: 9
Public event: 1
Pull request event: 4
Fork event: 1
Create event: 1

Packages

Total packages: 1
Total downloads:
- cran 412 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

cran.r-project.org: MUGS

Multisource Graph Synthesis with EHR Data

Homepage: https://github.com/celehs/MUGS
Documentation: http://cran.r-project.org/web/packages/MUGS/MUGS.pdf
License: GPL-3
Latest release: 0.1.0
published about 1 year ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 412 Last month

Rankings

Dependent packages count: 26.5%

Dependent repos count: 32.6%

Average: 48.6%

Downloads: 86.7%

Maintainers (1)

mengyanli@bentley.edu

Last synced: 10 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
MASS * imports
Matrix * imports
doSNOW * imports
dplyr * imports
fastDummies * imports
foreach * imports
glmnet * imports
grpreg * imports
methods * imports
mvtnorm * imports
pROC * imports
parallel * imports
rsvd * imports
zen4R >= 0.5.0 imports
knitr * suggests
rmarkdown * suggests
testthat >= 3.0.0 suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science