jamie

Joint variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)

https://github.com/oafish1/jamie

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.0%) to scientific vocabulary

Keywords

autoencoder imputation integration multimodal variational variational-autoencoder
Last synced: 6 months ago · JSON representation

Repository

Joint variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)

Basic Info
Statistics
  • Stars: 12
  • Watchers: 1
  • Forks: 9
  • Open Issues: 1
  • Releases: 1
Topics
autoencoder imputation integration multimodal variational variational-autoencoder
Created over 4 years ago · Last pushed 11 months ago
Metadata Files
Readme Changelog License Citation

README.md

Joint Variational Autoencoders for Multimodal Imputation and Embedding (JAMIE)

Single-cell multi-modal datasets have emerged to measure various characteristics of individual cells, enabling a deep understanding of cellular and molecular mechanisms. However, generating multi-modal data for many cells remains costly and challenging. Thus, missing modalities happens frequently, becoming a major obstacle to understanding mechanisms. Recently, machine learning approaches have been developed to impute cell data but typically use fully matched multi-modal data and learn common latent embeddings that potentially miss modality specificity. To address these issues, we developed a novel machine learning model with open-source tool, Joint variational Autoencoders for Multi-modal Imputation and Embedding (JAMIE). JAMIE takes single-cell multi-modal data that can have partially matched samples across modalities. Variational autoencoders learn the latent embeddings of each modality. Then, embeddings from matched samples across modalities are aggregated to identify joint cross-modal latent embeddings before reconstruction. To perform cross-modal imputation from one to another, the latent embeddings can be used with the opposite decoder.

This library houses JAMIE along with several utility functions that aid in the evaluation of multi-modal integration and imputation techniques. Paper figures are generated in examples/notebooks and source data can be found in examples/data. JAMIE is built on the framework of UnionCom.

Installation Instructions (Ubuntu 20.04, WSL 2.0, Windows 10/11 Pro)

First, clone and navigate to the repository. ```bash git clone https://github.com/Oafish1/JAMIE cd JAMIE

JAMIE may also be installed directly from GitHub without cloning, but

does not have version controlled dependencies, and is therefore not

recommended

pip install jamie@git+https://git@github.com/Oafish1/JAMIE

``` This process can take several minutes, depending on network speed.

Create and activate a virtual environment using python 3.9 with virtualenv or conda, ```python

virtualenv (python 3.9)

virtualenv env source env/bin/activate

conda

conda create -n JAMIE python=3.9 conda activate JAMIE ```

Install dependencies and the local library with pip. ```bash

NOTE: UnionCom and UMAP will not import correctly if installed on Windows as administrator

For notebooks or development

pip install -r requirements-dev.txt

For installation without extras (No SHAP, WR2MD, notebooks)

pip install -r requirements.txt

pip install -e .

``` This process usually takes around 5 minutes.

Example: Simulation Single-Cell Multi-Modal Data

This example covers running JAMIE on branching manifold simulation data from MMD-MA (Liu J. et al.). The example takes around 2 minutes to run. A notebook with this code may be found at examples/notebooks/sample.ipynb.

Optionally, create a new jupyter notebook in JAMIE/examples/notebooks

Load two data matrices with an optional prior correspondence matrix. ```python import numpy as np

The working directory is assumed to be JAMIE/examples/notebooks

Paths may need to be changed accordingly

data1 = np.loadtxt("../data/UnionCom/MMD/s1mapped1.txt") data2 = np.loadtxt("../data/UnionCom/MMD/s1mapped2.txt")

JAMIE will assume matrices with the same number of rows are

completely matched if not provided a correspondence matrix

corr = np.eye(data1.shape[0], data2.shape[0]) ```

We can preview the data with plot_regular, ```python import matplotlib.pyplot as plt from jamie.evaluation import plot_regular

Load cell-type labels

type1 = np.loadtxt("../data/UnionCom/MMD/s1type1.txt").astype(int) type2 = np.loadtxt("../data/UnionCom/MMD/s1type2.txt").astype(int) type1 = np.array([f'Cell Type {i}' for i in type1]) type2 = np.array([f'Cell Type {i}' for i in type2])

Visualize integrated latent spaces

fig = plt.figure(figsize=(10, 5)) plotregular([data1, data2], [type1, type2], ['Modality 1', 'Modality 2'], legend=True) plt.tightlayout() plt.savefig('../../img/simulationraw.png', dpi=300, bboxinches='tight') ```

Raw simulation single-cell multimodal data

Create the JAMIE instance and integrate the datasets. ```python from jamie import JAMIE

jm = JAMIE(minepochs=500) integrateddata = jm.fit_transform(dataset=[data1, data2], P=corr) ```

Several arguments may be passed to JAMIE, including: - output_dim = 32: The number of latent features. - epoch_dnn = 10,000: Maximum number of epochs. - batch_size = 512: Batch size - pca_dim = [512, 512]: If None, does not perform PCA. Otherwise, controls the number of principal components to use while processing each input dataset - loss_weights = [1, 1, 1, 1]: Weights for the KL, Reconstruction, Cosine, and F losses, respectively - use_early_stop = True: If True, uses early stopping algorithm - min_epochs = 2,500: Number of epochs before early stopping can take effect. Also controls the length of KL annealing - dropout = 0.6: Amount of dropout in JAMIE model. Generally should be 0 for pure integration models and 0.6 for everything else - debug = False: Print individual loss values if True

The model can be saved and loaded in an h5 file format ```python

Save model

jm.savemodel('simulationmodel.h5')

Load model

jm.loadmodel('simulationmodel.h5')

```

The trained JAMIE model may be reused on other datasets of the same modalities ```python

data3 = ...

data4 = ...

newintegrateddata = jm.transform(dataset=[data3, data4])

```

Additionally, the trained model can be used for imputation ```python data1imputed = jm.modalpredict(data2, 1) data2imputed = jm.modalpredict(data1, 0)

data3imputed = jm.modalpredict(data4, 1)

data4imputed = jm.modalpredict(data3, 0)

```

For visualization, JAMIE includes plot_integrated, which uses UMAP to preview integrated or imputed data. ```python from jamie.evaluation import plot_integrated

Visualize integrated latent spaces

fig = plt.figure(figsize=(10, 5)) plotintegrated(integrateddata, [type1, type2], ['Integrated Modality 1', 'Integrated Modality 2']) plt.tightlayout() plt.savefig('../../img/simulationintegrated.png', dpi=300, bbox_inches='tight') ```

Integrated simulation single-cell multi-modal data

```python

Visualize imputed data

fig = plt.figure(figsize=(10, 5)) plotintegrated([data1, data1imputed], [type1, type1], ['Measured Modality 1', 'Imputed Modality 1']) plt.tightlayout() plt.savefig('../../img/simulationimputed.png', dpi=300, bbox_inches='tight') ```

Imputed simulation single-cell multi-modal data

Detailed comparisons and additional visualization functions can be found in examples/notebooks.

Citations

Cohen Kalafut, N., Huang, X. & Wang, D. Joint variational autoencoders for multimodal imputation and embedding. Nat Mach Intell 5, 631642 (2023). https://doi.org/10.1038/s42256-023-00663-z

Kai C, Xiangqi B, Yiguang H, Lin W, Unsupervised topological alignment for single-cell multi-omics integration, Bioinformatics, July 2020. doi: 10.1093/bioinformatics/btaa443.

Liu J, Huang Y, Singh R, Vert JP, Noble WS. Jointly Embedding Multiple Single-Cell Omics Measurements. Algorithms Bioinform. 2019 Sep 3;143:10. doi: 10.4230/LIPIcs.WABI.2019.10. PMID: 34632462; PMCID: PMC8496402.

Owner

  • Name: Noah Cohen Kalafut
  • Login: Oafish1
  • Kind: user
  • Company: @the-mom-project

GitHub Events

Total
  • Watch event: 2
  • Push event: 1
Last Year
  • Watch event: 2
  • Push event: 1