https://github.com/kundajelab/immuneclip

CLIP-based model for aligning epitopes to immune receptors

https://github.com/kundajelab/immuneclip

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.3%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

CLIP-based model for aligning epitopes to immune receptors

Basic Info
  • Host: GitHub
  • Owner: kundajelab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 525 KB
Statistics
  • Stars: 11
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

ImmuneCLIP

Code for paper "Sequence-based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-tuning of Protein Language Models." paper

🚧 This repository is under active construction. 🚧

Model Components:

  • Epitope Encoder

    • PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3)
    • Default: Using LoRA (rank = 8) on last 8 transformer layers
    • Projection layer: FC linear layer (dim $d{e} \rightarrow dp$)
    • $d{e}$ is the original PLM dimension, $dp$ is the projection dimension
  • Recepter Encoder

    • PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3) or BCR/TCR Language Models (e.g. AbLang, TCR-BERT, etc.)
    • Default: Using LoRA (rank = 8) on last 4 transformer layers
    • Projection layer: FC linear layer (dim $d{r} \rightarrow dp$)
    • $d{r}$ is the original receptor LM dimension, $dp$ is the projection dimension

Dataset:

  • MixTCRPred Dataset (paper)
    • Contains curated mixture of TCR-pMHC sequence data from IEDB, VDJdb, 10x Genomics, and McPAS-TCR

Pre-trained Weights:

  • The pre-trained weights for ImmuneCLIP is deposited at Zenodo

CLI:

Environment Variables

To run this application, set the following environment variables WANDB_OUTPUT_DIR=<path to output dir>

Additionally, if training on top of a custom in-house TCR model, the following path needs to be set INHOUSE_MODEL_CKPT_PATH=<path to custom model file>

Training

```

go to root directory of the repo, and then run:

python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage fit --max-epochs 100 \ --receptor-model-name [esm2|tcrlang|tcrbert] --projection-dim 512 --gpus-used [GPUIDX] --lr 1e-3 \ --batch-size 8 --output-dir [CHECKPOINTSOUTPUTDIR] [--mask-seqs] ```

Evaluation

```

currently, running model on test stage embeds the test set epitope/receptor pairs with the fine-tuned model and saves them.

python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage test --from-checkpoint [CHECKPOINTPATH] \ --projection-dim 512 --receptor-model-name [esm2|tcrlang|tcrbert] --gpus-used [GPUIDX] --batch-size 8 \ --save-embed-path [PATHFORSAVINGEMBEDS] ```

Owner

  • Name: Kundaje Lab
  • Login: kundajelab
  • Kind: organization
  • Location: Stanford University

Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.

GitHub Events

Total
  • Create event: 1
  • Issues event: 3
  • Watch event: 15
  • Delete event: 1
  • Issue comment event: 7
  • Member event: 1
  • Push event: 8
  • Public event: 1
  • Fork event: 2
Last Year
  • Create event: 1
  • Issues event: 3
  • Watch event: 15
  • Delete event: 1
  • Issue comment event: 7
  • Member event: 1
  • Push event: 8
  • Public event: 1
  • Fork event: 2