https://github.com/kundajelab/immuneclip

CLIP-based model for aligning epitopes to immune receptors

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: springer.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary

Last synced: 8 months ago · JSON representation

Repository

CLIP-based model for aligning epitopes to immune receptors

Basic Info

Host: GitHub
Owner: kundajelab
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 525 KB

Statistics

Stars: 11
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

ImmuneCLIP

Code for paper "Sequence-based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-tuning of Protein Language Models." paper

🚧 This repository is under active construction. 🚧

Model Components:

Epitope Encoder
- PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3)
- Default: Using LoRA (rank = 8) on last 8 transformer layers
- Projection layer: FC linear layer (dim $d{e} \rightarrow dp$)
- $d{e}$ is the original PLM dimension, $dp$ is the projection dimension
Recepter Encoder
- PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3) or BCR/TCR Language Models (e.g. AbLang, TCR-BERT, etc.)
- Default: Using LoRA (rank = 8) on last 4 transformer layers
- Projection layer: FC linear layer (dim $d{r} \rightarrow dp$)
- $d{r}$ is the original receptor LM dimension, $dp$ is the projection dimension

Dataset:

MixTCRPred Dataset (paper)
- Contains curated mixture of TCR-pMHC sequence data from IEDB, VDJdb, 10x Genomics, and McPAS-TCR

Pre-trained Weights:

The pre-trained weights for ImmuneCLIP is deposited at Zenodo

CLI:

Environment Variables

To run this application, set the following environment variables WANDB_OUTPUT_DIR=<path to output dir>

Additionally, if training on top of a custom in-house TCR model, the following path needs to be set INHOUSE_MODEL_CKPT_PATH=<path to custom model file>

Training

```

go to root directory of the repo, and then run:

python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage fit --max-epochs 100 \ --receptor-model-name [esm2|tcrlang|tcrbert] --projection-dim 512 --gpus-used [GPUIDX] --lr 1e-3 \ --batch-size 8 --output-dir [CHECKPOINTSOUTPUTDIR] [--mask-seqs] ```

Evaluation

```

currently, running model on test stage embeds the test set epitope/receptor pairs with the fine-tuned model and saves them.

python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage test --from-checkpoint [CHECKPOINTPATH] \ --projection-dim 512 --receptor-model-name [esm2|tcrlang|tcrbert] --gpus-used [GPUIDX] --batch-size 8 \ --save-embed-path [PATHFORSAVINGEMBEDS] ```

Owner

Name: Kundaje Lab
Login: kundajelab
Kind: organization
Location: Stanford University

Website: http://anshul.kundaje.net
Repositories: 117
Profile: https://github.com/kundajelab

Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.

GitHub Events

Total

Create event: 1
Issues event: 3
Watch event: 15
Delete event: 1
Issue comment event: 7
Member event: 1
Push event: 8
Public event: 1
Fork event: 2

Last Year

Create event: 1
Issues event: 3
Watch event: 15
Delete event: 1
Issue comment event: 7
Member event: 1
Push event: 8
Public event: 1
Fork event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/kundajelab/immuneclip

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

ImmuneCLIP

Model Components:

Dataset:

Pre-trained Weights:

CLI:

Environment Variables

Training

go to root directory of the repo, and then run:

Evaluation

currently, running model on test stage embeds the test set epitope/receptor pairs with the fine-tuned model and saves them.

Owner

GitHub Events

Total

Last Year