https://github.com/kundajelab/immuneclip
CLIP-based model for aligning epitopes to immune receptors
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: springer.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.3%) to scientific vocabulary
Repository
CLIP-based model for aligning epitopes to immune receptors
Basic Info
Statistics
- Stars: 11
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
ImmuneCLIP
Code for paper "Sequence-based TCR-Peptide Representations Using Cross-Epitope Contrastive Fine-tuning of Protein Language Models." paper
🚧 This repository is under active construction. 🚧
Model Components:
Epitope Encoder
- PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3)
- Default: Using LoRA (rank = 8) on last 8 transformer layers
- Projection layer: FC linear layer (dim $d{e} \rightarrow dp$)
- $d{e}$ is the original PLM dimension, $dp$ is the projection dimension
Recepter Encoder
- PEFT-adapted Protein Language Models (e.g. ESM-2, ESM-3) or BCR/TCR Language Models (e.g. AbLang, TCR-BERT, etc.)
- Default: Using LoRA (rank = 8) on last 4 transformer layers
- Projection layer: FC linear layer (dim $d{r} \rightarrow dp$)
- $d{r}$ is the original receptor LM dimension, $dp$ is the projection dimension
Dataset:
- MixTCRPred Dataset (paper)
- Contains curated mixture of TCR-pMHC sequence data from IEDB, VDJdb, 10x Genomics, and McPAS-TCR
Pre-trained Weights:
- The pre-trained weights for ImmuneCLIP is deposited at Zenodo
CLI:
Environment Variables
To run this application, set the following environment variables
WANDB_OUTPUT_DIR=<path to output dir>
Additionally, if training on top of a custom in-house TCR model, the following path needs to be set
INHOUSE_MODEL_CKPT_PATH=<path to custom model file>
Training
```
go to root directory of the repo, and then run:
python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage fit --max-epochs 100 \ --receptor-model-name [esm2|tcrlang|tcrbert] --projection-dim 512 --gpus-used [GPUIDX] --lr 1e-3 \ --batch-size 8 --output-dir [CHECKPOINTSOUTPUTDIR] [--mask-seqs] ```
Evaluation
```
currently, running model on test stage embeds the test set epitope/receptor pairs with the fine-tuned model and saves them.
python -m src.training --run-id [RUNID] --dataset-path [PATHTODATASET] --stage test --from-checkpoint [CHECKPOINTPATH] \ --projection-dim 512 --receptor-model-name [esm2|tcrlang|tcrbert] --gpus-used [GPUIDX] --batch-size 8 \ --save-embed-path [PATHFORSAVINGEMBEDS] ```
Owner
- Name: Kundaje Lab
- Login: kundajelab
- Kind: organization
- Location: Stanford University
- Website: http://anshul.kundaje.net
- Repositories: 117
- Profile: https://github.com/kundajelab
Compbio and machine learning code repositories from the Kundaje Lab at Stanford Genetics and Computer Science Depts.
GitHub Events
Total
- Create event: 1
- Issues event: 3
- Watch event: 15
- Delete event: 1
- Issue comment event: 7
- Member event: 1
- Push event: 8
- Public event: 1
- Fork event: 2
Last Year
- Create event: 1
- Issues event: 3
- Watch event: 15
- Delete event: 1
- Issue comment event: 7
- Member event: 1
- Push event: 8
- Public event: 1
- Fork event: 2