clip-distillation
Knowledge Distillation using Contrastive Language-Image Pretraining (CLIP) without a teacher model.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Knowledge Distillation using Contrastive Language-Image Pretraining (CLIP) without a teacher model.
Basic Info
- Host: GitHub
- Owner: lnairGT
- License: mit
- Language: Python
- Default Branch: main
- Size: 229 KB
Statistics
- Stars: 13
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers
Short paper accepted to 28th IEEE High Performance Extreme Computing Conference (HPEC) 2024 -- Outstanding short paper award
Expanded paper available on arxiv: here
Can pre-computed embeddings obtained from the teacher model be used to train the student model in knowledge distillation?
This project extends CLIP for efficient knowledge distillation, by utilizing embeddings as teachers. Typical knowledge distillation frameworks require running forward passes through a teacher model, which is often prohibitive in the case of billion or trillion parameter teachers. Using only the embeddings of the teacher models to guide the distillation can yield significant computational savings.
RUNNING THE SCRIPT
Run the following command with appropriate arguments:
python train.py \
--dataset-name <CIFAR10; CIFAR100; ImageNet> \
--teacher-model <Huggingface-ckpt-name> \
--log-folder <folder-to-save-logs> \
--ckpt-save-name <name-of-ckpt-to-save-trained-model> \
--train-type <embed-KD; teacher-KD; vanilla>
An example command for running the script is available in train.sh. The argument train-type vanilla refers to regular knowledge distillation (without using the CLIP distillation loss).
The argument teacher-model is the name of the HuggingFace checkpoint. Teacher models used in the paper include: google/vit-large-patch16-224-in21k, google/vit-large-patch32-224-in21k, google/vit-base-patch16-224-in21k and google/vit-base-patch32-224-in21k. The configuration of the student model, and other training parameters are in config.py.
CLIP-Embed-KD: computational efficiency

Pseudocode reference
CITATION
If you find this work useful, please consider citing the paper:
@misc{nair2024clipembedkd,
title={CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers},
author={Lakshmi Nair},
year={2024},
eprint={2404.06170},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Owner
- Name: Lakshmi Nair
- Login: lnairGT
- Kind: user
- Company: Georgia Institute of Technology
- Website: https://scholar.google.com/citations?user=eTGOo_cAAAAJ&hl=en
- Repositories: 2
- Profile: https://github.com/lnairGT
PhD Robotics student
Citation (citation.cff)
cff-version: 1.2.0
message: "If you find this work helpful, please consider citing it as below."
authors:
- family-names: Nair
given-names: Lakshmi
title: "CLIP-Embed-KD: Computationally Efficient Knowledge Distillation Using Embeddings as Teachers"
version: 2.0.4
year: 2024
journal: ArXiv
doi: https://doi.org/XYZ/arXiv.XYZ
date-released: 2024-04-12
url: "https://github.com/lnairGT/CLIP-Distillation"
GitHub Events
Total
- Watch event: 9
- Fork event: 1
Last Year
- Watch event: 9
- Fork event: 1