https://github.com/google-deepmind/tips
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: google-deepmind
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 5.09 MB
Statistics
- Stars: 54
- Watchers: 7
- Forks: 3
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
TIPS: Text-Image Pretraining with Spatial awareness (ICLR 2025)
This repository contains the implementation and models introduced in TIPS: Text-Image Pretraining with Spatial Awareness, published at ICLR 2025.
Quick Links: Paper | Project Website | Pytorch Notebook | Scenic Notebook
We provide both Pytorch and Jax (Scenic) implementations:
tips/pytorch/: PyTorch inference for the model. The image tower largely follows the official DINOv2 definition.tips/scenic/: Jax-based inference using the scenic library.
Abstract
Checkpoints
We provide links to all available checkpoints, for both Pytorch and Jax model definitions, together with representative evals.
Model size | #Params vision / text | Pytorch ckp. | Jax ckp. | PASCAL seg.↑ | NYU-depth↓ | ImageNet-KNN↑ | UNED-KNN↑ | Flickr T→I↑ | Flickr I→T↑ :---------- | :--------------------- | :------------------------------------------------------: | :------------------------------------------------------: | :---------: | :-------: | :----------: | :------: | :--------: | :--------: g/14-HR | 1.1B / 389.1M | vision | text | vision | text | 83.1 | 0.363 | 83.2 | 68.4 | 93.8 | 83.8 g/14-LR | 1.1B / 389.1M | vision | text | vision | text | 82.0 | 0.390 | 83.6 | 71.5 | 93.4 | 82.1 SO/14-HR | 412.4M / 448.3M | vision | text | vision | text | 83.7 | 0.362 | 83.0 | 68.6 | 94.2 | 83.8 L/14-HR | 303.2M / 183.9M | vision | text | vision | text | 83.9 | 0.372 | 82.5 | 67.8 | 93.6 | 83.5 B/14-HR | 85.7M / 109.6M | vision | text | vision | text | 82.9 | 0.379 | 80.0 | 62.7 | 91.3 | 79.4 S/14-HR | 21.6M / 33.6M | vision | text | vision | text | 80.6 | 0.425 | 75.1 | 57.7 | 86.3 | 74.7
Using Pytorch
Installation
Manage dependencies with a custom environment (eg. Conda)
```bash conda create -n tips python=3.11
Activate the environment.
conda activate tips ```
Install Pytorch dependencies.
```bash
Install pytorch (change to GPU version if needed)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Install other dependencies.
pip install tensorflow_text mediapy jax jaxlib scikit-learn
Optionally, install Jupyter to use the notebook.
pip install jupyter ```
Clone the code from this repo.
```bash git clone https://github.com/google-deepmind/tips.git
Add the current directory to PYTHONPATH.
export PYTHONPATH=$PYTHONPATH:$(pwd) ```
Download the checkpoints locally. The script downloads all released checkpoints. Please adjust accordingly.
bash
cd tips/pytorch/checkpoints
chmod +x download_checkpoints.sh
./download_checkpoints.sh
cd ../../..
Usage (Pytorch)
To run inference on one image and get the L2-normalized image embedding from the 1st and 2nd CLS token, one can use the following:
bash
cd tips/pytorch && \
python run_image_encoder_inference.py \
--model_path=${PATH_TO_CHECKPOINT} \
--image_file=${PATH_TO_IMAGE} \
--model_variant=${MODEL_VARIANT}
One can use is_low_res to specify whether a low-resolution or high-resolution
checkpoint is used.
To run text model inference and get the L2-normalized text embedding, please use the following cmd
bash
cd tips/pytorch && \
python run_text_encoder_inference.py \
--model_path=${PATH_TO_CHECKPOINT} \
--tokenizer_path=${PATH_TO_TOKENIZER} \
--model_variant=${MODEL_VARIANT} \
--text_input=${TEXT_INPUT}
We also provide a simple notebook demo:
bash
jupyter-notebook
Then navigate to tips/pytorch/TIPS_Demo.ipynb.
Using Jax (Scenic)
Installation
Similar to using Pytorch, manage dependencies with a custom environment.
```bash conda create -n tips python=3.11
Activate the environment.
conda activate tips ```
```bash
Install scenic.
git clone https://github.com/google-research/scenic.git scenicsrc cd scenicsrc pip install . cd .. rm -rf scenic_src
Install other dependencies.
pip install pillow scikit-learn opencv-python tensorflow_text
Optionally, install Jupyter to use the notebook.
pip install jupyter mediapy
In case of using CUDA, install the CUDA-supported JAX libraries.
For example, for CUDA 12 run:
pip install --upgrade "jax[cuda12_pip]" -f \
https://storage.googleapis.com/jax-releases/jaxcudareleases.html
```
Clone the code from the this repo.
```bash git clone https://github.com/google-deepmind/tips.git
Add the current directory to PYTHONPATH.
export PYTHONPATH=$PYTHONPATH:$(pwd) ```
Download the checkpoints (different files from Pytorch).
bash
cd tips/scenic/checkpoints
chmod +x download_checkpoints.sh
./download_checkpoints.sh
cd ../../..
Usage (Jax)
To run inference on an image, use the following script:
bash
cd tips/scenic
python run_tips_inference.py
Alternatively, try the demo in the notebook:
bash
jupyter-notebook
Then navigate to tips/scenic/notebooks/TIPS_Demo.ipynb.
Citing this work
The paper can be found on arXiv. Please consider citing this work using:
@InProceedings{tips_paper,
Title={{TIPS: Text-Image Pretraining with Spatial Awareness}},
Author={Maninis, Kevis-Kokitsi and Chen, Kaifeng and Ghosh, Soham and Karpur, Arjun and Chen, Koert and Xia, Ye and Cao, Bingyi and Salz, Daniel and Han, Guangxing and Dlabal, Jan and Gnanapragasam, Dan and Seyedhosseini, Mojtaba and Zhou, Howard and Araujo, Andr\'e},
Booktitle={ICLR},
year={2025},
}
License and disclaimer
Copyright 2025 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.
Owner
- Name: Google DeepMind
- Login: google-deepmind
- Kind: organization
- Website: https://www.deepmind.com/
- Repositories: 245
- Profile: https://github.com/google-deepmind
GitHub Events
Total
- Issues event: 5
- Watch event: 76
- Issue comment event: 5
- Member event: 2
- Push event: 1
- Public event: 1
- Fork event: 5
- Create event: 1
Last Year
- Issues event: 5
- Watch event: 76
- Issue comment event: 5
- Member event: 2
- Push event: 1
- Public event: 1
- Fork event: 5
- Create event: 1
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Kevis-Kokitsi Maninis | k****s@g****m | 3 |
| Kaifeng Chen | f****n@g****m | 2 |
| koertchen | k****n@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 1.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 1.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- Avinash2468 (1)
- NielsRogge (1)
- AlirezaSalehy (1)
- jdmogollonp (1)