two-effects-one-trigger
Official code for the paper "Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models" (ICLR 2025 Oral)
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
Official code for the paper "Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models" (ICLR 2025 Oral)
Basic Info
- Host: GitHub
- Owner: lmb-freiburg
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://openreview.net/forum?id=uAFHCZRmXk
- Size: 1.55 MB
Statistics
- Stars: 13
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models (ICLR 2025 Oral)
This repository provides the code for our paper Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models.
If this work is useful to you, please consider citing our paper:
@inproceedings{schrodi2025two,
title={Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models},
author={Simon Schrodi and David T. Hoffmann and Max Argus and Volker Fischer and Thomas Brox},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=uAFHCZRmXk}
}
Prerequisites
To setup your Python environment follow the steps below:
- Clone this repository
- Create a Python environment
- Create the folder
datasetsand add symbolic links to the respective datasets or update the configs insettings.py - [Optionally]: You can add a folder for
resultsandfiguresthat are symbolically linked to some workspace Python package installation:
a. For analysis and experiments on MAD:
pip install -r requirements.txtb. For experiments on real data: Follow the instructions here
Analysis
Modality gap
This part describes how you can reproduce the results from the analysis of the modality gap.
Run
python analyis/gap_precompute.py --model $model --save, e.g., withmodel=RN50__openai. This will precompute the embedding features and modality gap as well as performance metrics that we will later use.Below, we provide the scripts to reproduce the results (note that you may need to set):
To re-create Figure 3 and Table 1, run:
python analysis/gap_vs_performance.pyTo re-create Figure 4a & 11a, run:
python analysis/gap_mean_differences.pyTo re-create Figure 4b, 11b & 12, run:
python analysis/gap_embedding_dim_pairs.pyTo re-create Figure 4c & 11c, run:
python analysis/gap_ablate_dims.pyTo re-create Table 2, run: Follow the instructions here to prepare the ImageNet100 splits. Then, run
python analysis/gap_neighborhood_test.pyTo re-create Table 3, run:
python analysis/gap_conditioning_on_data.pyTo re-create Table 4, run:
python analysis/gap_data_filtering.pyTo re-create Table 5, run:
python analysis/gap_ideal_words.py
Object bias
Precompute features and object bias/performance metrics, via
python analysis/object_bias_precompute.py --model RN50__openai --saveTo replicate our analysis, run the scripts below:
To re-create Figure 5a, run:
python analysis/object_bias_vs_performance.pyTo re-create Figure 5b, run:
python analysis/object_vs_attribute_performance.py
CLIP trainings on synthetic data (MAD)
We provide the dataset implementation in mad_dataset. The augmentations are partly based on Morpho-MNIST.
Unfortunately, we are not allowed to share the training and evaluation code. To reproduce our experiments, you can adopt standard CLIP training pipelines and adapt the provided evaluation protocols.
CLIP trainings on real data (CC12M and CC3M training, and DCI finetuning)
The code for training of CLIP models on CC12M and CC3M in cliponreal_data is adopted from OpenCLIP. We provide setup and run instructions in the README in cliponreal_data.
Owner
- Name: Computer Vision Group, Albert-Ludwigs-Universität Freiburg
- Login: lmb-freiburg
- Kind: organization
- Location: Freiburg, Germany
- Website: https://lmb.informatik.uni-freiburg.de/
- Twitter: CVisionFreiburg
- Repositories: 47
- Profile: https://github.com/lmb-freiburg
Pattern Recognition and Image Processing
GitHub Events
Total
- Issues event: 2
- Watch event: 19
- Issue comment event: 2
- Member event: 1
- Push event: 2
Last Year
- Issues event: 2
- Watch event: 19
- Issue comment event: 2
- Member event: 1
- Push event: 2
Dependencies
- aiohttp ==3.9.5
- aiosignal ==1.3.1
- annotated-types ==0.7.0
- antlr4-python3-runtime ==4.9.3
- anyio ==4.4.0
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- arrow ==1.3.0
- asttokens ==2.4.1
- async-lru ==2.0.4
- attrs ==23.2.0
- babel ==2.15.0
- beautifulsoup4 ==4.12.3
- bleach ==6.1.0
- blis ==0.7.11
- boto3 ==1.34.144
- botocore ==1.34.144
- braceexpand ==0.1.7
- bravado ==11.0.3
- bravado-core ==6.1.1
- catalogue ==2.0.10
- cffi ==1.16.0
- chardet ==5.2.0
- click ==8.1.7
- clip-project ==0.11.1
- cloudpathlib ==0.18.1
- comm ==0.2.2
- confection ==0.1.5
- contourpy ==1.2.1
- cycler ==0.12.1
- cymem ==2.0.8
- debugpy ==1.8.2
- decorator ==5.1.1
- defusedxml ==0.7.1
- deprecated ==1.2.14
- executing ==2.0.1
- fastjsonschema ==2.20.0
- fonttools ==4.53.1
- fqdn ==1.5.1
- frozenlist ==1.4.1
- fsspec ==2024.6.1
- ftfy ==6.2.0
- future ==1.0.0
- gitdb ==4.0.11
- gitpython ==3.1.43
- h11 ==0.14.0
- h5py ==3.11.0
- httpcore ==1.0.5
- httpx ==0.27.0
- huggingface-hub ==0.23.5
- importlib-resources ==6.4.0
- iniconfig ==2.0.0
- ipdb ==0.13.13
- ipykernel ==6.29.5
- ipython ==8.26.0
- ipywidgets ==8.1.3
- isoduration ==20.11.0
- itables ==2.1.4
- jedi ==0.19.1
- jmespath ==1.0.1
- joblib ==1.4.2
- json5 ==0.9.25
- jsonpointer ==3.0.0
- jsonref ==1.1.0
- jsonschema ==4.23.0
- jsonschema-specifications ==2023.12.1
- jupyter ==1.0.0
- jupyter-client ==8.6.2
- jupyter-console ==6.6.3
- jupyter-core ==5.7.2
- jupyter-events ==0.10.0
- jupyter-lsp ==2.2.5
- jupyter-server ==2.14.2
- jupyter-server-terminals ==0.5.3
- jupyterlab ==4.2.3
- jupyterlab-pygments ==0.3.0
- jupyterlab-server ==2.27.3
- jupyterlab-widgets ==3.0.11
- kiwisolver ==1.4.5
- langcodes ==3.4.0
- language-data ==1.2.0
- lmdb ==1.5.1
- loguru ==0.7.2
- marisa-trie ==1.2.0
- markdown-it-py ==3.0.0
- matplotlib ==3.9.1
- matplotlib-inline ==0.1.7
- mdurl ==0.1.2
- mistune ==3.0.2
- monotonic ==1.6
- msgpack ==1.0.8
- multidict ==6.0.5
- murmurhash ==1.0.10
- natsort ==8.4.0
- nbclient ==0.10.0
- nbconvert ==7.16.4
- nbformat ==5.10.4
- neptune ==1.10.4
- nest-asyncio ==1.6.0
- nltk ==3.8.1
- notebook ==7.2.1
- notebook-shim ==0.2.4
- nvidia-ml-py3 ==7.352.0
- oauthlib ==3.2.2
- omegaconf ==2.3.0
- overrides ==7.7.0
- packg ==0.13.1
- pandas ==2.2.2
- pandocfilters ==1.5.1
- parso ==0.8.4
- pathspec ==0.12.1
- pexpect ==4.9.0
- platformdirs ==4.2.2
- pluggy ==1.5.0
- preshed ==3.0.9
- prometheus-client ==0.20.0
- prompt-toolkit ==3.0.47
- protobuf ==5.27.2
- psutil ==6.0.0
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pyarrow ==17.0.0
- pycocoevalcap ==1.2
- pycocotools ==2.0.8
- pycparser ==2.22
- pydantic ==2.8.2
- pydantic-core ==2.20.1
- pydub ==0.25.1
- pygments ==2.18.0
- pyinstrument ==4.6.2
- pyjwt ==2.8.0
- pyparsing ==3.1.2
- pytest ==8.2.2
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-json-logger ==2.0.7
- pyturbojpeg ==1.7.3
- pytz ==2024.1
- pyzmq ==26.0.3
- qtconsole ==5.5.2
- qtpy ==2.4.1
- referencing ==0.35.1
- regex ==2024.5.15
- requests-oauthlib ==2.0.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.7.1
- rpds-py ==0.19.0
- s3transfer ==0.10.2
- safetensors ==0.4.3
- scikit-learn ==1.5.1
- scipy ==1.14.0
- seaborn ==0.13.2
- send2trash ==1.8.3
- sentencepiece ==0.2.0
- shellingham ==1.5.4
- simplejson ==3.19.2
- smart-open ==7.0.4
- smmap ==5.0.1
- sniffio ==1.3.1
- soupsieve ==2.5
- spacy ==3.7.5
- spacy-legacy ==3.0.12
- spacy-loggers ==1.0.5
- srsly ==2.4.8
- stack-data ==0.6.3
- swagger-spec-validator ==3.0.4
- termcolor ==2.4.0
- terminado ==0.18.1
- thinc ==8.2.5
- threadpoolctl ==3.5.0
- timm ==1.0.7
- tinycss2 ==1.3.0
- tokenizers ==0.19.1
- tornado ==6.4.1
- traitlets ==5.14.3
- transformers ==4.42.4
- tueplots ==0.0.15
- typedparser ==0.13.2
- typer ==0.12.3
- types-python-dateutil ==2.9.0.20240316
- tzdata ==2024.1
- uri-template ==1.3.0
- visiontext ==0.13.2
- wasabi ==1.1.3
- wcwidth ==0.2.13
- weasel ==0.4.1
- webcolors ==24.6.0
- webdataset ==0.2.86
- webencodings ==0.5.1
- websocket-client ==1.8.0
- widgetsnbextension ==4.0.11
- wrapt ==1.16.0
- yarl ==1.9.4
- zstandard ==0.23.0
- ftfy *
- huggingface-hub *
- regex *
- timm *
- torch >=1.9.0
- torchvision *
- tqdm *
- pytest ==7.2.0 test
- pytest-split ==0.8.0 test
- timm >=1.0.7 test
- transformers * test
- braceexpand *
- fsspec *
- ftfy *
- huggingface_hub *
- pandas *
- regex *
- timm >=1.0.7
- torch >=1.9.0
- torchvision *
- tqdm *
- transformers *
- webdataset >=0.2.5
- ftfy *
- huggingface_hub *
- regex *
- timm *
- torch >=1.9.0
- torchvision *
- tqdm *
- Jinja2 ==3.1.5
- MarkupSafe ==3.0.2
- PyYAML ==6.0.2
- Pygments ==2.19.1
- certifi ==2025.1.31
- charset-normalizer ==3.4.1
- contourpy ==1.3.1
- cycler ==0.12.1
- debugpy ==1.8.12
- einops ==0.8.1
- filelock ==3.17.0
- fonttools ==4.56.0
- fsspec ==2025.2.0
- ftfy ==6.3.1
- huggingface-hub ==0.28.1
- idna ==3.10
- imageio ==2.37.0
- kiwisolver ==1.4.8
- lazy_loader ==0.4
- markdown-it-py ==3.0.0
- matplotlib ==3.10.0
- mdurl ==0.1.2
- mpmath ==1.3.0
- networkx ==3.4.2
- numpy ==2.2.3
- nvidia-cublas-cu12 ==12.4.5.8
- nvidia-cuda-cupti-cu12 ==12.4.127
- nvidia-cuda-nvrtc-cu12 ==12.4.127
- nvidia-cuda-runtime-cu12 ==12.4.127
- nvidia-cudnn-cu12 ==9.1.0.70
- nvidia-cufft-cu12 ==11.2.1.3
- nvidia-curand-cu12 ==10.3.5.147
- nvidia-cusolver-cu12 ==11.6.1.9
- nvidia-cusparse-cu12 ==12.3.1.170
- nvidia-cusparselt-cu12 ==0.6.2
- nvidia-nccl-cu12 ==2.21.5
- nvidia-nvjitlink-cu12 ==12.4.127
- nvidia-nvtx-cu12 ==12.4.127
- open-clip-torch ==2.23.0
- packaging ==24.2
- pandas ==2.2.3
- path ==17.1.0
- patsy ==1.0.1
- pillow ==11.1.0
- protobuf ==5.29.3
- pycocotools ==2.0.8
- pyparsing ==3.2.1
- python-dateutil ==2.9.0.post0
- pytz ==2025.1
- regex ==2024.11.6
- requests ==2.32.3
- rich ==13.9.4
- safetensors ==0.5.2
- scikit-image ==0.25.2
- scipy ==1.15.1
- seaborn ==0.13.2
- sentencepiece ==0.2.0
- six ==1.17.0
- statsmodels ==0.14.4
- sympy ==1.13.1
- tifffile ==2025.3.30
- timm ==1.0.14
- tokenizers ==0.21.0
- torch ==2.5.1
- torchvision ==0.20.1
- tqdm ==4.67.1
- transformers ==4.48.3
- triton ==3.1.0
- tueplots ==0.2.0
- typing_extensions ==4.12.2
- tzdata ==2025.1
- urllib3 ==2.3.0
- wcwidth ==0.2.13