adaptive-relevance-margin-loss
Code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.7%) to scientific vocabulary
Repository
Code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".
Basic Info
- Host: GitHub
- Owner: webis-de
- License: mit
- Language: Python
- Default Branch: main
- Size: 26.4 KB
Statistics
- Stars: 1
- Watchers: 14
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins
Overview
This is the code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".
Representation-based retrieval models, so-called bi-encoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art bi-encoders are trained using an expensive training regime involving knowledge distillation from a teacher model and extensive batch-sampling techniques. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained text similarity capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We explore the capabilities of our proposed approach through extensive ablation studies, demonstrating that self-distillation can match the effectiveness of teacher-distillation approaches while requiring only a fraction of the data and compute.
Supplementary data (TREC-format run files for all final trained models) is hosted on Zenodo.
Project Organization
├── Dockerfile <- Dockerfile with all dependencies for reproducible execution
├── LICENSE <- License file
├── Makefile <- Makefile with commands to reproduce artifacts (data + models)
├── README.md <- The top-level README for project
├── configs <- Configuration files for model and sweep parameters
├── data <- Data folder; will be populated by data scripts
├── main.py <- Main Lightning CLI entrypoint
├── requirements.txt <- Dependencies
├── scripts <- Scripts to automate single tasks (data parsing, sweep agents, ...)
├── setup.py <- Makes project pip installable (pip install -e .) so src can be imported
└── src <- Model source code
Replication
Data, model training, and evaluation is replicable with make targets:
``` $ make Available rules:
requirements Install Python Dependencies data-train Download and preprocess train dataset data-eval Download and preprocess eval datasets fit Run the training process eval Run eval process clean Delete all compiled Python files
```
These can be run in the given order to fully replicate the experimental pipeline.
Each training run from the paper can be executed with its given config file in configs with the following command:
sh
python3 main.py fit -c <path-to-config-file>
Citation
If you use this code in your research, please cite:
bib
@InProceedings{gienapp:2025b,
address = {New York},
author = {Lukas Gienapp and Niklas Deckers and Martin Potthast and Harrisen Scells},
booktitle = {15th International Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)},
doi = {10.1145/3731120.3744594},
editor = {Hamed Zamani and Laura Dietz and Benjamin Piwowarski and Sebastian Bruch},
isbn = {979-8-4007-1861-8/2025/07},
month = jul,
pages = {275--285},
publisher = {ACM},
site = {Padua, Italy},
title = {{Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins}},
year = 2025
}
Owner
- Name: Webis
- Login: webis-de
- Kind: organization
- Location: Halle / Leipzig / Paderborn / Weimar
- Website: https://webis.de
- Twitter: webis_de
- Repositories: 194
- Profile: https://github.com/webis-de
Web Technology & Information Systems Group (Webis Group)
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Learning Effective Representations for Retrieval using
Self-Distillation with Adaptive Relevance Margins
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Lukas
family-names: Gienapp
affiliation: Leipzig University & ScaDS.AI
orcid: 'https://orcid.org/0000-0001-5707-3751'
- given-names: Niklas
family-names: Deckers
affiliation: University of Kassel & ScaDS.AI & hessian.AI
orcid: 'https://orcid.org/0000-0001-6803-1223'
- given-names: Harrisen
family-names: Scells
affiliation: Leipzig University
orcid: 'https://orcid.org/0000-0001-9578-7157'
- given-names: Martin
family-names: Potthast
affiliation: University of Kassel & ScaDS.AI & hessian.AI
orcid: 'https://orcid.org/0000-0003-2451-0665'
abstract: >
Representation-based retrieval models, so-called
bi-encoders, estimate the relevance of a document to a
query by calculating the similarity of their respective
embeddings. Current state-of-the-art bi-encoders are
trained using an expensive training regime involving
knowledge distillation from a teacher model and extensive
batch-sampling techniques. Instead of relying on a teacher
model, we contribute a novel parameter-free loss function
for self-supervision that exploits the pre-trained text
similarity capabilities of the encoder model as a training
signal, eliminating the need for batch sampling by
performing implicit hard negative mining. We explore the
capabilities of our proposed approach through extensive
ablation studies, demonstrating that self-distillation can
match the effectiveness of teacher-distillation approaches
while requiring only a fraction of the data and compute.
license: MIT
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 14
- Average time to close issues: N/A
- Average time to close pull requests: 26 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.07
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 14
Past Year
- Issues: 0
- Pull requests: 14
- Average time to close issues: N/A
- Average time to close pull requests: 26 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.07
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 14
Top Authors
Issue Authors
Pull Request Authors
- dependabot[bot] (10)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pytorch/pytorch 2.1.0-cuda11.8-cudnn8-runtime build
- GitPython ==3.1.41
- Jinja2 ==3.1.3
- MarkupSafe ==2.1.3
- PyJWT ==2.8.0
- PyYAML ==6.0.1
- Pygments ==2.17.2
- Send2Trash ==1.8.2
- aiobotocore ==2.7.0
- aiohttp ==3.9.1
- aioitertools ==0.11.0
- aiosignal ==1.3.1
- annotated-types ==0.6.0
- antlr4-python3-runtime ==4.9.3
- anyio ==4.2.0
- appdirs ==1.4.4
- arrow ==1.3.0
- async-timeout ==4.0.3
- attrs ==23.2.0
- backoff ==2.2.1
- beautifulsoup4 ==4.12.2
- bitsandbytes ==0.42.0
- blessed ==1.20.0
- boto3 ==1.28.64
- botocore ==1.31.64
- certifi ==2023.11.17
- charset-normalizer ==3.3.2
- click ==8.1.7
- contourpy ==1.2.0
- croniter ==1.4.1
- cycler ==0.12.1
- datasets ==2.16.1
- dateutils ==0.6.12
- deepdiff ==6.7.1
- dill ==0.3.7
- docker ==6.1.3
- docker-pycreds ==0.4.0
- docstring-parser ==0.15
- editor ==1.6.5
- exceptiongroup ==1.2.0
- faiss-gpu ==1.7.2
- fastapi ==0.109.0
- filelock ==3.13.1
- fonttools ==4.47.2
- frozenlist ==1.4.1
- fsspec ==2023.10.0
- gitdb ==4.0.11
- h11 ==0.14.0
- huggingface-hub ==0.20.2
- hydra-core ==1.3.2
- idna ==3.6
- importlib-resources ==6.1.1
- inquirer ==3.2.1
- jmespath ==1.0.1
- jsonargparse ==4.27.1
- kiwisolver ==1.4.5
- lightning ==2.1.3
- lightning-api-access ==0.0.5
- lightning-cloud ==0.5.57
- lightning-fabric ==2.1.3
- lightning-utilities ==0.10.0
- markdown-it-py ==3.0.0
- matplotlib ==3.8.2
- mdurl ==0.1.2
- mpmath ==1.3.0
- multidict ==6.0.4
- multiprocess ==0.70.15
- networkx ==3.2.1
- numpy ==1.26.3
- nvidia-cublas-cu12 ==12.1.3.1
- nvidia-cuda-cupti-cu12 ==12.1.105
- nvidia-cuda-nvrtc-cu12 ==12.1.105
- nvidia-cuda-runtime-cu12 ==12.1.105
- nvidia-cudnn-cu12 ==8.9.2.26
- nvidia-cufft-cu12 ==11.0.2.54
- nvidia-curand-cu12 ==10.3.2.106
- nvidia-cusolver-cu12 ==11.4.5.107
- nvidia-cusparse-cu12 ==12.1.0.106
- nvidia-nccl-cu12 ==2.18.1
- nvidia-nvjitlink-cu12 ==12.3.101
- nvidia-nvtx-cu12 ==12.1.105
- omegaconf ==2.3.0
- ordered-set ==4.1.0
- packaging ==23.2
- pandas ==2.1.4
- pillow ==10.2.0
- protobuf ==4.25.2
- psutil ==5.9.7
- pyarrow ==14.0.2
- pyarrow-hotfix ==0.6
- pydantic ==2.5.3
- pydantic_core ==2.14.6
- pyparsing ==3.1.1
- python-dateutil ==2.8.2
- python-git ==2018.2.1
- python-multipart ==0.0.6
- pytorch-lightning ==2.1.3
- pytz ==2023.3.post1
- readchar ==4.0.5
- redis ==5.0.1
- regex ==2023.12.25
- requests ==2.31.0
- rich ==13.7.0
- runs ==1.2.0
- s3fs ==2023.10.0
- s3transfer ==0.7.0
- safetensors ==0.4.1
- scipy ==1.11.4
- sentry-sdk ==1.39.2
- setproctitle ==1.3.3
- six ==1.16.0
- smmap ==5.0.1
- sniffio ==1.3.0
- soupsieve ==2.5
- starlette ==0.35.1
- sympy ==1.12
- tensorboardX ==2.6.2.2
- tokenizers ==0.15.0
- torch ==2.1.2
- torchaudio ==2.1.2
- torchmetrics ==1.3.0
- torchvision ==0.16.2
- tqdm ==4.66.1
- traitlets ==5.14.1
- transformers ==4.36.2
- triton ==2.1.0
- types-python-dateutil ==2.8.19.20240106
- typeshed-client ==2.4.0
- typing_extensions ==4.9.0
- tzdata ==2023.4
- urllib3 ==2.0.7
- uvicorn ==0.26.0
- wandb ==0.16.2
- wcwidth ==0.2.13
- websocket-client ==1.7.0
- websockets ==11.0.3
- wrapt ==1.16.0
- xmod ==1.8.1
- xxhash ==3.4.1
- yarl ==1.9.4