adaptive-relevance-margin-loss

Code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".

https://github.com/webis-de/adaptive-relevance-margin-loss

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".

Basic Info
  • Host: GitHub
  • Owner: webis-de
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 26.4 KB
Statistics
  • Stars: 1
  • Watchers: 14
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins


Overview

This is the code repository for the paper "Learning Effective Representations for Retrieval using Self-Distillation with Adaptive Relevance Margins".

Representation-based retrieval models, so-called bi-encoders, estimate the relevance of a document to a query by calculating the similarity of their respective embeddings. Current state-of-the-art bi-encoders are trained using an expensive training regime involving knowledge distillation from a teacher model and extensive batch-sampling techniques. Instead of relying on a teacher model, we contribute a novel parameter-free loss function for self-supervision that exploits the pre-trained text similarity capabilities of the encoder model as a training signal, eliminating the need for batch sampling by performing implicit hard negative mining. We explore the capabilities of our proposed approach through extensive ablation studies, demonstrating that self-distillation can match the effectiveness of teacher-distillation approaches while requiring only a fraction of the data and compute.

Supplementary data (TREC-format run files for all final trained models) is hosted on Zenodo.

Project Organization

├── Dockerfile <- Dockerfile with all dependencies for reproducible execution ├── LICENSE <- License file ├── Makefile <- Makefile with commands to reproduce artifacts (data + models) ├── README.md <- The top-level README for project ├── configs <- Configuration files for model and sweep parameters ├── data <- Data folder; will be populated by data scripts ├── main.py <- Main Lightning CLI entrypoint ├── requirements.txt <- Dependencies ├── scripts <- Scripts to automate single tasks (data parsing, sweep agents, ...) ├── setup.py <- Makes project pip installable (pip install -e .) so src can be imported └── src <- Model source code

Replication

Data, model training, and evaluation is replicable with make targets:

``` $ make Available rules:

requirements Install Python Dependencies data-train Download and preprocess train dataset data-eval Download and preprocess eval datasets fit Run the training process eval Run eval process clean Delete all compiled Python files

```

These can be run in the given order to fully replicate the experimental pipeline.

Each training run from the paper can be executed with its given config file in configs with the following command: sh python3 main.py fit -c <path-to-config-file>

Citation

If you use this code in your research, please cite:

bib @InProceedings{gienapp:2025b, address = {New York}, author = {Lukas Gienapp and Niklas Deckers and Martin Potthast and Harrisen Scells}, booktitle = {15th International Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR 2025)}, doi = {10.1145/3731120.3744594}, editor = {Hamed Zamani and Laura Dietz and Benjamin Piwowarski and Sebastian Bruch}, isbn = {979-8-4007-1861-8/2025/07}, month = jul, pages = {275--285}, publisher = {ACM}, site = {Padua, Italy}, title = {{Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins}}, year = 2025 }


Owner

  • Name: Webis
  • Login: webis-de
  • Kind: organization
  • Location: Halle / Leipzig / Paderborn / Weimar

Web Technology & Information Systems Group (Webis Group)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Learning Effective Representations for Retrieval using
  Self-Distillation with Adaptive Relevance Margins
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Lukas
    family-names: Gienapp
    affiliation: Leipzig University & ScaDS.AI
    orcid: 'https://orcid.org/0000-0001-5707-3751'
  - given-names: Niklas
    family-names: Deckers
    affiliation: University of Kassel & ScaDS.AI & hessian.AI
    orcid: 'https://orcid.org/0000-0001-6803-1223'
  - given-names: Harrisen
    family-names: Scells
    affiliation: Leipzig University
    orcid: 'https://orcid.org/0000-0001-9578-7157'
  - given-names: Martin
    family-names: Potthast
    affiliation: University of Kassel & ScaDS.AI & hessian.AI
    orcid: 'https://orcid.org/0000-0003-2451-0665'
abstract: >
  Representation-based retrieval models, so-called
  bi-encoders, estimate the relevance of a document to a
  query by calculating the similarity of their respective
  embeddings. Current state-of-the-art bi-encoders are
  trained using an expensive training regime involving
  knowledge distillation from a teacher model and extensive
  batch-sampling techniques. Instead of relying on a teacher
  model, we contribute a novel parameter-free loss function
  for self-supervision that exploits the pre-trained text
  similarity capabilities of the encoder model as a training
  signal, eliminating the need for batch sampling by
  performing implicit hard negative mining. We explore the
  capabilities of our proposed approach through extensive
  ablation studies, demonstrating that self-distillation can
  match the effectiveness of teacher-distillation approaches
  while requiring only a fraction of the data and compute.
license: MIT

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 26 days
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.07
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 14
Past Year
  • Issues: 0
  • Pull requests: 14
  • Average time to close issues: N/A
  • Average time to close pull requests: 26 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.07
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 14
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (10)
Top Labels
Issue Labels
Pull Request Labels
dependencies (10)

Dependencies

Dockerfile docker
  • pytorch/pytorch 2.1.0-cuda11.8-cudnn8-runtime build
requirements.txt pypi
  • GitPython ==3.1.41
  • Jinja2 ==3.1.3
  • MarkupSafe ==2.1.3
  • PyJWT ==2.8.0
  • PyYAML ==6.0.1
  • Pygments ==2.17.2
  • Send2Trash ==1.8.2
  • aiobotocore ==2.7.0
  • aiohttp ==3.9.1
  • aioitertools ==0.11.0
  • aiosignal ==1.3.1
  • annotated-types ==0.6.0
  • antlr4-python3-runtime ==4.9.3
  • anyio ==4.2.0
  • appdirs ==1.4.4
  • arrow ==1.3.0
  • async-timeout ==4.0.3
  • attrs ==23.2.0
  • backoff ==2.2.1
  • beautifulsoup4 ==4.12.2
  • bitsandbytes ==0.42.0
  • blessed ==1.20.0
  • boto3 ==1.28.64
  • botocore ==1.31.64
  • certifi ==2023.11.17
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • contourpy ==1.2.0
  • croniter ==1.4.1
  • cycler ==0.12.1
  • datasets ==2.16.1
  • dateutils ==0.6.12
  • deepdiff ==6.7.1
  • dill ==0.3.7
  • docker ==6.1.3
  • docker-pycreds ==0.4.0
  • docstring-parser ==0.15
  • editor ==1.6.5
  • exceptiongroup ==1.2.0
  • faiss-gpu ==1.7.2
  • fastapi ==0.109.0
  • filelock ==3.13.1
  • fonttools ==4.47.2
  • frozenlist ==1.4.1
  • fsspec ==2023.10.0
  • gitdb ==4.0.11
  • h11 ==0.14.0
  • huggingface-hub ==0.20.2
  • hydra-core ==1.3.2
  • idna ==3.6
  • importlib-resources ==6.1.1
  • inquirer ==3.2.1
  • jmespath ==1.0.1
  • jsonargparse ==4.27.1
  • kiwisolver ==1.4.5
  • lightning ==2.1.3
  • lightning-api-access ==0.0.5
  • lightning-cloud ==0.5.57
  • lightning-fabric ==2.1.3
  • lightning-utilities ==0.10.0
  • markdown-it-py ==3.0.0
  • matplotlib ==3.8.2
  • mdurl ==0.1.2
  • mpmath ==1.3.0
  • multidict ==6.0.4
  • multiprocess ==0.70.15
  • networkx ==3.2.1
  • numpy ==1.26.3
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.18.1
  • nvidia-nvjitlink-cu12 ==12.3.101
  • nvidia-nvtx-cu12 ==12.1.105
  • omegaconf ==2.3.0
  • ordered-set ==4.1.0
  • packaging ==23.2
  • pandas ==2.1.4
  • pillow ==10.2.0
  • protobuf ==4.25.2
  • psutil ==5.9.7
  • pyarrow ==14.0.2
  • pyarrow-hotfix ==0.6
  • pydantic ==2.5.3
  • pydantic_core ==2.14.6
  • pyparsing ==3.1.1
  • python-dateutil ==2.8.2
  • python-git ==2018.2.1
  • python-multipart ==0.0.6
  • pytorch-lightning ==2.1.3
  • pytz ==2023.3.post1
  • readchar ==4.0.5
  • redis ==5.0.1
  • regex ==2023.12.25
  • requests ==2.31.0
  • rich ==13.7.0
  • runs ==1.2.0
  • s3fs ==2023.10.0
  • s3transfer ==0.7.0
  • safetensors ==0.4.1
  • scipy ==1.11.4
  • sentry-sdk ==1.39.2
  • setproctitle ==1.3.3
  • six ==1.16.0
  • smmap ==5.0.1
  • sniffio ==1.3.0
  • soupsieve ==2.5
  • starlette ==0.35.1
  • sympy ==1.12
  • tensorboardX ==2.6.2.2
  • tokenizers ==0.15.0
  • torch ==2.1.2
  • torchaudio ==2.1.2
  • torchmetrics ==1.3.0
  • torchvision ==0.16.2
  • tqdm ==4.66.1
  • traitlets ==5.14.1
  • transformers ==4.36.2
  • triton ==2.1.0
  • types-python-dateutil ==2.8.19.20240106
  • typeshed-client ==2.4.0
  • typing_extensions ==4.9.0
  • tzdata ==2023.4
  • urllib3 ==2.0.7
  • uvicorn ==0.26.0
  • wandb ==0.16.2
  • wcwidth ==0.2.13
  • websocket-client ==1.7.0
  • websockets ==11.0.3
  • wrapt ==1.16.0
  • xmod ==1.8.1
  • xxhash ==3.4.1
  • yarl ==1.9.4
setup.py pypi