https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation

A Multi-domain Benchmark for Personalized Search Evaluation

https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

dataset evaluation information-retrieval personalization personalized-search
Last synced: 5 months ago · JSON representation

Repository

A Multi-domain Benchmark for Personalized Search Evaluation

Basic Info
  • Host: GitHub
  • Owner: AmenRa
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 81.1 KB
Statistics
  • Stars: 8
  • Watchers: 2
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
dataset evaluation information-retrieval personalization personalized-search
Created almost 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme

README.md

A Multi-domain Benchmark for Personalized Search Evaluation

DOI

We provide large-scale multi-domain benchmark datasets for Personalized Search.

The datasets can be found here.
Models' source code can be found here.
Pre-computed baseline runs are available on ranxhub.

Citation

Please cite the following paper if you use the data or code in this repo.

@inproceedings{bassani2022multi, title={A Multi-Domain Benchmark for Personalized Search Evaluation}, author={Bassani, Elias and Kasela, Pranav and Raganato, Alessandro and Pasi, Gabriella}, booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management}, pages={3822--3827}, year={2022} }

Folder structure of each dataset

- train: - queries.jsonl - query_ids.txt - val: - bm25_run.json - qrels.json - queries.jsonl - query_ids.txt - test: - bm25_run.json - qrels.json - queries.jsonl - query_ids.txt - collection.jsonl - fos_hierarachies.jsonl - in_refs.jsonl - out_refs.jsonl - has_authors.jsonl - authors.jsonl - affiliations.jsonl - conference_instances.jsonl - conference_series.jsonl - journals.jsonl - bm25_config.json

File descriptions

queries.jsonl

Each JSON line is as follows: python { "id": ... "text": ... "rel_doc_ids": ... # IDs of the relevant documents "user_id": ... # Same as `author_id` in other files "user_doc_ids": ... # IDs of the associated user documents "bm25_doc_ids": ... # IDs of the documents retrieved by BM25 "bm25_doc_scores": ... # Scores assigned by BM25 to the retrieved documents "timestamp": ... }

collection.jsonl

Each JSON line is as follows: python { "id": ... "title": ... "text": ... "keywords": ... "fields_of_study": ... "publication_date": ... "timestamp": ... "conference_instance_id": ... "conference_series_id": ... "journal_id": ... "issue_id": ... "volume": ... "publisher": ... "doi": ... }

authors.jsonl

Each JSON line is as follows: python { "id": ... "name": ... "affiliation_id": ... "docs": [{"doc_id": "...", "timestamp": ...}, ...] }

has_authors.jsonl

Each JSON line is as follows: python { "doc_id": ... "timestamp": ... "author_ids": ["123678452", ...] }

in_refs.jsonl (incoming reference)

Each JSON line is as follows: python { "doc_id": ... "in_refs": [{"doc_id": "...", "timestamp": ...}, ...] }

out_refs.jsonl (outgoing reference)

Each JSON line is as follows: python { "doc_id": ... "timestamp": ... "out_refs": ["2048600620", ...] }

affiliations.jsonl

Each JSON line is as follows: python { "id": ... "name": ... # Name of the institution }

conference_instances.jsonl

Each JSON line is as follows: python { "id": ... "name": ... "conference_series_id": ... }

conference_series.jsonl

Each JSON line is as follows: python { "id": ... "name": ... }

journals.jsonl

Each JSON line is as follows: python { "id": ... "name": ... }

fields_of_study_hierarchies.jsonl

Fields of studies associated with the documents have a hierarchical tree structure.
Each JSON line is as follows: python { "id": ... "hierarchy": ... }

Owner

  • Name: Elias Bassani
  • Login: AmenRa
  • Kind: user
  • Location: Milan, Italy
  • Company: Joint Research Centre

Ph.D. in CS. I like Neural Networks, usability, efficiency, einsum, memes, and improperly used emojis. 🫠

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Dependencies

environment.yml pypi
  • krovetzstemmer *
  • pycld3 *
  • ranx *