https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation

A Multi-domain Benchmark for Personalized Search Evaluation

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary

Keywords

dataset evaluation information-retrieval personalization personalized-search

Last synced: 5 months ago · JSON representation

Repository

A Multi-domain Benchmark for Personalized Search Evaluation

Basic Info

Host: GitHub
Owner: AmenRa
Language: Python
Default Branch: master
Homepage:
Size: 81.1 KB

Statistics

Stars: 8
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 0

Topics

dataset evaluation information-retrieval personalization personalized-search

Created almost 4 years ago · Last pushed over 2 years ago

Metadata Files

Readme

A Multi-domain Benchmark for Personalized Search Evaluation

We provide large-scale multi-domain benchmark datasets for Personalized Search.

The datasets can be found here.
Models' source code can be found here.
Pre-computed baseline runs are available on ranxhub.

Citation

Please cite the following paper if you use the data or code in this repo.

@inproceedings{bassani2022multi, title={A Multi-Domain Benchmark for Personalized Search Evaluation}, author={Bassani, Elias and Kasela, Pranav and Raganato, Alessandro and Pasi, Gabriella}, booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management}, pages={3822--3827}, year={2022} }

Folder structure of each dataset

- train: - queries.jsonl - query_ids.txt - val: - bm25_run.json - qrels.json - queries.jsonl - query_ids.txt - test: - bm25_run.json - qrels.json - queries.jsonl - query_ids.txt - collection.jsonl - fos_hierarachies.jsonl - in_refs.jsonl - out_refs.jsonl - has_authors.jsonl - authors.jsonl - affiliations.jsonl - conference_instances.jsonl - conference_series.jsonl - journals.jsonl - bm25_config.json

File descriptions

`queries.jsonl`

Each JSON line is as follows: python { "id": ... "text": ... "rel_doc_ids": ... # IDs of the relevant documents "user_id": ... # Same as `author_id` in other files "user_doc_ids": ... # IDs of the associated user documents "bm25_doc_ids": ... # IDs of the documents retrieved by BM25 "bm25_doc_scores": ... # Scores assigned by BM25 to the retrieved documents "timestamp": ... }

`collection.jsonl`

Each JSON line is as follows: python { "id": ... "title": ... "text": ... "keywords": ... "fields_of_study": ... "publication_date": ... "timestamp": ... "conference_instance_id": ... "conference_series_id": ... "journal_id": ... "issue_id": ... "volume": ... "publisher": ... "doi": ... }

`authors.jsonl`

Each JSON line is as follows: python { "id": ... "name": ... "affiliation_id": ... "docs": [{"doc_id": "...", "timestamp": ...}, ...] }

`has_authors.jsonl`

Each JSON line is as follows: python { "doc_id": ... "timestamp": ... "author_ids": ["123678452", ...] }

`in_refs.jsonl` (incoming reference)

Each JSON line is as follows: python { "doc_id": ... "in_refs": [{"doc_id": "...", "timestamp": ...}, ...] }

`out_refs.jsonl` (outgoing reference)

Each JSON line is as follows: python { "doc_id": ... "timestamp": ... "out_refs": ["2048600620", ...] }

`affiliations.jsonl`

Each JSON line is as follows: python { "id": ... "name": ... # Name of the institution }

`conference_instances.jsonl`

Each JSON line is as follows: python { "id": ... "name": ... "conference_series_id": ... }

`conference_series.jsonl`

Each JSON line is as follows: python { "id": ... "name": ... }

`journals.jsonl`

Each JSON line is as follows: python { "id": ... "name": ... }

`fields_of_study_hierarchies.jsonl`

Fields of studies associated with the documents have a hierarchical tree structure.
Each JSON line is as follows: python { "id": ... "hierarchy": ... }

Owner

Name: Elias Bassani
Login: AmenRa
Kind: user
Location: Milan, Italy
Company: Joint Research Centre

Website: amenra.github.io/eliasbassani
Twitter: elias_bssn
Repositories: 28
Profile: https://github.com/AmenRa

Ph.D. in CS. I like Neural Networks, usability, efficiency, einsum, memes, and improperly used emojis. 🫠

GitHub Events

Total

Watch event: 2

Last Year

Watch event: 2

Dependencies

environment.yml pypi

krovetzstemmer *
pycld3 *
ranx *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Multi-domain Benchmark for Personalized Search Evaluation

Citation

Folder structure of each dataset

File descriptions

`queries.jsonl`

`collection.jsonl`

`authors.jsonl`

`has_authors.jsonl`

`in_refs.jsonl` (incoming reference)

`out_refs.jsonl` (outgoing reference)

`affiliations.jsonl`

`conference_instances.jsonl`

`conference_series.jsonl`

`journals.jsonl`

`fields_of_study_hierarchies.jsonl`

Owner

GitHub Events

Total

Last Year

Dependencies

https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Multi-domain Benchmark for Personalized Search Evaluation

Citation

Folder structure of each dataset

File descriptions

queries.jsonl

collection.jsonl

authors.jsonl

has_authors.jsonl

in_refs.jsonl (incoming reference)

out_refs.jsonl (outgoing reference)

affiliations.jsonl

conference_instances.jsonl

conference_series.jsonl

journals.jsonl

fields_of_study_hierarchies.jsonl

Owner

GitHub Events

Total

Last Year

Dependencies

`queries.jsonl`

`collection.jsonl`

`authors.jsonl`

`has_authors.jsonl`

`in_refs.jsonl` (incoming reference)

`out_refs.jsonl` (outgoing reference)

`affiliations.jsonl`

`conference_instances.jsonl`

`conference_series.jsonl`

`journals.jsonl`

`fields_of_study_hierarchies.jsonl`