https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation
A Multi-domain Benchmark for Personalized Search Evaluation
https://github.com/amenra/a-multi-domain-benchmark-for-personalized-search-evaluation
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.6%) to scientific vocabulary
Keywords
Repository
A Multi-domain Benchmark for Personalized Search Evaluation
Basic Info
Statistics
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
A Multi-domain Benchmark for Personalized Search Evaluation
We provide large-scale multi-domain benchmark datasets for Personalized Search.
The datasets can be found here.
Models' source code can be found here.
Pre-computed baseline runs are available on ranxhub.
Citation
Please cite the following paper if you use the data or code in this repo.
@inproceedings{bassani2022multi,
title={A Multi-Domain Benchmark for Personalized Search Evaluation},
author={Bassani, Elias and Kasela, Pranav and Raganato, Alessandro and Pasi, Gabriella},
booktitle={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
pages={3822--3827},
year={2022}
}
Folder structure of each dataset
- train:
- queries.jsonl
- query_ids.txt
- val:
- bm25_run.json
- qrels.json
- queries.jsonl
- query_ids.txt
- test:
- bm25_run.json
- qrels.json
- queries.jsonl
- query_ids.txt
- collection.jsonl
- fos_hierarachies.jsonl
- in_refs.jsonl
- out_refs.jsonl
- has_authors.jsonl
- authors.jsonl
- affiliations.jsonl
- conference_instances.jsonl
- conference_series.jsonl
- journals.jsonl
- bm25_config.json
File descriptions
queries.jsonl
Each JSON line is as follows:
python
{
"id": ...
"text": ...
"rel_doc_ids": ... # IDs of the relevant documents
"user_id": ... # Same as `author_id` in other files
"user_doc_ids": ... # IDs of the associated user documents
"bm25_doc_ids": ... # IDs of the documents retrieved by BM25
"bm25_doc_scores": ... # Scores assigned by BM25 to the retrieved documents
"timestamp": ...
}
collection.jsonl
Each JSON line is as follows:
python
{
"id": ...
"title": ...
"text": ...
"keywords": ...
"fields_of_study": ...
"publication_date": ...
"timestamp": ...
"conference_instance_id": ...
"conference_series_id": ...
"journal_id": ...
"issue_id": ...
"volume": ...
"publisher": ...
"doi": ...
}
authors.jsonl
Each JSON line is as follows:
python
{
"id": ...
"name": ...
"affiliation_id": ...
"docs": [{"doc_id": "...", "timestamp": ...}, ...]
}
has_authors.jsonl
Each JSON line is as follows:
python
{
"doc_id": ...
"timestamp": ...
"author_ids": ["123678452", ...]
}
in_refs.jsonl (incoming reference)
Each JSON line is as follows:
python
{
"doc_id": ...
"in_refs": [{"doc_id": "...", "timestamp": ...}, ...]
}
out_refs.jsonl (outgoing reference)
Each JSON line is as follows:
python
{
"doc_id": ...
"timestamp": ...
"out_refs": ["2048600620", ...]
}
affiliations.jsonl
Each JSON line is as follows:
python
{
"id": ...
"name": ... # Name of the institution
}
conference_instances.jsonl
Each JSON line is as follows:
python
{
"id": ...
"name": ...
"conference_series_id": ...
}
conference_series.jsonl
Each JSON line is as follows:
python
{
"id": ...
"name": ...
}
journals.jsonl
Each JSON line is as follows:
python
{
"id": ...
"name": ...
}
fields_of_study_hierarchies.jsonl
Fields of studies associated with the documents have a hierarchical tree structure.
Each JSON line is as follows:
python
{
"id": ...
"hierarchy": ...
}
Owner
- Name: Elias Bassani
- Login: AmenRa
- Kind: user
- Location: Milan, Italy
- Company: Joint Research Centre
- Website: amenra.github.io/eliasbassani
- Twitter: elias_bssn
- Repositories: 28
- Profile: https://github.com/AmenRa
Ph.D. in CS. I like Neural Networks, usability, efficiency, einsum, memes, and improperly used emojis. 🫠
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Dependencies
- krovetzstemmer *
- pycld3 *
- ranx *