horizon-dataset
The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.
Basic Info
- Host: GitHub
- Owner: Serendipity-AI
- License: apache-2.0
- Default Branch: master
- Size: 34.2 KB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Horizon Dataset
The Horizon Dataset is an open-sourced, high-quality dataset for evaluating event forecasting systems such as in Soru and Marshall, 2025. It contains 150 structured real-world events across 15 topical domains, each annotated with a factual outcome assessed over time.
Overview
- Name: Horizon Dataset
- Format: JSON
- Forecasting Date: 16 Feb 2024
- Fact-checking Date: 28 Oct 2024
- Total Events: 150
- Topics:
- Arctic
- Artificial Intelligence
- Automotive
- Blockchain
- Climate Change
- Cybersecurity
- Disinformation
- Egypt
- Energy
- European Union
- Geopolitics
- Italy
- Space
- United Kingdom
- United States
- Arctic
Schema
Each JSON object contains the following fields:
"id": Unique identifier for the event."topic": Topic category (see above)."title": A short name of the forecasted event."description": A short textual description of the forecasted event."valid": False if the event had happened even before the forecasting date."timeframe": The timeframe against which the event probability should be computed."outcome": Outcome label relative to the forecasting and fact-checking window:
1= Event occurred-1= Event did not occur0= Not enough evidence to confirm either
Example
json
{
"id": 0,
"topic": "Italy",
"title": "Italy hosts a major international summit on AI regulation in Rome",
"description": "Italy, leveraging its strategic position in the EU and its current focus on digital market regulation, hosts a significant international summit aimed at advancing discussions on a unified regulatory framework for artificial intelligence.",
"valid": true,
"timeframe": 1723593600,
"outcome": 1
},
Usage
This dataset is intended for use in:
- Forecast evaluation and benchmarking
- Long-term reasoning with language models
- Temporal knowledge extraction
- Strategic foresight and geopolitical analysis
There are no usage restrictions.
Download
You can download the dataset directly from this GitHub repository.
Citation
If you use the Horizon Dataset in your work, please cite the following:
bibtex
@INPROCEEDINGS {11036314,
author = {Soru, Tommaso and Marshall, Jim},
booktitle = {2025 19th International Conference on Semantic Computing (ICSC)},
title = {{Anticipating the Future with Large Language Models}},
year = {2025},
url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038}
}
Additional Information
This dataset was compiled as part of the Horizon project to support research on temporal forecasting with large language models. For methodology, data construction, and evaluation details, refer to the accompanying papers:
- 👉 Anticipating the Future with Large Language Models
- 👉 Leveraging Log Probabilities in Language Models to Forecast Future Events
License
Licensed under the Apache License 2.0.
Limitations
- The dataset includes a small number of events (150), curated for quality rather than coverage.
- Topics are diverse but do not exhaustively represent all domains or regions.
- Outcome labels reflect available evidence as of 28 Oct 2024 and may not capture long-term developments.
Owner
- Name: Serendipity AI
- Login: Serendipity-AI
- Kind: organization
- Location: London, UK
- Website: https://www.serendipityai.co.uk
- Repositories: 2
- Profile: https://github.com/Serendipity-AI
Citation (CITATION.cff)
@INPROCEEDINGS {11036314,
author = {Soru, Tommaso and Marshall, Jim},
booktitle = {2025 19th International Conference on Semantic Computing (ICSC)},
title = {{Anticipating the Future with Large Language Models}},
year = {2025},
url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038}
}
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3