horizon-dataset

The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.

https://github.com/serendipity-ai/horizon-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.

Basic Info

Host: GitHub
Owner: Serendipity-AI
License: apache-2.0
Default Branch: master
Size: 34.2 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

Horizon Dataset

The Horizon Dataset is an open-sourced, high-quality dataset for evaluating event forecasting systems such as in Soru and Marshall, 2025. It contains 150 structured real-world events across 15 topical domains, each annotated with a factual outcome assessed over time.

Overview

Name: Horizon Dataset
Format: JSON
Forecasting Date: 16 Feb 2024
Fact-checking Date: 28 Oct 2024
Total Events: 150
Topics:
- Arctic
- Artificial Intelligence
- Automotive
- Blockchain
- Climate Change
- Cybersecurity
- Disinformation
- Egypt
- Energy
- European Union
- Geopolitics
- Italy
- Space
- United Kingdom
- United States

Schema

Each JSON object contains the following fields:

"id": Unique identifier for the event.
"topic": Topic category (see above).
"title": A short name of the forecasted event.
"description": A short textual description of the forecasted event.
"valid": False if the event had happened even before the forecasting date.
"timeframe": The timeframe against which the event probability should be computed.
"outcome": Outcome label relative to the forecasting and fact-checking window:
- 1 = Event occurred
- -1 = Event did not occur
- 0 = Not enough evidence to confirm either

Example

json { "id": 0, "topic": "Italy", "title": "Italy hosts a major international summit on AI regulation in Rome", "description": "Italy, leveraging its strategic position in the EU and its current focus on digital market regulation, hosts a significant international summit aimed at advancing discussions on a unified regulatory framework for artificial intelligence.", "valid": true, "timeframe": 1723593600, "outcome": 1 },

Usage

This dataset is intended for use in:

Forecast evaluation and benchmarking
Long-term reasoning with language models
Temporal knowledge extraction
Strategic foresight and geopolitical analysis

There are no usage restrictions.

Download

You can download the dataset directly from this GitHub repository.

Citation

If you use the Horizon Dataset in your work, please cite the following:

bibtex @INPROCEEDINGS {11036314, author = {Soru, Tommaso and Marshall, Jim}, booktitle = {2025 19th International Conference on Semantic Computing (ICSC)}, title = {{Anticipating the Future with Large Language Models}}, year = {2025}, url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038} }

Additional Information

This dataset was compiled as part of the Horizon project to support research on temporal forecasting with large language models. For methodology, data construction, and evaluation details, refer to the accompanying papers:

License

Licensed under the Apache License 2.0.

Limitations

The dataset includes a small number of events (150), curated for quality rather than coverage.
Topics are diverse but do not exhaustively represent all domains or regions.
Outcome labels reflect available evidence as of 28 Oct 2024 and may not capture long-term developments.

Owner

Name: Serendipity AI
Login: Serendipity-AI
Kind: organization
Location: London, UK

Website: https://www.serendipityai.co.uk
Repositories: 2
Profile: https://github.com/Serendipity-AI

Citation (CITATION.cff)

@INPROCEEDINGS {11036314,
  author = {Soru, Tommaso and Marshall, Jim},
  booktitle = {2025 19th International Conference on Semantic Computing (ICSC)},
  title = {{Anticipating the Future with Large Language Models}},
  year = {2025},
  url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038}
}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science