horizon-dataset

The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.

https://github.com/serendipity-ai/horizon-dataset

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The open-sourced event forecasting dataset used to evaluate Horizon and similar systems.

Basic Info
  • Host: GitHub
  • Owner: Serendipity-AI
  • License: apache-2.0
  • Default Branch: master
  • Size: 34.2 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Horizon Dataset

The Horizon Dataset is an open-sourced, high-quality dataset for evaluating event forecasting systems such as in Soru and Marshall, 2025. It contains 150 structured real-world events across 15 topical domains, each annotated with a factual outcome assessed over time.

Overview

  • Name: Horizon Dataset
  • Format: JSON
  • Forecasting Date: 16 Feb 2024
  • Fact-checking Date: 28 Oct 2024
  • Total Events: 150
  • Topics:
    • Arctic
    • Artificial Intelligence
    • Automotive
    • Blockchain
    • Climate Change
    • Cybersecurity
    • Disinformation
    • Egypt
    • Energy
    • European Union
    • Geopolitics
    • Italy
    • Space
    • United Kingdom
    • United States

Schema

Each JSON object contains the following fields:

  • "id": Unique identifier for the event.
  • "topic": Topic category (see above).
  • "title": A short name of the forecasted event.
  • "description": A short textual description of the forecasted event.
  • "valid": False if the event had happened even before the forecasting date.
  • "timeframe": The timeframe against which the event probability should be computed.
  • "outcome": Outcome label relative to the forecasting and fact-checking window:
    • 1 = Event occurred
    • -1 = Event did not occur
    • 0 = Not enough evidence to confirm either

Example

json { "id": 0, "topic": "Italy", "title": "Italy hosts a major international summit on AI regulation in Rome", "description": "Italy, leveraging its strategic position in the EU and its current focus on digital market regulation, hosts a significant international summit aimed at advancing discussions on a unified regulatory framework for artificial intelligence.", "valid": true, "timeframe": 1723593600, "outcome": 1 },

Usage

This dataset is intended for use in:

  • Forecast evaluation and benchmarking
  • Long-term reasoning with language models
  • Temporal knowledge extraction
  • Strategic foresight and geopolitical analysis

There are no usage restrictions.

Download

You can download the dataset directly from this GitHub repository.

Citation

If you use the Horizon Dataset in your work, please cite the following:

bibtex @INPROCEEDINGS {11036314, author = {Soru, Tommaso and Marshall, Jim}, booktitle = {2025 19th International Conference on Semantic Computing (ICSC)}, title = {{Anticipating the Future with Large Language Models}}, year = {2025}, url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038} }

Additional Information

This dataset was compiled as part of the Horizon project to support research on temporal forecasting with large language models. For methodology, data construction, and evaluation details, refer to the accompanying papers:

License

Licensed under the Apache License 2.0.

Limitations

  • The dataset includes a small number of events (150), curated for quality rather than coverage.
  • Topics are diverse but do not exhaustively represent all domains or regions.
  • Outcome labels reflect available evidence as of 28 Oct 2024 and may not capture long-term developments.

Owner

  • Name: Serendipity AI
  • Login: Serendipity-AI
  • Kind: organization
  • Location: London, UK

Citation (CITATION.cff)

@INPROCEEDINGS {11036314,
  author = {Soru, Tommaso and Marshall, Jim},
  booktitle = {2025 19th International Conference on Semantic Computing (ICSC)},
  title = {{Anticipating the Future with Large Language Models}},
  year = {2025},
  url = {https://doi.ieeecomputersociety.org/10.1109/ICSC64641.2025.00038}
}

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3