sn-echoes

Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

https://github.com/soccernet/sn-echoes

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

Basic Info
Statistics
  • Stars: 12
  • Watchers: 5
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

SoccerNet-Echoes

Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.

Dataset

Each folder inside the Dataset directory is categorized by league, season, and game. Within these folders, JSON files contain the transcribed and translated game commentary.

```python

📂 Dataset ├── 📁 whisperv1 │ ├── 🏆 englandepl │ │ ├── 📅 2014-2015 │ │ │ └── ⚽ 2016-03-02 - 23-00 Liverpool 3 - 0 Manchester City │ │ │ ├── ☁️ 1asr.json │ │ │ └── ☁️ 2asr.json │ │ ├── 📅 2015-2016 │ │ └── ... │ ├── 🏆 europeuefa-champions-league │ └── ... ├── 📁 whisperv1en │ └── ... ├── 📁 whisperv2 │ └── ... ├── 📁 whisperv2en │ └── ... ├── 📁 whisper_v3 │ └── ...

whisperv1: Contains ASR from Whisper v1. whisperv1en: English-translated datasets from Whisper v1. whisperv2: Contains ASR from Whisper v2. whisperv2en: English-translated datasets from Whisper v2. whisper_v3: Contains ASR from Whisper v3. ```

Each JSON file has the following format: ```python

{ "segments": { segment index (int):[ start time in second (float), end time in second (float), transcribed text from ASR ] .... } } The top-level object is named segments. It contains an object where each key represents a unique segment index (e.g., "0", "1", "2", etc.). Each segment index object has the following properties: python starttime: A number representing the starting time of the segment in seconds. endtime: A number representing the ending time of the segment in seconds. text: A string containing the textual content of the commentary segment. ```

Citation

Please cite our work if you use the SoccerNet-Echoes dataset:


@misc{gautam2024soccernetechoes,
      title={SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset}, 
      author={Sushant Gautam and Mehdi Houshmand Sarkhoosh and Jan Held and Cise Midoglu and Anthony Cioppa and Silvio Giancola and Vajira Thambawita and Michael A. Riegler and Pål Halvorsen and Mubarak Shah},
      year={2024},
      eprint={2405.07354},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      doi={10.48550/arXiv.2405.07354}
}

Owner

  • Name: SoccerNet
  • Login: SoccerNet
  • Kind: organization
  • Email: soccernet@uliege.be

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this dataset in your work, please cite it."
title: "SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset"
authors:
  - family-names: "Gautam"
    given-names: "Sushant"
    affiliation: "SimulaMet, OsloMet"
    country: "Norway"
  - family-names: "Houshmand Sarkhoosh"
    given-names: "Mehdi"
    affiliation: "OsloMet, Forzasys"
    country: "Norway"
  - family-names: "Held"
    given-names: "Jan"
    affiliation: "University of Liège"
    country: "Belgium"
  - family-names: "Midoglu"
    given-names: "Cise"
    affiliation: "SimulaMet, Forzasys"
    country: "Norway"
  - family-names: "Cioppa"
    given-names: "Anthony"
    affiliation: "University of Liège, KAUST"
    country: "Belgium, Saudi Arabia"
  - family-names: "Giancola"
    given-names: "Silvio"
    affiliation: "KAUST"
    country: "Saudi Arabia"
  - family-names: "Thambawita"
    given-names: "Vajira"
    affiliation: "SimulaMet"
    country: "Norway"
  - family-names: "Riegler"
    given-names: "Michael A."
    affiliation: "SimulaMet"
    country: "Norway"
  - family-names: "Halvorsen"
    given-names: "Pål"
    affiliation: "SimulaMet, OsloMet, Forzasys"
    country: "Norway"
  - family-names: "Shah"
    given-names: "Mubarak"
    affiliation: "UCF Center for Research in Computer Vision"
    country: "USA"
version: 1.0
abstract: "This repository contains the SoccerNet-Echoes dataset, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games."
doi: "10.48550/arXiv.2405.07354"
keywords:
  - football
  - soccer
  - dataset
  - match
  - game
  - sports
repository-code: "https://github.com/SoccerNet/sn-echoes"

GitHub Events

Total
  • Watch event: 7
  • Push event: 1
Last Year
  • Watch event: 7
  • Push event: 1