sn-echoes
Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.2%) to scientific vocabulary
Repository
Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Basic Info
- Host: GitHub
- Owner: SoccerNet
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2405.07354
- Size: 145 MB
Statistics
- Stars: 12
- Watchers: 5
- Forks: 3
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
SoccerNet-Echoes
Official repo for the paper: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset.
Dataset
Each folder inside the Dataset directory is categorized by league, season, and game. Within these folders, JSON files contain the transcribed and translated game commentary.
```python
📂 Dataset ├── 📁 whisperv1 │ ├── 🏆 englandepl │ │ ├── 📅 2014-2015 │ │ │ └── ⚽ 2016-03-02 - 23-00 Liverpool 3 - 0 Manchester City │ │ │ ├── ☁️ 1asr.json │ │ │ └── ☁️ 2asr.json │ │ ├── 📅 2015-2016 │ │ └── ... │ ├── 🏆 europeuefa-champions-league │ └── ... ├── 📁 whisperv1en │ └── ... ├── 📁 whisperv2 │ └── ... ├── 📁 whisperv2en │ └── ... ├── 📁 whisper_v3 │ └── ...
whisperv1: Contains ASR from Whisper v1. whisperv1en: English-translated datasets from Whisper v1. whisperv2: Contains ASR from Whisper v2. whisperv2en: English-translated datasets from Whisper v2. whisper_v3: Contains ASR from Whisper v3. ```
Each JSON file has the following format: ```python
{
"segments": {
segment index (int):[
start time in second (float),
end time in second (float),
transcribed text from ASR
]
....
}
}
The top-level object is named segments.
It contains an object where each key represents a unique segment index (e.g., "0", "1", "2", etc.).
Each segment index object has the following properties:
python
starttime: A number representing the starting time of the segment in seconds.
endtime: A number representing the ending time of the segment in seconds.
text: A string containing the textual content of the commentary segment.
```
Citation
Please cite our work if you use the SoccerNet-Echoes dataset:
@misc{gautam2024soccernetechoes,
title={SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset},
author={Sushant Gautam and Mehdi Houshmand Sarkhoosh and Jan Held and Cise Midoglu and Anthony Cioppa and Silvio Giancola and Vajira Thambawita and Michael A. Riegler and Pål Halvorsen and Mubarak Shah},
year={2024},
eprint={2405.07354},
archivePrefix={arXiv},
primaryClass={cs.SD},
doi={10.48550/arXiv.2405.07354}
}
Owner
- Name: SoccerNet
- Login: SoccerNet
- Kind: organization
- Email: soccernet@uliege.be
- Website: www.soccer-net.org
- Repositories: 14
- Profile: https://github.com/SoccerNet
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this dataset in your work, please cite it."
title: "SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset"
authors:
- family-names: "Gautam"
given-names: "Sushant"
affiliation: "SimulaMet, OsloMet"
country: "Norway"
- family-names: "Houshmand Sarkhoosh"
given-names: "Mehdi"
affiliation: "OsloMet, Forzasys"
country: "Norway"
- family-names: "Held"
given-names: "Jan"
affiliation: "University of Liège"
country: "Belgium"
- family-names: "Midoglu"
given-names: "Cise"
affiliation: "SimulaMet, Forzasys"
country: "Norway"
- family-names: "Cioppa"
given-names: "Anthony"
affiliation: "University of Liège, KAUST"
country: "Belgium, Saudi Arabia"
- family-names: "Giancola"
given-names: "Silvio"
affiliation: "KAUST"
country: "Saudi Arabia"
- family-names: "Thambawita"
given-names: "Vajira"
affiliation: "SimulaMet"
country: "Norway"
- family-names: "Riegler"
given-names: "Michael A."
affiliation: "SimulaMet"
country: "Norway"
- family-names: "Halvorsen"
given-names: "Pål"
affiliation: "SimulaMet, OsloMet, Forzasys"
country: "Norway"
- family-names: "Shah"
given-names: "Mubarak"
affiliation: "UCF Center for Research in Computer Vision"
country: "USA"
version: 1.0
abstract: "This repository contains the SoccerNet-Echoes dataset, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games."
doi: "10.48550/arXiv.2405.07354"
keywords:
- football
- soccer
- dataset
- match
- game
- sports
repository-code: "https://github.com/SoccerNet/sn-echoes"
GitHub Events
Total
- Watch event: 7
- Push event: 1
Last Year
- Watch event: 7
- Push event: 1