https://github.com/ferencberes/twittertennis

Utility python package for RG17 and UO17 Twitter tennis data sets.

https://github.com/ferencberes/twittertennis

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

research temporal-networks twitter-data twitter-data-analysis twitter-dataset
Last synced: 6 months ago · JSON representation

Repository

Utility python package for RG17 and UO17 Twitter tennis data sets.

Basic Info
Statistics
  • Stars: 3
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
research temporal-networks twitter-data twitter-data-analysis twitter-dataset
Created over 6 years ago · Last pushed over 4 years ago
Metadata Files
Readme License

README.md

twittertennis

build codecov PyPI - Python Version Binder

Utility python package for RG17 and UO17 Twitter tennis tournament data sets.

Introduction

This repository is a Python package that ease the interaction with two Twitter data sets related to tennis tournaments: RG17 (Roland-Garros 2017) and UO17 (USOpen 2017). In our research, we used the underlying Twitter mention graphs to analyse the performance of mulitple dynamic centrality measures and temporal node embedding methods. A huge advantage of our data is that the nodes (Twitter accounts) of the network are temporally labeled thus we could compare online graph algortihms in supervised evaluation tasks. The labels encode whether a given node in the Twitter mention network is related to a tennis player who played in a tournament on the given day. For more details on these data sets, see our paper: Temporal walk based centrality metric for graph streams.

How to deploy?

Install

bash pip install twittertennis

Tests

git clone https://github.com/ferencberes/twittertennis.git cd twittertennis python setup.py test

Examples

Quick start

In this short example the RG17 (Roland-Garros 2017) data set is processed by the TennisDataHandler object. The data is automatically downloaded to the '../data/' folder during the first execution! After data preparation steps, mention links and daily node relevance labels are exported for further analysis.

  • Initialize data preprocessor

```python import twittertennis.handler as tt

handler = tt.TennisDataHandler("../data/", "rg17", include_qualifiers=True) print(handler.summary()) ```

  • Export mention links:

python handler.export_edges(YOUR_OUTPUT_DIR) - Export daily node relevance labels:

python handler.export_relevance_labels(YOUR_OUTPUT_DIR, binary=True) OR change the last line of the code if you only want to export relevant nodes for each day: python handler.export_relevance_labels(YOUR_OUTPUT_DIR, binary=True, only_pos_label=True)

Preprocessed file content:

After data preprocessing you will find the following files in your specified folder:

  • edges.csv : edge stream of Twitter mentions. The timestamp in the first column in followed by the source and target node identifiers.
  • label_*.csv : list of relevant node identifiers for each day
  • summary.json : parameters set for TennisDataHandler during data preparation

See more examples in this notebook.

Related research

1. Temporal walk based centrality metric for graph streams: paper code

@article{Beres2018, author="B{\'e}res, Ferenc and P{\'a}lovics, R{\'o}bert and Ol{\'a}h, Anna and Bencz{\'u}r, Andr{\'a}s A.", title="Temporal walk based centrality metric for graph streams", journal="Applied Network Science", year="2018", volume="3", number="32", pages="26", issn="2364-8228", }

2. Node embeddings in dynamic graphs: paper code @Article{Béres2019, author="B{\'e}res, Ferenc and Kelen, Domokos M. and P{\'a}lovics, R{\'o}bert and Bencz{\'u}r, Andr{\'a}s A.", title="Node embeddings in dynamic graphs", journal="Applied Network Science", year="2019", volume="4", number="64", pages="25", }

3. PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models: paper code @article{RozemberczkiPGT2021, author = {Benedek Rozemberczki and Paul Scherer and Yixuan He and George Panagopoulos and Maria Sinziana Astefanoaei and Oliver Kiss and Ferenc B{\'{e}}res and Nicolas Collignon and Rik Sarkar}, title = {PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models}, volume = {abs/2104.07788}, year = {2021}, url = {https://arxiv.org/abs/2104.07788}, archivePrefix = {arXiv}, eprint = {2104.07788}, }

Owner

  • Name: Ferenc Béres
  • Login: ferencberes
  • Kind: user
  • Location: Hungary, Budapest
  • Company: SZTAKI (Institute for Computer Science and Control)

PhD student in Network Science

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 31
  • Total Committers: 1
  • Avg Commits per committer: 31.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Beres Ferenc f****5@g****m 31

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 61 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 1
  • Total maintainers: 1
pypi.org: twittertennis

Utility packages for Twitter tennis tournaments data sets.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 61 Last month
Rankings
Dependent packages count: 10.0%
Dependent repos count: 21.7%
Forks count: 22.6%
Average: 27.3%
Stargazers count: 27.8%
Downloads: 54.2%
Maintainers (1)
Last synced: about 1 year ago

Dependencies

setup.py pypi
  • datetime *
  • matplotlib *
  • networkx *
  • pandas *
  • pytz *
  • seaborn *
  • tqdm *