https://github.com/arrangabriel/com-480

https://github.com/arrangabriel/com-480

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: arrangabriel
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 2.27 MB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme

README.md

Project of Data Visualization (COM-480)

| Student's name | SCIPER | |--------------------------------|--------| | Halvor Linder Henriksen | 379433 | | Nils Holger Anders Johansson | 376469 | | Arran Øystein Kostveit Gabriel | 375923 |

Milestone 1Milestone 2Milestone 3

Milestone 1 (29th March, 5pm)

Dataset

We will mainly use one dataset from kaggle.com and one dataset published on github.com. The dataset from kaggle.com is Football Events (published by Alin Secareanu @secareanualin, downloaded 2024-03-12, kaggle.com) and the other is Football transfers dataset that is scraped from the web and published on github.com (published by Dmitrii Antipov @d2ski, downloaded 2024-03-12, github.com). In this project we only intend to visualize the events and transfers for a handful of players which we choose based on interest. This allows us to choose players with sufficient and high quality data which reduces the amount of data processing required.

The Football Events dataset is generally of high quality data. The data consists of over 900000 events that have occurred during football games from 2011/2012 season to 2017-01-27. Each event is tagged with the player that triggered the event together with a description of what happened. For example if a foul is committed or a goal is scored. The processing required is to project where the event happened on the pitch to a map of a football pitch. The description of where the event happened is categorized into 19 different locations like ‘Right wing’ or ‘Attacking Half’. The location descriptions are not very precise so we need to decide where to place it on the map but since the project is about visualization the correctness of the placements are not important.

The Football transfers dataset is also well ordered and of high quality. The dataset contains all transfers for the seven largest European leagues between 2009 - 2021. To connect the two datasets we need to use player names but this will again be easy since we only will use a few players and possible duplicates can be removed manually. The two datasets are not covering the exact same years but since we will not compare data between the sets that is also not a problem.

In conclusion the two main datasets we will use throughout the project are of high quality and do not require much processing.

Problematic

We want to visualize the careers of football players in Europe’s top clubs. We want to select a handful of top players and display their transfer history as a directed graph on a map, where nodes represent clubs/cities and directed edges represent transfers. In addition, we also want to show further information, like how players have performed throughout their careers and players that have played together at some point in time.

The beautiful game of football unfolds not just on the pitch, but also in the movement of players between clubs. The transfer market is becoming an increasingly important part of football. This project aims to capture and make available this part of the game. By visualizing the transfer history of players as an interactive map, we'll bring their careers to life, showcasing the clubs they've called home, their impacts at these clubs, and the evolution of transfer fees. But the story goes beyond destinations. We'll delve into player performance data to paint a more complete picture – how their play developed across different teams, and highlight instances where players crossed paths.

The project seeks to answer these questions:

  1. Player Career Visualization: How can we effectively visualize the career trajectories of top football players, highlighting their transfers between clubs and showcasing their performance over time?
  2. Club Analysis: What insights can we derive from analyzing the performance and playstyles of clubs these players have been associated with? How do transfers impact club dynamics and results?
Importantly we want to make the data interesting and easily digestible for football fans and others alike. Building intuitive and rich visualizations will be paramount in achieving this goal.

Exploratory Data Analysis

We have examined the two datasets we intend to use as our main data for the project, Football Events and Football transfers dataset.

Football Events

Our concerns about the data from the first inspection was correct. The last year for the collected data, 2017, is not complete but that was already known and we have discussed before how this is not a problem for the project. Another potential problem is that the positional data of the events is not precise. However, we believe we can handle that by manually mapping the different positions to some places on the pitch like discussed above. We also encountered another fault in the positional data that we did not know about before, namely that only about one third of the events have positional data. Luckily most moments of play such as goals have a location and we believe that it is enough for us to be able to complete the project.

Some interesting facts found is that most goals are scored in the bottom corners of the goal but most attempts are made in the centre.

Football transfers dataset

The dataset contains rows for every player transfer, loan, and retirement in both directions. Meaning that there are rows for both the "receiving" and "giving" clubs. It is hard to test the completeness of the dataset but since we are only interested in a few players it can easily be tested manually. Some false data discovered is that there are at least 13 transfers in Serie A with reported transfer fee amounts that are too high. The largest transfer knows is when Neymar transferred from FC Barcelona to Paris SG for 220 million Euro but some of the transfers in Serie A have up to 550 million Euro transfer fee amounts reported.

Some interesting facts and statistics from the data when retirements and end of loans are excluded:

  • Average transfers (including loans) per season is 1785,62 for the seasons 2009 - 2021.
  • The club acquiring most players (excluding loans and retirements) is Genoa CFC acquiring a total of 271 players between 2009 - 2021. The average amount of acquiring (excluding loans) per club and season is 5,72.
  • The player with most transfers (including loans) is Gaël Kakuta from France who did 13 transfers between 2009 - 2021. The last transfer was not between clubs but going from a loan to a contract within RC Lens. The average amount of transfers (including loans) per player and season is 0,14.
  • The player with most transfers (excluding loans) is Kevin-Prince Boateng who did 10 transfers between 2009 - 2021. The average amount of transfers (excluding loans) per player and season is 0,12.

Related work

There are several projects done on this topic, for example:

In our project we want to present the same data as above but from another perspective. We want to showcase the career of some key players in Europe in the last decade by following their transfers on a geographical map and then displaying the game events they trigger on the football pitch. We hope this will bring even more insights about some already well known players.

The inspiration we have taken is both from geographical data visualizations with interactions that often are interesting and entertaining to view and very specific, deep diving visualizations like the Beatles song example showcased in class.

None of the team's participants have used the data before in any class or have some prior experiences with it.

Milestone 2 (26th April, 5pm)

10% of the final grade

Milestone 3 (31st May, 5pm)

80% of the final grade

Owner

  • Name: Arran Øystein Kostveit Gabriel
  • Login: arrangabriel
  • Kind: user
  • Location: Trondheim, Norway

MSc in computer science @ NTNU | Developer @ Secunor

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • J-Holger (2)
Top Labels
Issue Labels
Pull Request Labels