youtube-comments

This project will allow you to download and analyze textual data from video comments

https://github.com/fbietti/youtube-comments

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This project will allow you to download and analyze textual data from video comments

Basic Info
  • Host: GitHub
  • Owner: fbietti
  • Language: R
  • Default Branch: main
  • Size: 2.56 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

YouTube Comments

This project will allow you to download and analyze textual data from video comments

This code will allow you to download comments from YouTube videos for your textual analyses, lexicometry, and NLP (Natural Language Processing). It's a very simple code.

The function has been coden in python. Afterwards, I provided an example of analysis using R.

File: youtube_comments.py

The file 'youtube_comments.py' contains the function to download the relevant content, namely: the comment content, date, and author.

File: exemple.py

The 'example.py' file is an example of how to use the code. I downloaded the comments from the video of the song 'pa tipos como tu' by Shakira and Bizarrap. I chose this song because it has a lot of comments, which demonstrates the power of the function.

This file shows how to transform lists into a data frame and then save it as a .csv file.

Once you have the database in the form of a .csv file, you can use it to conduct your analyses

File: cleaning.R

The file 'cleaning.R' contains some rather simple manipulations in R to clean the database and standardize the date format. You will also find two lines to remove empty lines and emojis. In my case, emojis were not very important, so it made sense to delete them.

File: sentiment_analysis.R

This file contains lines to perform sentiment analysis. First, we obtain the sentences. Then, we calculate the sentiment associated with the sentences using the 'sentiment' function from the 'sentimentr' package. I chose this function because it has a lexicon in Spanish (as most of the comments are in Spanish). After some manipulations, I have included a graph to visualize the result.

First sentiment analysis plot: alt text

Second sentiment analysis plot: alt text

File: celanfonction.R

The file 'cleanfunction.R' contains a function to clean character string vectors. It will be useful for cleaning comments.

File: topicanalysis.R

This file contains commands to conduct an analysis to highlight the main themes that appear in the corpus. To do this, I am using several commands from the quanteda package.

File: topicanalysis_wordassociation.R

This file presents a few commands to find word associations using functions from the tm package.

Second sentiment analysis plot: alt text

Owner

  • Login: fbietti
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Bietti
    given-names: Federico
    orcid: https://orcid.org/0000-0002-3912-3951
title: "YouTube comments scraping code"
version: 
identifiers:
  - type: 
    value: 
date-released: 2023-12-14 

GitHub Events

Total
  • Push event: 3
Last Year
  • Push event: 3