youtube-comments
This project will allow you to download and analyze textual data from video comments
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary
Repository
This project will allow you to download and analyze textual data from video comments
Basic Info
- Host: GitHub
- Owner: fbietti
- Language: R
- Default Branch: main
- Size: 2.56 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
YouTube Comments
This project will allow you to download and analyze textual data from video comments
This code will allow you to download comments from YouTube videos for your textual analyses, lexicometry, and NLP (Natural Language Processing). It's a very simple code.
The function has been coden in python. Afterwards, I provided an example of analysis using R.
File: youtube_comments.py
The file 'youtube_comments.py' contains the function to download the relevant content, namely: the comment content, date, and author.
File: exemple.py
The 'example.py' file is an example of how to use the code. I downloaded the comments from the video of the song 'pa tipos como tu' by Shakira and Bizarrap. I chose this song because it has a lot of comments, which demonstrates the power of the function.
This file shows how to transform lists into a data frame and then save it as a .csv file.
Once you have the database in the form of a .csv file, you can use it to conduct your analyses
File: cleaning.R
The file 'cleaning.R' contains some rather simple manipulations in R to clean the database and standardize the date format. You will also find two lines to remove empty lines and emojis. In my case, emojis were not very important, so it made sense to delete them.
File: sentiment_analysis.R
This file contains lines to perform sentiment analysis. First, we obtain the sentences. Then, we calculate the sentiment associated with the sentences using the 'sentiment' function from the 'sentimentr' package. I chose this function because it has a lexicon in Spanish (as most of the comments are in Spanish). After some manipulations, I have included a graph to visualize the result.
First sentiment analysis plot:

Second sentiment analysis plot:

File: celanfonction.R
The file 'cleanfunction.R' contains a function to clean character string vectors. It will be useful for cleaning comments.
File: topicanalysis.R
This file contains commands to conduct an analysis to highlight the main themes that appear in the corpus. To do this, I am using several commands from the quanteda package.
File: topicanalysis_wordassociation.R
This file presents a few commands to find word associations using functions from the tm package.
Second sentiment analysis plot:

Owner
- Login: fbietti
- Kind: user
- Repositories: 1
- Profile: https://github.com/fbietti
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Bietti
given-names: Federico
orcid: https://orcid.org/0000-0002-3912-3951
title: "YouTube comments scraping code"
version:
identifiers:
- type:
value:
date-released: 2023-12-14
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3