youtube-comments

This project will allow you to download and analyze textual data from video comments

https://github.com/fbietti/youtube-comments

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

This project will allow you to download and analyze textual data from video comments

Basic Info

Host: GitHub
Owner: fbietti
Language: R
Default Branch: main
Size: 2.56 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

YouTube Comments

This project will allow you to download and analyze textual data from video comments

This code will allow you to download comments from YouTube videos for your textual analyses, lexicometry, and NLP (Natural Language Processing). It's a very simple code.

The function has been coden in python. Afterwards, I provided an example of analysis using R.

File: youtube_comments.py

The file 'youtube_comments.py' contains the function to download the relevant content, namely: the comment content, date, and author.

File: exemple.py

The 'example.py' file is an example of how to use the code. I downloaded the comments from the video of the song 'pa tipos como tu' by Shakira and Bizarrap. I chose this song because it has a lot of comments, which demonstrates the power of the function.

This file shows how to transform lists into a data frame and then save it as a .csv file.

Once you have the database in the form of a .csv file, you can use it to conduct your analyses

File: cleaning.R

The file 'cleaning.R' contains some rather simple manipulations in R to clean the database and standardize the date format. You will also find two lines to remove empty lines and emojis. In my case, emojis were not very important, so it made sense to delete them.

File: sentiment_analysis.R

This file contains lines to perform sentiment analysis. First, we obtain the sentences. Then, we calculate the sentiment associated with the sentences using the 'sentiment' function from the 'sentimentr' package. I chose this function because it has a lexicon in Spanish (as most of the comments are in Spanish). After some manipulations, I have included a graph to visualize the result.

First sentiment analysis plot: alt text

Second sentiment analysis plot: alt text

File: celanfonction.R

The file 'cleanfunction.R' contains a function to clean character string vectors. It will be useful for cleaning comments.

File: topicanalysis.R

This file contains commands to conduct an analysis to highlight the main themes that appear in the corpus. To do this, I am using several commands from the quanteda package.

File: topicanalysis_wordassociation.R

This file presents a few commands to find word associations using functions from the tm package.

Second sentiment analysis plot: alt text

Owner

Login: fbietti
Kind: user

Repositories: 1
Profile: https://github.com/fbietti

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Bietti
    given-names: Federico
    orcid: https://orcid.org/0000-0002-3912-3951
title: "YouTube comments scraping code"
version: 
identifiers:
  - type: 
    value: 
date-released: 2023-12-14

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

youtube-comments

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

YouTube Comments

File: youtube_comments.py

File: exemple.py

File: cleaning.R

File: sentiment_analysis.R

File: celanfonction.R

File: topicanalysis.R

File: topicanalysis_wordassociation.R

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year