tubecleanr
(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Repository
(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML
Basic Info
- Host: GitHub
- Owner: gesiscss
- License: other
- Language: R
- Default Branch: main
- Homepage: https://gesiscss.github.io/tubecleanR/
- Size: 7.39 MB
Statistics
- Stars: 2
- Watchers: 7
- Forks: 1
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
tubecleanR

This is a mini R package for cleaning and preprocess YouTube comment data collected with the R packages tuber or vosonSML.
The package is a collection of functions that were developed during several workshops on collecting and analyzing YouTube data at GESIS - Leibniz Institute for the Social Sciences. The main function of the package is parse_yt_comments() which takes a dataframe containing YouTube comments collected with tuber or vosonSML as input and outputs a processed dataframe in which URLs/links, video timestamps user mentions, emoticons, and emoji have been extracted from the comments into separate columns. In addition to this, the function creates a column containing textual descriptions of the emoji, and another one containing a cleaned version of the comment in which the elements listed before as well as numbers and punctuation have been removed.
Please note: The functions in this package are heavily dependent on the structure of the data exports from tuber and vosonSML, and, by extension, the structure of the YouTube API. You can find an introduction to the YouTube Data API in the GESIS Guide on Digital Behavioral Data on "How to Collect Data with the YouTube Data API".
If you are interested in becoming a maintainer of this package, feel free to contact us.
1) Installation
R
# GitHub version
library(remotes)
remotes::install_github("gesiscss/tubecleanR")
2) Demo data
We have created some simulated YouTube comment data in the tuber and vosonSML formats that is included in this package.
```R
Attaching package
library(tubecleanR)
Checking example comments bundled with the package
View(tuberComments) View(vosonComments)
Parsing comments
tuberparsed <- parseytcomments(tuberComments) vosonparsed <- parseytcomments(vosonComments)
Checking parsed versions of example comments
View(tuberparsed) View(Vosonparsed) ```
3) Using your own data
The parse_yt_comments() function is meant to be used for YouTube comment data collected with the get_all_comments() function from tuber or the Collect() function from vosonSML. Both of those require access credentials for the YouTube API. Check the documentation of those two packages for further details.
If you want to learn more about getting access to the YouTube API, collecting comment (and other) data from the API using R, and processing and exploring the resulting data, you can also check out the materials from our workshop.
4) Citation
If you are using this package in your research, please cite it as follows:
R
citation("tubecleanR")
```R To cite package ‘tubecleanR’ in publications use:
Kohne, J., & Breuer, J. (2024). tubecleanR: Parsing and Preprocessing YouTube Comment Data. R package version 0.1.0. https://gesiscss.github.io/tubecleanR/.
A BibTeX entry for LaTeX users is
@Manual{, title = {tubecleanR: Parsing and Preprocessing YouTube Comment Data}, author = {Julian Kohne and Johannes Breuer}, year = {2024}, note = {R package version 0.1.0}, url = {https://gesiscss.github.io/tubecleanR/}, } ```
Owner
- Name: GESIS – Leibniz Institute for the Social Sciences
- Login: gesiscss
- Kind: organization
- Location: Cologne, Germany
- Website: https://www.gesis.org/en/institute/departments/computational-social-science
- Repositories: 46
- Profile: https://github.com/gesiscss
Department Computational Social Science
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: tubecleanR
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Julian
family-names: Kohne
affiliation: GESIS - Leibniz Institute for the Social Sciences
orcid: 'https://orcid.org/0000-0002-6710-7545'
- given-names: Johannes
family-names: Breuer
email: johannes.breuer@gesis.org
affiliation: GESIS - Leibniz Institute for the Social Sciences
orcid: 'https://orcid.org/0000-0001-5906-7873'
repository-code: 'https://github.com/gesiscss/tubecleanR'
url: 'https://gesiscss.github.io/tubecleanR/'
keywords:
- R
- YouTube
license: CC-BY-4.0
GitHub Events
Total
- Watch event: 1
- Push event: 1
Last Year
- Watch event: 1
- Push event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 1
- Average time to close issues: about 21 hours
- Average time to close pull requests: about 1 hour
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- schochastics (1)
- chainsawriot (1)
Pull Request Authors
- chainsawriot (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- JamesIves/github-pages-deploy-action v4.5.0 composite
- actions/checkout v4 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- anytime * imports
- qdapRegex * imports
- stringi * imports
- utils * imports
- testthat >= 3.0.0 suggests