tubecleanr

(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML

https://github.com/gesiscss/tubecleanr

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

preprocessing r tuber vosonsml youtube
Last synced: 4 months ago · JSON representation ·

Repository

(Mini) R package for preprocessing YouTube comment data collected with tuber or vosonSML

Basic Info
Statistics
  • Stars: 2
  • Watchers: 7
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Topics
preprocessing r tuber vosonsml youtube
Created almost 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

tubecleanR

tubecleanR Sticker

This is a mini R package for cleaning and preprocess YouTube comment data collected with the R packages tuber or vosonSML. The package is a collection of functions that were developed during several workshops on collecting and analyzing YouTube data at GESIS - Leibniz Institute for the Social Sciences. The main function of the package is parse_yt_comments() which takes a dataframe containing YouTube comments collected with tuber or vosonSML as input and outputs a processed dataframe in which URLs/links, video timestamps user mentions, emoticons, and emoji have been extracted from the comments into separate columns. In addition to this, the function creates a column containing textual descriptions of the emoji, and another one containing a cleaned version of the comment in which the elements listed before as well as numbers and punctuation have been removed.

Please note: The functions in this package are heavily dependent on the structure of the data exports from tuber and vosonSML, and, by extension, the structure of the YouTube API. You can find an introduction to the YouTube Data API in the GESIS Guide on Digital Behavioral Data on "How to Collect Data with the YouTube Data API".

If you are interested in becoming a maintainer of this package, feel free to contact us.

1) Installation

R # GitHub version library(remotes) remotes::install_github("gesiscss/tubecleanR")

2) Demo data

We have created some simulated YouTube comment data in the tuber and vosonSML formats that is included in this package.

```R

Attaching package

library(tubecleanR)

Checking example comments bundled with the package

View(tuberComments) View(vosonComments)

Parsing comments

tuberparsed <- parseytcomments(tuberComments) vosonparsed <- parseytcomments(vosonComments)

Checking parsed versions of example comments

View(tuberparsed) View(Vosonparsed) ```

3) Using your own data

The parse_yt_comments() function is meant to be used for YouTube comment data collected with the get_all_comments() function from tuber or the Collect() function from vosonSML. Both of those require access credentials for the YouTube API. Check the documentation of those two packages for further details.

If you want to learn more about getting access to the YouTube API, collecting comment (and other) data from the API using R, and processing and exploring the resulting data, you can also check out the materials from our workshop.

4) Citation

If you are using this package in your research, please cite it as follows:

R citation("tubecleanR")

```R To cite package ‘tubecleanR’ in publications use:

Kohne, J., & Breuer, J. (2024). tubecleanR: Parsing and Preprocessing YouTube Comment Data. R package version 0.1.0. https://gesiscss.github.io/tubecleanR/.

A BibTeX entry for LaTeX users is

@Manual{, title = {tubecleanR: Parsing and Preprocessing YouTube Comment Data}, author = {Julian Kohne and Johannes Breuer}, year = {2024}, note = {R package version 0.1.0}, url = {https://gesiscss.github.io/tubecleanR/}, } ```

Owner

  • Name: GESIS – Leibniz Institute for the Social Sciences
  • Login: gesiscss
  • Kind: organization
  • Location: Cologne, Germany

Department Computational Social Science

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: tubecleanR
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Julian
    family-names: Kohne
    affiliation: GESIS - Leibniz Institute for the Social Sciences
    orcid: 'https://orcid.org/0000-0002-6710-7545'
  - given-names: Johannes
    family-names: Breuer
    email: johannes.breuer@gesis.org
    affiliation: GESIS - Leibniz Institute for the Social Sciences
    orcid: 'https://orcid.org/0000-0001-5906-7873'
repository-code: 'https://github.com/gesiscss/tubecleanR'
url: 'https://gesiscss.github.io/tubecleanR/'
keywords:
  - R
  - YouTube
license: CC-BY-4.0

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
Last Year
  • Watch event: 1
  • Push event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 2
  • Total pull requests: 1
  • Average time to close issues: about 21 hours
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • schochastics (1)
  • chainsawriot (1)
Pull Request Authors
  • chainsawriot (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.5.0 composite
  • actions/checkout v4 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION cran
  • anytime * imports
  • qdapRegex * imports
  • stringi * imports
  • utils * imports
  • testthat >= 3.0.0 suggests