tiktok-scraping

Full pipeline and implementation for the collection and analysis of TikTok videos and metadata with Python.

https://github.com/daliao15/tiktok-scraping

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Full pipeline and implementation for the collection and analysis of TikTok videos and metadata with Python.

Basic Info
  • Host: GitHub
  • Owner: DaliaO15
  • License: bsd-2-clause
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 1 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

TikTok sraping and video transcription

Full pipeline for Tiktok's video post and metadeta scraping and transcriptions analysis.

The project consists of three main parts: - Metadata collection, - Video downloading, - Transcribing and analysis.

The most important part of the project involves collecting metadata, specifically the links to each video per TikTok channel. The challenge here is that TikTok's platform undergoes frequent changes, making it difficult to access or locate the class that contains the video links. The file "metadataallvideos.py" was a functional solution as of May 31st, 2023, but you may need to make modifications when you use it (although you can take inspiration from it).

Tip: an alternative to scraping the links with Python is to use other scraping tools, such as Web Scraper io (extensions available for Chrome and Firefox).

Requirements

Create a new virtual environment and install all the necessary Python packages:

conda env create -f environment.yml conda activate tiktok_scraping_and_transcription

To run the scraper, you will need to have a web driver. You can download the Chrome driver from this link and the Firefox driver from this link. Personally, I used the Chrome driver for this project.

For the transcriptions and analysis, you will need to install the Whispers model and the spaCy model for the English language (or the language of the videos you're working with). You can find the installation instructions for Whispers here and for spaCy here.

```

Now install Whispers

Now install spicy

```

Demo of input, middle point, and ouput

How the input may look like:

The final data frame for author_XXX would look like:

A figure showing the first 20 most common nouns used in author_XXX's tiktoks:

License

  • Refer to the LICENSE file for details on the license.
  • The authors of this code not accept any responsibility for the misuse of it.
  • This project was conducted under certified ethical approval.

Cite this repo

@software{Ortiz_Pablo_Tiktok-scraping_2023, author = {Ortiz Pablo, Dalia}, license = {BSD-2}, month = jul, title = {{Tiktok-scraping}}, url = {https://github.com/DaliaO15/Tiktok-scraping}, version = {1.0.0}, year = {2023} }

Owner

  • Name: DaliaOP
  • Login: DaliaO15
  • Kind: user
  • Location: Uppsala, Sweden

Citation (CITATION.cff)

cff-version: 1.2.0
title: Tiktok-scraping
message: "If you use this software, please cite it using the metadata from this file."
type: software
authors:
  - given-names: Dalia
    family-names: Ortiz Pablo
    affiliation: CDHU, Uppsala University
repository-code: 'https://github.com/DaliaO15/Tiktok-scraping'
url: 'https://github.com/DaliaO15/Tiktok-scraping'
license: BSD-2
version: 1.0.0
date-released: '2023-07-13'

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
  • Fork event: 3
Last Year
  • Watch event: 1
  • Push event: 1
  • Fork event: 3