tiktok-scraping
Full pipeline and implementation for the collection and analysis of TikTok videos and metadata with Python.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Repository
Full pipeline and implementation for the collection and analysis of TikTok videos and metadata with Python.
Basic Info
- Host: GitHub
- Owner: DaliaO15
- License: bsd-2-clause
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 4
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
TikTok sraping and video transcription
Full pipeline for Tiktok's video post and metadeta scraping and transcriptions analysis.

The project consists of three main parts: - Metadata collection, - Video downloading, - Transcribing and analysis.
The most important part of the project involves collecting metadata, specifically the links to each video per TikTok channel. The challenge here is that TikTok's platform undergoes frequent changes, making it difficult to access or locate the class that contains the video links. The file "metadataallvideos.py" was a functional solution as of May 31st, 2023, but you may need to make modifications when you use it (although you can take inspiration from it).
Tip: an alternative to scraping the links with Python is to use other scraping tools, such as Web Scraper io (extensions available for Chrome and Firefox).
Requirements
Create a new virtual environment and install all the necessary Python packages:
conda env create -f environment.yml
conda activate tiktok_scraping_and_transcription
To run the scraper, you will need to have a web driver. You can download the Chrome driver from this link and the Firefox driver from this link. Personally, I used the Chrome driver for this project.
For the transcriptions and analysis, you will need to install the Whispers model and the spaCy model for the English language (or the language of the videos you're working with). You can find the installation instructions for Whispers here and for spaCy here.
```
Now install Whispers
Now install spicy
```
Demo of input, middle point, and ouput
How the input may look like:

The final data frame for author_XXX would look like:

A figure showing the first 20 most common nouns used in author_XXX's tiktoks:

License
- Refer to the LICENSE file for details on the license.
- The authors of this code not accept any responsibility for the misuse of it.
- This project was conducted under certified ethical approval.
Cite this repo
@software{Ortiz_Pablo_Tiktok-scraping_2023,
author = {Ortiz Pablo, Dalia},
license = {BSD-2},
month = jul,
title = {{Tiktok-scraping}},
url = {https://github.com/DaliaO15/Tiktok-scraping},
version = {1.0.0},
year = {2023}
}
Owner
- Name: DaliaOP
- Login: DaliaO15
- Kind: user
- Location: Uppsala, Sweden
- Repositories: 6
- Profile: https://github.com/DaliaO15
Citation (CITATION.cff)
cff-version: 1.2.0
title: Tiktok-scraping
message: "If you use this software, please cite it using the metadata from this file."
type: software
authors:
- given-names: Dalia
family-names: Ortiz Pablo
affiliation: CDHU, Uppsala University
repository-code: 'https://github.com/DaliaO15/Tiktok-scraping'
url: 'https://github.com/DaliaO15/Tiktok-scraping'
license: BSD-2
version: 1.0.0
date-released: '2023-07-13'
GitHub Events
Total
- Watch event: 1
- Push event: 1
- Fork event: 3
Last Year
- Watch event: 1
- Push event: 1
- Fork event: 3