tiktok-content-scraper
TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author, music, file, hashtags, content, interactions etc.
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Repository
TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author, music, file, hashtags, content, interactions etc.
Basic Info
Statistics
- Stars: 37
- Watchers: 3
- Forks: 9
- Open Issues: 1
- Releases: 3
Metadata Files
README.md
What is it?
This scraper allows you to download both TikTok videos and slides without an official API key. Additionally, it can scrape approximately 100 metadata fields related to the video, author, music, video file, and hashtags. The scraper is built as a Python class and can be inherited by a custom parent class, allowing for easy integration with databases or other systems.
Features
- Download TikTok videos (mp4) and slides (jpeg's + mp3).
- Scrape extensive metadata.
- Customizable and extendable via inheritance.
- Supports batch processing and progress tracking. > New Feature = Author metadata scraping!
Usage
Setup
Clone the Repository:
bash git clone https://github.com/Q-Bukold/TikTok-Content-Scraper.gitInstall All Dependencies in the Requirements File:
bash pip install -r requirements.txtRun the Example Script:
bash python3 example_script.py
Scrape a single video or slide
To scrape the metadata and content of a video, the TikTok ID is required. It can be found in the URL of a video. Let's use the ID 7460303767968156958 to scrape the associated video.
```python from TTScraper import TTScraper
Configure the scraper, this step is always needed
tt = TTScraper(waittime=0.3, outputfilesfp="data/")
Download all metadata as a .json and all content as .mp4/.jpeg
tt.scrape(id = 7460303767968156958, scrapecontent = True, downloadmetadata = True, download_content = True)
```
Scrape a single user profile
To scrape the metadata of a user, the TikTok username is required (with or without an @). It can be found in the URL of a user profile. Let's use the ID insidecdu to scrape the associated user profile.
```python from TTScraper import TTScraper
Configure the scraper, this step is always needed
tt = TTScraper(waittime=0.3, outputfilesfp="data/")
scrape user profile
tt.scrapeuser(username="insidecdu", downloadmetadata=True)
```
Scrape multiple videos and slides
You can also scrape a list of IDs with the following code. The scraper detects on it's own, if the content is a Slide or Video.
```python import pandas as pd from TTScraper import TTScraper
Configure the scraper, this step is always needed
tt = TTScraper(waittime=0.3, outputfilesfp="data/")
Define list of TikTok ids (ids can be string or integer)
data = pd.readcsv("data/seedlist.csv") mylist = data["ids"].tolist()
Insert list into scraper
tt.scrapelist(ids = mylist, scrapecontent = True, batchsize = None, clear_console = True) ```
The scrape_list function provides a useful overview of your progress. Enable clear_console to clear the terminal output after every scrape. Note that clear_console does not work on Windows machines.
``` Queue Information: Current Queue: 691 / 163,336 Errors in a row: 0 1.10 iteration time 2.89 sec. per video (averaged) ETA (current queue): 5 days, 10:23:19
-> id 7359982080861703457 -> is slide with 17 pictures
```
Scrape multiple user profiles
Development in progress...
Citation
Bukold, Q. (2025). TikTok Content Scraper (Version 1.0) [Computer software]. Weizenbaum Institute. https://doi.org/10.34669/WI.RD/4
Advanced Usage
Alternatives to saving the data on drive
The scraper can download metadata and content (video file, images) as well as return them as variables. Metadata is returned as a dictionary or saved as a .json file, and content is saved as .mp4 / .jpeg + .mp3 or returned as an array of binaries. Remember the rule: what is not downloaded is returned.
```python from TTScraper import TTScraper
Configure the scraper, this step is always needed
tt = TTScraper(waittime=0.3, outputfilesfp="data/")
Downloading Everything
tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = True, download_content = True)
Returning Everything
metadata, content = tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = False, download_content = False)
Returning one of the two and downloading the other
metadata = tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = False, download_content = True) ```
Alternatives to saving the data on the drive II: Overwriting the downloaddata function
Changing the output of scrape_list() is a bit more difficult, but can be achieved by overwriting a function called \_download_data() that is part of the TT_Scraper class. To overwrite the function, one must inherit the class. The variable metadata_batch is a list of dictionaries, each containing all the metadata of a video/slide as well as the binary content of a video/slide.
Let's save the content, but insert the metadata into a database: ```python from TTScraper import TTScraper
create a new class, that inherits the TT_Scraper
class TTScraperDB(TTScraper): def _init(self, waittime = 0.35, outputfiles_fp = "data/"): super().init_(waittime, outputfilesfp)
# overwriting download_data function to upsert metadata into database
def _download_data(self, metadata_batch, download_metadata = True, download_content = True):
for metadata_package in metadata_batch:
# insert metadata into database
self.insert_metadata_to_db(metadata_package)
# downloading content
super()._download_data(metadata_batch, download_metadata=False, download_content=True)
def insert_metadata_to_db(metadata_package)
...
return None
tt = TTScraperDB(waittime = 0.35, outputfilesfp = "data/") tt.scrapelist(my_list) ```
Owner
- Name: Quentin Bukold
- Login: Q-Bukold
- Kind: user
- Repositories: 2
- Profile: https://github.com/Q-Bukold
Student Uni-Hildesheim B.A. Digitale Sozialwissenschaften
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Bukold"
given-names: "Quentin"
title: "TikTok-Content-Scraper"
version: 1.0
date-released: 2025-02-12
identifiers:
- type: doi
value: https://doi.org/10.34669/WI.RD/4
url: "https://www.weizenbaum-library.de/handle/id/814"
GitHub Events
Total
- Create event: 10
- Release event: 4
- Issues event: 4
- Watch event: 35
- Delete event: 4
- Member event: 1
- Issue comment event: 5
- Public event: 1
- Push event: 76
- Pull request event: 15
- Fork event: 11
Last Year
- Create event: 10
- Release event: 4
- Issues event: 4
- Watch event: 35
- Delete event: 4
- Member event: 1
- Issue comment event: 5
- Public event: 1
- Push event: 76
- Pull request event: 15
- Fork event: 11
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 7
- Average time to close issues: about 13 hours
- Average time to close pull requests: about 10 hours
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 1.5
- Average comments per pull request: 0.29
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 7
- Average time to close issues: about 13 hours
- Average time to close pull requests: about 10 hours
- Issue authors: 2
- Pull request authors: 3
- Average comments per issue: 1.5
- Average comments per pull request: 0.29
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tomasruizt (1)
- Q-Bukold (1)
Pull Request Authors
- Q-Bukold (5)
- FeLoe (1)
- mrtn3000 (1)