tiktok-content-scraper

TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author, music, file, hashtags, content, interactions etc.

https://github.com/q-bukold/tiktok-content-scraper

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

TikTok Content Scraper -- No API-Key needed, minimal dependencies, citable | Download videos (MP4), slides (JPEG) and metadata of author, music, file, hashtags, content, interactions etc.

Basic Info

Host: GitHub
Owner: Q-Bukold
License: other
Language: Python
Default Branch: main
Homepage:
Size: 1.02 MB

Statistics

Stars: 37
Watchers: 3
Forks: 9
Open Issues: 1
Releases: 3

Created over 1 year ago · Last pushed 9 months ago

Metadata Files

Readme License Citation

What is it?

This scraper allows you to download both TikTok videos and slides without an official API key. Additionally, it can scrape approximately 100 metadata fields related to the video, author, music, video file, and hashtags. The scraper is built as a Python class and can be inherited by a custom parent class, allowing for easy integration with databases or other systems.

Features

Download TikTok videos (mp4) and slides (jpeg's + mp3).
Scrape extensive metadata.
Customizable and extendable via inheritance.
Supports batch processing and progress tracking. > New Feature = Author metadata scraping!

Usage

Setup

Clone the Repository: bash git clone https://github.com/Q-Bukold/TikTok-Content-Scraper.git
Install All Dependencies in the Requirements File: bash pip install -r requirements.txt
Run the Example Script: bash python3 example_script.py

Scrape a single video or slide

To scrape the metadata and content of a video, the TikTok ID is required. It can be found in the URL of a video. Let's use the ID 7460303767968156958 to scrape the associated video.

```python from TTScraper import TTScraper

Configure the scraper, this step is always needed

tt = TTScraper(waittime=0.3, outputfilesfp="data/")

Download all metadata as a .json and all content as .mp4/.jpeg

tt.scrape(id = 7460303767968156958, scrapecontent = True, downloadmetadata = True, download_content = True)

```

Scrape a single user profile

To scrape the metadata of a user, the TikTok username is required (with or without an @). It can be found in the URL of a user profile. Let's use the ID insidecdu to scrape the associated user profile.

```python from TTScraper import TTScraper

Configure the scraper, this step is always needed

tt = TTScraper(waittime=0.3, outputfilesfp="data/")

scrape user profile

tt.scrapeuser(username="insidecdu", downloadmetadata=True)

```

Scrape multiple videos and slides

You can also scrape a list of IDs with the following code. The scraper detects on it's own, if the content is a Slide or Video.

```python import pandas as pd from TTScraper import TTScraper

Configure the scraper, this step is always needed

tt = TTScraper(waittime=0.3, outputfilesfp="data/")

Define list of TikTok ids (ids can be string or integer)

data = pd.readcsv("data/seedlist.csv") mylist = data["ids"].tolist()

Insert list into scraper

tt.scrapelist(ids = mylist, scrapecontent = True, batchsize = None, clear_console = True) ```

The scrape_list function provides a useful overview of your progress. Enable clear_console to clear the terminal output after every scrape. Note that clear_console does not work on Windows machines.

``` Queue Information: Current Queue: 691 / 163,336 Errors in a row: 0 1.10 iteration time 2.89 sec. per video (averaged) ETA (current queue): 5 days, 10:23:19

-> id 7359982080861703457 -> is slide with 17 pictures

```

Scrape multiple user profiles

Development in progress...

Citation

Bukold, Q. (2025). TikTok Content Scraper (Version 1.0) [Computer software]. Weizenbaum Institute. https://doi.org/10.34669/WI.RD/4

Advanced Usage

Alternatives to saving the data on drive

The scraper can download metadata and content (video file, images) as well as return them as variables. Metadata is returned as a dictionary or saved as a .json file, and content is saved as .mp4 / .jpeg + .mp3 or returned as an array of binaries. Remember the rule: what is not downloaded is returned.

```python from TTScraper import TTScraper

Configure the scraper, this step is always needed

tt = TTScraper(waittime=0.3, outputfilesfp="data/")

Downloading Everything

tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = True, download_content = True)

Returning Everything

metadata, content = tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = False, download_content = False)

Returning one of the two and downloading the other

metadata = tt.scrape( id = 7460303767968156958, scrapecontent = True, downloadmetadata = False, download_content = True) ```

Alternatives to saving the data on the drive II: Overwriting the downloaddata function

Changing the output of scrape_list() is a bit more difficult, but can be achieved by overwriting a function called \_download_data() that is part of the TT_Scraper class. To overwrite the function, one must inherit the class. The variable metadata_batch is a list of dictionaries, each containing all the metadata of a video/slide as well as the binary content of a video/slide.

Let's save the content, but insert the metadata into a database: ```python from TTScraper import TTScraper

create a new class, that inherits the TT_Scraper

class TTScraperDB(TTScraper): def _init(self, waittime = 0.35, outputfiles_fp = "data/"): super().init_(waittime, outputfilesfp)

# overwriting download_data function to upsert metadata into database
def _download_data(self, metadata_batch, download_metadata = True, download_content = True):

    for metadata_package in metadata_batch:
        # insert metadata into database
        self.insert_metadata_to_db(metadata_package)

    # downloading content
    super()._download_data(metadata_batch, download_metadata=False, download_content=True)

def insert_metadata_to_db(metadata_package)
    ...
    return None

tt = TTScraperDB(waittime = 0.35, outputfilesfp = "data/") tt.scrapelist(my_list) ```

Owner

Name: Quentin Bukold
Login: Q-Bukold
Kind: user

Repositories: 2
Profile: https://github.com/Q-Bukold

Student Uni-Hildesheim B.A. Digitale Sozialwissenschaften

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Bukold"
  given-names: "Quentin"
title: "TikTok-Content-Scraper"
version: 1.0
date-released: 2025-02-12
identifiers:
  - type: doi
    value: https://doi.org/10.34669/WI.RD/4
url: "https://www.weizenbaum-library.de/handle/id/814"

GitHub Events

Total

Create event: 10
Release event: 4
Issues event: 4
Watch event: 35
Delete event: 4
Member event: 1
Issue comment event: 5
Public event: 1
Push event: 76
Pull request event: 15
Fork event: 11

Last Year

Create event: 10
Release event: 4
Issues event: 4
Watch event: 35
Delete event: 4
Member event: 1
Issue comment event: 5
Public event: 1
Push event: 76
Pull request event: 15
Fork event: 11

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 7
Average time to close issues: about 13 hours
Average time to close pull requests: about 10 hours
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 0.29
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 7
Average time to close issues: about 13 hours
Average time to close pull requests: about 10 hours
Issue authors: 2
Pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 0.29
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

tiktok-content-scraper

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

What is it?

Features

Usage

Setup

Scrape a single video or slide

Configure the scraper, this step is always needed

Download all metadata as a .json and all content as .mp4/.jpeg

Scrape a single user profile

Configure the scraper, this step is always needed

scrape user profile

Scrape multiple videos and slides

Configure the scraper, this step is always needed

Define list of TikTok ids (ids can be string or integer)

Insert list into scraper

Scrape multiple user profiles

Citation

Advanced Usage

Alternatives to saving the data on drive

Configure the scraper, this step is always needed

Downloading Everything

Returning Everything

Returning one of the two and downloading the other

Alternatives to saving the data on the drive II: Overwriting the downloaddata function

create a new class, that inherits the TT_Scraper

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels