pytok

A web scraper for TikTok using Playwright

https://github.com/networkdynamics/pytok

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

data-collection tiktok tiktok-api tiktok-scraper web-scraper
Last synced: 6 months ago · JSON representation ·

Repository

A web scraper for TikTok using Playwright

Basic Info
  • Host: GitHub
  • Owner: networkdynamics
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 270 KB
Statistics
  • Stars: 82
  • Watchers: 9
  • Forks: 12
  • Open Issues: 8
  • Releases: 1
Topics
data-collection tiktok tiktok-api tiktok-scraper web-scraper
Created over 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Citation

README.md

DOI

pytok

This is a Playwright based version of David Teacher's unofficial api wrapper for TikTok.com in python. It re-implements a currently limited set of the features of the original library, with a shifted focus on using browser automation to allow automatic captcha solves with a hopefully minor trade-off in performance.

Installation

bash pip install git+https://github.com/networkdynamics/pytok.git@master

Quick Start Guide

Here's a quick bit of code to get the videos from a particular hashtag on TikTok. There's more examples in the examples directory.

```py import asyncio

from pytok.tiktok import PyTok

async def main(): async with PyTok() as api: user = api.user(username="therock") userdata = await user.info() print(userdata)

    videos = []
    async for video in user.videos():
        video_data = video.info()
        print(video_data)

if name == "main": asyncio.run(main()) ```

Please note pulling data from TikTok takes a while! We recommend leaving the scripts running on a server for a while for them to finish downloading everything. Feel free to play around with the delay constants to either speed up the process or avoid TikTok rate limiting, like so: PyTok(request_delay=10)

Please do not hesitate to make an issue in this repo to get our help with this!

Citation

If you use this library in your research, please cite it using the following BibTeX entry:

bibtex @software{ben_steel_2024_12802714, author = {Ben Steel and Alexei Abrahams}, title = {{networkdynamics/pytok: Initial working version of library}}, month = jul, year = 2024, publisher = {Zenodo}, version = {v0.1.0}, doi = {10.5281/zenodo.12802714}, url = {https://doi.org/10.5281/zenodo.12802714} }

Format and Schema

The JSONable dictionary returned by the info() methods contains all of the data that the TikTok API returns. We have provided helper functions to parse that data into Pandas DataFrames, utils.get_comment_df(), utils.get_video_df() and utils.get_user_df() for the data from comments, videos, and users respectively.

The video dataframe will contain the following columns: |Field name | Description | |----------|----------| |video_id| Unique video ID | |createtime| UTC datetime of video creation time in YYYY-MM-DD HH:MM:SS format | |author_name| Unique author name | |author_id| Unique author ID | |desc| The full video description from the author | |hashtags| A list of hashtags used in the video description | |share_video_id| If the video is sharing another video, this is the video ID of that original video, else empty | |share_video_user_id| If the video is sharing another video, this the user ID of the author of that video, else empty | |share_video_user_name| If the video is sharing another video, this is the user name of the author of that video, else empty | |share_type| If the video is sharing another video, this is the type of the share, stitch, duet etc. | |mentions| A list of users mentioned in the video description, if any | |digg_count| The number of likes on the video | |share_count| The number of times the video was shared | |comment_count| The number of comments on the video | |play_count| The number of times the video was played |

The comment dataframe will contain the following columns: |Field name | Description | |----------|-----------| |comment_id| Unique comment ID | |createtime| UTC datetime of comment creation time in YYYY-MM-DD HH:MM:SS format | |author_name| Unique author name | |author_id| Unique author ID | |text| Text of the comment | |mentions| A list of users that are tagged in the comment | |video_id| The ID of the video the comment is on | |comment_language| The language of the comment, as predicted by the TikTok API | |digg_count| The number of likes the comment got | |reply_comment_id| If the comment is replying to another comment, this is the ID of that comment |

The user dataframe will contain the following columns: |Field name | Description | |----------|-----------| |id| Unique author ID | |unique_id| Unique user name | |nickname| Display user name, changeable | |signature| Short user description | |verified| Whether or not the user is verified | |num_following| How many other accounts the user is following | |num_followers| How many followers the user has | |num_videos| How many videos the user has made | |num_likes| How many total likes the user has had | |createtime| When the user account was made. This is derived from the id field, and can occasionally be incorrect with a very low unix epoch such as 1971 |

Owner

  • Name: McGill Network Dynamics Lab
  • Login: networkdynamics
  • Kind: organization
  • Location: Montreal, Canada

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Steel"
  given-names: "Ben"
  orcid: "https://orcid.org/0009-0006-3845-1394"
- family-names: "Abrahams"
  given-names: "Alexei"
  orcid: "https://orcid.org/0000-0002-6547-072X"
title: "PyTok"
version: 0.1.0
doi: 10.5281/zenodo.12802714
date-released: 2024-07-23
url: "https://github.com/networkdynamics/pytok"

GitHub Events

Total
  • Issues event: 7
  • Watch event: 40
  • Issue comment event: 4
  • Push event: 16
  • Fork event: 15
Last Year
  • Issues event: 7
  • Watch event: 40
  • Issue comment event: 4
  • Push event: 16
  • Fork event: 15

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Total issue authors: 3
  • Total pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: about 1 month
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • yanivRnDhuji (2)
  • shakibyzn (2)
  • ketanmalempati (1)
  • KuntilBogel (1)
  • xiari0703 (1)
  • ghost (1)
  • moha-ep (1)
  • heyeanne34 (1)
  • NguyenDoCong (1)
  • toptierprojections (1)
  • michaelcyshield (1)
Pull Request Authors
  • Person-Account (1)
  • Dr-Yes (1)
Top Labels
Issue Labels
Pull Request Labels