4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

https://github.com/digitalmethodsinitiative/4cat

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary

Keywords

digitalmethods python scraping social-media textanalysis
Last synced: 6 months ago · JSON representation

Repository

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.

Basic Info
  • Host: GitHub
  • Owner: digitalmethodsinitiative
  • License: other
  • Language: Python
  • Default Branch: master
  • Homepage: https://4cat.nl
  • Size: 68.5 MB
Statistics
  • Stars: 326
  • Watchers: 14
  • Forks: 64
  • Open Issues: 68
  • Releases: 29
Topics
digitalmethods python scraping social-media textanalysis
Created over 7 years ago · Last pushed 6 months ago
Metadata Files
Readme License Security Zenodo

README.md

4CAT: Capture and Analysis Toolkit

DOI: 10.5117/CCR2022.2.007.HAGE DOI: 10.5281/zenodo.4742622 License: MPL 2.0 Requires Python 3.11 Docker image status

A screenshot of 4CAT, displaying its 'Create Dataset' interfaceA screenshot of 4CAT, displaying a network visualisation of a dataset

4CAT has a website at 4cat.nl.

Follow 4CAT on Bluesky for updates.

4CAT is a research tool that can be used to analyse and process data from online social platforms. Its goal is to make the capture and analysis of data from these platforms accessible to people through a web interface, without requiring any programming or web scraping skills. Our target audience is researchers, students and journalists interested using Digital Methods in their work.

In 4CAT, you create a dataset from a given platform according to a given set of parameters; the result of this (usually a CSV or JSON file containing matching items) can then be downloaded or analysed further with a suite of analytical 'processors', which range from simple frequency charts to more advanced analyses such as the generation and visualisation of word embedding models.

4CAT has a (growing) number of supported data sources corresponding to popular platforms that are part of the tool, but you can also add additional data sources using 4CAT's Python API. The following data sources are currently supported actively and can be used to collect data with 4CAT directly:

  • 4chan and 8kun
  • Bluesky
  • Telegram
  • TikTok (from a list of TikTok post URLs)
  • Tumblr

The following platforms are supported through Zeeschuimer, with which you can collect data to import into 4CAT for analysis:

  • 9gag
  • Douyin
  • Gab
  • Imgur
  • Instagram (posts)
  • LinkedIn
  • Pinterest
  • Threads
  • Thruth.social
  • TikTok (posts and comments)
  • X/Twitter
  • Xiaohangshu

It is also possible to upload data collected with other tools as CSV files, or zip archives of media files (i.e. video, images, and audio). The following tools are explicitly supported but other data can also be uploaded as long as it is formatted as CSV or uses a common media file format:

A number of other platforms have built-in support that is untested, or requires e.g. special API access. You can view the data sources in our wiki or review the data sources' code in the GitHub repository.

Installation

You can install 4CAT locally or on a server via Docker or manually. For easiest installation, we recommend copying our docker-compose.yml file, .env file, and running this terminal command in the folder where those files have been saved:

docker-compose up -d

In depth instructions on both Docker installation and manual installation can be found in our wiki. A video walkthrough installing 4CAT via Docker can be found on YouTube here.

Currently scraping of 4chan, 8chan, and 8kun require additional steps; please see the wiki.

Please check our issues and create one if you experience any problems (pull requests are also very welcome).

Upgrading 4CAT

Instructions on upgrading 4CAT from previous versions can be found in our wiki.

Modules

4CAT is a modular tool and easy to extend. The following two folders in the repository are of interest for this:

  • datasources: Data source definitions. This is a set of configuration options, database definitions and python scripts to process this data with. If you want to set up your own data sources, refer to the wiki.
  • processors: A collection of data processing scripts that can plug into 4CAT to manipulate or process datasets created with 4CAT. There is an API you can use to make your own processors.

Credits & License

4CAT was created at OILab and the Digital Methods Initiative at the University of Amsterdam. The tool was inspired by DMI-TCAT, a tool with comparable functionality that can be used to scrape and analyse Twitter data.

4CAT development is supported by the Dutch PDI-SSH foundation through the CAT4SMR project.

4CAT is licensed under the Mozilla Public License, 2.0. Refer to the LICENSE file for more information.

Owner

  • Name: Digital Methods Initiative
  • Login: digitalmethodsinitiative
  • Kind: organization
  • Email: webmaster@digitalmethods.net
  • Location: Amsterdam

The Digital Methods Initiative (DMI) is one of Europe's leading Internet Studies research groups. Research tools it develops are collected here.

GitHub Events

Total
  • Create event: 21
  • Release event: 4
  • Issues event: 44
  • Watch event: 69
  • Delete event: 3
  • Issue comment event: 96
  • Push event: 453
  • Gollum event: 26
  • Pull request review comment event: 8
  • Pull request review event: 18
  • Pull request event: 34
  • Fork event: 7
Last Year
  • Create event: 21
  • Release event: 4
  • Issues event: 44
  • Watch event: 69
  • Delete event: 3
  • Issue comment event: 96
  • Push event: 453
  • Gollum event: 26
  • Pull request review comment event: 8
  • Pull request review event: 18
  • Pull request event: 34
  • Fork event: 7

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 21
  • Total pull requests: 23
  • Average time to close issues: 7 months
  • Average time to close pull requests: 4 months
  • Total issue authors: 13
  • Total pull request authors: 5
  • Average comments per issue: 1.05
  • Average comments per pull request: 1.52
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 19
  • Pull requests: 21
  • Average time to close issues: 23 days
  • Average time to close pull requests: 2 months
  • Issue authors: 13
  • Pull request authors: 5
  • Average comments per issue: 0.89
  • Average comments per pull request: 1.0
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sal-uva (13)
  • stijn-uva (12)
  • dale-wahl (9)
  • LAMaglan (4)
  • eeftychiou (2)
  • carschno (2)
  • williamhollingshead (2)
  • heynoooo (2)
  • PoliSP01 (2)
  • alexsal149 (2)
  • rvanvliet (1)
  • rodgers-scott (1)
  • muneerahp (1)
  • citilu (1)
  • mifiwi (1)
Pull Request Authors
  • dale-wahl (17)
  • stijn-uva (13)
  • sal-uva (7)
  • transen (1)
  • carschno (1)
  • dependabot[bot] (1)
  • namansnghl (1)
  • Parker-Kasiewicz (1)
Top Labels
Issue Labels
enhancement (13) bug (10) data source (5) docker issue (5) (mostly) back-end (4) explorer (3) low priority (3) questionable (2) (mostly) front-end (1) processors (1) big (1) PR-welcome (1) deployment (1) wontfix (1)
Pull Request Labels
enhancement (6) data source (3) processors (2) explorer (2) (mostly) back-end (1) questionable (1) dependencies (1) python (1)

Dependencies

.github/workflows/docker_latest.yml actions
  • actions/checkout v3 composite
  • docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
  • docker/login-action f054a8b539a109f9f41c372932f1ae047eff08c9 composite
  • docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite
.github/workflows/docker_new_release.yml actions
  • actions/checkout v3 composite
  • docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
  • docker/login-action f054a8b539a109f9f41c372932f1ae047eff08c9 composite
  • docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite
.github/workflows/docker_pr_test.yml actions
  • actions/checkout v2 composite
docker/Dockerfile docker
  • python 3.8-slim build
docker-compose.yml docker
  • digitalmethodsinitiative/4cat ${DOCKER_TAG}
  • postgres ${POSTGRES_TAG}
docker-compose_build.yml docker
  • 4cat latest
  • postgres ${POSTGRES_TAG}
docker-compose_public_ip.yml docker
  • digitalmethodsinitiative/4cat ${DOCKER_TAG}
  • postgres ${POSTGRES_TAG}
requirements.txt pypi
setup.py pypi