https://github.com/centre-for-humanities-computing/twitter-posting-stats

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: centre-for-humanities-computing
Language: Jupyter Notebook
Default Branch: main
Size: 250 KB

Statistics

Stars: 2
Watchers: 0
Forks: 1
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme

Twitter posting stats

This project is for gathering posting statistics from Twitter scrapes. Since those scrapes can get pretty big, pyspark is used. The data amounts are probably not big enough to merit large Spark clusters, but in principle this should scale to bigger and more machines.

Input data

The input data should be line-delimited JSON files of Tweets scraped via the Twitter API. The Tweet objects should contain these fields as a minimum in this format:

{ "id": str, "text": str, "created_at": datetime, "author_id": str, "public_metrics": { "retweet_count": int, "reply_count": int, "like_count": int, "quote_count": int }, "includes": { "users": [ { "id": str, "username": str, "verified": bool, "description": str, "protected": bool, "name": str, "created_at": datetime, "public_metrics": { "followers_count": int "following_count": int "tweet_count": int "listed_count": int } } ] } }

How to run

First, install the package:

pip install -e .

Extract data from scrapes

You can run the Spark app just with Python, e.g.:

python extract-data.py "input/examples_*.ndjson""

Be mindful of quotes if you use a glob pattern.

You can also run with spark-submit which can give more control over Spark configuration, in which case you should pass the -n (--no-local) to the Python script, e.g.:

spark-submit --master "local[32]" --driver-memory "64G" extract-data.py -n "input/examples_*.ndjson"

Write extracted data to SQLite database

You can write the extracted data (tweets and users) to a SQLite database with the scripts in database-setup, e.g.

python database-setup/tweet-tables.py "out/examples/tweets/*" twitter-example

which will then create a twitter-example.db file.

Perform analysis on tweets and users

[...]

Owner

Name: Center for Humanities Computing Aarhus
Login: centre-for-humanities-computing
Kind: organization
Email: chcaa@cas.au.dk
Location: Aarhus, Denmark

Website: https://chc.au.dk/
Repositories: 130
Profile: https://github.com/centre-for-humanities-computing

GitHub Events

Total

Watch event: 1
Delete event: 1

Last Year

Watch event: 1
Delete event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science