https://github.com/centre-for-humanities-computing/twitter-posting-stats
https://github.com/centre-for-humanities-computing/twitter-posting-stats
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: centre-for-humanities-computing
- Language: Jupyter Notebook
- Default Branch: main
- Size: 250 KB
Statistics
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Twitter posting stats
This project is for gathering posting statistics from Twitter scrapes. Since
those scrapes can get pretty big, pyspark is used. The data amounts are probably not
big enough to merit large Spark clusters, but in principle this should scale to bigger
and more machines.
Input data
The input data should be line-delimited JSON files of Tweets scraped via the Twitter API. The Tweet objects should contain these fields as a minimum in this format:
{
"id": str,
"text": str,
"created_at": datetime,
"author_id": str,
"public_metrics": {
"retweet_count": int,
"reply_count": int,
"like_count": int,
"quote_count": int
},
"includes": {
"users": [
{
"id": str,
"username": str,
"verified": bool,
"description": str,
"protected": bool,
"name": str,
"created_at": datetime,
"public_metrics": {
"followers_count": int
"following_count": int
"tweet_count": int
"listed_count": int
}
}
]
}
}
How to run
First, install the package:
pip install -e .
Extract data from scrapes
You can run the Spark app just with Python, e.g.:
python extract-data.py "input/examples_*.ndjson""
Be mindful of quotes if you use a glob pattern.
You can also run with spark-submit which can give more control over Spark
configuration, in which case you should pass the -n (--no-local) to the Python
script, e.g.:
spark-submit --master "local[32]" --driver-memory "64G" extract-data.py -n "input/examples_*.ndjson"
Write extracted data to SQLite database
You can write the extracted data (tweets and users) to a SQLite database with the
scripts in database-setup, e.g.
python database-setup/tweet-tables.py "out/examples/tweets/*" twitter-example
which will then create a twitter-example.db file.
Perform analysis on tweets and users
[...]
Owner
- Name: Center for Humanities Computing Aarhus
- Login: centre-for-humanities-computing
- Kind: organization
- Email: chcaa@cas.au.dk
- Location: Aarhus, Denmark
- Website: https://chc.au.dk/
- Repositories: 130
- Profile: https://github.com/centre-for-humanities-computing
GitHub Events
Total
- Watch event: 1
- Delete event: 1
Last Year
- Watch event: 1
- Delete event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- KasperFyhn (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- dacy *
- dataclasses_json *
- pyspark *
- pytest *