waybacktweets

Archived tweets from the Wayback Machine

https://github.com/claromes/waybacktweets

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.6%) to scientific vocabulary

Keywords

internet-archive osint osint-tools socmint twitter wayback-machine wayback-tweets x
Last synced: 6 months ago · JSON representation ·

Repository

Archived tweets from the Wayback Machine

Basic Info
Statistics
  • Stars: 130
  • Watchers: 4
  • Forks: 37
  • Open Issues: 2
  • Releases: 21
Topics
internet-archive osint osint-tools socmint twitter wayback-machine wayback-tweets x
Created almost 3 years ago · Last pushed 9 months ago
Metadata Files
Readme Funding License Citation

README.md

Wayback Tweets

PyPI PyPI Downloads

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see Field Options), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

Installation

It is compatible with Python versions 3.10 and above. See installation options.

shell pipx install waybacktweets

CLI

```shell Usage: waybacktweets [OPTIONS] USERNAME USERNAME: The Twitter username without @

Options: -c, --collapse [urlkey|digest|timestamp:xx] Collapse results based on a field, or a substring of a field. XX in the timestamp value ranges from 1 to 14, comparing the first XX digits of the timestamp field. It is recommended to use from 4 onwards, to compare at least by years. -f, --from DATE Filtering by date range from this date. Format: YYYYmmdd -t, --to DATE Filtering by date range up to this date. Format: YYYYmmdd -l, --limit INTEGER Query result limits. -rk, --resumption_key TEXT Allows for a simple way to scroll through the results. Key to continue the query from the end of the previous query. -mt, --matchtype [exact|prefix|host|domain] Results matching a certain prefix, a certain host or all subdomains. -v, --verbose Shows the log. --version Show the version and exit. -h, --help Show this message and exit.

Examples: waybacktweets jack waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack

Repository: https://github.com/claromes/waybacktweets

Documentation: https://waybacktweets.claromes.com ```

Module

Open In Collab

```python from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter

USERNAME = "jack"

api = WaybackTweets(USERNAME) archived_tweets = api.get()

if archivedtweets: fieldoptions = [ "archivedurlkey", "archivedtimestamp", "parsedarchivedtimestamp", "archivedtweeturl", "parsedarchivedtweeturl", "originaltweeturl", "parsedtweeturl", "availabletweettext", "availabletweetisRT", "availabletweetinfo", "archivedmimetype", "archivedstatuscode", "archiveddigest", "archivedlength", "resumption_key", ]

parser = TweetsParser(archived_tweets, USERNAME, field_options)
parsed_tweets = parser.parse()

exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
exporter.save_to_csv()
exporter.save_to_json()
exporter.save_to_html()

```

Web App

Streamlit App

A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.

Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.

Documentation

Acknowledgements

  • Tristan Lee (Bellingcat's Data Scientist) for the idea.
  • Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
  • OSINT Community for recommending the package and the application.

License

GPL-3.0

Owner

  • Login: claromes
  • Kind: user
  • Company: @CampusVirtualFiocruz

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Wayback Tweets
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Clarissa
    family-names: Mendes
    email: support@claromes.com
identifiers:
  - type: doi
    value: 10.5281/zenodo.12528447
    description: Retrieves archived tweets from Wayback Machine in HTML, CSV, and JSON.
  - type: url
    value: "https://pypi.org/project/waybacktweets/"
    description: Python Package Index.
  - type: url
    value: "https://waybacktweets.claromes.com/"
    description: Documentation.
repository-code: "https://github.com/claromes/waybacktweets"
url: "https://waybacktweets.claromes.com/"
abstract: >-
  Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
keywords:
  - Twitter
  - X
  - Tweets
  - Wayback Machine
  - OSINT
  - SOCMINT
  - Python
license: GPL-3.0
version: 1.0
date-released: "2025-05-26"

GitHub Events

Total
  • Issues event: 6
  • Watch event: 35
  • Delete event: 1
  • Issue comment event: 11
  • Push event: 23
  • Pull request event: 3
  • Fork event: 12
Last Year
  • Issues event: 6
  • Watch event: 35
  • Delete event: 1
  • Issue comment event: 11
  • Push event: 23
  • Pull request event: 3
  • Fork event: 12

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 186
  • Total Committers: 2
  • Avg Commits per committer: 93.0
  • Development Distribution Score (DDS): 0.011
Past Year
  • Commits: 96
  • Committers: 1
  • Avg Commits per committer: 96.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Claromes c****s@h****m 184
claromes c****a@h****m 2
Committer Domains (Top 20 + Academic)
hey.com: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 13
  • Total pull requests: 23
  • Average time to close issues: 5 months
  • Average time to close pull requests: 8 days
  • Total issue authors: 8
  • Total pull request authors: 4
  • Average comments per issue: 1.77
  • Average comments per pull request: 0.13
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 4
  • Average time to close issues: 6 months
  • Average time to close pull requests: about 5 hours
  • Issue authors: 5
  • Pull request authors: 2
  • Average comments per issue: 1.8
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • claromes (7)
  • ezezezrov (1)
  • kadins1 (1)
  • j2muro (1)
  • smile131313 (1)
  • Thebroken10 (1)
  • aardvarkshere (1)
  • Qmorah82one (1)
  • cupofjoe0 (1)
  • digitalarchivo (1)
Pull Request Authors
  • claromes (22)
  • cobblefence2k (2)
  • taylornic62 (2)
  • longtree (1)
  • Qmorah82one (1)
Top Labels
Issue Labels
enhancement (3) community feedback (2) prototype (2) web app (1) help wanted (1) bug (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 195 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 9
  • Total maintainers: 1
pypi.org: waybacktweets

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data.

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 195 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.7%
Dependent repos count: 60.6%
Maintainers (1)
Last synced: 6 months ago