waybackpy

Wayback Machine API interface & a command-line tool

https://github.com/akamhy/waybackpy

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

archive-webpage archive-webpages cdx-api internet-archive internet-archiving osint savepagenow wayback-machine wayback-machine-api wayback-machine-python web-archiving webarchiving

Keywords from Contributors

dice-roller data-integration data-manipulation genomics multi-omics algorithm
Last synced: 6 months ago · JSON representation ·

Repository

Wayback Machine API interface & a command-line tool

Basic Info
Statistics
  • Stars: 545
  • Watchers: 10
  • Forks: 35
  • Open Issues: 17
  • Releases: 35
Topics
archive-webpage archive-webpages cdx-api internet-archive internet-archiving osint savepagenow wayback-machine wayback-machine-api wayback-machine-python web-archiving webarchiving
Created almost 6 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md


Python package & CLI tool that interfaces the Wayback Machine APIs

Unit Tests codecov pypi Downloads Codacy Badge GitHub lastest commit PyPI - Python Version Code style: black


Introduction

Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs.

Internet Archive's Wayback Machine has 3 useful public APIs.

  • SavePageNow or Save API
  • CDX Server API
  • Availability API

These three APIs can be accessed via the waybackpy either by importing it from a python file/module or from the command-line interface.

Installation

Using pip, from PyPI (recommended):

bash pip install waybackpy -U

Using conda, from conda-forge (recommended):

See also waybackpy feedstock, maintainers are @rafaelrdealmeida, @labriunesp and @akamhy.

bash conda install -c conda-forge waybackpy

Install directly from this git repository (NOT recommended):

bash pip install git+https://github.com/akamhy/waybackpy.git

Docker Image

Docker Hub: hub.docker.com/r/secsi/waybackpy

Docker image is automatically updated on every release by Regulary and Automatically Updated Docker Images (RAUDI).

RAUDI is a tool by SecSI, an Italian cybersecurity startup.

Usage

As a Python package

Save API aka SavePageNow

```python

from waybackpy import WaybackMachineSaveAPI url = "https://github.com" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

saveapi = WaybackMachineSaveAPI(url, useragent) saveapi.save() https://web.archive.org/web/20220118125249/https://github.com/ saveapi.cachedsave False saveapi.timestamp() datetime.datetime(2022, 1, 18, 12, 52, 49) ```

CDX API aka CDXServerAPI

```python

from waybackpy import WaybackMachineCDXServerAPI url = "https://google.com" useragent = "my new app's user agent" cdxapi = WaybackMachineCDXServerAPI(url, user_agent) ```

oldest

python cdx_api.oldest() com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 oldest = cdx_api.oldest() oldest com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 oldest.archive_url 'https://web.archive.org/web/19981111184551/http://google.com:80/' oldest.original 'http://google.com:80/' oldest.urlkey 'com,google)/' oldest.timestamp '19981111184551' oldest.datetime_timestamp datetime.datetime(1998, 11, 11, 18, 45, 51) oldest.statuscode '200' oldest.mimetype 'text/html'

newest

python newest = cdx_api.newest() newest com,google)/ 20220217234427 http://@google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 563 newest.archive_url 'https://web.archive.org/web/20220217234427/http://@google.com/' newest.timestamp '20220217234427'

near

```python near = cdxapi.near(year=2010, month=10, day=10, hour=10, minute=10) near.archiveurl 'https://web.archive.org/web/20101010101435/http://google.com/' near com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391 near.timestamp '20101010101435' near.timestamp '20101010101435' near = cdxapi.near(waybackmachinetimestamp=2008080808) near.archiveurl 'https://web.archive.org/web/20080808051143/http://google.com/' near = cdxapi.near(unixtimestamp=1286705410) near com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391 near.archive_url 'https://web.archive.org/web/20101010101435/http://google.com/'

```

snapshots

python from waybackpy import WaybackMachineCDXServerAPI url = "https://pypi.org" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0" cdx = WaybackMachineCDXServerAPI(url, user_agent, start_timestamp=2016, end_timestamp=2017) for item in cdx.snapshots(): ... print(item.archive_url) ... https://web.archive.org/web/20160110011047/http://pypi.org/ https://web.archive.org/web/20160305104847/http://pypi.org/ . . # URLS REDACTED FOR READABILITY . https://web.archive.org/web/20171127171549/https://pypi.org/ https://web.archive.org/web/20171206002737/http://pypi.org:80/

Availability API

It is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, WaybackMachineAvailabilityAPI, are also implemented in the CDX server API interface class, WaybackMachineCDXServerAPI. Also note that the newest() method of WaybackMachineAvailabilityAPI can be more recent than WaybackMachineCDXServerAPI's same method.

```python

from waybackpy import WaybackMachineAvailabilityAPI

url = "https://google.com" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"

availabilityapi = WaybackMachineAvailabilityAPI(url, useragent) ```

oldest

python availability_api.oldest() https://web.archive.org/web/19981111184551/http://google.com:80/

newest

python availability_api.newest() https://web.archive.org/web/20220118150444/https://www.google.com/

near

python availability_api.near(year=2010, month=10, day=10, hour=10) https://web.archive.org/web/20101010101708/http://www.google.com/

Documentation is at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.

As a CLI tool

Demo video on asciinema.org, you can copy the text from video:

asciicast

CLI documentation is at https://github.com/akamhy/waybackpy/wiki/CLI-docs.

CONTRIBUTORS

AUTHORS

ACKNOWLEDGEMENTS

Owner

  • Name: Akash Mahanty
  • Login: akamhy
  • Kind: user
  • Location: Delhi, India

~

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: waybackpy
abstract: "Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily."
version: '3.0.6'
doi: 10.5281/ZENODO.3977276
date-released: 2022-03-15
type: software
authors:
  - given-names: Akash
    family-names: Mahanty
    email: akamhy@yahoo.com
    orcid: https://orcid.org/0000-0003-2482-8227
keywords:
    - Archive Website
    - Wayback Machine
    - Internet Archive
    - Wayback Machine CLI
    - Wayback Machine Python
    - Internet Archiving
    - Availability API
    - CDX API
    - savepagenow
license: MIT
repository-code: "https://github.com/akamhy/waybackpy"

GitHub Events

Total
  • Issues event: 2
  • Watch event: 64
  • Issue comment event: 5
  • Pull request event: 1
  • Fork event: 2
Last Year
  • Issues event: 2
  • Watch event: 64
  • Issue comment event: 5
  • Pull request event: 1
  • Fork event: 2

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 484
  • Total Committers: 14
  • Avg Commits per committer: 34.571
  • Development Distribution Score (DDS): 0.182
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Akash 6****y 396
Akash Mahanty a****o@g****m 58
eggplants w****w@y****p 14
whitesource-bolt-for-github[bot] 4****] 2
deepsource-autofix[bot] 6****] 2
danvalen1 d****1@g****m 2
Rafael de Almeida r****a@g****m 2
AntiCompositeNumber A****r@g****m 2
pyup.io bot g****t@p****o 1
Rishav Kundu rk@r****o 1
Jonáš Jančařík j****k@g****m 1
Jens Finkhaeuser j****s@f****e 1
DeepSource Bot b****t@d****o 1
ArztKlein 5****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 51
  • Total pull requests: 57
  • Average time to close issues: 4 months
  • Average time to close pull requests: 14 days
  • Total issue authors: 24
  • Total pull request authors: 12
  • Average comments per issue: 2.33
  • Average comments per pull request: 1.19
  • Merged pull requests: 49
  • Bot issues: 1
  • Bot pull requests: 3
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • akamhy (15)
  • eggplants (9)
  • dequeued0 (3)
  • h6197627 (3)
  • maaaaz (2)
  • superbonaci (2)
  • emilyksanders (1)
  • mend-bolt-for-github[bot] (1)
  • DGaffney (1)
  • AlbertoFDR (1)
  • Forage (1)
  • sissbruecker (1)
  • Huo-Yuan (1)
  • riko1010 (1)
  • alicescfernandes (1)
Pull Request Authors
  • akamhy (29)
  • eggplants (12)
  • pyup-bot (5)
  • deepsource-autofix[bot] (2)
  • jjmaestro (2)
  • jfinkhaeuser (1)
  • mend-bolt-for-github[bot] (1)
  • ArztKlein (1)
  • rafaelrdealmeida (1)
  • codacy-badger (1)
  • xrisk (1)
  • jonasjancarik (1)
Top Labels
Issue Labels
enhancement (23) bug (12) security vulnerability (2) good first issue (2) question (1) wontfix (1)
Pull Request Labels
enhancement (4) documentation (2) bug (2) test (1) security fix (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 50,226 last-month
  • Total docker downloads: 4,976
  • Total dependent packages: 7
    (may contain duplicates)
  • Total dependent repositories: 174
    (may contain duplicates)
  • Total versions: 41
  • Total maintainers: 1
pypi.org: waybackpy

Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.

  • Versions: 35
  • Dependent Packages: 7
  • Dependent Repositories: 174
  • Downloads: 50,226 Last month
  • Docker Downloads: 4,976
Rankings
Dependent repos count: 1.2%
Dependent packages count: 1.6%
Docker downloads count: 1.8%
Average: 3.0%
Stargazers count: 3.4%
Downloads: 3.5%
Forks count: 6.8%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/akamhy/waybackpy
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
conda-forge.org: waybackpy

Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine API. Wayback Machine has 3 client side APIs: Save API, Availability API and CDX API. These three APIs can be accessed via the waybackpy either by importing it in a script or from the CLI.

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 22.9%
Forks count: 30.3%
Dependent repos count: 34.0%
Average: 34.6%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

requirements-dev.txt pypi
  • black * development
  • click * development
  • codecov * development
  • flake8 * development
  • mypy * development
  • pytest * development
  • pytest-cov * development
  • requests * development
  • setuptools >=46.4.0 development
  • types-requests * development
requirements.txt pypi
  • click *
  • requests *
  • urllib3 *
.github/workflows/build-test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/unit-test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
pyproject.toml pypi
setup.py pypi