Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Wayback Machine API interface & a command-line tool
Basic Info
- Host: GitHub
- Owner: akamhy
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://pypi.org/project/waybackpy/
- Size: 575 KB
Statistics
- Stars: 545
- Watchers: 10
- Forks: 35
- Open Issues: 17
- Releases: 35
Topics
Metadata Files
README.md
Python package & CLI tool that interfaces the Wayback Machine APIs
Introduction
Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine APIs.
Internet Archive's Wayback Machine has 3 useful public APIs.
- SavePageNow or Save API
- CDX Server API
- Availability API
These three APIs can be accessed via the waybackpy either by importing it from a python file/module or from the command-line interface.
Installation
Using pip, from PyPI (recommended):
bash
pip install waybackpy -U
Using conda, from conda-forge (recommended):
See also waybackpy feedstock, maintainers are @rafaelrdealmeida, @labriunesp and @akamhy.
bash
conda install -c conda-forge waybackpy
Install directly from this git repository (NOT recommended):
bash
pip install git+https://github.com/akamhy/waybackpy.git
Docker Image
Docker Hub: hub.docker.com/r/secsi/waybackpy
Docker image is automatically updated on every release by Regulary and Automatically Updated Docker Images (RAUDI).
RAUDI is a tool by SecSI, an Italian cybersecurity startup.
Usage
As a Python package
Save API aka SavePageNow
```python
from waybackpy import WaybackMachineSaveAPI url = "https://github.com" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
saveapi = WaybackMachineSaveAPI(url, useragent) saveapi.save() https://web.archive.org/web/20220118125249/https://github.com/ saveapi.cachedsave False saveapi.timestamp() datetime.datetime(2022, 1, 18, 12, 52, 49) ```
CDX API aka CDXServerAPI
```python
from waybackpy import WaybackMachineCDXServerAPI url = "https://google.com" useragent = "my new app's user agent" cdxapi = WaybackMachineCDXServerAPI(url, user_agent) ```
oldest
python cdx_api.oldest() com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 oldest = cdx_api.oldest() oldest com,google)/ 19981111184551 http://google.com:80/ text/html 200 HOQ2TGPYAEQJPNUA6M4SMZ3NGQRBXDZ3 381 oldest.archive_url 'https://web.archive.org/web/19981111184551/http://google.com:80/' oldest.original 'http://google.com:80/' oldest.urlkey 'com,google)/' oldest.timestamp '19981111184551' oldest.datetime_timestamp datetime.datetime(1998, 11, 11, 18, 45, 51) oldest.statuscode '200' oldest.mimetype 'text/html'newest
python newest = cdx_api.newest() newest com,google)/ 20220217234427 http://@google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 563 newest.archive_url 'https://web.archive.org/web/20220217234427/http://@google.com/' newest.timestamp '20220217234427'near
```python near = cdxapi.near(year=2010, month=10, day=10, hour=10, minute=10) near.archiveurl 'https://web.archive.org/web/20101010101435/http://google.com/' near com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391 near.timestamp '20101010101435' near.timestamp '20101010101435' near = cdxapi.near(waybackmachinetimestamp=2008080808) near.archiveurl 'https://web.archive.org/web/20080808051143/http://google.com/' near = cdxapi.near(unixtimestamp=1286705410) near com,google)/ 20101010101435 http://google.com/ text/html 301 Y6PVK4XWOI3BXQEXM5WLLWU5JKUVNSFZ 391 near.archive_url 'https://web.archive.org/web/20101010101435/http://google.com/'
```
snapshots
python from waybackpy import WaybackMachineCDXServerAPI url = "https://pypi.org" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0" cdx = WaybackMachineCDXServerAPI(url, user_agent, start_timestamp=2016, end_timestamp=2017) for item in cdx.snapshots(): ... print(item.archive_url) ... https://web.archive.org/web/20160110011047/http://pypi.org/ https://web.archive.org/web/20160305104847/http://pypi.org/ . . # URLS REDACTED FOR READABILITY . https://web.archive.org/web/20171127171549/https://pypi.org/ https://web.archive.org/web/20171206002737/http://pypi.org:80/
Availability API
It is recommended to not use the availability API due to performance issues. All the methods of availability API interface class, WaybackMachineAvailabilityAPI, are also implemented in the CDX server API interface class, WaybackMachineCDXServerAPI. Also note
that the newest() method of WaybackMachineAvailabilityAPI can be more recent than WaybackMachineCDXServerAPI's same method.
```python
from waybackpy import WaybackMachineAvailabilityAPI
url = "https://google.com" user_agent = "Mozilla/5.0 (Windows NT 5.1; rv:40.0) Gecko/20100101 Firefox/40.0"
availabilityapi = WaybackMachineAvailabilityAPI(url, useragent) ```
oldest
python availability_api.oldest() https://web.archive.org/web/19981111184551/http://google.com:80/newest
python availability_api.newest() https://web.archive.org/web/20220118150444/https://www.google.com/near
python availability_api.near(year=2010, month=10, day=10, hour=10) https://web.archive.org/web/20101010101708/http://www.google.com/Documentation is at https://github.com/akamhy/waybackpy/wiki/Python-package-docs.
As a CLI tool
Demo video on asciinema.org, you can copy the text from video:
CLI documentation is at https://github.com/akamhy/waybackpy/wiki/CLI-docs.
CONTRIBUTORS
AUTHORS
- akamhy (https://github.com/akamhy)
- eggplants (https://github.com/eggplants)
- danvalen1 (https://github.com/danvalen1)
- AntiCompositeNumber (https://github.com/AntiCompositeNumber)
- rafaelrdealmeida (https://github.com/rafaelrdealmeida)
- jonasjancarik (https://github.com/jonasjancarik)
- jfinkhaeuser (https://github.com/jfinkhaeuser)
ACKNOWLEDGEMENTS
- mhmdiaa (https://github.com/mhmdiaa)
--known-urlsis based on this gist. - dequeued0 (https://github.com/dequeued0) for reporting bugs and useful feature requests.
Owner
- Name: Akash Mahanty
- Login: akamhy
- Kind: user
- Location: Delhi, India
- Website: https://akamhy.me
- Twitter: _AkashMahanty
- Repositories: 5
- Profile: https://github.com/akamhy
~
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: waybackpy
abstract: "Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily."
version: '3.0.6'
doi: 10.5281/ZENODO.3977276
date-released: 2022-03-15
type: software
authors:
- given-names: Akash
family-names: Mahanty
email: akamhy@yahoo.com
orcid: https://orcid.org/0000-0003-2482-8227
keywords:
- Archive Website
- Wayback Machine
- Internet Archive
- Wayback Machine CLI
- Wayback Machine Python
- Internet Archiving
- Availability API
- CDX API
- savepagenow
license: MIT
repository-code: "https://github.com/akamhy/waybackpy"
GitHub Events
Total
- Issues event: 2
- Watch event: 64
- Issue comment event: 5
- Pull request event: 1
- Fork event: 2
Last Year
- Issues event: 2
- Watch event: 64
- Issue comment event: 5
- Pull request event: 1
- Fork event: 2
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Akash | 6****y | 396 |
| Akash Mahanty | a****o@g****m | 58 |
| eggplants | w****w@y****p | 14 |
| whitesource-bolt-for-github[bot] | 4****] | 2 |
| deepsource-autofix[bot] | 6****] | 2 |
| danvalen1 | d****1@g****m | 2 |
| Rafael de Almeida | r****a@g****m | 2 |
| AntiCompositeNumber | A****r@g****m | 2 |
| pyup.io bot | g****t@p****o | 1 |
| Rishav Kundu | rk@r****o | 1 |
| Jonáš Jančařík | j****k@g****m | 1 |
| Jens Finkhaeuser | j****s@f****e | 1 |
| DeepSource Bot | b****t@d****o | 1 |
| ArztKlein | 5****n | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 51
- Total pull requests: 57
- Average time to close issues: 4 months
- Average time to close pull requests: 14 days
- Total issue authors: 24
- Total pull request authors: 12
- Average comments per issue: 2.33
- Average comments per pull request: 1.19
- Merged pull requests: 49
- Bot issues: 1
- Bot pull requests: 3
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- akamhy (15)
- eggplants (9)
- dequeued0 (3)
- h6197627 (3)
- maaaaz (2)
- superbonaci (2)
- emilyksanders (1)
- mend-bolt-for-github[bot] (1)
- DGaffney (1)
- AlbertoFDR (1)
- Forage (1)
- sissbruecker (1)
- Huo-Yuan (1)
- riko1010 (1)
- alicescfernandes (1)
Pull Request Authors
- akamhy (29)
- eggplants (12)
- pyup-bot (5)
- deepsource-autofix[bot] (2)
- jjmaestro (2)
- jfinkhaeuser (1)
- mend-bolt-for-github[bot] (1)
- ArztKlein (1)
- rafaelrdealmeida (1)
- codacy-badger (1)
- xrisk (1)
- jonasjancarik (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 50,226 last-month
- Total docker downloads: 4,976
-
Total dependent packages: 7
(may contain duplicates) -
Total dependent repositories: 174
(may contain duplicates) - Total versions: 41
- Total maintainers: 1
pypi.org: waybackpy
Python package that interfaces with the Internet Archive's Wayback Machine APIs. Archive pages and retrieve archived pages easily.
- Homepage: https://akamhy.github.io/waybackpy/
- Documentation: https://github.com/akamhy/waybackpy/wiki
- License: MIT
-
Latest release: 3.0.6
published almost 4 years ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/akamhy/waybackpy
- Documentation: https://pkg.go.dev/github.com/akamhy/waybackpy#section-documentation
- License: mit
-
Latest release: v1.5.1
published almost 6 years ago
Rankings
conda-forge.org: waybackpy
Waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine API. Wayback Machine has 3 client side APIs: Save API, Availability API and CDX API. These three APIs can be accessed via the waybackpy either by importing it in a script or from the CLI.
- Homepage: https://akamhy.github.io/waybackpy/
- License: MIT
-
Latest release: 3.0.6
published almost 4 years ago
Rankings
Dependencies
- black * development
- click * development
- codecov * development
- flake8 * development
- mypy * development
- pytest * development
- pytest-cov * development
- requests * development
- setuptools >=46.4.0 development
- types-requests * development
- click *
- requests *
- urllib3 *
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- github/codeql-action/analyze v1 composite
- github/codeql-action/autobuild v1 composite
- github/codeql-action/init v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite