caltechlibrary/waystation

Automatically archive your repository's GitHub Pages in the Wayback Machine.

https://github.com/caltechlibrary/waystation

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
    Organization caltechlibrary has institutional domain (www.library.caltech.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

archiving automation documentation github-action github-actions github-automation github-pages internet-archive preservation wayback-machine
Last synced: 4 months ago · JSON representation ·

Repository

Automatically archive your repository's GitHub Pages in the Wayback Machine.

Basic Info
Statistics
  • Stars: 26
  • Watchers: 6
  • Forks: 1
  • Open Issues: 0
  • Releases: 11
Topics
archiving automation documentation github-action github-actions github-automation github-pages internet-archive preservation wayback-machine
Created about 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Support Codemeta

README.md

WaystationWaystation logo: a stylized icon of a camera drawn in white outline inside an orange-colored circle.

Waystation is a GitHub Action that makes it easy to archive your repository's GitHub Pages site automatically in the Internet Archive's Wayback Machine.

License GitHub Latest release DOI GitHub marketplace

Table of contents

Introduction

Many projects use GitHub Pages for documentation and other purposes. GitHub Pages are wonderful, but they are not archived. To help ensure long-term access to your GitHub Pages, you may want to preserve them in the Internet Archive's Wayback Machine. That's the purpose of this GitHub Action.

How does Waystation work?

Waystation (a loose acronym of Wayback site archiving automation) sends your project's GitHub Pages URL to the Wayback Machine. It's intended to be triggered on software releases in your repository and uses the Wayback Machine GitHub Action to send your repository's configured GitHub Pages URL to the Wayback Machine, thereby ensuring that the latest copy of your site is archived. You can change the trigger condition if needed.

Why would you want to bother with this?

GitHub is incredibly popular today, but the content is not guaranteed to be permanent; moreover, GitHub has in the past changed the URLs and policies surrounding GitHub Pages—and may do so again in the future. The Wayback Machine is a free digital archive of the World Wide Web founded by the Internet Archive. Web pages saved in the Wayback Machine continue to exist even after the original project repository changes or is removed from the web, and the archived pages can be searched for, shared, and linked to normally. You can also view previous versions of a site if they were archived.

Installation

To use Waystation, you need to create a GitHub Actions workflow file in your repository. Follow these simple steps.

Add the workflow file to your repository

  1. In the main branch of your repository, create a .github/workflows directory if this directory does not already exist.
  2. In the .github/workflows directory, create a file named archive-github-pages.yml.
  3. Copy and paste the following content into the file:

    ```yaml

    GitHub Actions workflow for Waystation version 1.8.0.

    Available as the file "sample-workflow.yml" from the software

    repository at https://github.com/caltechlibrary/waystation

    name: Archive GitHub Pages run-name: Archive GitHub Pages in the Wayback Machine

    on: release: types: [published] workflowdispatch: inputs: dryrun: description: "Run without actually sending URLs" type: boolean

    jobs: run-waystation: name: Run Waystation runs-on: ubuntu-latest steps: - uses: caltechlibrary/waystation@v1.8 with: dryrun: ${{github.event.inputs.dryrun || false}} ```

  4. Save the file, add it to your git repository, and commit the changes.

  5. (If you did the steps above outside of GitHub) Push your repository changes to GitHub.

Test the workflow

Once you have created the workflow file and pushed it to GitHub, it's wise to do a dry run, in order to test that things work as expected.

  1. Go to the Actions tab in your repository and click on the workflow named "Archive GitHub Pages" in the sidebar on the left

    Screenshot of GitHub actions workflow list

  2. In the page shown by GitHub next, click the Run workflow button in the right-hand side of the blue strip

    Screenshot of GitHub Actions workflow run button

  3. In the pull-down, click the checkbox for "Run without actually sending URLs"

    Screenshot of GitHub Actions workflow menu

  4. Click the green Run workflow button near the bottom
  5. Refresh the web page and a new line will be shown named after your workflow file

    Screenshot of GitHub Actions running

  6. Click the title of that workflow, to make GitHub show the progress and results of running Waystation

Usage

Once installed, the sample workflow will run automatically the next time you publish a release on GitHub. The trigger condition that causes Waystation to run automatically is determined by the on statement in your archive-github-pages.yml workflow file. The examples shown here use on: release to trigger when a release is published, but you can use other trigger events defined by GitHub if you wish.

Several optional parameters control the behavior of Waystation; they are described below.

dry_run (default: false)

Setting the parameter dry_run to true will cause the action to execute without sending the URL to the Wayback Machine. This is mainly useful for testing, especially if you want to try different trigger conditions.

The sample workflow file (shown above) includes a dry_run parameter checkbox when invoked manually. You can use that to set the value on an individual per-run basis. To change the default value (for example, when experimenting with different trigger conditions), you can do so by changing the false to true in the last line of the sample workflow. That is, change the last line from

yaml dry_run: ${{github.event.inputs.dry_run || false}}

to

yaml dry_run: ${{github.event.inputs.dry_run || true}}

debug (default: false)

Passing the parameter debug with a value of true will cause Waystation to print the values of the input variables and the GitHub context at run time. This is useful for debugging the workflow. To set the debug parameter, add it as part of the with: block in the workflow file. For example:

yaml ... - uses: caltechlibrary/waystation@main with: dry_run: ${{github.event.inputs.dry_run || false}} debug: true ...

save_outlinks (default: true)

This corresponds to the parameter saveOutlinks in the Wayback Machine GitHub Action. A value of true will make the action tell the Wayback Machine to archive external pages that are linked to from your GitHub Pages. The default in Waystation is true because Waystation's author finds this useful in producing a more complete archive of a GitHub Pages site. To set the save_outlinks parameter, add it as part of the with: block in the workflow file. For example:

yaml ... - uses: caltechlibrary/waystation@main with: dry_run: ${{github.event.inputs.dry_run || false}} save_outlinks: true ...

save_screenshot (default: true)

This corresponds to the parameter saveScreenshot in the Wayback Machine GitHub Action. A value of true will make the action tell the Wayback Machine to save a screenshot of the page located at the GitHub Pages URL. The default in Waystation is true because Waystation's author finds this useful in producing a more complete archive of a GitHub Pages site. To set the save_screenshot parameter, add it as part of the with: block in the workflow file. For example:

yaml ... - uses: caltechlibrary/waystation@main with: dry_run: ${{github.event.inputs.dry_run || false}} save_screenshot: true ...

Getting help

If you find an issue, please submit it in the GitHub issue tracker for this repository.

Contributing

Your help and participation in enhancing Waystation is welcome! Please visit the guidelines for contributing for some tips on getting started.

License

Software produced by the Caltech Library is Copyright © 2022–2024 California Institute of Technology. This software is freely distributed under a modified BSD 3-clause license. Please see the LICENSE file for more information.

Acknowledgments

This work was funded by the California Institute of Technology Library.

Waystation makes use of the excellent Wayback Machine GitHub Action by Jamie Magee.


Caltech logo

Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Hucka
    given-names: Michael
    email: mhucka@caltech.edu
    orcid: https://orcid.org/0000-0001-9105-5960
title: Waystation
abstract: GitHub Action to archive the GitHub Pages of a repository in the Wayback Machine.
version: 1.8.0
date-released: 2024-01-29
url: https://caltechlibrary.github.io/waystation
repository-code: https://github.com/caltechlibrary/waystation
license-url: https://github.com/caltechlibrary/waystation/blob/main/LICENSE
doi: 10.22002/hy6ag-xw238
type: software
keywords:
  - software
  - automation
  - archiving

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "name": "Waystation",
  "identifier": "waystation",
  "description": "GitHub Action to archive the GitHub Pages of a repository in the Wayback Machine.",
  "version": "1.8.0",
  "datePublished": "2024-01-29",
  "author": [
    {
      "@type": "Person",
      "givenName": "Michael",
      "familyName": "Hucka",
      "affiliation": {
        "@type": "Organization",
        "name": "California Institute of Technology Library"
      },
      "email": "mhucka@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9105-5960"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Michael",
      "familyName": "Hucka",
      "affiliation": {
        "@type": "Organization",
        "name": "California Institute of Technology Library"
      },
      "email": "mhucka@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9105-5960"
    }
  ],
  "funder": {
    "@id": "https://ror.org/05dxps055",
    "@type": "Organization",
    "name": "California Institute of Technology Library"
  },
  "copyrightHolder": [
    {
      "@id": "https://ror.org/05dxps055",
      "@type": "Organization",
      "name": "California Institute of Technology"
    }
  ],
  "copyrightYear": 2024,
  "license": "https://github.com/caltechlibrary/waystation/blob/main/LICENSE",
  "isAccessibleForFree": true,
  "url": "https://caltechlibrary.github.io/waystation",
  "codeRepository": "https://github.com/caltechlibrary/waystation",
  "readme": "https://github.com/caltechlibrary/waystation/blob/main/README.md",
  "softwareHelp": "https://caltechlibrary.github.io/waystation",
  "releaseNotes": "https://github.com/caltechlibrary/waystation/blob/main/CHANGES.md",
  "issueTracker": "https://github.com/caltechlibrary/waystation/issues",
  "downloadUrl": "https://github.com/caltechlibrary/waystation/archive/main.zip",
  "relatedLink": "https://data.caltech.edu/records/hy6ag-xw238",
  "keywords": [
    "software",
    "automation",
    "archiving"
  ],
  "developmentStatus": "active"
}

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 105
  • Total Committers: 2
  • Avg Commits per committer: 52.5
  • Development Distribution Score (DDS): 0.019
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Hucka m****a@c****u 103
Jamie Magee j****e@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 1
  • Total pull requests: 2
  • Average time to close issues: 17 days
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 2.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • theodore-s-beers (1)
Pull Request Authors
  • JamieMagee (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 13
github actions: caltechlibrary/waystation

Archive a repository's GitHub Pages in the Wayback Machine

  • Versions: 13
  • Dependent Packages: 0
  • Dependent Repositories: 2
Rankings
Dependent packages count: 0.0%
Stargazers count: 11.4%
Average: 13.4%
Dependent repos count: 20.6%
Forks count: 21.7%
Last synced: 12 months ago

Dependencies

.github/workflows/build-sphinx.yml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/test-action.yml actions
  • caltechlibrary/waystation main composite
.github/workflows/test-wayback.yml actions
  • JamieMagee/wayback v1.3.28 composite
action.yml actions
  • JamieMagee/wayback v1.3.28 composite
  • actions/github-script v6.3.3 composite
requirements-dev.txt pypi
  • linkify-it-py * development
  • myst-parser * development
  • sphinx-material * development
.github/workflows/waystation.yml actions
  • caltechlibrary/waystation main composite