Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: actions-marketplace-validations
  • License: other
  • Language: Makefile
  • Default Branch: main
  • Size: 203 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Support Codemeta

README.md

BalerA baler making bales of hay on a farm

Baler (bad link reporter) is a GitHub Action that tests the URLs inside Markdown files of your GitHub repository. If any of them are invalid, Baler automatically opens a GitHub issue to report the problem(s).

Latest release License DOI

Table of contents

Introduction

The URLs of hyperlinks inside Markdown files may be invalid for any number of reasons: the author might make typographical errors, or the link destinations might disappear over time, or other reasons. Manually testing the validity of links on a regular basis is laborious and error-prone – this is clearly a case where automation is best. That's where Baler comes in.

Baler (bad link reporter) is a GitHub Action for automatically testing the links inside Markdown files in your repository, then filing issue reports when problems are found. It's by no means the first or only GitHub Action for this purpose. Baler aims to be different from the others through its simplicity, the use of a different link checker approach, and its informative issue reports.

Baler only tests URLs that use the scheme https or http.

Installation

This action is available from the GitHub Marketplace. Once you find the page in the GitHub Marketplace, do the following:

  1. In the main branch of your repository, create a .github/workflows directory if this directory does not already exist.
  2. In the .github/workflows directory, create a file named bad-link-reporter.yml.
  3. Copy and paste the following content into the file:

    ```yaml

    GitHub Actions workflow for Baler (BAd Link reportER) version 0.0.1.

    This is available as the file "sample-workflow.yml" from the source

    code repository for Baler: https://github.com/caltechlibrary/baler/

    name: "Bad Link Reporter"

    Configure this section ─────────────────────────────────────────────

    env: # Files examined by the workflow: files: '*.md'

    # Label assigned to issues created by this workflow: labels: 'bug'

    # Optional file containing a list of URLs to ignore, one per line: ignore: '.github/workflows/ignored-urls.txt'

    on: schedule: # Syntax is: "minute hour day-of-month month day-of-week" - cron: "00 04 * * " pull_request: paths: ['*.md'] push: paths: - .github/workflows/bad-link-reporter.yml - .github/workflows/ignored-urls.txt workflow_dispatch:

    The rest of this file should be left as-is ─────────────────────────

    run-name: Test links in files jobs: run-baler: name: Run Bad Link Reporter runs-on: ubuntu-latest steps: - uses: caltechlibrary/baler@main with: files: ${{github.event.inputs.files || env.files}} labels: ${{github.event.inputs.labels || env.labels}} ignore: ${{github.event.inputs.ignore || env.ignore}} ```

  4. Save the file, add it to your git repository, and commit the changes.

  5. (If you did the steps above outside of GitHub) Push your repository changes to GitHub.

Refer to the next section for more information.

Usage

The trigger condition that causes Baler to run is determined by the on statement in your bad-link-reporter.yml workflow file. The default triggers are:

  • a scheduled run every night
  • pull requests involving .md files
  • push requests involving the workflow file itself or the optional list of ignored URLs
  • manual workflow dispatch execution

The workflow triggers on pull requests involving Markdown files, because that's a situation when it makes sense to test the URLs immediately. However, the default configuration does not trigger execution on every push. That's because running tests at every push is rarely a good idea: if you're actively editing a file like the README file and it has an undiscovered URL error, you can easily trigger the creation of many issue reports before you realize what happened. Instead, a once-a-night run is good enough.

For more information about schedule-based execution, please see the GitHub document "Workflow syntax for GitHub Actions". For more information about other triggers you can use, please see the GitHub document "Triggering a workflow".

Workflow configuration parameters

A few parameters control the behavior of Baler. They are described below.

files

The input parameter files sets the file name pattern that identifies the Markdown files Baler examines. The default is *.md, which makes Baler examine the Markdown files at the top level of a repository. You can set this to multiple patterns by separating patterns with commas (without spaces).

ignore_list

The value of the input parameter ignore_list should be a plain text file file containing URLs that Baler should ignore. The default value is .github/workflows/ignored-urls.txt. The file does not have to exist; if it doesn't exist, this parameter simply has no effect. The parameter can only reference a file in the repository and not an external file. Each URL should be written alone on a separate line of the file. They can be written as regular expresions; e.g., https://example\.(com|org).

Telling Baler to ignore certain URLs is useful if some of your files contain fake URLs used as examples in documentation, or when certain real URLs are repeatedly flagged as unreachable when the workflow runs in GitHub's computing environment (see next section below).

labels

When Baler opens a new issue after it finds problems, it can optionally assign a label to the issue. The value of this input parameter should be the name of one or more labels that is already defined in the GitHub repository's issue system. Multiple issue labels can be written, with commas between them.

GitHub event handling

The following is an explanation of how different types of GitHub events are handled, and the reasoning behind the choices:

  • `workflowdispatchevents_: test all.mdfiles matched by the pattern defined byinputs.files`, regardless of whether the files were modified in the latest commit. Rationale: if you're invoking the action manually, you probably intend to test the files as they exist in the repository now, and not relative to a past commit or other past event.
  • schedule events: test all .md files matched by inputs.files, regardless of whether they have been modified in the latest commit. Rationale: (1) it wouldn't make sense to have periodic runs test only the files modified in the latest commit, because a previous commit (or the nth previous) might also have modified some Markdown files, which means the latest commit is not a good reference point; and (2) regularly testing all Markdown files, regardless of whether they were edited recently, is an important way to find links that worked in the past but stopped working due to link rot or other problems.
  • All other event types: test the .md files that will be changed (compared to the versions of those files in the destination branch) as a result of the event. The exact trigger condition is under the control of the invoking workflow. For example, in the sample workflow, pull_request events result in testing .md files that were modified by the pull request, but push events do not test any .md files unless the push also changes the workflow file itself or the file containing the list of ignored urls.

Known issues and limitations

When Baler runs on GitHub, it will sometimes mysteriously report a link as unreacheable even though you can access it without trouble from your local computer. It's not yet clear what causes this. My current best guess is that it's due to network routing or DNS issues in the environment where the link checker actually runs (i.e., GitHub's computing environment).

Getting help

If you find an issue, please submit it in the GitHub issue tracker for this repository.

Contributing

Your help and participation in enhancing Baler is welcome! Please visit the guidelines for contributing for some tips on getting started.

License

Software produced by the Caltech Library is Copyright © 2023 California Institute of Technology. This software is freely distributed under a BSD-style license. Please see the LICENSE file for more information.

Acknowledgments

The image of a baler used at the top of this README file was obtained from Wikimedia Commons on 2023-12-11. The photo was taken and contributed by Glendon Kuhns and made available under the Creative Commons CC0 1.0 license.

Numerous other broken link checkers similar to Baler can be found in GitHub. Some of them served as sources of ideas for what to do in Baler, and I want to acknowledge this debt. The following are notable programs that I looked at (and if you are the author of another one not listed here, please don't feel slighted – I probably missed it simply due to limited time, inadequate or incomplete search, or lack of serendipity):

This work was funded by the California Institute of Technology Library.


Caltech logo

Owner

  • Name: actions-marketplace-validations
  • Login: actions-marketplace-validations
  • Kind: organization

Temporarily holds mirrors of GitHub Actions from the marketplace

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: Baler: BAd Link reportER
authors:
  - family-names: Hucka
    given-names: Michael
    orcid: https://orcid.org/0000-0001-9105-5960
abstract: Baler is a GitHub Action that tests the URLs inside Markdown files in your GitHub repository and opens an issue if it finds any problems.
repository-code: "https://github.com/caltechlibrary/baler"
type: software
version: 0.0.2
license-url: "https://github.com/caltechlibrary/baler/blob/main/LICENSE"
keywords:
  - automation
  - software
  - GitHub Actions
date-released: 2023-12-14

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "name": "Baler: BAd Link reportER",
  "identifier": "baler",
  "description": "Baler is a GitHub Action that tests the URLs inside Markdown files in your GitHub repository and opens an issue if it finds any problems.",
  "version": "0.0.2",
  "datePublished": "2023-12-14",
  "dateCreated": "2023-12-11",
  "author": [
    {
      "@type": "Person",
      "givenName": "Michael",
      "familyName": "Hucka",
      "affiliation": {
        "@type": "Organization",
        "name": "California Institute of Technology Library"
      },
      "email": "mhucka@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9105-5960"
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Michael",
      "familyName": "Hucka",
      "affiliation": {
        "@type": "Organization",
        "name": "California Institute of Technology Library"
      },
      "email": "mhucka@caltech.edu",
      "@id": "https://orcid.org/0000-0001-9105-5960"
    }
  ],
  "funder": {
    "@id": "https://ror.org/05dxps055",
    "@type": "Organization",
    "name": "California Institute of Technology Library"
  },
  "copyrightHolder": [
    {
      "@id": "https://ror.org/05dxps055",
      "@type": "Organization",
      "name": "California Institute of Technology"
    }
  ],
  "copyrightYear": 2023,
  "license": "https://github.com/caltechlibrary/baler/blob/main/LICENSE",
  "isAccessibleForFree": true,
  "url": "https://caltechlibrary.github.io/baler",
  "codeRepository": "https://github.com/caltechlibrary/baler",
  "readme": "https://github.com/caltechlibrary/baler/blob/main/README.md",
  "releaseNotes": "https://github.com/caltechlibrary/baler/blob/main/CHANGES.md",
  "issueTracker": "https://github.com/caltechlibrary/baler/issues",
  "downloadUrl": "https://github.com/caltechlibrary/baler/releases",
  "softwareHelp": "https://caltechlibrary.github.io/baler",
  "relatedLink": "https://data.caltech.edu/records/h75w5-y7y57",
  "keywords": [
    "software",
    "automation",
    "GitHub Actions",
    "GitHub Automation"
  ],
  "developmentStatus": "active"
}

GitHub Events

Total
Last Year

Dependencies

.github/workflows/bad-link-reporter.yml actions
  • caltechlibrary/baler main composite
.github/workflows/codeql-analysis-injected.yml actions
  • actions/checkout 93ea575cb5d8a053eaa0ac8fa3b40d7e05a33cc8 composite
  • github/codeql-action/analyze a669cc5936cc5e1b6a362ec1ff9e410dc570d190 composite
  • github/codeql-action/init a669cc5936cc5e1b6a362ec1ff9e410dc570d190 composite
.github/workflows/iga.yml actions
  • caltechlibrary/iga main composite
.github/workflows/markdown-linter.yml actions
  • DavidAnson/markdownlint-cli2-action v13 composite
  • actions/checkout v4 composite
action.yml actions
  • actions/checkout v3 composite
  • peter-evans/create-issue-from-file v4 composite
  • tj-actions/changed-files v40 composite
  • tj-actions/glob v17 composite