https://github.com/capjamesg/web-feed-recovery

Try to identify new versions of feeds that now return a 404.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Keywords

atom feed-reader-testing feed-reading rss

Last synced: 5 months ago · JSON representation

Repository

Try to identify new versions of feeds that now return a 404.

Basic Info

Host: GitHub
Owner: capjamesg
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 27.3 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

atom feed-reader-testing feed-reading rss

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

Web Feed Recovery

This repository contains a script that aims to find a new version of a web feed for a feed that currently returns a 404.

This repository takes a list of feed URLs that are known to be 404s and attempts to find new feeds.

On a test of 160 broken feeds from the real world, this project recovered 67%.

Installation

First, clone this project:

git clone https://github.com/capjamesg/web-feed-recovery

Then, create a file called feeds.txt and add feeds that are known to be broken. Add one feed URL per line.

Then, run:

app.py

Results will be saved to a file called results.json with the structure:

json [ { "original_feed": "https://blog.autumnrain.cc", "found_feeds": { "https://blog.autumnrain.cc/rss/": "application/rss+xml" } } ]

The key-value pairs are the found feed URL mapped to the found MIME type.

MIME types are only added if a feed was found through HTTP header discovery. If the feed was not found through HTTP header discovery, the MIME type will be null.

Algorithm

Go to the homepage of the site associated with the feed.
Check the HTTP headers and HTML <link> tags for signals of a feed (using the indieweb-utils feed discovery implementation).
Check for instances of several link anchors indicative of a feed (i.e. "RSS", "RSS Feed"). Save those as potential new feeds.
Check for instances of link anchors for several blog-related terms, like "Blog" and "Writing". Go to those pages, perform HTTP header and HTML <link> tag analysis, and save any feeds.
Present all discovered feeds.

Limitations

For a multi-user site on the same domain, the algorithm will not work. This is because a feed on the URL cannot be confidently, generally reconciled with a single writer with the algorithm above. More additions would be needed to support such behaviour.

UX

The feeds returned are "potential" feeds, since any feed that the user did not add to a feed reader themselves (or that a feed reader did not infer from a URL provided by a user) cannot be known to be the right replacement without confirmation from a user. Thus, use of this script in any project should be accompanied by a stage where a user is asked to confirm that the new feed matches their expectations before replacing the broken feed with the newly-found one.

License

This project is licensed under an MIT license.

Owner

Name: James
Login: capjamesg
Kind: user
Location: Scotland
Company: @Roboflow

Website: jamesg.blog
Repositories: 320
Profile: https://github.com/capjamesg

from words, wonder.

GitHub Events

Total

Push event: 12
Create event: 2

Last Year

Push event: 12
Create event: 2

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/capjamesg/web-feed-recovery

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Web Feed Recovery

Installation

Algorithm

Limitations

UX

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies