Recent Releases of hyphe

hyphe - Hot 2025, the post Skybox era

ChangeLog: - Add a button to start a crawl directly from the Network page - Add in frontend ways to cancel all pending crawls and cancel/recrawl individual crawls from the Monitor all crawls page (+ fix API's crawl.cancel_all route to also cancel crawls unscheduled within scrapy yet and set their crawl status appropriately) - Improve reviewed crawls button in the Monitor all crawls page - Add default webentity creation rules for Bluesky and X user accounts, as well as skyblogs for webarchives - Fix Monitor latest crawls page not displaying most recent ones in some server cases due to misaligned timestamps - Small fixes for BnF & INA Web Archives (proper permalinks, adapt to recent upstream changes) - Minor fixes to installation doc and frontend display (make tags validation easier with an "Add" button, display visually crawl status of each page of a webentity, handle total redirected pages missing from old hyphe corpus versions, autostop network spatialization, make some buttons more visible, fix duration displayed for canceled unscheduled crawls, autofocus input in Import page, etc.)

Full Changelog: https://github.com/medialab/hyphe/compare/v1.12.0...v1.12.1

- JavaScript
Published by boogheta 8 months ago

hyphe - 2025 up in the Skybox

ChangeLog: - Fix Default WebEntityCreationRule not always applied when different of domain (upgrades to hyphe-traph v2.2) (#499) - Add an option in the web interface to load tags from a CSV file along with importing new or existing WebEntities (#503) - Add the possibility to set a crawl job as reviewed (#478) - Allow to rename a corpus (#457) - Better handle WebEntities with prefixes including special characters in the path (#447) - Distinguish crawl pages error from simple redirection ones (#492) - Auto resolve more urls directly within crawler (#463) - Fix automatic feeding of recent UserAgents, whether behind a proxy or not - Small fixes for INA & BnF Web Archives (#502 + permalinks with misformatted dates) - Minor fixes to lookups logic, config loading, manual installation doc, corpus landing page (#487) and backend logs display

Full Changelog: https://github.com/medialab/hyphe/compare/v1.11.0...v1.12.0

- JavaScript
Published by boogheta 10 months ago

hyphe - Early 2024

ChangeLog: - Give access to detailed crawl logs within frontend (#452) - Diverse small UI fixes/improvements in frontend (#482, #483, #485, #486, #488, #494) - Complete adaptation of web archives handling to INA's (#484)

Full Changelog: https://github.com/medialab/hyphe/compare/v1.10.9...v1.11.0

- JavaScript
Published by boogheta about 2 years ago

hyphe - alpha v1.11

Changelog: TBD

- JavaScript
Published by boogheta over 2 years ago

hyphe - Back-to-school papercuts

ChangeLog: - Add a button to export metadata from all pages of a webentity (#318) - Explicitly separate startpages warnings regarding redirected pages and faulty ones (#379) - Allow to set a specific User-Agent per crawl within the web interface (#461) - Display hints on the meaning of the different possible status of a crawl (#474) - Highlight corresponding webentities when hovering a status or a tag in the network legend (#459) - Switch User-Agents list used within crawls to relying on https://www.useragents.me/ (#453) - Various improvements (cleaner backend logs, remove empty traphs directories (#475), updated heuristics for webentity links calculation rhythm, visual fixes (#476, #477)

- JavaScript
Published by boogheta over 2 years ago

hyphe - Hot Summer '23 one again

ChangeLog: - migrated caching WELinks to (working) files instead of mongo to handle huge corpuses - allow to set archives pass as ENV variable for docker instances - display time required by links indexation on overview

- JavaScript
Published by boogheta over 2 years ago

hyphe - Hot Summer '23

ChangeLog: - migrated caching WELinks to files instead of mongo to handle huge corpuses - allow to set archives pass as ENV variable for docker instances

- JavaScript
Published by boogheta over 2 years ago

hyphe - Summer '23

ChangeLog: - Added handling of more webarchives as sources (Arquivo.pt + INA DLWeb) + fixed various webarchives frontend info (#469, #471, - Added a corpus setting "ignore internal links" to crawl but not record links within the currently crawled webentity in order to fasten drastically indexation of entities with crazy amounts of links (with a cost in terms of functionalities since the network of internal pages is then not available, and entities that are split after a crawl will require to recrawled) (cf #371, #378, #433) - Better handle frontend warning on pending actions when trying to close a tab (#465, #466) - Minor fixes (#448, #460, #467, #468, #470, 50d97e84814fcb40987c6beca1edbb2639d3d202, 85decf2787cb2924928a82dca89be1438778dd14)

- JavaScript
Published by boogheta over 2 years ago

hyphe - Better, faster, stronger traph, there it is!

ChangeLog: - Switched to breaking new version of hyphe-traph 2.1, which should help fasten indexation on big networks, but requires to rebuild corpuses from start - Make iterator traph calls less recurrent to leave priority to quick user actions - Fixed stack on calling empty callback in List Webentities - Upgraded urllib3 to handle SSL deprecation - Froze dependencies to maintain python2.7 compat

- JavaScript
Published by boogheta over 3 years ago

hyphe - Summer '22

ChangeLog: - Upgraded User Agents list - Added extra default WebEntity CreationRules for Github, Instagram, TikTok, Reddit and a bunch of blog platforms - Added perma.cc to list of default autofollowlinks - Diverse fixes and extra features for webarchives (links to archive permalinks, etc.) - Minor bugfixes

- JavaScript
Published by boogheta over 3 years ago

hyphe - Spring '22

ChangeLog: - Added a distinction between successful and errored crawled pages to identify Suspicious crawls (#425) - Fixed frontend compatibility within Hyphe-Browser (https://github.com/medialab/hyphe-browser/issues/212) - Fixed WebArchives crawling interface (#431) and behavior from BNF's archives (#426) - Improved network page's interaction using latest sigma.js v2.2 (node highlight etc & #367) - Allowed frontend to automatically restart a closed corpus when reopening the frontend directly on a specific corpus link (#440) - Allowed to check contiguous cases in frontend's lists of webentities using the shift key (#438) - Allowed to tune the frontend's header color from the config (#430) - Published Hyphe on Zenodo & Software Heritage - Minor fixes (#397, #388, #432, #429, #437, #343, #341, #444, #325)

- JavaScript
Published by boogheta almost 4 years ago

hyphe - Robots sensitive crawls (stabilized)

ChangeLog: - Fixed environment variable OBEY_ROBOTS for Docker instance - Added explanation helpers in frontend - Fixed undeletable corpora

- JavaScript
Published by boogheta over 4 years ago

hyphe - Robots sensitive crawls

ChangeLog: - Optional support of robots.txt respect by crawls (added by @stijn-uva #376 #421) - Minor fixes (sigma.js upgrade to v2.0beta, #370, #395, #423, #284, https://github.com/medialab/hyphe/commit/dba57218e399b819d21f18920cb19b0733ef87de, ...)

- JavaScript
Published by boogheta over 4 years ago

hyphe - WebArchives powered crawls

ChangeLog: - Allow to start crawls on Web Archives to browse disappeared or modified webentities in the past (#372) - Allow to setup advanced individual crawl settings (using a specific cookie, adjusting the depth, using a web archive...) - Allow to display only crawled pages in a webentity's webpages list - Upgraded fake user agents dependency for more recent UAs - Add to the API a route to collect crawled webentity's webpages content as clear text instead of zipped base64 - Minor fixes (#397, #416, #418, https://github.com/medialab/hyphe/commit/8b8f73f31756f9f8088cb910d7170c8b035ba5a4, https://github.com/medialab/hyphe/commit/3b48755bfbf8ea9eb0c29ba3513e03175b762a47, https://github.com/medialab/hyphe/commit/6aea48ab61b4eb5c768d6c73592707ba6d87e449, https://github.com/medialab/hyphe/commit/f3c1e85716698250156084baba6d9dd2151b828a, https://github.com/medialab/hyphe/commit/e97b9d057a83d32a11840bfee507a1c59d992a1b, https://github.com/medialab/hyphe/commit/b05d4704e3d8dde4ad6700cfed84ed5d681d0355, https://github.com/medialab/hyphe/commit/01aac8a98aaf43331f0c7244f80ef87f6cc9918d, ...)

- JavaScript
Published by boogheta over 4 years ago

hyphe - Sum' 21

ChangeLog: - Fix links from OUT entities not accessible (#401) - Upgraded MongoDB version to work on more recent debian and such (#377) - Fix breakage with some old python2 configurations - Upgraded traph version to fix issue happening sometimes when getting paginated pages - Minor fixes (#263, #397, #416...)

- JavaScript
Published by boogheta over 4 years ago

hyphe - Early 2021

ChangeLog: - Fix WebEntityCreationRules not taken into account anymore after corpus reset (#392) - Better handle startpages manually from the front as batch (#352 #365 #336) - Fix use of htpasswd to lock instances in docker builds (#390) - Improve & fasten docker builds & CI autodeploys (#339 #374) - Updated list of automatically followed redirection domains in docker builds - Allow to set tags from API when creating a WebEntity - Ensure settings input from frontend are valid (#360)

- JavaScript
Published by boogheta about 5 years ago

hyphe - Winter 2019

ChangeLog: - Settings can now be adjusted when creating a new corpus (#229) - Network pages improvements (search nodes #255, view links direction #286, colors #311) - Admin page improvements (filter, order, backup and reindex actions, destroy all button... #264) - Links from OUT and DISCOVERED entities not taken into account anymore when computing WEs indegree (#232) - First version for displaying each WE's ego network (#316 #204) - Export buttons for crawls metadata in All crawls page (#319) - Fix starting crawls with many many prefixes and startpages previously impossible (#353) - Use imported urls as startpages when importing preexisting webentity (#365) - Handle case of nested imbricated WebEntities for crawls (#326) - Updated list of redirection domains to follow when crawling (#346) - Fix missing www for creationrules with a path prefix (#363) - Minor frontend improvements (login #239 #279, prospect #323, webentity edit #314 #304, pages network #335, startpages #324, homepage #340, crawls #297, ...) - Add API routes to collect crawled pages metadata & html content when option activated

Many thanks to @2LaMa who's behind a lot of these improvements!

- JavaScript
Published by boogheta over 6 years ago

hyphe - Fall 2019 (fixes 2 breaking bugs + minor fixes)

ChangeLog: - Fix "homepage" mode for automatic startpages (breaking crawls from prospect on some settings) - Fix some breaking calls to get_tags with no namespace - Fix action menu in List WebEntities sometimes not triggered - Better handle errors coming from empty calls (closes #337)

- JavaScript
Published by boogheta over 6 years ago

hyphe - Summer 2019 (fix issues with big webentities)

Changelog: - Use traph 1.2.0 with paginated queries to fix issues collecting all pages and pagelinks of a single webentity at once (#293), also fasten collecting childentities and cache number of pages by entity during network computation - Fix broken WebEntity pages network view - Add number of pages per webentity to WebEntities Lists, as well as exports and network view - Fix creationrules missing after resetting a corpus (#320) - Fix password protected access to corpora - Always include homepage as a startpage when crawling a discovered (#322) - Fix various crawler errors - Allow editing a tag in a single API call instead of removing then adding - Add script to trigger backup for all existing corpora

- JavaScript
Published by boogheta over 6 years ago

hyphe - Spring 2019 (upgraded crawler)

Changelog: - Upgraded Scrapy (0.24.6 -> 1.6) and ScrapyD (1.0.1 -> 1.2.0) versions to latest ones, fixing broken crawls on many https websites (#268 #273 #312 #270) and broken Docker installs on some Windows and Mac machines - Upgraded Hyphe-Traph (1.0.0 -> 1.1.0) for faster homepages automatic identification - Upgraded Graphology (0.11.4 -> 0.14.1) & Sigma (2.0.0-alpha18 -> 2.0.0-alpha20) for small networks fixes - Improved Tags Inputs in Frontend's "WebEntity edition" and "Manage Tags" pages - Transformed FREETAGS into actual research "Field notes" preparing HyBro's coming new direction (#296) - Plenty of minor backend & frontend fixes (#305 #291 #310 #302 #281 #276 #248 #244 #275 #236 #294 #290 #258 ...)

- JavaScript
Published by boogheta almost 7 years ago

hyphe - Working early 2019

Changelog: - Fix docker issue with NFS volumes and alpine dependencies - Give to crawler more recent user-agents - Use more recent version of sigma.js in frontend's graph visualisation (#285) - Add sorting buttons in frontend's crawls list - Minor frontend fixes (#280 #277)

- JavaScript
Published by boogheta about 7 years ago

hyphe - Early 2019

Warning: please privilege version 1.0.5

Changelog: - Fix docker issue with NFS volumes - Give to crawler more recent user-agents - Use more recent version of sigma.js in frontend's graph visualisation (#285) - Add sorting buttons in frontend's crawls list - Minor frontend fixes (#280 #277)

- JavaScript
Published by boogheta about 7 years ago

hyphe - Sum 2018

Changelog: - Small frontend bugfixes (#259, #262 + tags autocomplete/sorting issues) - Add option to setup cookies for some crawls via API advanced use - Priorize indexation over webentity links calculation when queue gets too long

- JavaScript
Published by boogheta over 7 years ago

hyphe - Faster indexation for big corpora

Changelog: - Updated mongodb calls and added more indexes to fasten pages indexation - Changed default configuration from storing html contents to not storing them to lighten disk consumption - Small frontend bugfixes (#252, #254, #261) - Fixed bin/clone_corpus script

- JavaScript
Published by boogheta almost 8 years ago

hyphe - Extra Docker settings options (htpasswd, advanced config)

Changelog:

  • Added to Docker frontend config options to restrict Hyphe access behind htpasswd (#249)
  • Added to Docker backend config options to configure DefaultStartpagesMode, CreationRules and RedirectionDomains (#242)
  • Fixed compatibility of exported GEXF graph with Gephi (removed extra id attributes)

- JavaScript
Published by boogheta about 8 years ago

hyphe - Version 1

After many development versions over the past few years, Hyphe is finally reaching a stable faster version with this version 1.0.0 which includes: - an easy installation process relying on Docker for any kind of machine including Linux, Mac OS X & Windows - a new homemade memory structure relying on our mix of Tree and Graph structures named hyphe-traph - a Material-based redesigned web interface with embedded tagging and a couple other new functionnalities

- JavaScript
Published by boogheta about 8 years ago

hyphe - Multicorpus & new interface

Edit: corrected version of 0.2 to handle scrapyd issues on some distros by including https://github.com/medialab/hyphe/commit/7a6efd23d106b4758849913971370b53f401723e (cf #159)

Download hyphe-release-v0.2.1.tar.gz on this link or at the bottom of this page.

Just untar, install and run as shown below (do not use sudo, the install script will automatically ask for your password when needed):

bash tar xzvf hyphe-release-v0.2.1.tar.gz cd hyphe bin/install.sh bin/hyphe start

Use the web interface on http://localhost/hyphe Read the doc for more details or to serve Hyphe on a webserver.

To restart or stop the service, just run:

bash bin/hyphe restart bin/hyphe stop

This release was tested on the following distributions: (use preferably a 64bit system to ensure MongoDB is able to work with bigger databases than only 2GB)

| Distribution | Version | precision | OK ? | | :-: | :-: | :-: | :-: | | Ubuntu | 12.04.5 LTS | server | ✓ | | Ubuntu | 12.04.5 LTS | desktop | ✓ | | Ubuntu | 14.04.1 LTS | server | ✓ | | Ubuntu | 14.04.1 LTS | desktop | ✓ | | Ubuntu | 15.04 | desktop | — (ScrapyD + Upstart issue with Ubuntu 15 so far) | | Ubuntu | 14.10 | desktop | ✓ | | CentOS | 5.7 | server | — (issues due to missing upstart & python2.4) | | CentOS | 6.4 Final | server | ✓ | | Debian | 6.0.10 squeeze | server | ✓ | | Debian | 7.5 wheezy | server | ✓ | | Debian | 7.8 wheezy | livecd gnome | ✓ | | Debian | 8.0 jessie | livecd gnome | — (MongoDB not supporting Debian 8 yet) |

Screenshots:

hyphe2 hyphe3 hyphe4 hyphe5

- JavaScript
Published by boogheta almost 11 years ago

hyphe - More distributions support

Release meant to be installable on various distributions including ubuntu/debian/centos machines at least. Download hyphe-v0.1.tar.gz at the bottom of this page. Just untar, install and run! (DO NOT use sudo, the install script will already require your password when needed)

bash tar xzvf hyphe-v0.1.tar.gz cd Hyphe bin/install.sh bin/hyphe start

Use the web interface on http://localhost/hyphe

To stop the service, just run:

bash bin/hyphe stop

(use preferably a 64bit system to ensure MongoDB is able to work with bigger databases than only 2Gb)

hyphe hyphe-network

- JavaScript
Published by boogheta about 12 years ago

hyphe - First release

Release meant to be easily installable on an ubuntu machine. Download hyphe-v0.0.0.tar.gz at the bottom of this page. Just untar, install and run!

bash mkdir -p hyphe tar -xvf hyphe-v0.0.0.tar.gz -C hyphe cd hyphe bash bin/install.sh bash bin/start.sh

Use the web interface on http://localhost/hyphe

To stop the service, just run:

bash bash bin/stop.sh

(use preferably a 64bit system to ensure MongoDB is able to work with bigger databases than only 2Gb)

hyphe hyphe-network

- JavaScript
Published by boogheta over 12 years ago