trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
Iceberg-locations
Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2
worldometer
Get live, population, geography, projected, and historical data from around the world 🌍
https://github.com/claromes/volleystats
🏐 Command-line tool to scrape volleyball statistics from Data Project Web Competition websites
semantic-outlier-removal
Code and data for SORE (ACL 2025), a semantic boilerplate remover.
https://github.com/cumbof/galaxy-scraper-extension
Chrome extension that automatically scrape and send data from a webpage to the current history of a Galaxy instance.
rat-software
Streamline your search engine research. With the Result Assessment Tool (RAT) you can easily collect results from different search engines, let participants evaluate the results and analyse your findings.
4cat
The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
https://github.com/apoorv74/olympics-2020-golf-stats
Scripts to fetch statistics from the Olympics website. Sport - Golf. Event - Women's Individual Stroke Play