trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
https://github.com/clarin-eric/pressmint
PressMint: Interoperable Corpora of Historical Newspapers
https://github.com/chartes/dubourg
Sources XML-TEI de l’édition de la correspondance du chancelier Antoine Du Bourg (1535-1538)
ediarum.prohd.edit
Last public release of ediarum.PROHD.edit for Proyecto Humboldt Digital including localization into Spanish
https://github.com/conal-tuohy/vmcp-upconversion
Ferdinand von Mueller's correspondence upconversion from MS Word to TEI XML
dts-typescript
Distributed Text Services (DTS) API for the TEI/XML files available in the Kouigenji Monogatari Text DB
https://github.com/chartes/cartulaires
Sources XML-TEI de l’édition des Cartulaires d’Île-de-France
https://github.com/chartes/miroir
Sources XML-TEI de l’application Miroir des classiques (F. Duval)
https://github.com/chartes/encpos
Sources XML/TEI des positions des thèses de l’École des chartes
rus-novel-desktop-app
Десктопное приложение для создания размеченных файлов корпуса русского романа 📖
fabulasmitologicas
A collection of Golden Age poems in Spanish in TEI and plain text