TextDescriptives
TextDescriptives: A Python package for calculating a large variety of metrics from text - Published in JOSS (2023)
trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
@stdlib/datasets-spache-revised
A list of simple American-English words (revised Spache).
textstat
:memo: python package to calculate readability statistics of a text object - paragraphs, sentences, articles.
https://github.com/amazon-science/controllable-readability-summarization
Generating Summaries with Controllable Readability Levels (EMNLP 2023)
https://github.com/alan-turing-institute/readabilipy
A simple HTML content extractor in Python. Can be run as a wrapper for Mozilla's Readability.js package or in pure-python mode.
https://github.com/brucewlee/prompt-learning-readability
[EACL 2023] use text-to-text models (BART, T5) for readability assessment
https://github.com/brucewlee/textreader
Readability Formulas and Reading Time Statistics
semantic-outlier-removal
Code and data for SORE (ACL 2025), a semantic boilerplate remover.
dyslexic-readability
A readability scoring library tailored to the specific needs of Turkish dyslexic readers.