textnets
textnets: A Python package for text analysis with networks - Published in JOSS (2020)
jstor
jstor: Import and Analyse Data from Scientific Texts - Published in JOSS (2018)
corporaexplorer
corporaexplorer: An R package for dynamic exploration of text collections - Published in JOSS (2019)
TRUNAJOD
TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)
cntext
text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。
constituent-treelib
A lightweight Python library for constructing, processing, and visualizing constituent trees.
obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
occupationcoder
Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.
align
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
https://github.com/brucewlee/lftk
[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.
stylest
R package for estimating speaker style distinctiveness in texts. Install it from CRAN!
https://github.com/cahya-wirawan/text-classification
Text Classification engine using several algorithms in machine learning
architxt
ArchiTXT is an open source Python library that transforms unstructured text into structured, searchable, and AI-ready data. It enables automated database generation and seamless data integration.
https://github.com/chainsawriot/textplex
Calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019).
https://github.com/sergeyklay/clusterium
Text Clustering Toolkit for Bayesian Nonparametric Analysis
https://github.com/hidadeng/chinese-pretrained-word-embeddings
中文文本分析工具、语料、预训练模型相关资源汇总。
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - Published in JOSS (2026)
wtt
The Word-Text-Topic (WTT) extraction approach, implemented in Python and R.
taguette
Free and open source qualitative research tool -- MIRROR OF GITLAB REPOSITORY
universitatespodcastdata
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
https://github.com/chainsawriot/rectr
💒 Reproducible Extraction of Cross-lingual Topics using R