textnets
textnets: A Python package for text analysis with networks - Published in JOSS (2020)
jstor
jstor: Import and Analyse Data from Scientific Texts - Published in JOSS (2018)
corporaexplorer
corporaexplorer: An R package for dynamic exploration of text collections - Published in JOSS (2019)
TRUNAJOD
TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)
cntext
text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。
constituent-treelib
A lightweight Python library for constructing, processing, and visualizing constituent trees.
obsei
Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
occupationcoder
Given a job title and job description, the algorithm assigns a standard occupational classification (SOC) code to the job.
align
Python library for extracting quantitative, reproducible metrics of multi-level alignment between speakers in naturalistic language corpora.
qdap
Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis
https://github.com/brucewlee/lftk
[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.
stylest
R package for estimating speaker style distinctiveness in texts. Install it from CRAN!
https://github.com/cahya-wirawan/text-classification
Text Classification engine using several algorithms in machine learning
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali
anvay: A Web-based Tool for Interpretive Topic Modelling in Bengali - Published in JOSS (2026)
taguette
Free and open source qualitative research tool -- MIRROR OF GITLAB REPOSITORY
https://github.com/chainsawriot/textplex
Calculate textual complexity using the algorithm by Tolochko & Boomgaarden (2019).
wtt
The Word-Text-Topic (WTT) extraction approach, implemented in Python and R.
https://github.com/chainsawriot/rectr
💒 Reproducible Extraction of Cross-lingual Topics using R
https://github.com/sergeyklay/clusterium
Text Clustering Toolkit for Bayesian Nonparametric Analysis
https://github.com/hidadeng/chinese-pretrained-word-embeddings
中文文本分析工具、语料、预训练模型相关资源汇总。
universitatespodcastdata
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
architxt
ArchiTXT is an open source Python library that transforms unstructured text into structured, searchable, and AI-ready data. It enables automated database generation and seamless data integration.