PyArabic
PyArabic: A Python package for Arabic text - Published in JOSS (2023)
Nostril
Nostril: A nonsense string evaluator written in Python - Published in JOSS (2018)
TRUNAJOD
TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)
Shekar: A Python Toolkit for Persian Natural Language Processing
Shekar: A Python Toolkit for Persian Natural Language Processing - Published in JOSS (2025)
nlpo3
Thai natural language processing library in Rust, with Python and Node bindings.
Phonetic Algorithms in R
Phonetic Algorithms in R - Published in JOSS (2018)
corpus_text_processor
A desktop application for preparing files for use in a corpus
colibri-core
Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.
https://github.com/kupolak/textstat
Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.
ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).
jaconv
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku
frog
Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.
https://github.com/bagustris/isst_2019
Repository for text emotion recognition submitted to ISST 2019
https://github.com/alexpreynolds/subset
Draw a subset of lines from a text file
convert-csv-schwab2pp
Converts a Charles Schwab transaction CSV file to a ready-to-import CSV file for Portfolio Performance.
https://github.com/andrei-vataselu/data-science-snippets
🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.
csv2ical
A CLI tool that converts a CSV file with event details into an iCalendar ICS file. The ICS file can then be imported into apps like Google Calendar, Microsoft Outlook, Apple macOS Calendar and etc.
stringr-rladies-jakarta
Material pendukung dalam R-Ladies Jakarta 15th Meetup (2 April 2022) dengan topik "Basic text manipulation with stringr".
stringi
Fast and portable character string processing in R (with the Unicode ICU)
portagetextprocessing
Text processing tools that came out of the Portage SMT project — Outils de traitement de texte issus du projet Portage de TAS