Scientific Software
Updated 6 months ago

PyArabic — Peer-reviewed • Rank 21.4 • Science 100%

PyArabic: A Python package for Arabic text - Published in JOSS (2023)

Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

Nostril — Peer-reviewed • Rank 15.9 • Science 95%

Nostril: A nonsense string evaluator written in Python - Published in JOSS (2018)

Scientific Software
Updated 6 months ago

TRUNAJOD — Peer-reviewed • Rank 12.2 • Science 95%

TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)

Updated 6 months ago

nlpo3 • Rank 15.7 • Science 67%

Thai natural language processing library in Rust, with Python and Node bindings.

Scientific Software
Updated 6 months ago

Phonetic Algorithms in R — Peer-reviewed • Rank 5.1 • Science 59%

Phonetic Algorithms in R - Published in JOSS (2018)

Updated 6 months ago

corpus_text_processor • Rank 3.6 • Science 54%

A desktop application for preparing files for use in a corpus

Updated 6 months ago

colibri-core • Rank 6.9 • Science 49%

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

Updated 5 months ago

https://github.com/kupolak/textstat • Rank 16.4 • Science 26%

Ruby gem to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.

Updated 6 months ago

ekphrasis • Rank 15.7 • Science 23%

Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekphrasis performs tokenization, word normalization, word segmentation (for splitting hashtags) and spell correction, using word statistics from 2 big corpora (english Wikipedia, twitter - 330mil english tweets).

Updated 6 months ago

de-workflow • Rank 11.6 • Science 26%

A ToolBox for fuzzily extracting drugs mentions from text.

Updated 6 months ago

jaconv • Rank 22.7 • Science 13%

Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, and Zenkaku

Updated 6 months ago

stringx • Rank 9.2 • Science 26%

Drop-in replacements for base R string functions powered by stringi

Updated 6 months ago

frog • Rank 7.1 • Science 26%

Frog is an integration of memory-based natural language processing (NLP) modules developed for Dutch. All NLP modules are based on Timbl, the Tilburg memory-based learning software package.

Updated 6 months ago

convert-csv-schwab2pp • Science 67%

Converts a Charles Schwab transaction CSV file to a ready-to-import CSV file for Portfolio Performance.

Updated 5 months ago

https://github.com/andrei-vataselu/data-science-snippets • Science 26%

🧰 Essential EDA and Data Cleaning Helpers for Any DataFrame This collection of functions is designed to accelerate exploratory data analysis (EDA), quickly surface data quality issues, and offer high-level insights into the structure and content of your dataset.

Updated 6 months ago

csv2ical • Science 67%

A CLI tool that converts a CSV file with event details into an iCalendar ICS file. The ICS file can then be imported into apps like Google Calendar, Microsoft Outlook, Apple macOS Calendar and etc.

Updated 6 months ago

stringr-rladies-jakarta • Science 57%

Material pendukung dalam R-Ladies Jakarta 15th Meetup (2 April 2022) dengan topik "Basic text manipulation with stringr".

Updated 6 months ago

stringi • Science 49%

Fast and portable character string processing in R (with the Unicode ICU)

Updated 6 months ago

text-dedup • Science 67%

All-in-one text de-duplication

Updated 6 months ago

cassandre • Science 44%

Diary for qualitative analysis

Updated 6 months ago

pttk • Science 44%

This is a pandoc preprocessor toolkit based on my experiment pdtmpl

Updated 6 months ago

portagetextprocessing • Science 44%

Text processing tools that came out of the Portage SMT project — Outils de traitement de texte issus du projet Portage de TAS