Scientific Software
Updated 6 months ago

tidytext — Peer-reviewed • Rank 22.7 • Science 95%

tidytext: Text Mining and Analysis Using Tidy Data Principles in R - Published in JOSS (2016)

Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

Fast, Consistent Tokenization of Natural Language Text — Peer-reviewed • Rank 19.9 • Science 95%

Fast, Consistent Tokenization of Natural Language Text - Published in JOSS (2018)

Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

LISC — Peer-reviewed • Rank 11.2 • Science 100%

LISC: A Python Package for Scientific Literature Collection and Analysis - Published in JOSS (2019)

Scientific Software
Updated 6 months ago

jstor — Peer-reviewed • Rank 16.6 • Science 93%

jstor: Import and Analyse Data from Scientific Texts - Published in JOSS (2018)

Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

TRUNAJOD — Peer-reviewed • Rank 12.2 • Science 95%

TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)

Scientific Software
Updated 6 months ago

seesus — Peer-reviewed • Rank 6.9 • Science 100%

seesus: a social, environmental, and economic sustainability classifier for Python - Published in JOSS (2024)

Scientific Software
Updated 6 months ago

Arabica — Peer-reviewed • Rank 12.3 • Science 93%

Arabica: A Python package for exploratory analysis of text data - Published in JOSS (2024)

Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

ldaPrototype — Peer-reviewed • Rank 8.4 • Science 95%

ldaPrototype: A method in R to get a Prototype of multiple Latent Dirichlet Allocations - Published in JOSS (2020)

Updated 6 months ago

trafilatura • Rank 26.3 • Science 77%

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Scientific Software
Updated 6 months ago

SDGdetector — Peer-reviewed • Rank 9.3 • Science 93%

SDGdetector: an R-based text mining tool for quantifying efforts toward Sustainable Development Goals - Published in JOSS (2023)

Scientific Software
Updated 6 months ago

EndoMineR for the extraction of endoscopic and associated pathology data from medical reports — Peer-reviewed • Rank 4.2 • Science 93%

EndoMineR for the extraction of endoscopic and associated pathology data from medical reports - Published in JOSS (2018)

Updated 6 months ago

edsnlp • Rank 17.2 • Science 77%

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

Updated 6 months ago

wpextract • Rank 6.1 • Science 85%

Create datasets from WordPress sites for research or archiving

Updated 6 months ago

cntext • Rank 12.3 • Science 67%

text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包,支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。

Updated 6 months ago

uk.ac.cam.ch.wwmm.oscar • Rank 11.0 • Science 49%

OSCAR (Open Source Chemistry Analysis Routines) is an open source extensible system for the automated annotation of chemistry in scientific articles.

Updated 6 months ago

nlppln • Rank 4.5 • Science 54%

NLP pipeline software using common workflow language

Scientific Software
Updated 6 months ago

CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database — Peer-reviewed • Rank 8.8 • Science 49%

CAZy-parser a way to extract information from the Carbohydrate-Active enZYmes Database - Published in JOSS (2016)

Biology (34%)
Scientific Software · Peer-reviewed
Updated 6 months ago

packFinder • Rank 12.6 • Science 33%

A package for the de novo discovery of pack-TYPE transposons

Updated 6 months ago

textexplorer • Rank 1.4 • Science 44%

A tool designed for the exploration, analysis, and comparison of textual data variants.

Updated 5 months ago

https://github.com/bluebrain/search • Rank 11.4 • Science 33%

Blue Brain text mining toolbox for semantic search and structured information extraction

Updated 6 months ago

R.temis • Rank 17.5 • Science 26%

R.TeMiS: R Text Mining Solution

Updated 6 months ago

qdap • Rank 18.4 • Science 23%

Quantitative Discourse Analysis Package: Bridging the gap between qualitative data and quantitative analysis

Updated 5 months ago

https://github.com/cthoyt/onto2nx • Rank 7.4 • Science 33%

Converts OWL ontologies and OBO to NetworkX Graphs

Updated 6 months ago

LDAvis • Rank 15.9 • Science 23%

R package for web-based interactive topic model visualization.

Updated 5 months ago

https://github.com/caimeng2/uniscraper • Rank 4.7 • Science 23%

A universal scraper that grabs text from multiple types of webpages.

Updated 6 months ago

ngram • Rank 16.6 • Science 10%

Fast n-Gram Tokenization

Updated 6 months ago

chemdataextractor • Rank 13.4 • Science 10%

Automatically extract chemical information from scientific documents

Updated 5 months ago

textstem • Rank 12.7 • Science 10%

Tools for fast text stemming & lemmatization

Updated 6 months ago

sentimentpy • Rank 2.3 • Science 18%

A Python port of the #rstats sentimentr package

Updated 5 months ago

quran • Rank 8.7 • Science 10%

📖 An R package for the complete text of the Qur'an

Updated 5 months ago

scripturs • Rank 8.6 • Science 10%

📖 An R package for the complete LDS Scriptures

Updated 5 months ago

hcandersenr • Rank 7.7 • Science 10%

An R Package for H.C. Andersens fairy tales

Updated 5 months ago

pubchunks • Rank 3.2 • Science 13%

:warning: ARCHIVED :warning: Get chunks of XML format scholarly articles

Updated 5 months ago

https://github.com/adbar/german-nlp • Science 36%

Curated list of open-access/open-source/off-the-shelf resources and tools developed with a particular focus on German

Updated 6 months ago

snap-umls-clusters • Science 36%

Master Thesis Project in Arab American University Palestine with Palestinian Neuro Initiative Educational Research Center - Clustering medical sentences based on Unified Medical Language System (UMLS) terms and expanded UMLS terms present in them

Updated 6 months ago

qtm • Science 49%

QTLTableMiner++ tool for mining tables in scientific articles

Updated 5 months ago

acep • Science 23%

Análisis Computacional de Eventos de Protesta (ACEP). Computer-Aided Protest Event Analysis (CAPEA)

Updated 6 months ago

orange-story-navigator • Science 67%

Add-on to the Orange3 data mining toolkit with text processing widgets from the project Navigating Stories

Updated 5 months ago

https://github.com/cedergrouphub/materialparser • Science 23%

Utility to compile string of chemical terms into data structure with chemical formula and composition

Updated 6 months ago

iramuteqlike • Science 26%

💬⛏️ IRaMuTeQ Software Analyses in R

Scientific Software
Updated 6 months ago

Jabberwocky — Peer-reviewed • Science 93%

Jabberwocky: an ontology-aware toolkit for manipulating text - Published in JOSS (2020)

Artificial Intelligence and Machine Learning
Scientific Software · Peer-reviewed
Updated 6 months ago

supermat • Science 57%

Superconductors material dataset

Updated 5 months ago

https://github.com/brucewlee/wiki-text-summarizer-keyword-extractor • Science 13%

Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one beautiful code. A simple but effective solution to extractive text summarization.

Updated 6 months ago

architxt • Science 44%

ArchiTXT is an open source Python library that transforms unstructured text into structured, searchable, and AI-ready data. It enables automated database generation and seamless data integration.

Updated 6 months ago

corpusexplorer.terminal.console • Science 44%

Erlaubt anderen Programmen/Programmiersprachen den Zugriff auf Analysen/Daten des CorpusExplorer v2.0