Projects | Open Source Science

Scientific Software

Updated 11 months ago

Pubmed Parser — Peer-reviewed • Rank 19.4 • Science 100%

Pubmed Parser: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset XML Dataset - Published in JOSS (2020)

article doi medline-xml nlp parse parser pmid pubmed-central pubmed-parser python xml

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

htmldate — Peer-reviewed • Rank 23.6 • Science 95%

htmldate: A Python package to extract publication dates from web pages - Published in JOSS (2020)

date date-parser datetime digital-forensics entity-extraction forensics-tools information-extraction metadata metadata-extraction natural-language-processing nlp opengraph web-scraping webscraping

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

TextDescriptives — Peer-reviewed • Rank 17.6 • Science 98%

TextDescriptives: A Python package for calculating a large variety of metrics from text - Published in JOSS (2023)

dependency-distance descriptive-statistics nlp python readability readability-scores spacy spacy-extension statistics syntactic-analysis

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

Fast, Consistent Tokenization of Natural Language Text — Peer-reviewed • Rank 19.9 • Science 95%

Fast, Consistent Tokenization of Natural Language Text - Published in JOSS (2018)

nlp peer-reviewed r r-package rstats text-mining tokenizer

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

Augmenty — Peer-reviewed • Rank 15.9 • Science 98%

Augmenty: A Python Library for Structured Text Augmentation - Published in JOSS (2024)

augmentation natural-language-processing nlp nlproc python spacy spacy-extension spacy-nlp text-augmentation text-classification training-data

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

textnets — Peer-reviewed • Rank 13.9 • Science 100%

textnets: A Python package for text analysis with networks - Published in JOSS (2020)

computational-social-science network-analysis nlp sociology text-analysis text-as-data visualization

Mathematics

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

WordTokenizers.jl — Peer-reviewed • Rank 13.2 • Science 95%

WordTokenizers.jl: Basic tools for tokenizing natural language in Julia - Published in JOSS (2020)

data-mining information-retrieval lexer nlp tokenization

Engineering (40%) Earth and Environmental Sciences (40%)

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

Jury — Peer-reviewed • Rank 14.8 • Science 93%

Jury: A Comprehensive Evaluation Toolkit - Published in JOSS (2024)

datasets evaluate evaluation huggingface machine-learning metrics natural-language-processing nlp nlp-evaluation python pytorch transformers

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

giotto-deep — Peer-reviewed • Rank 10.6 • Science 95%

giotto-deep: A Python Package for Topological Deep Learning - Published in JOSS (2022)

computational-geometry deep-learning image-processing nlp pytorch tda

Sociology

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

Mordecai — Peer-reviewed • Rank 12.4 • Science 93%

Mordecai: Full Text Geoparsing and Event Geocoding - Published in JOSS (2017)

geocoding geonames geoparsing nlp spacy toponym-resolution

Sociology (40%)

Scientific Software · Peer-reviewed

Scientific Software

Updated 11 months ago

Arabica — Peer-reviewed • Rank 12.3 • Science 93%

Arabica: A Python package for exploratory analysis of text data - Published in JOSS (2024)

exploratory-data-analysis nlp text-mining

Scientific Software · Peer-reviewed

Updated 9 months ago

Shekar: A Python Toolkit for Persian Natural Language Processing • Rank 11.4 • Science 93%

Shekar: A Python Toolkit for Persian Natural Language Processing - Published in JOSS (2025)

embeddings keyword-extraction lemmatization morphology named-entity-recognition natural-language-processing ner nlp nlp-library normalization part-of-speech-tagging persian persian-nlp pos spell-checker text-processing wordcloud

Scientific Software

Updated 11 months ago

gobbli — Peer-reviewed • Rank 10.5 • Science 93%

gobbli: A uniform interface to deep learning for text in Python - Published in JOSS (2021)

deep-learning docker nlp python

Scientific Software · Peer-reviewed

Updated 11 months ago

trafilatura • Rank 26.3 • Science 77%

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

article-extractor corpus-builder corpus-tools crawler html-to-markdown html2text llm news-aggregator news-crawler nlp rag readability rss-feed scraping tei text-cleaning text-extraction text-mining text-preprocessing web-scraping

Updated 11 months ago

transformers • Rank 38.7 • Science 64%

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio deep-learning deepseek gemma glm hacktoberfest llm machine-learning model-hub natural-language-processing nlp pretrained-models python pytorch pytorch-transformers qwen speech-recognition transformer vlm

Updated 11 months ago

lazyllm-llamafactory • Rank 25.6 • Science 77%

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

agent ai deepseek fine-tuning gemma gpt instruction-tuning large-language-models llama llama3 llm lora moe nlp peft qlora quantization qwen rlhf transformers

Updated 11 months ago

datasets • Rank 34.4 • Science 64%

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

ai artificial-intelligence computer-vision dataset-hub datasets deep-learning llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Updated 11 months ago

edsnlp • Rank 17.2 • Science 77%

Modular, fast NLP framework, compatible with Pytorch and spaCy, offering tailored support for French clinical notes.

clinical-data-warehouse deep-learning fast french medical multi-task nlp pytorch rule-based spacy text-mining

Updated 11 months ago

ontogpt • Rank 16.7 • Science 77%

LLM-based ontological extraction tools, including SPIRES

ai chat-gpt data-modeling gpt-3 information-extraction language-models large-language-models linkml llm monarchinitiative named-entity-recognition ner nlp oaklib obofoundry relation-extraction

Updated 11 months ago

gensim • Rank 15.8 • Science 77%

Topic Modelling for Humans

data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec

Mathematics (38%)

Updated 11 months ago

wpextract • Rank 6.1 • Science 85%

Create datasets from WordPress sites for research or archiving

corpus crawler nlp text-extraction text-mining web-scraping wordpress

Updated 11 months ago

nltk • Rank 36.5 • Science 54%

NLTK Source

machine-learning natural-language-processing nlp nltk python

Updated 10 months ago

openprompt • Rank 18.3 • Science 72%

An Open-Source Framework for Prompt-Learning.

ai deep-learning natural-language-processing natural-language-understanding nlp nlp-library nlp-machine-learning pre-trained-language-models pre-trained-model prompt prompt-based-tuning prompt-learning prompt-toolkit prompts pytorch transformer

Updated 11 months ago

pytextrank • Rank 22.2 • Science 67%

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

graph-algorithms machine-learning natural-language natural-language-processing nlp python spacy spacy-extension summarization textgraphs textrank

Updated 5 months ago

KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP • Rank 1.4 • Science 87%

KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP - Published in JOSS (2026)

julia natural-language-processing nlp text-encoding textprocessing tokenization

Updated 11 months ago

txtai • Rank 22.2 • Science 64%

💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

ai artificial-intelligence embeddings information-retrieval language-model large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search

Updated 10 months ago

flair • Rank 27.1 • Science 59%

A very simple framework for state-of-the-art Natural Language Processing (NLP)

machine-learning named-entity-recognition natural-language-processing nlp pytorch semantic-role-labeling sequence-labeling word-embeddings

Updated 11 months ago

fugashi • Rank 22.1 • Science 64%

A Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.

cython-wrapper japanese mecab nlp tokenizer

Updated 11 months ago

metanno • Rank 8.4 • Science 77%

Annotator building tool for Jupyter

annotator customizable jupyter modular nlp

Updated 11 months ago

pydata-wrangler • Rank 8.2 • Science 77%

Wrangle messy numerical, image, and text data into consistent well-organized formats

data data-analysis data-science data-wrangling hugging-face image-data machine-learning nlp numpy pandas polars python scikit-learn

Updated 11 months ago

ammico • Rank 10.1 • Science 75%

AI-based Media and Misinformation Content Analysis Tool: Analyze text and images

classification computer-vision nlp text-extraction translation

Updated 11 months ago

contextualspellcheck • Rank 16.8 • Science 67%

✔️Contextual word checker for better suggestions (not actively maintained)

bert chatbot help-wanted natural-language-processing nlp oov preprocessing python python-spelling-corrector spacy spacy-extension spellcheck spellchecker spelling-correction spelling-corrections

Updated 11 months ago

dolma • Rank 19.6 • Science 64%

Data and tools for generating and inspecting OLMo pre-training data.

data-processing large-language-models llm machile-learning nlp

Updated 11 months ago

farm-haystack • Rank 28.7 • Science 54%

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

agent agents ai gemini generative-ai gpt-4 information-retrieval large-language-models llm machine-learning nlp orchestration python pytorch question-answering rag retrieval-augmented-generation semantic-search summarization transformers

Updated 11 months ago

lexicalrichness • Rank 15.5 • Science 67%

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

data-mining data-science information-retrieval lexical-analysis lexical-analyzer linguistic-analysis natural-language natural-language-processing nlp python

Updated 11 months ago

drug-named-entity-recognition • Rank 15.1 • Science 67%

drug-discovery drugs named-entity-recognition natural-language-processing natural-language-understanding ner nlp pharma pharmaceutical pharmaceuticals

Updated 11 months ago

deepke • Rank 18.1 • Science 64%

[EMNLP 2022] An Open Toolkit for Knowledge Graph Extraction and Construction

attribute-extraction chinese deep-learning deepke document-level few-shot information-extraction instructie kg knowledge-graph knowprompt lightner low-resource multi-modal named-entity-recognition ner nlp prompt pytorch relation-extraction

Updated 11 months ago

promptsource • Rank 17.8 • Science 64%

Toolkit for creating, sharing and using natural language prompts.

machine-learning natural-language-processing nlp

Updated 11 months ago

flexrag • Rank 12.8 • Science 67%

FlexRAG: A RAG Framework for Information Retrieval and Generation.

llms nlp rag

Updated 11 months ago

cntext • Rank 12.3 • Science 67%

text analysis, supporting multiple methods including word count, readability, document similarity, sentiment analysis, Word2Vec/GloVe, and Large Language Models (LLMs).文本分析包，支持字数统计、可读性、文档相似度、情感分析在内的多种文本分析方法。

chinese content-analysis discourse-analysis glove llm nlp semantic-analysis sentiment-analysis social-science text-analysis text-mining word2vec

Updated 11 months ago

laonlp • Rank 12.3 • Science 67%

Lao language NLP

hacktoberfest lao lao-language natural-language-processing nlp nlp-library python

Updated 11 months ago

tasknet • Rank 12.0 • Science 67%

Easy modernBERT fine-tuning and multi-task learning

autotask autotrain bert dataset easy extreme-multi-task fine-tuning huggingface-transformers jiant-alternative modernbert mtl multi-task multi-task-trainer multitask nlp task-embeddings tasks templates trainer

Updated 11 months ago

adapters • Rank 24.9 • Science 54%

A Unified Library for Parameter-Efficient and Modular Transfer Learning

adapters bert lora natural-language-processing nlp parameter-efficient-learning parameter-efficient-tuning pytorch transformers

Updated 11 months ago

hanlp • Rank 24.3 • Science 54%

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

dependency-parser hanlp named-entity-recognition natural-language-processing nlp pos-tagging semantic-parsing text-classification

Updated 11 months ago

seb • Rank 11.3 • Science 67%

A Scandinavian Benchmark for sentence embeddings

benchmark low-resource-nlp natural-language-processing nlp scandinavian

Updated 11 months ago

mlconjug3 • Rank 14.1 • Science 64%

A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and Romanian (more soon) using Machine Learning techniques.

conjugation conjugator devops linguistics machine-learning nlp nlp-library nlp-machine-learning python3 test-driven-development

Updated 11 months ago

calamancy • Rank 11.1 • Science 67%

NLP pipelines for Tagalog using spaCy

computational-linguistics low-resource-languages low-resource-nlp machine-learning natural-language-processing ner nlp spacy

Updated 11 months ago

learn_prompting • Rank 13.3 • Science 64%

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

chatgpt chatgpt-api deep-learning gpt-3 gpt-4 gpt-4-api gpt3 large-language-models llm machine-learning nlp openai-api prompt-engineering prompt-toolkit prompt-tuning prompting transformers

Updated 10 months ago

py-torchtext • Rank 30.7 • Science 46%

Models, data loaders and abstractions for language processing, powered by PyTorch

data-loader dataset deep-learning models nlp pytorch

Updated 11 months ago

tasksource • Rank 11.4 • Science 64%

Datasets collection and preprocessings framework for NLP extreme multitask learning

benchmark bigbench crossfit curated-datasets dataset-collection discriminative extreme-mtl extreme-multi-task-learning glue huggingface instruction-tuning meta-learning multi-task-learning multi-task-learning-scaling natural-language-inference nlp preprocessings reward-modeling scaling text-classification

Updated 11 months ago

finmeter • Rank 7.8 • Science 67%

Tools for assessing Finnish poetry: rhymes, meter, hyphenation of Finnish and so on.

aesthetic-evaluation computational-creativity finnish hyphenation kalevala-meter metaphors nlp poetry semantics sentiment syllables

Updated 11 months ago

vulntrain • Rank 7.6 • Science 67%

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

dataset llm nlp text-generation vulnerability vulnerability-lookup

Updated 11 months ago

harmony • Rank 7.2 • Science 67%

The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.

ai data-harmonization data-science depression embedding embeddings foss harmonisation harmonization harmony help-wanted mental-health natural-language-processing nlp open-source psychology python research research-project social-sciences

Updated 11 months ago

autono • Rank 7.2 • Science 67%

A ReAct-Based Highly Robust Autonomous Agent Framework.

agent agi ai aiagent autogen autonomous-agents framework langchain llm-framework mcp mcp-agent mcp-agent-framework mcp-client multiagent nlp openai python tool-learning tool-oriented-learning transformer

Updated 11 months ago

pymusas • Rank 12.1 • Science 62%

Python Multilingual Ucrel Semantic Analysis System

natural-language-processing nlp python spacy spacy-pipeline

Updated 11 months ago

torchdistill • Rank 15.0 • Science 59%

A coding-free framework built on PyTorch for reproducible deep learning studies. PyTorch Ecosystem. 🏆26 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.

amazon-sagemaker-lab cifar10 cifar100 coco colab-notebook glue google-colab image-classification imagenet knowledge-distillation natural-language-processing nlp object-detection pascal-voc pytorch pytorch-ecosystem semantic-segmentation text-classification transformer

Updated 11 months ago

simplemma • Rank 16.8 • Science 57%

Simple multilingual lemmatizer for Python, especially useful for speed and efficiency

corpus-tools language-detection language-identification lemmatiser lemmatization lemmatizer low-resource-nlp morphological-analysis nlp tokenization tokenizer wordlist

Updated 11 months ago

spacy • Rank 37.7 • Science 36%

💫 Industrial-strength Natural Language Processing (NLP) in Python

ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy text-classification tokenization

Updated 11 months ago

transformers-interpret • Rank 19.3 • Science 54%

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

captum computer-vision deep-learning explainable-ai interpretability machine-learning model-explainability natural-language-processing neural-network nlp transformers transformers-model

Updated 11 months ago

bio-epidemiology-ner • Rank 5.9 • Science 67%

Recognize bio-medical entities from a text corpus

biomedical epidemiology ner nlp transformers

Updated 11 months ago

text2vec • Rank 17.9 • Science 54%

text2vec, text to vector. 文本向量表征工具，把文本转化为向量矩阵，实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型，开箱即用。

embeddings nlp sentence-embeddings similarity text-similarity text2vec word2vec

Updated 11 months ago

tango • Rank 17.4 • Science 54%

Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.

ai machine-learning nlp python python3 pytorch

Updated 11 months ago

huspacy • Rank 7.3 • Science 64%

HuSpaCy: industrial-strength Hungarian natural language processing

dependency-parsing hungarian hunlp huspacy information-extraction lemmatization machine-learning morphological-analysis named-entity-recognition natural-language-processing ner nlp pos-tagger python spacy spacy-models spacy-pipeline text-mining universal-dependencies

Updated 11 months ago

span-marker • Rank 17.2 • Science 54%

SpanMarker for Named Entity Recognition

huggingface ner nlp spacy spacy-extension transformers

Updated 11 months ago

deepsearch-toolkit • Rank 16.5 • Science 54%

Interact with the Deep Search platform for new knowledge explorations and discoveries

accelerated-discovery deepsearch knowledge-extraction knowledge-graph nlp pdf-converter python rag semantic-retrieval

Updated 11 months ago

emnlp23-paraphrase-types • Rank 2.6 • Science 67%

The official implementation of the EMNLP 2023 paper "Paraphrase Types for Generation and Detection"

chatgpt llama nlp paraphrase types

Updated 10 months ago

english-text-normalization • Rank 1.8 • Science 67%

Command-line interface (CLI) and library to normalize English texts.

nlp preprocessing text-normalization tts

Updated 11 months ago

fastrag • Rank 14.7 • Science 54%

Efficient Retrieval Augmentation and Generation Framework

benchmark colbert diffusion generative-ai information-retrieval knowledge-graph llm multi-modal nlp question-answering semantic-search sentence-transformers summarization transformers

Updated 10 months ago

scattertext • Rank 19.4 • Science 49%

Beautiful visualizations of how language differs among document types.

computational-social-science d3 eda exploratory-data-analysis japanese-language machine-learning natural-language-processing nlp scatter-plot semiotic-squares sentiment stylometric stylometry text-as-data text-mining text-visualization topic-modeling visualization word-embeddings word2vec

Updated 11 months ago

tokenizers • Rank 14.0 • Science 54%

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Updated 11 months ago

noisy-sentences-dataset • Rank 0.7 • Science 67%

550K sentences in 5 European languages augmented with noise for training and evaluating spell correction tools or machine learning models.

dataset natural-language-processing nlp

Updated 11 months ago

subtitle-word-frequencies • Rank 0.7 • Science 67%

Analyse word frequencies from webVTT subtitles

nlp word-frequency

Updated 10 months ago

Egret • Rank 18.2 • Science 49%

Tools for building power systems optimization problems

energy-system milp minlp nlp optimization power powerflow python snl-applications snl-science-libs

Updated 11 months ago

negativas • Rank 0.0 • Science 67%

negativas, uma ferramenta para auxiliar na busca e classificação de negações sentenciais no Português Brasileiro.

linguistics nlp spacy

Updated 9 months ago

https://github.com/google-research/retvec • Rank 12.3 • Science 54%

RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.

deep-learning natural-language-processing nlp python tensorflow text-classification

Updated 11 months ago

classy-classification • Rank 12.3 • Science 54%

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.

few-shot-classifcation hacktoberfest machine-learning natural-language-processing nlp nlu sentence-transformers spacy text-classification

Updated 11 months ago

pytextclassifier • Rank 12.2 • Science 54%

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，BERT等分类模型实现，开箱即用。

bert classification focalloss-pytorch hierarchical machine-learning nlp pytextclassifier python pytorch softmax text-classification text-classifier

Updated 11 months ago

thinc • Rank 12.2 • Science 54%

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

ai artificial-intelligence deep-learning functional-programming jax machine-learning machine-learning-library mxnet natural-language-processing nlp python pytorch spacy tensorflow type-checking

Updated 11 months ago

summertime • Rank 11.8 • Science 54%

An open-source text summarization toolkit for non-experts. EMNLP'2021 Demo

deep-learning neural-networks nlp python pytorch text-summarization

Updated 11 months ago

@stdlib/nlp • Rank 21.4 • Science 44%

Standard library natural language processing.

javascript language lib library linguistics modeling natural nlp node node-js nodejs standard stdlib

Updated 11 months ago

banks • Rank 21.3 • Science 44%

LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. It allows attaching metadata to prompts to ease their management, and versioning is first-class citizen. Banks provides ways to store prompts on disk along with their metadata.

chatgpt llm nlp openai prompt-engineering prompt-toolkit

Updated 11 months ago

detoxify • Rank 21.0 • Science 44%

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

bert bert-model hate-speech hate-speech-detection hatespeech huggingface huggingface-transformers kaggle-competition nlp pytorch-lightning sentence-classification toxic-comment-classification toxic-comments toxicity toxicity-classification

Updated 11 months ago

chinese-llama-alpaca-2 • Rank 11.0 • Science 54%

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

64k alpaca alpaca-2 alpaca2 flash-attention large-language-models llama llama-2 llama2 llm nlp rlhf yarn

Updated 11 months ago

frame-semantic-transformer • Rank 10.9 • Science 54%

Frame Semantic Parser based on T5 and FrameNet

framenet huggingface nlp semantic-parsing t5 transformers

Updated 11 months ago

@stdlib/nlp-porter-stemmer • Rank 7.5 • Science 57%

Extract the stem of a given word.

javascript nlp node node-js nodejs stdlib stem stemming util utilities utility utils word

Updated 11 months ago

transformer-srl • Rank 10.4 • Science 54%

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

allennlp bert conll2012 dataset labeling natural-language-processing nlp propbank pytorch role semantic semantic-role-labeling shi span srl srl-annotations srltagger transformer transformers verbatlas

Updated 11 months ago

dkpro-cassis • Rank 7.0 • Science 57%

UIMA CAS processing library written in Python

annotation cas nlp python uima

Updated 10 months ago

https://github.com/astrazeneca/kazu • Rank 14.7 • Science 49%

Fast, world class biomedical NER

biomedical-text-mining natural-language-processing nlp

Updated 11 months ago

polydedupe • Rank 6.4 • Science 57%

PolyDeDupe: Multi-Lingual Data Deduplication

data-deduplication multilingual nlp

Updated 11 months ago

gismo • Rank 9.2 • Science 54%

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

nlp package python research

Updated 10 months ago

textblob • Rank 27.2 • Science 36%

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

natural-language-processing nlp nltk pattern python python-3

Updated 11 months ago

question_generation • Rank 8.7 • Science 54%

Neural question generation using transformers

deep-learning natural-language-generation natural-language-processing nlg nlp question-generation t5 transformer

Updated 11 months ago

@stdlib/datasets-liu-positive-opinion-words-en • Rank 5.4 • Science 57%

A list of positive opinion words.

data dataset datasets emotion emotive javascript language lexicon list nlp node node-js nodejs opinion positive sample sentiment stdlib subjectivity words

Updated 11 months ago

wn • Rank 17.8 • Science 44%

A modern, interlingual wordnet interface for Python

dictionary lexicon library nlp python-library wordnet wordnets

Updated 11 months ago

langfun • Rank 17.8 • Science 44%

OO for LLMs

framework llms nlp

Updated 11 months ago

datasets-liu-negative-opinion-words-en • Rank 4.7 • Science 57%

A list of negative opinion words.

data dataset datasets emotion emotive javascript language lexicon list negative nlp node node-js nodejs opinion sample sentiment stdlib subjectivity words

Updated 10 months ago

text • Rank 15.7 • Science 46%

Using Transformers from HuggingFace in R

deep-learning machine-learning nlp transformers

Updated 11 months ago

chinese-mixtral • Rank 7.1 • Science 54%

中文Mixtral混合专家大模型（Chinese Mixtral MoE LLMs）

32k 64k large-language-models llm mixtral mixture-of-experts moe nlp

Updated 10 months ago

https://github.com/datamade/usaddress • Rank 24.9 • Science 36%

:us: a python library for parsing unstructured United States address strings into address components

address address-parser conditional-random-fields crf machine-learning natural-language-processing nlp parserator python python-library

Updated 11 months ago

turkish-question-generation • Rank 3.9 • Science 57%

Automated question generation and question answering from Turkish texts using text-to-text transformers

arxiv mt5 multilingual neptune-ai nlp question-answering question-generation t5 transformers turkish wandb xquad