Projects | Open Source Science

Scientific Software

Updated 10 months ago

htmldate — Peer-reviewed • Rank 23.6 • Science 95%

htmldate: A Python package to extract publication dates from web pages - Published in JOSS (2020)

date date-parser datetime digital-forensics entity-extraction forensics-tools information-extraction metadata metadata-extraction natural-language-processing nlp opengraph web-scraping webscraping

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

tidytext — Peer-reviewed • Rank 22.7 • Science 95%

tidytext: Text Mining and Analysis Using Tidy Data Principles in R - Published in JOSS (2016)

natural-language-processing r text-mining tidy-data tidyverse

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

Augmenty — Peer-reviewed • Rank 15.9 • Science 98%

Augmenty: A Python Library for Structured Text Augmentation - Published in JOSS (2024)

augmentation natural-language-processing nlp nlproc python spacy spacy-extension spacy-nlp text-augmentation text-classification training-data

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

Talisman — Peer-reviewed • Rank 20.9 • Science 93%

Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks - Published in JOSS (2020)

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

Jury — Peer-reviewed • Rank 14.8 • Science 93%

Jury: A Comprehensive Evaluation Toolkit - Published in JOSS (2024)

datasets evaluate evaluation huggingface machine-learning metrics natural-language-processing nlp nlp-evaluation python pytorch transformers

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

TRUNAJOD — Peer-reviewed • Rank 12.2 • Science 95%

TRUNAJOD: A text complexity library to enhance natural language processing - Published in JOSS (2021)

coherence cohesion entity-graph lexical-diversity natural-language-processing readability-metrics semantic-measurements spacy spacy-extensions text-analysis text-mining text-processing ttr type-token-ratio

Engineering

Scientific Software · Peer-reviewed

Scientific Software

Updated 10 months ago

pygamma-agreement — Peer-reviewed • Rank 11.7 • Science 93%

pygamma-agreement: Gamma γ measure for inter/intra-annotator agreement in Python - Published in JOSS (2021)

agreement annotation-tool annotations gamma-agreement natural-language-processing speech

Mathematics

Scientific Software · Peer-reviewed

Updated 8 months ago

Shekar: A Python Toolkit for Persian Natural Language Processing • Rank 11.4 • Science 93%

Shekar: A Python Toolkit for Persian Natural Language Processing - Published in JOSS (2025)

embeddings keyword-extraction lemmatization morphology named-entity-recognition natural-language-processing ner nlp nlp-library normalization part-of-speech-tagging persian persian-nlp pos spell-checker text-processing wordcloud

Updated 10 months ago

transformers • Rank 38.7 • Science 64%

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio deep-learning deepseek gemma glm hacktoberfest llm machine-learning model-hub natural-language-processing nlp pretrained-models python pytorch pytorch-transformers qwen speech-recognition transformer vlm

Updated 10 months ago

datasets • Rank 34.4 • Science 64%

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

ai artificial-intelligence computer-vision dataset-hub datasets deep-learning llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

Updated 10 months ago

flaml • Rank 29.9 • Science 64%

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

automated-machine-learning automl classification data-science deep-learning finetuning hyperparam hyperparameter-optimization jupyter-notebook machine-learning natural-language-generation natural-language-processing python random-forest regression scikit-learn tabular-data timeseries-forecasting tuning

Updated 10 months ago

gensim • Rank 15.8 • Science 77%

Topic Modelling for Humans

data-mining data-science document-similarity fasttext gensim information-retrieval machine-learning natural-language-processing neural-network nlp python topic-modeling word-embeddings word-similarity word2vec

Mathematics (38%)

Updated 10 months ago

pythainlp • Rank 24.6 • Science 67%

Thai natural language processing in Python

computational-linguistics hacktoberfest natural-language-processing nlp-library python soundex text-processing thai thai-language thai-nlp thai-nlp-library thai-soundex word-segmentation

Updated 10 months ago

nltk • Rank 36.5 • Science 54%

NLTK Source

machine-learning natural-language-processing nlp nltk python

Updated 10 months ago

openprompt • Rank 18.3 • Science 72%

An Open-Source Framework for Prompt-Learning.

ai deep-learning natural-language-processing natural-language-understanding nlp nlp-library nlp-machine-learning pre-trained-language-models pre-trained-model prompt prompt-based-tuning prompt-learning prompt-toolkit prompts pytorch transformer

Updated 10 months ago

pytextrank • Rank 22.2 • Science 67%

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

graph-algorithms machine-learning natural-language natural-language-processing nlp python spacy spacy-extension summarization textgraphs textrank

Updated 4 months ago

KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP • Rank 1.4 • Science 87%

KeemenaPreprocessing.jl: Unicode-Robust Cleaning, Multi-Level Tokenisation & Streaming Offset Bundling for Julia NLP - Published in JOSS (2026)

julia natural-language-processing nlp text-encoding textprocessing tokenization

Updated 10 months ago

flair • Rank 27.1 • Science 59%

A very simple framework for state-of-the-art Natural Language Processing (NLP)

machine-learning named-entity-recognition natural-language-processing nlp pytorch semantic-role-labeling sequence-labeling word-embeddings

Updated 10 months ago

contextualspellcheck • Rank 16.8 • Science 67%

✔️Contextual word checker for better suggestions (not actively maintained)

bert chatbot help-wanted natural-language-processing nlp oov preprocessing python python-spelling-corrector spacy spacy-extension spellcheck spellchecker spelling-correction spelling-corrections

Updated 10 months ago

nlpo3 • Rank 15.7 • Science 67%

Thai natural language processing library in Rust, with Python and Node bindings.

hacktoberfest natural-language-processing nodejs python rust text-processing thai-language tokenizer

Updated 10 months ago

lexicalrichness • Rank 15.5 • Science 67%

:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).

data-mining data-science information-retrieval lexical-analysis lexical-analyzer linguistic-analysis natural-language natural-language-processing nlp python

Updated 10 months ago

drug-named-entity-recognition • Rank 15.1 • Science 67%

drug-discovery drugs named-entity-recognition natural-language-processing natural-language-understanding ner nlp pharma pharmaceutical pharmaceuticals

Updated 10 months ago

promptsource • Rank 17.8 • Science 64%

Toolkit for creating, sharing and using natural language prompts.

machine-learning natural-language-processing nlp

Updated 10 months ago

inseq • Rank 13.6 • Science 67%

Interpretability for sequence generation models 🐛 🔍

attribution-methods captum deep-learning explainable-ai generative-ai huggingface interpretability language-generation language-model large-language-models natural-language-processing sequence-to-sequence transformers

Updated 10 months ago

laonlp • Rank 12.3 • Science 67%

Lao language NLP

hacktoberfest lao lao-language natural-language-processing nlp nlp-library python

Updated 10 months ago

adapters • Rank 24.9 • Science 54%

A Unified Library for Parameter-Efficient and Modular Transfer Learning

adapters bert lora natural-language-processing nlp parameter-efficient-learning parameter-efficient-tuning pytorch transformers

Updated 10 months ago

imodelsx • Rank 14.5 • Science 64%

Interpret text data using LLMs (scikit-learn compatible).

ai deep-learning explainability huggingface interpretability language-model machine-learning ml natural-language-processing natural-language-understanding neural-network pytorch scikit-learn text text-classification transformer-models xai

Updated 10 months ago

hanlp • Rank 24.3 • Science 54%

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

dependency-parser hanlp named-entity-recognition natural-language-processing nlp pos-tagging semantic-parsing text-classification

Updated 10 months ago

seb • Rank 11.3 • Science 67%

A Scandinavian Benchmark for sentence embeddings

benchmark low-resource-nlp natural-language-processing nlp scandinavian

Updated 10 months ago

calamancy • Rank 11.1 • Science 67%

NLP pipelines for Tagalog using spaCy

computational-linguistics low-resource-languages low-resource-nlp machine-learning natural-language-processing ner nlp spacy

Updated 10 months ago

catalyst • Rank 23.8 • Science 54%

Accelerated deep learning R&D

computer-vision deep-learning distributed-computing image-classification image-processing image-segmentation information-retrieval infrastructure machine-learning metric-learning natural-language-processing object-detection python pytorch recommender-system reinforcement-learning reproducibility research text-classification text-segmentation

Updated 10 months ago

asreview • Rank 19.2 • Science 57%

Active learning for systematic reviews

active-learning asreview deep-learning language-model learning-algorithms literature llm natural-language-processing neural-network research systematic-literature-reviews systematic-reviews utrecht-university

Updated 10 months ago

https://github.com/autogluon/autogluon • Rank 29.8 • Science 46%

Fast and Accurate ML in 3 Lines of Code

autogluon automated-machine-learning automl computer-vision data-science deep-learning ensemble-learning forecasting gluon hyperparameter-optimization machine-learning natural-language-processing object-detection python pytorch scikit-learn structured-data tabular-data time-series transfer-learning

Updated 10 months ago

syntaxmaker • Rank 8.5 • Science 67%

The NLG tool for Finnish

finnish finnish-language inflection morphological-generation natural-language-generation natural-language-processing nlg python-library surface-realization syntax syntax-maker

Updated 10 months ago

harmony • Rank 7.2 • Science 67%

The Harmony Python library: a research tool for psychologists to harmonise data and questionnaire items. Open source.

ai data-harmonization data-science depression embedding embeddings foss harmonisation harmonization harmony help-wanted mental-health natural-language-processing nlp open-source psychology python research research-project social-sciences

Updated 10 months ago

pymusas • Rank 12.1 • Science 62%

Python Multilingual Ucrel Semantic Analysis System

natural-language-processing nlp python spacy spacy-pipeline

Updated 10 months ago

torchdistill • Rank 15.0 • Science 59%

A coding-free framework built on PyTorch for reproducible deep learning studies. PyTorch Ecosystem. 🏆26 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.

amazon-sagemaker-lab cifar10 cifar100 coco colab-notebook glue google-colab image-classification imagenet knowledge-distillation natural-language-processing nlp object-detection pascal-voc pytorch pytorch-ecosystem semantic-segmentation text-classification transformer

Updated 10 months ago

spacy • Rank 37.7 • Science 36%

💫 Industrial-strength Natural Language Processing (NLP) in Python

ai artificial-intelligence cython data-science deep-learning entity-linking machine-learning named-entity-recognition natural-language-processing neural-network neural-networks nlp nlp-library python spacy text-classification tokenization

Updated 10 months ago

transformers-interpret • Rank 19.3 • Science 54%

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

captum computer-vision deep-learning explainable-ai interpretability machine-learning model-explainability natural-language-processing neural-network nlp transformers transformers-model

Updated 10 months ago

huspacy • Rank 7.3 • Science 64%

HuSpaCy: industrial-strength Hungarian natural language processing

dependency-parsing hungarian hunlp huspacy information-extraction lemmatization machine-learning morphological-analysis named-entity-recognition natural-language-processing ner nlp pos-tagger python spacy spacy-models spacy-pipeline text-mining universal-dependencies

Updated 10 months ago

nlp-progress • Rank 15.8 • Science 54%

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

dialogue machine-learning machine-translation named-entity-recognition natural-language-processing nlp-tasks

Updated 10 months ago

mishkal • Rank 15.3 • Science 54%

Mishkal is an arabic text vocalization software

arabic natural-language-processing python webapp

Updated 10 months ago

scattertext • Rank 19.4 • Science 49%

Beautiful visualizations of how language differs among document types.

computational-social-science d3 eda exploratory-data-analysis japanese-language machine-learning natural-language-processing nlp scatter-plot semiotic-squares sentiment stylometric stylometry text-as-data text-mining text-visualization topic-modeling visualization word-embeddings word2vec

Updated 10 months ago

tokenizers • Rank 14.0 • Science 54%

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Updated 10 months ago

noisy-sentences-dataset • Rank 0.7 • Science 67%

550K sentences in 5 European languages augmented with noise for training and evaluating spell correction tools or machine learning models.

dataset natural-language-processing nlp

Updated 9 months ago

https://github.com/google-research/retvec • Rank 12.3 • Science 54%

RETVec is an efficient, multilingual, and adversarially-robust text vectorizer.

deep-learning natural-language-processing nlp python tensorflow text-classification

Updated 10 months ago

classy-classification • Rank 12.3 • Science 54%

This repository contains an easy and intuitive approach to few-shot classification using sentence-transformers or spaCy models, or zero-shot classification with Huggingface.

few-shot-classifcation hacktoberfest machine-learning natural-language-processing nlp nlu sentence-transformers spacy text-classification

Updated 10 months ago

thinc • Rank 12.2 • Science 54%

🔮 A refreshing functional take on deep learning, compatible with your favorite libraries

ai artificial-intelligence deep-learning functional-programming jax machine-learning machine-learning-library mxnet natural-language-processing nlp python pytorch spacy tensorflow type-checking

Updated 10 months ago

transformer-srl • Rank 10.4 • Science 54%

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

allennlp bert conll2012 dataset labeling natural-language-processing nlp propbank pytorch role semantic semantic-role-labeling shi span srl srl-annotations srltagger transformer transformers verbatlas

Updated 10 months ago

ml-visuals • Rank 10.4 • Science 54%

🎨 ML Visuals contains figures and templates which you can reuse and customize to improve your scientific writing.

artificial-intelligence deep-learning design machine-learning natural-language-processing

Updated 10 months ago

https://github.com/astrazeneca/kazu • Rank 14.7 • Science 49%

Fast, world class biomedical NER

biomedical-text-mining natural-language-processing nlp

Updated 10 months ago

textblob • Rank 27.2 • Science 36%

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

natural-language-processing nlp nltk pattern python python-3

Updated 10 months ago

question_generation • Rank 8.7 • Science 54%

Neural question generation using transformers

deep-learning natural-language-generation natural-language-processing nlg nlp question-generation t5 transformer

Updated 10 months ago

shiba-model • Rank 8.7 • Science 54%

Pytorch implementation and pre-trained Japanese model for CANINE, the efficient character-level transformer.

deep-learning natural-language-processing neural-network

Updated 10 months ago

matchzoo • Rank 15.2 • Science 46%

Facilitating the design, comparison and sharing of deep text matching models.

deep-learning matching natural-language-processing neural-network text text-matching

Updated 10 months ago

https://github.com/datamade/usaddress • Rank 24.9 • Science 36%

:us: a python library for parsing unstructured United States address strings into address components

address address-parser conditional-random-fields crf machine-learning natural-language-processing nlp parserator python python-library

Updated 10 months ago

zensols-mimicsid • Rank 5.4 • Science 54%

MIMIC-III corpus parsing and section prediction with MedSecId (COLING paper)

biomedical clinical docker medical mimic-iii natural-language-processing parsers

Updated 10 months ago

obsei • Rank 15.0 • Science 44%

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .

anonymization artificial-intelligence business-process-automation customer-engagement customer-support issue-tracking-system low-code lowcode natural-language-processing nlp process-automation python sentiment-analysis social-listening social-network-analysis text-analysis text-analytics text-classification workflow workflow-automation

Updated 10 months ago

asent • Rank 14.9 • Science 44%

Asent is a python library for performing efficient and transparent sentiment analysis using spaCy.

interpretability natural-language-processing nlp python3 sentiment-analysis spacy spacy-extensions

Updated 10 months ago

text2sdg • Rank 12.0 • Science 46%

Detect UN Sustainable Development Goals in Text

natural-language-processing sustainability sustainable-development sustainable-development-goals

Updated 10 months ago

stripnet • Rank 8.7 • Science 49%

STriP Net: Semantic Similarity of Scientific Papers (S3P) Network

data-science natural-language-processing network-analysis nlp research scientific-publications semantic-similarity topic-modeling

Updated 10 months ago

dacy • Rank 13.3 • Science 44%

DaCy: The State of the Art Danish NLP pipeline using SpaCy

danish-language natural-language-processing reproducible-workflows spacy

Updated 10 months ago

forte • Rank 16.2 • Science 41%

Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/

data-processing deep-learning information-retrieval machine-learning natural-language natural-language-processing pipeline python text-data

Updated 10 months ago

https://github.com/makcedward/nlpaug • Rank 24.0 • Science 33%

Data augmentation for NLP

adversarial-attacks adversarial-example ai artificial-intelligence augmentation data-science machine-learning ml natural-language-processing nlp

Updated 10 months ago

conversationalign • Rank 9.6 • Science 46%

An R package for analyzing linguistic alignment between partners in conversation transcripts

communication conversation dyadic-data language natural-language-processing psycholinguistics

Updated 10 months ago

sense2vec • Rank 18.5 • Science 36%

🦆 Contextually-keyed word vectors

gensim gensim-word2vec machine-learning natural-language-processing nlp python sense2vec spacy word2vec

Updated 10 months ago

ua-datasets • Rank 9.3 • Science 44%

A collection of datasets for Ukrainian language

dataset natural-language-processing nlp nlp-datasets question-answering text-classification token-classification ukrainian-language

Updated 10 months ago

odin-slides • Rank 7.8 • Science 44%

This is an advanced Python tool that empowers you to effortlessly draft customizable PowerPoint slides using the Generative Pre-trained Transformer (GPT) of your choice. Leveraging the capabilities of Large Language Models (LLM), odin-slides enables you to turn the lengthiest Word documents into well organized presentations.

ai-assistant chatgpt-api generative-ai hacktoberfest large-language-models machine-learning natural-language-processing openai-api portfolio powerpoint powerpoint-automation pptx presentation-tools productivity-tool productivity-tools prompt-engineering rag slide-generator slides writing-tool

Updated 10 months ago

zensols-deepnlp • Rank 7.5 • Science 44%

Deep learning utility library for natural language processing (NLP-OSS paper)

deep-learning deep-neural-networks framework natural-language-processing nlp

Updated 10 months ago

concise-concepts • Rank 7.3 • Science 44%

This repository contains an easy and intuitive approach to few-shot NER using most similar expansion over spaCy embeddings. Now with entity scoring.

few-shot-classifcation gensim hacktoberfest machine-learning natural-language-processing ner nlp spacy

Updated 10 months ago

crosslingual-coreference • Rank 6.4 • Science 44%

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

coreference coreference-resolution hacktoberfest natural-language-processing nlp python spacy

Updated 10 months ago

argilla • Rank 13.1 • Science 36%

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning

Updated 10 months ago

cherche • Rank 7.6 • Science 41%

Neural Search

bm25 flashtext information-retrieval machine-learning natural-language-processing neural-networks neural-search nlp question-answering reader retrieval search searching semantic-search vector-search

Updated 10 months ago

odinrunes • Rank 4.5 • Science 44%

Odin Runes, a java-based GPT client, facilitates interaction with your preferred GPT model right through your favorite text editor. There is more: It also facilitates prompt-engineering by extracting context from diverse sources using technologies such as OCR, enhancing overall productivity and saving costs.

ai-assistant chatbot-ui chatgpt-app code-assistant custom-gpt gpt-client hacktoberfest llama3 natural-language-processing ollama ollama-client ollama-gui ollama-interface ollama-ui productivity productivity-tool prompt-engineering prompt-toolkit rag writing-tool

Updated 10 months ago

natural-language-processing • Rank 4.0 • Science 44%

Fundamentals of natural language processing with Python

carpentries-incubator english lesson natural-language-processing nlp pre-alpha python

Updated 10 months ago

odin-tabs • Rank 3.0 • Science 44%

The Odin Tabs extension is a browser extension that allows you to navigate through your browser tabs using speech recognition and the Large Language Model (LLM) of your choice.

accessibility artificial-intelligence assistive-technology chatgpt-api chrome-extension interaction-design large-language-models machine-learning natural-language-processing openai-api portfolio productivity-tools rag speech-to-text tab-management tab-navigation ui user-interface web-accessibility web-automation

Updated 10 months ago

knowurenvironment • Rank 2.4 • Science 44%

Official release of KnowUREnvironment, a knowledge graph on climate change and related environmental issues. Paper link: https://www.climatechange.ai/papers/aaaifss2022/3

climate-change knowledge-graph natural-language-processing

Updated 10 months ago

zensols-nlparse • Rank 1.6 • Science 44%

Natural language processing parsing and tool library

natural-language-processing nlp-machine-learning pypi-badge pypi-link spacy spacy-nlp

Updated 10 months ago

https://github.com/cedrickchee/awesome-transformer-nlp • Rank 9.3 • Science 36%

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.

attention-mechanism awesome awesome-list bert chatgpt gpt-2 gpt-3 gpt-4 language-model llama natural-language-processing neural-networks nlp pre-trained-language-models transfer-learning transformer xlnet

Updated 10 months ago

https://github.com/percevalw/nlstruct • Rank 9.3 • Science 36%

Natural language structuring library

deep-learning machine-learning natural-language-processing notebook python structured-data

Updated 10 months ago

pyonmttok • Rank 18.5 • Science 26%

Fast and customizable text tokenization library with BPE and SentencePiece support

bpe cpp icu machine-translation natural-language-processing python sentencepiece tokenization tokenizer unicode

Updated 10 months ago

https://github.com/bluebrain/search • Rank 11.4 • Science 33%

Blue Brain text mining toolbox for semantic search and structured information extraction

deep-learning machine-learning natural-language-processing nlp python text-mining

Updated 10 months ago

https://github.com/brucewlee/lingfeat • Rank 4.9 • Science 39%

[EMNLP 2021] LingFeat - A Comprehensive Linguistic Features Extraction ToolKit for Readability Assessment

discourse feature-extraction flesch-kincaid lexical-analysis linguistic-analysis natural-language-processing nlp readability-metrics readability-scores semantic-analysis spacy syntactic-analysis text-classification text-simplification

Updated 10 months ago

https://github.com/alexeyev/awesome-kyrgyz-nlp • Rank 4.1 • Science 36%

Kyrgyz language processing software, models and datasets.

awesome-list corpus kyrgyz morphology natural-language-processing turkic turkic-languages

Updated 10 months ago

https://github.com/alexeyev/awesome-azerbaijani-nlp • Rank 4.1 • Science 36%

Azerbaijani language processing software, models and datasets.

awesome-list azeri morphology natural-language-processing stemming turkic

Updated 10 months ago

https://github.com/cran-task-views/naturallanguageprocessing • Rank 4.0 • Science 36%

CRAN Task View: Natural Language Processing

cran natural-language-processing r rstats task-views text-mining

Updated 9 months ago

https://github.com/dcavar/dcavar.github.io • Rank 3.2 • Science 36%

Cavar's homepage

artificial-intelligence computational-linguistics machine-learning natural-language-processing

Updated 10 months ago

spacy-wrap • Rank 12.4 • Science 26%

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers

Updated 10 months ago

word2vec • Rank 15.1 • Science 23%

Distributed Representations of Words using word2vec

embeddings natural-language-processing r-package word2vec

Updated 10 months ago

ai • Rank 10.9 • Science 26%

AI ——人工智能工具集，包含机器学习，深度学习，自然语言处理

ai deep-learning dl machine-learning ml natural-language-processing nlp python

Updated 10 months ago

https://github.com/dair-ai/ml-course-notes • Rank 9.8 • Science 26%

🎓 Sharing machine learning course / lecture notes.

ai data-science deep-learning machine-learning natural-language-processing

Updated 10 months ago

https://github.com/bramvanroy/spacy_conll • Rank 12.6 • Science 23%

Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doc and its sentences and tokens. Can also be used as a command-line tool.

conll conll-u data-science machine-learning natural-language-processing nlp pandas parser python spacy spacy-extension spacy-pipeline stanford-machine-learning stanford-nlp stanza udpipe

Updated 10 months ago

citation-report • Rank 4.5 • Science 31%

Parse legal citations having the publisher format - i.e. SCRA, PHIL, OFFG - referring to Philippine Supreme Court decisions.

legal legal-analytics legal-entity-identifier natural-language-processing regex

Updated 10 months ago

https://github.com/agamiko/100-days-of-code • Rank 2.5 • Science 33%

My 100 days journey with coding to improve my Machine Learning, Deep Learning, Data Science skills

acoustics computer-vision data-science deep-learning image-processing machine-learning natural-language-processing neural-networks

Updated 10 months ago

ucto • Rank 9.3 • Science 26%

Unicode tokeniser. Ucto tokenizes text files: it separates words from punctuation, and splits sentences. It offers several other basic preprocessing steps such as changing case that you can all use to make your text suited for further processing such as indexing, part-of-speech tagging, or machine translation. Ucto comes with tokenisation rules for several languages and can be easily extended to suit other languages. It has been incorporated for tokenizing Dutch text in Frog, our Dutch morpho-syntactic processor. http://ilk.uvt.nl/ucto --

computational-linguistics folia language natural-language-processing nlp punctuation tokeniser

Updated 10 months ago

stringx • Rank 9.2 • Science 26%

Drop-in replacements for base R string functions powered by stringi

icu icu4c natural-language-processing nlp r regex regexp string-manipulation stringi text text-processing unicode

Updated 10 months ago

bnbphoneticparser • Rank 8.7 • Science 26%

Bengali Phonetic Parser

banglish bengali bengali-phonetic natural-language-processing nlp python

Updated 10 months ago

https://github.com/asyml/texar-pytorch • Rank 14.5 • Science 20%

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Updated 10 months ago

mosaico • Rank 3.3 • Science 31%

A multilingual open-text semantically annotated interlinked corpus

artificial-intelligence natural-language-processing natural-language-understanding relation-extraction semantic-parsing semantic-role-labeling word-sense-disambiguation

Updated 10 months ago

https://github.com/brucewlee/lftk • Rank 11.0 • Science 23%

[BEA @ ACL 2023] General-purpose tool for linguistic features extraction; Tested on readability assessment, essay scoring, fake news detection, hate speech detection, etc.

bea-workshop feature-extraction handcrafted-features linguistic-features natural-language-processing python readability-scores reading-time spacy text-analysis word-difficulty