Projects | Open Source Science

Updated 6 months ago

contextualspellcheck • Rank 16.8 • Science 67%

✔️Contextual word checker for better suggestions (not actively maintained)

bert chatbot help-wanted natural-language-processing nlp oov preprocessing python python-spelling-corrector spacy spacy-extension spellcheck spellchecker spelling-correction spelling-corrections

Updated 6 months ago

uform • Rank 15.6 • Science 64%

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

bert clip clustering contrastive-learning cross-attention huggingface-transformers image-search language-vision llava multi-lingual multimodal neural-network openai openclip pretrained-models pytorch representation-learning semantic-search transformer vector-search

Updated 6 months ago

tasknet • Rank 12.0 • Science 67%

Easy modernBERT fine-tuning and multi-task learning

autotask autotrain bert dataset easy extreme-multi-task fine-tuning huggingface-transformers jiant-alternative modernbert mtl multi-task multi-task-trainer multitask nlp task-embeddings tasks templates trainer

Updated 6 months ago

adapters • Rank 24.9 • Science 54%

A Unified Library for Parameter-Efficient and Modular Transfer Learning

adapters bert lora natural-language-processing nlp parameter-efficient-learning parameter-efficient-tuning pytorch transformers

Updated 6 months ago

textgen • Rank 12.8 • Science 64%

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型，实现了包括LLaMA，ChatGLM，BLOOM，GPT2，Seq2Seq，BART，T5，UDA等模型的训练和预测，开箱即用。

bart bert chatglm chatgpt gpt2 llama seq2seq t5 text-generation textgen xlnet

Updated 6 months ago

code-bert-score • Rank 15.6 • Science 54%

CodeBERTScore: an automatic metric for code generation, based on BERTScore

bert bertscore code code-bert-score code-bertscore codebert codebertscore score

Updated 6 months ago

tokenizers • Rank 14.0 • Science 54%

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Updated 6 months ago

pytextclassifier • Rank 12.2 • Science 54%

pytextclassifier is a toolkit for text classification. 文本分类，LR，Xgboost，TextCNN，FastText，TextRNN，BERT等分类模型实现，开箱即用。

bert classification focalloss-pytorch hierarchical machine-learning nlp pytextclassifier python pytorch softmax text-classification text-classifier

Updated 6 months ago

detoxify • Rank 21.0 • Science 44%

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at contact@unitary.ai.

bert bert-model hate-speech hate-speech-detection hatespeech huggingface huggingface-transformers kaggle-competition nlp pytorch-lightning sentence-classification toxic-comment-classification toxic-comments toxicity toxicity-classification

Updated 6 months ago

transformer-srl • Rank 10.4 • Science 54%

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

allennlp bert conll2012 dataset labeling natural-language-processing nlp propbank pytorch role semantic semantic-role-labeling shi span srl srl-annotations srltagger transformer transformers verbatlas

Updated 6 months ago

bangla-bert • Rank 4.4 • Science 54%

Bangla-Bert is a pretrained bert model for Bengali language

bangla bangla-nlp bert lm nlp transformers

Updated 6 months ago

transformers-tutorials • Rank 11.4 • Science 46%

This repository contains demos I made with the Transformers library by HuggingFace.

bert gpt-2 layoutlm pytorch transformers vision-transformer

Updated 6 months ago

nerpy • Rank 9.0 • Science 44%

🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具，支持BertSoftmax、BertSpan等模型，开箱即用。

bert bert-softmax bert-span named-entity-recognition ner nlp pytorch transformers

Updated 6 months ago

hugsvision • Rank 6.1 • Science 46%

HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision

bert computer-vision deep-learning deit detr huggingface image-classification image-generation machine-learning object-detection pretrained-models pythorch pytorch pytorch-transformers semantic-segmentation state-of-the-art torchvision transformers vit yolo

Updated 5 months ago

https://github.com/cedrickchee/awesome-transformer-nlp • Rank 9.3 • Science 36%

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.

attention-mechanism awesome awesome-list bert chatgpt gpt-2 gpt-3 gpt-4 language-model llama natural-language-processing neural-networks nlp pre-trained-language-models transfer-learning transformer xlnet

Updated 6 months ago

efficient-task-transfer • Rank 3.6 • Science 41%

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

adapters bert nlp roberta transfer-learning transformers

Updated 6 months ago

quickai • Rank 10.5 • Science 26%

QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.

ai artificial-intelligence bert deep-learning dl easy-to-use fast gpt gpt-neo huggingface-transformers ml neural-network nlp object-detection python pytorch quickai research tensorflow2 yolo

Updated 5 months ago

https://github.com/asyml/texar-pytorch • Rank 14.5 • Science 20%

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

bert casl-project data-processing deep-learning dialog-systems gpt-2 machine-learning machine-translation natural-language-processing python pytorch roberta texar texar-pytorch text-data text-generation xlnet

Updated 5 months ago

https://github.com/cvi-szu/linly • Rank 9.4 • Science 23%

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Updated 5 months ago

https://github.com/beomi/transformers-language-modeling • Rank 3.1 • Science 26%

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

bert deepspeed language-model transformers

Updated 4 months ago

https://github.com/deepset-ai/farm • Rank 19.1 • Science 10%

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

bert deep-learning germanbert language-models ner nlp nlp-framework nlp-library pretrained-models pytorch question-answering roberta transfer-learning xlnet-pytorch

Updated 5 months ago

https://github.com/bytedance/lightseq • Rank 18.7 • Science 10%

LightSeq: A High Performance Library for Sequence Processing and Generation

accelerate bart beam-search bert cuda diverse-decoding gpt inference multilingual-nmt sampling training transformer

Updated 5 months ago

https://github.com/compnet/tibert • Rank 0.0 • Science 26%

End-to-End BERT-Based Coreference System

bert coreference-resolution nlp

Updated 5 months ago

https://github.com/explosion/spacy-transformers • Rank 10.3 • Science 13%

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

bert google gpt-2 huggingface language-model machine-learning natural-language-processing natural-language-understanding nlp openai pytorch pytorch-model spacy spacy-extension spacy-pipeline transfer-learning xlnet

Updated 6 months ago

backprop • Rank 12.9 • Science 10%

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

bert fine-tuning image-classification language-model multilingual-models natural-language-processing nlp question-answering text-classification transfer-learning transformers

Updated 5 months ago

gpl • Rank 11.9 • Science 10%

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

bert domain-adaptation information-retrieval nlp transformers vector-search

Updated 6 months ago

band • Rank 8.7 • Science 13%

BAND：BERT Application aNd Deployment, A simple and efficient BERT model training and deployment framework.

bert named-entity-recognition question-answering reading-comprehension sequence-labeling text-classification transformer

Updated 5 months ago

https://github.com/bytedance/bytetransformer • Rank 6.9 • Science 13%

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

bert gpu inference research transformer

Updated 5 months ago

text-sim • Rank 7.9 • Science 10%

文本相似度（匹配）计算，提供Baseline、训练、推理、指标分析...代码包含TensorFlow/Pytorch双版本

bert deep-learning mechine-learing model nlp pytorch similarity text-classification transformer

Updated 5 months ago

https://github.com/ai-forever/ner-bert • Rank 7.1 • Science 10%

BERT-NER (nert-bert) with google bert https://github.com/google-research.

atis attention bert bert-model bilstm-crf classification conll-2003 elmo factrueval joint-models ner ner-task nlp nmt python python3 pytorch pytorch-model transfer-learning

Updated 5 months ago

https://github.com/aryashah2k/nlp-data-augmentation • Rank 2.3 • Science 10%

Implementing 5 Different Approaches To Augmenting Data For Natural Language Processing Tasks.

back-translation bert data-augmentation ensemble natural-language-processing t5-model text-to-text-transfer-transformer word-embeddings

Updated 5 months ago

https://github.com/amazon-science/bold • Science 13%

Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper

bert bert-model bias fairness-ml gpt-2 language-model nlg nlg-dataset nlp text-generation

Updated 5 months ago

https://github.com/amazon-science/transformers-data-augmentation • Science 26%

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

bart bert bert-model data-augmentation gpt

Updated 6 months ago

medkit-lib • Science 44%

Toolkit for a learning health system

bert digital-health electronic-health-records nlp umls

Updated 6 months ago

tsdae • Science 54%

Tranformer-based Denoising AutoEncoder for Sentence Transformers Unsupervised pre-training.

bert bert-embeddings lemone lemone-io machine-learning nltk pre-training python sentence-transformers transformers tsdae unsupervised-learning

Updated 5 months ago

https://github.com/awslabs/mlm-scoring • Science 10%

Python library & examples for Masked Language Model Scoring (ACL 2020)

bert language-model mxnet nlp pytorch speech-recognition xlm

Updated 6 months ago

openai-clip • Science 67%

Simple implementation of OpenAI CLIP model in PyTorch.

bert clip-model dataset deep-learning nlp openai-clip paper pytorch

Updated 6 months ago

word-embeddings-repository-for-turkish • Science 49%

Code for "A Comprehensive Analysis of Static Word Embeddings for Turkish". Expert Systems with Applications 2024.

bert elmo fasttext glove nlp turkish turkish-nlp word2vec

Updated 5 months ago

https://github.com/johnsnowlabs/johnsnowlabs • Science 26%

Gateway into the John Snow Labs Ecosystem

bert databricks gpt machine-learning natural-language-processing nlp python seq2seq spark t5

Updated 5 months ago

https://github.com/cyberagentailab/japanese-nli-model • Science 10%

This repository provides the code for Japanese NLI model, a fine-tuned masked language model.

bert japanese natural-language-processing natural-language-understanding nli nlp roberta sentence-transformers transformers

Updated 6 months ago

zeldarose • Science 44%

Train transformer-based models.

bert fine-tuning machine-learning neural-networks nlp pretraining transformers

Updated 5 months ago

https://github.com/cabralpinto/wildfire-heat-map-generation • Science 26%

Wildfire Heat Map Generation with Twitter and BERT

bert geoparsing twitter wildfire-data-visualization

Updated 5 months ago

https://github.com/buaadreamer/nlpkiller • Science 13%

nlp intro project

attention bert nlp rnn rnn-pytorch transformer word2vec

Updated 5 months ago

https://github.com/ai-forever/model-zoo • Science 10%

NLP model zoo for Russian

bert nlp pytorch roberta roberta-model russian russian-language t5 t5-model transformers

Updated 6 months ago

contextual-spell-checker-for-bangla • Science 26%

Automatic Context Sensitive Spelling Correction for Bangla Text Using Bert and Levenstein Distance

bangla-bert bangla-nlp bert fastapi levenshtein-distance ner nlp spellcheck spelling-correction

Updated 6 months ago

llms-from-scratch • Science 26%

Build your own Large Language Model from scratch with this code repository. Learn the ins and outs of LLMs like GPT. 🚀💻

bert book chatgpt deberta flan-t5 from-scratch language-model large-language-models llm llms-book machine-learning mcp neural-networks nlp prompt-engineering python pytorch roberta

Updated 6 months ago

language-pretraining • Science 67%

Pre-training Language Models for Japanese

bert electra implementation japanese language-model language-models natural-language-processing nlp pytorch transformer transformers

Updated 5 months ago

https://github.com/atharvapathak/twitter_sentiment_analysis_project • Science 23%

Twitter sentiment analysis is the process of analyzing tweets posted on the Twitter platform to determine the overall sentiment expressed within them. It involves using natural language processing (NLP) and machine learning techniques to classify tweets.

api bag-of-words bert cnn data gbm nltk rnn spacy twitter

Updated 6 months ago

partial-embedding-matrix-adaptation • Science 41%

Vocabulary-level memory efficiency for language model fine-tuning.

bert huggingface nlp transformers

Updated 5 months ago

https://github.com/alcantarar/biomchbert • Science 10%

Repository for BiomchBERT, the neural network classifying papers for the weekly Biomch-L Literature Update

ai bert biomechanics

Updated 6 months ago

banglasenti-dataset-prep • Science 44%

BanglaSenti Dataset Preparation: Bangla Sentiment Analysis CSV Dataset for NLP & Machine Learning

bangla bangla-dataset bangla-sentiment bangla-sentiment-classification bert machine-learning nlp open-source text-classification

Updated 4 months ago

https://github.com/dimits-ts/large-text-nlp-survey • Science 10%

A survey paper exploring the use of state-of-the-art deep neural network architectures in NLP problems featuring very large documents.

bert document-classification document-summarization literature nlp sentiment-analysis survey-paper

Updated 5 months ago

https://github.com/cedrickchee/bert-pytorch • Science 10%

Google AI BERT 2018 pytorch implementation

bert language-modeling nlp pytorch transformer

Updated 6 months ago

automated-identification-of-security-relevant-configuration-settings-using-nlp • Science 52%

This repository is part of the paper "Automated Identification of Security-Relevant Configuration Settings Using NLP" accepted at the Industry Showcase track at the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE). https://conf.researchr.org/track/ase-2022/ase-2022-industry-showcase.

bert configuration-management hardening nlp scap scapolite security

Updated 6 months ago

nusabert • Science 54%

NusaBERT: Teaching IndoBERT to be multilingual and multicultural!

bert indobert indonesian language-model natural-language-processing natural-language-understanding nusabert

Updated 5 months ago

FMAT • Science 49%

😷 The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language.

ai artificial-intelligence bert bert-model bert-models contextualized-representation fill-in-the-blank fill-mask huggingface language-model language-models large-language-models masked-language-models natural-language-processing natural-language-understanding nlp pretrained-models transformer transformers

Updated 5 months ago

https://github.com/cluebbers/nlp_deeplearning_spring2023 • Science 20%

Implementing and fine-tuning BERT for sentiment analysis, paraphrase detection, and semantic textual similarity tasks. Includes code, data, and detailed results.

adamw-optimizer bert deep-learning natural-language-processing paraphrase-detection python pytorch semantic-similarity sentiment-analysis sophia tensorflow transformers