Scientific Software
Updated 6 months ago

Turftopic — Peer-reviewed • Rank 13.8 • Science 98%

Turftopic: Topic Modelling with Contextual Representations from Sentence Transformers - Published in JOSS (2025)

Mathematics
Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

Speakerbox — Peer-reviewed • Rank 10.9 • Science 100%

Speakerbox: Few-Shot Learning for Speaker Identification with Transformers - Published in JOSS (2023)

Engineering (40%)
Scientific Software · Peer-reviewed
Scientific Software
Updated 6 months ago

Jury — Peer-reviewed • Rank 14.8 • Science 93%

Jury: A Comprehensive Evaluation Toolkit - Published in JOSS (2024)

Scientific Software
Updated 6 months ago

pactus — Peer-reviewed • Rank 8.1 • Science 98%

pactus: A Python framework for trajectory classification - Published in JOSS (2023)

Updated 6 months ago

decimer • Rank 15.9 • Science 77%

DECIMER Image Transformer is a deep-learning-based tool designed for automated recognition of chemical structure images. Leveraging transformer architectures, the model converts chemical images into SMILES strings, enabling the digitization of chemical data from scanned documents, literature, and patents.

Updated 6 months ago

farm-haystack • Rank 28.7 • Science 54%

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Updated 6 months ago

rwkv-lm • Rank 11.2 • Science 67%

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

Updated 6 months ago

gpt-neox • Rank 13.8 • Science 64%

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Updated 6 months ago

learn_prompting • Rank 13.3 • Science 64%

Prompt Engineering, Generative AI, and LLM Guide by Learn Prompting | Join our discord for the largest Prompt Engineering learning community

Updated 6 months ago

transformers-interpret • Rank 19.3 • Science 54%

Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.

Updated 6 months ago

bio-epidemiology-ner • Rank 5.9 • Science 67%

Recognize bio-medical entities from a text corpus

Updated 6 months ago

mlm-bias • Rank 4.9 • Science 67%

Measuring Biases in Masked Language Models for PyTorch Transformers. Support for multiple social biases and evaluation measures.

Updated 6 months ago

span-marker • Rank 17.2 • Science 54%

SpanMarker for Named Entity Recognition

Updated 6 months ago

alignment-handbook • Rank 16.9 • Science 54%

Robust recipes to align language models with human and AI preferences

Updated 6 months ago

tokenizers • Rank 14.0 • Science 54%

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Updated 6 months ago

wiki-entity-summarization-preprocessor • Rank 0.7 • Science 67%

Convert Wikidata and Wikipedia raw files to filterable formats with a focus of marking Wikidata as summaries based on their Wikipedia abstracts.

Updated 6 months ago

knn-transformers • Rank 5.6 • Science 62%

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

Updated 6 months ago

frame-semantic-transformer • Rank 10.9 • Science 54%

Frame Semantic Parser based on T5 and FrameNet

Updated 6 months ago

maestro • Rank 10.5 • Science 54%

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Updated 6 months ago

transformer-srl • Rank 10.4 • Science 54%

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

Updated 6 months ago

optimum • Rank 27.2 • Science 36%

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

Updated 6 months ago

linear-relational • Rank 7.8 • Science 54%

Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch

Updated 6 months ago

text • Rank 15.7 • Science 46%

Using Transformers from HuggingFace in R

Updated 6 months ago

turkish-question-generation • Rank 3.9 • Science 57%

Automated question generation and question answering from Turkish texts using text-to-text transformers

Updated 6 months ago

bangla-bert • Rank 4.4 • Science 54%

Bangla-Bert is a pretrained bert model for Bengali language

Updated 6 months ago

transformers-tutorials • Rank 11.4 • Science 46%

This repository contains demos I made with the Transformers library by HuggingFace.

Updated 6 months ago

huggingsound • Rank 13.3 • Science 44%

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Updated 6 months ago

nerpy • Rank 9.0 • Science 44%

🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。

Updated 5 months ago

https://github.com/betswish/cross-lingual-consistency • Rank 3.9 • Science 49%

Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper here: https://aclanthology.org/2023.emnlp-main.658/

Updated 6 months ago

stormtrooper • Rank 7.6 • Science 44%

Zero/few shot learning components for scikit-learn pipelines with LLMs and transformers.

Updated 6 months ago

marqo-fashionclip • Rank 6.4 • Science 44%

State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.

Updated 6 months ago

argostranslate • Rank 24.2 • Science 26%

Open-source offline translation library written in Python

Updated 6 months ago

porag • Rank 4.9 • Science 44%

Fully Configurable RAG Pipeline for Bengali Language RAG Applications. Supports both Local and Huggingface Models, Built with Langchain.

Updated 6 months ago

logfire-callback • Rank 2.6 • Science 44%

A callback for logging training events from Hugging Face's Transformers to Logfire 🤗

Updated 6 months ago

efficient-task-transfer • Rank 3.6 • Science 41%

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

Updated 6 months ago

napolab • Rank 7.0 • Science 36%

The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.

Updated 6 months ago

astronet • Rank 3.7 • Science 36%

Efficient Deep Learning for Real-time Classification of Astronomical Transients and Multivariate Time-series

Updated 6 months ago

spacy-wrap • Rank 12.4 • Science 26%

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

Updated 5 months ago

https://github.com/beomi/easy-lm-trainer • Rank 4.8 • Science 26%

🤗 최소한의 세팅으로 LM을 학습하기 위한 샘플코드

Updated 5 months ago

https://github.com/dadananjesha/ai-text-humanizer-app • Rank 3.0 • Science 26%

Transform AI-generated text into formal, human-like, and academic writing with ease, avoids AI detector!

Updated 5 months ago

https://github.com/compvis/geometry-free-view-synthesis • Rank 6.0 • Science 23%

Is a geometric model required to synthesize novel views from a single image?

Updated 5 months ago

https://github.com/beomi/bitnet-transformers • Rank 5.7 • Science 23%

0️⃣1️⃣🤗 BitNet-Transformers: Huggingface Transformers Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch with Llama(2) Architecture

Updated 5 months ago

https://github.com/bramvanroy/lt3-2019-transformer-trainer • Rank 0.7 • Science 26%

Transformer trainer for variety of classification problems that has been used in-house at LT3 for different research topics.

Updated 6 months ago

gpl • Rank 11.9 • Science 10%

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

Updated 6 months ago

fluence • Rank 8.2 • Science 10%

A deep learning library based on Pytorch focussed on low resource language research and robustness

Updated 5 months ago

https://github.com/daniel-furman/sft-demos • Rank 4.4 • Science 13%

Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.

Updated 5 months ago

https://github.com/beomi/gemma-easylm • Rank 6.3 • Science 10%

Train GEMMA on TPU/GPU! (Codebase for training Gemma-Ko Series)

Updated 6 months ago

uniformers • Science 36%

Token-free Language Modeling with ByGPT5 & Friends!

Updated 5 months ago

https://github.com/chirayu-tripathi/paper-implementations • Science 23%

My implementation of Machine Learning and Deep Learning papers from scratch.

Updated 5 months ago

https://github.com/cosbidev/naim • Science 23%

Official implementation for the paper ``Not Another Imputation Method: A Transformer-based Model for Missing Values in Tabular Datasets´´

Updated 6 months ago

balena • Science 44%

BALanced Execution through Natural Activation : a human-computer interaction methodology for code running.

Updated 5 months ago

https://github.com/cyberagentailab/japanese-nli-model • Science 10%

This repository provides the code for Japanese NLI model, a fine-tuned masked language model.

Updated 6 months ago

transformers-tf-finetune • Science 44%

Scripts to finetune huggingface transformers models with Tensorflow 2

Updated 6 months ago

ai-essayist • Science 44%

SKKU AI X Bookathon 4회 [쿠봇] 팀의 레포지토리입니다.

Updated 5 months ago

https://github.com/dc-research/tempo • Science 36%

The official code for "TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting (ICLR 2024)". TEMPO is one of the very first open source Time Series Foundation Models for forecasting task v1.0 version.

Updated 5 months ago

https://github.com/ai4co/routefinder • Science 36%

[ICML'24 FM-Wild Oral] RouteFinder: Towards Foundation Models for Vehicle Routing Problems

Updated 6 months ago

bitcoin-trader-ml • Science 44%

Automated 24/7 bitcoin trader for Coinbase using Transformer Neural Networks

Updated 6 months ago

master-thesis • Science 44%

One Bit at a Time: Impact of Quantisation on Neural Machine Translation

Updated 6 months ago

tsdae • Science 54%

Tranformer-based Denoising AutoEncoder for Sentence Transformers Unsupervised pre-training.

Updated 6 months ago

llms4om • Science 54%

LLMs4OM: Matching Ontologies with Large Language Models

Updated 6 months ago

indic-syntax-evaluation • Science 39%

Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages

Updated 5 months ago

https://github.com/chris-santiago/tsfeast • Science 10%

A collection of Scikit-Learn compatible time series transformers and tools.

Updated 5 months ago

https://github.com/cluebbers/nlp_deeplearning_spring2023 • Science 20%

Implementing and fine-tuning BERT for sentiment analysis, paraphrase detection, and semantic textual similarity tasks. Includes code, data, and detailed results.

Updated 6 months ago

dpo-rlhf-paraphrase-types • Science 67%

Enhancing paraphrase-type generation using Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF), with large-scale HPC support. This project aligns model outputs to human-ranked data for robust, safety-focused NLP.

Updated 6 months ago

partial-embedding-matrix-adaptation • Science 41%

Vocabulary-level memory efficiency for language model fine-tuning.

Updated 6 months ago

pangoling • Science 36%

An R package for estimating the log-probabilities of words in a given context using transformer models.

Updated 6 months ago

linktransformer • Science 54%

A convenient way to link, deduplicate, aggregate and cluster data(frames) in Python using deep learning

Updated 6 months ago

staged-training • Science 54%

Staged Training for Transformer Language Models