Projects | Open Source Science

Updated 10 months ago

open-clip-torch • Rank 29.2 • Science 85%

An open source implementation of CLIP.

computer-vision contrastive-loss deep-learning language-model multi-modal-learning pretrained-models pytorch zero-shot-classification

Updated 10 months ago

lm-eval • Rank 28.4 • Science 59%

A framework for few-shot evaluation of language models.

evaluation-framework language-model transformer

Updated 10 months ago

txtai • Rank 22.2 • Science 64%

💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

ai artificial-intelligence embeddings information-retrieval language-model large-language-models llm machine-learning nlp python rag retrieval-augmented-generation search search-engine semantic-search sentence-embeddings transformers txtai vector-database vector-search

Updated 10 months ago

inseq • Rank 13.6 • Science 67%

Interpretability for sequence generation models 🐛 🔍

attribution-methods captum deep-learning explainable-ai generative-ai huggingface interpretability language-generation language-model large-language-models natural-language-processing sequence-to-sequence transformers

Updated 10 months ago

imodelsx • Rank 14.5 • Science 64%

Interpret text data using LLMs (scikit-learn compatible).

ai deep-learning explainability huggingface interpretability language-model machine-learning ml natural-language-processing natural-language-understanding neural-network pytorch scikit-learn text text-classification transformer-models xai

Updated 10 months ago

rwkv-lm • Rank 11.2 • Science 67%

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

attention-mechanism chatgpt deep-learning gpt gpt-2 gpt-3 language-model linear-attention lstm pytorch rnn rwkv transformer transformers

Updated 10 months ago

gpt-neox • Rank 13.8 • Science 64%

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

deepspeed-library gpt-3 language-model transformers

Updated 10 months ago

asreview • Rank 19.2 • Science 57%

Active learning for systematic reviews

active-learning asreview deep-learning language-model learning-algorithms literature llm natural-language-processing neural-network research systematic-literature-reviews systematic-reviews utrecht-university

Updated 10 months ago

prompt-engineering-guide • Rank 16.4 • Science 54%

🐙 Guides, papers, lecture, notebooks and resources for prompt engineering

chatgpt deep-learning generative-ai language-model openai prompt-engineering

Updated 10 months ago

llms-from-scratch • Rank 14.9 • Science 54%

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

ai artificial-intelligence chatgpt deep-learning from-scratch gpt language-model large-language-models llm machine-learning python pytorch transformer

Updated 10 months ago

tokenizers • Rank 14.0 • Science 54%

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

bert gpt language-model natural-language-processing natural-language-understanding nlp transformers

Updated 10 months ago

https://github.com/blinkdl/chatrwkv • Rank 21.3 • Science 46%

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

chatbot chatgpt language-model pytorch rnn rwkv

Updated 10 months ago

opennmt-py • Rank 24.6 • Science 36%

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

deep-learning language-model llms machine-translation neural-machine-translation pytorch

Updated 9 months ago

https://github.com/csinva/tree-prompt • Rank 6.0 • Science 46%

Tree prompting: easy-to-use scikit-learn interface for improved prompting.

ai artificial-intelligence classification controllability decision-tree huggingface interpretability language-model language-models llm llm-inference machine-learning prompt-engineering prompting scikit-learn

Updated 10 months ago

feste • Rank 6.7 • Science 44%

Feste is a free and open-source framework allowing scalable composition of NLP tasks using a graph execution model that is optimized and executed by specialized schedulers.

deep-learning language-model machine-learning nlp

Updated 9 months ago

https://github.com/csinva/interpretable-embeddings • Rank 4.7 • Science 46%

Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)

ai artificial-intelligence embeddings encoding-models explainability fmri huggingface language-model llm neural-network neuroscience rag retrieval-augmented-generation transformer xai

Updated 10 months ago

aicademy • Rank 6.4 • Science 44%

A friendly community offering free AI education.

ai community data-science deep-learning education generative-ai language-model llm machine-learning math open-source prompt-engineering python pytorch

Updated 10 months ago

sgpt • Rank 7.9 • Science 41%

SGPT: GPT Sentence Embeddings for Semantic Search

gpt information-retrieval language-model large-language-models neural-search retrieval semantic-search sentence-embeddings sgpt text-embedding

Updated 10 months ago

llama-classification • Rank 4.7 • Science 44%

Text classification with Foundation Language Model LLaMA

classification foundation-model gpt language-model llama pytorch

Updated 8 months ago

https://github.com/google-research/jestimator • Rank 10.8 • Science 36%

Amos optimizer with JEstimator lib.

deep-learning flax jax language-model machine-learning mnist optimization optimization-algorithms optimizer transformer

Updated 10 months ago

https://github.com/cedrickchee/awesome-transformer-nlp • Rank 9.3 • Science 36%

A curated list of NLP resources focused on Transformer networks, attention mechanism, GPT, BERT, ChatGPT, LLMs, and transfer learning.

attention-mechanism awesome awesome-list bert chatgpt gpt-2 gpt-3 gpt-4 language-model llama natural-language-processing neural-networks nlp pre-trained-language-models transfer-learning transformer xlnet

Updated 10 months ago

https://github.com/chiang-yuan/llamp • Rank 6.3 • Science 33%

A web app and Python API for multi-modal RAG framework to ground LLMs on high-fidelity materials informatics. An agentic materials scientist powered by @materialsproject, @langchain-ai, and @openai

ai4science cheminformatics language-model materials-informatics retrieval-augmented-generation

Updated 10 months ago

spacy-wrap • Rank 12.4 • Science 26%

spaCy-wrap is a wrapper library for spaCy for including fine-tuned transformers from Huggingface in your spaCy pipeline allowing you to include existing fine-tuned models within your SpaCy workflow.

deep-learning huggingface huggingface-transformers language-model machine-learning natural-language-processing nlp pytorch spacy spacy-extension spacy-extensions spacy-models spacy-nlp spacy-pipeline spacy-transformers text-classification transformers

Updated 10 months ago

https://github.com/brucewlee/h-test • Rank 0.7 • Science 33%

[ACL 2024] Language Models Don't Learn the Physical Manifestation of Language

benchmark evaluation language-model

Updated 9 months ago

https://github.com/cvi-szu/linly • Rank 9.4 • Science 23%

Chinese-LLaMA 1&2、Chinese-Falcon 基础模型；ChatFlow中文对话模型；中文OpenLLaMA模型；NLP预训练/指令微调数据集

bert chatbot chatgpt chinese chinese-nlp gpt-3 language-model llama nlp zero-shot-learning

Updated 10 months ago

https://github.com/beomi/easy-lm-trainer • Rank 4.8 • Science 26%

🤗 최소한의 세팅으로 LM을 학습하기 위한 샘플코드

boilerplate huggingface language-model transformers

Updated 10 months ago

https://github.com/amazon-science/mezo_svrg • Rank 3.7 • Science 26%

Code the ICML 2024 paper: "Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models"

deep-learning fine-tuning language-model large-language-models llm-training llms machine-learning machine-learning-algorithms optimization optimization-algorithms svrg variance-reduction zero-order-methods

Updated 10 months ago

https://github.com/beomi/transformers-language-modeling • Rank 3.1 • Science 26%

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

bert deepspeed language-model transformers

Updated 10 months ago

https://github.com/bminixhofer/zett • Rank 5.6 • Science 23%

Code for Zero-Shot Tokenizer Transfer

language-model llm llms multilingual tokenization transfer-learning

Updated 10 months ago

https://github.com/buaadreamer/dlpy • Rank 0.7 • Science 23%

Programming Language for Deep Learning in Python

baichuan2 chatbot deep-learning gpt-2 huggingface language-model mlp programming-language python3 transformers

Updated 10 months ago

https://github.com/explosion/spacy-transformers • Rank 10.3 • Science 13%

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

bert google gpt-2 huggingface language-model machine-learning natural-language-processing natural-language-understanding nlp openai pytorch pytorch-model spacy spacy-extension spacy-pipeline transfer-learning xlnet

Updated 10 months ago

backprop • Rank 12.9 • Science 10%

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

bert fine-tuning image-classification language-model multilingual-models natural-language-processing nlp question-answering text-classification transfer-learning transformers

Updated 10 months ago

https://github.com/ai-forever/ru-gpts • Rank 9.0 • Science 10%

Russian GPT3 models.

deep-learning gpt3 language-model russian russian-language transformers

Updated 10 months ago

https://github.com/beomi/gemma-easylm • Rank 6.3 • Science 10%

Train GEMMA on TPU/GPU! (Codebase for training Gemma-Ko Series)

easylm flax gemma huggingface jax language-model tpu transformers

Updated 10 months ago

https://github.com/cedrickchee/advent-of-code-2022 • Rank 0.0 • Science 13%

Advent of Code (AoC) 2022 in Rust

advent-of-code advent-of-code-2022 gpt-3 language-model rust-hack

Updated 10 months ago

mdlm • Science 62%

[NeurIPS 2024] Simple and Effective Masked Diffusion Language Model

diffusion-language-models diffusion-models language-model text

Updated 10 months ago

https://github.com/ac-rad/xdl-generation • Science 13%

CLAIRify: Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

language-model planning robot

Updated 10 months ago

pal • Science 54%

PaL: Program-Aided Language Models (ICML 2023)

commonsense-reasoning few-shot-learning language-generation language-model large-language-models reasoning

Updated 10 months ago

pix2seq • Science 67%

Simple Implementation of Pix2Seq model for object detection in PyTorch

deep-learning huggingface language-model object-detection pix2seq pytorch pytorch-implementation timm transformer

Updated 10 months ago

https://github.com/alexeyev/letnyayashkola-dl4nlp-2018 • Science 13%

Deep Learning workshop @ letnyayashkola.org, supporting materials for lectures by @alexeyev [in Russian]

deep-learning language-model nlp pytorch semantics

Updated 9 months ago

https://github.com/csinva/iprompt • Science 36%

Finding semantically meaningful and accurate prompts.

ai autoprompt deep-learning explainability explainable-ai interpretability iprompt language-model large-language-models ml natural-language-processing neural-network prompting text text-classification xai

Updated 10 months ago

https://github.com/awslabs/mlm-scoring • Science 10%

Python library & examples for Masked Language Model Scoring (ACL 2020)

bert language-model mxnet nlp pytorch speech-recognition xlm

Updated 10 months ago

snap-umls-clusters • Science 36%

Master Thesis Project in Arab American University Palestine with Palestinian Neuro Initiative Educational Research Center - Clustering medical sentences based on Unified Medical Language System (UMLS) terms and expanded UMLS terms present in them

deep-neural-networks knowledge-graph language-model machine-learning natural-language-processing text-mining

Updated 10 months ago

llms-from-scratch • Science 26%

Build your own Large Language Model from scratch with this code repository. Learn the ins and outs of LLMs like GPT. 🚀💻

bert book chatgpt deberta flan-t5 from-scratch language-model large-language-models llm llms-book machine-learning mcp neural-networks nlp prompt-engineering python pytorch roberta

Updated 10 months ago

speechbrain • Science 64%

A PyTorch-based Speech Toolkit

asr audio audio-processing deep-learning huggingface language-model pytorch speaker-diarization speaker-recognition speaker-verification speech-enhancement speech-processing speech-recognition speech-separation speech-to-text speech-toolkit speechrecognition spoken-language-understanding transformers voice-recognition

Updated 10 months ago

https://github.com/rentruewang/bocoel • Science 26%

Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few lines of modular code.

bayesian-optimization benchmarking evaluation language-model llm machine-learning

Updated 10 months ago

cecilia • Science 57%

The Cuban Language Model

language-model llm slm

Updated 10 months ago

deeprank-gnn-esm • Science 26%

Graph Network for protein-protein interface including language model features

deep-learning graph-networks interface-classification language-model protein-protein-interaction scoring utrecht-university

Updated 10 months ago

language-pretraining • Science 67%

Pre-training Language Models for Japanese

bert electra implementation japanese language-model language-models natural-language-processing nlp pytorch transformer transformers

Updated 10 months ago

gerpt2 • Science 57%

German small and large versions of GPT2.

common-crawl german gpt2 language-model machine-learning nlp

Updated 10 months ago

indonesian-language-models • Science 54%

Indonesian Language Models and its Usage

deep-learning fastai huggingface-transformers language-model machine-learning nlp pytorch transformer

Updated 10 months ago

FMAT • Science 49%

😷 The Fill-Mask Association Test (FMAT): Measuring Propositions in Natural Language.

ai artificial-intelligence bert bert-model bert-models contextualized-representation fill-in-the-blank fill-mask huggingface language-model language-models large-language-models masked-language-models natural-language-processing natural-language-understanding nlp pretrained-models transformer transformers

Updated 10 months ago

awesome-japanese-llm • Science 75%

日本語LLMまとめ - Overview of Japanese LLMs

foundation-models generative-ai generative-model generative-models japanese japanese-language japanese-language-model japanese-llm language-model language-models large-language-model large-language-models llm llm-japanese llms multimodal vision-and-language vision-language vision-language-model

Updated 10 months ago

ai_story_scale • Science 67%

The AI story scale (AISS): A human rating scale for texts written with generative language models.

artificial-intelligence gpt language-model novelai psychometrics

Updated 8 months ago

https://github.com/disi-unibo-nlp/easumm • Science 13%

[DATA22 and Springer LNCS] Graph-Enhanced Biomedical Abstractive Summarization via Factual Evidence Extraction

abstractive-summarization event-extraction knowledge-infusion language-model nlp nlu

Updated 10 months ago

https://github.com/amazon-science/bold • Science 13%

Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper

bert bert-model bias fairness-ml gpt-2 language-model nlg nlg-dataset nlp text-generation

Updated 10 months ago

cramming • Science 54%

Cramming the training of a (BERT-type) language model into limited compute.

english-language language-model machine-learning

Updated 10 months ago

staged-training • Science 54%

Staged Training for Transformer Language Models

deep-learning language-model nlp pytorch transformers

Updated 10 months ago

https://github.com/awslabs/gap-text2sql • Science 10%

GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training

deep-learning language-model machine-learning nlp nlu pretrained-models pytorch semantic-parsing text-generation text2sql

Updated 9 months ago

https://github.com/claromes/toolazytowritealt • Science 23%

alt text for lazy people

blip-2 clarifai language-model llm-hackathon streamlit vision-and-language

Updated 10 months ago

chronos • Science 36%

Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.

artificial-intelligence autonomous-debugging benchmark benchmark-report bug-fixing chronos code code-analysis code-analysis-tool code-debugger code-understanding debugging developer-tools kodezi language-model machine-learning program-repair software-engineering

Updated 10 months ago

automated-brain-explanations • Science 36%

Generating and validating natural-language explanations for the brain.

ai-for-science artificial-intelligence automated-interpretability data-science explanation fmri fmri-data-analysis gpt gpt4 huggingface interpretability interpretable-embeddings language-model large-language-models machine-learning mechanistic-interpretability natural-language-processing neuroscience xai