llms-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

https://github.com/rasbt/llms-from-scratch

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 42 committers (4.8%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

ai artificial-intelligence chatgpt deep-learning from-scratch gpt language-model large-language-models llm machine-learning python pytorch transformer

Keywords from Contributors

data-mining jax
Last synced: 6 months ago · JSON representation ·

Repository

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Basic Info
  • Host: GitHub
  • Owner: rasbt
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage: https://amzn.to/4fqvn0D
  • Size: 13.9 MB
Statistics
  • Stars: 68,384
  • Watchers: 614
  • Forks: 9,648
  • Open Issues: 4
  • Releases: 0
Topics
ai artificial-intelligence chatgpt deep-learning from-scratch gpt language-model large-language-models llm machine-learning python pytorch transformer
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).




In Build a Large Language Model (From Scratch), you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.



To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

bash git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git


(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://github.com/rasbt/LLMs-from-scratch for the latest updates.)



Table of Contents

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, Ghostwriter is a good free option.

You can alternatively view this and other files on GitHub at https://github.com/rasbt/LLMs-from-scratch in your browser, which renders Markdown automatically.



Tip: If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.



Code tests Linux Code tests Windows Code tests macOS


| Chapter Title | Main Code (for Quick Access) | All Code + Supplementary | |------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------| | Setup recommendations | - | - | | Ch 1: Understanding Large Language Models | No code | - | | Ch 2: Working with Text Data | - ch02.ipynb
- dataloader.ipynb (summary)
- exercise-solutions.ipynb | ./ch02 | | Ch 3: Coding Attention Mechanisms | - ch03.ipynb
- multihead-attention.ipynb (summary)
- exercise-solutions.ipynb| ./ch03 | | Ch 4: Implementing a GPT Model from Scratch | - ch04.ipynb
- gpt.py (summary)
- exercise-solutions.ipynb | ./ch04 | | Ch 5: Pretraining on Unlabeled Data | - ch05.ipynb
- gpt_train.py (summary)
- gpt_generate.py (summary)
- exercise-solutions.ipynb | ./ch05 | | Ch 6: Finetuning for Text Classification | - ch06.ipynb
- gptclassfinetune.py
- exercise-solutions.ipynb | ./ch06 | | Ch 7: Finetuning to Follow Instructions | - ch07.ipynb
- gptinstructionfinetuning.py (summary)
- ollama_evaluate.py (summary)
- exercise-solutions.ipynb | ./ch07 | | Appendix A: Introduction to PyTorch | - code-part1.ipynb
- code-part2.ipynb
- DDP-script.py
- exercise-solutions.ipynb | ./appendix-A | | Appendix B: References and Further Reading | No code | - | | Appendix C: Exercise Solutions | No code | - | | Appendix D: Adding Bells and Whistles to the Training Loop | - appendix-D.ipynb | ./appendix-D | | Appendix E: Parameter-efficient Finetuning with LoRA | - appendix-E.ipynb | ./appendix-E |


 

The mental model below summarizes the contents covered in this book.


 

Prerequisites

The most important prerequisite is a strong foundation in Python programming. With this knowledge, you will be well prepared to explore the fascinating world of LLMs and understand the concepts and code examples presented in this book.

If you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.

This book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs, helpful for learning about the essentials.


 

Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the setup doc for additional recommendations.)

 

Video Course

A 17-hour and 15-minute companion video course where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book's structure so that it can be used as a standalone alternative to the book or complementary code-along resource.

 

Companion Book / Sequel

Build A Reasoning Model (From Scratch), while a standalone book, can be considered as a sequel to Build A Large Language Model (From Scratch).

It starts with a pretrained model and implements different reasoning approaches, including inference-time scaling, reinforcement learning, and distillation, to improve the model's reasoning capabilities.

Similar to Build A Large Language Model (From Scratch), Build A Reasoning Model (From Scratch) takes a hands-on approach implementing these methods from scratch.


 

Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example, ./ch02/01_main-chapter-code/exercise-solutions.ipynb.

In addition to the code exercises, you can download a free 170-page PDF titled Test Yourself On Build a Large Language Model (From Scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.

 

Bonus Material

Several folders contain optional materials as a bonus for interested readers:


 

Questions, Feedback, and Contributing to This Repository

I welcome all sorts of feedback, best shared via the Manning Forum or GitHub Discussions. Likewise, if you have any questions or just want to bounce ideas off others, please don't hesitate to post these in the forum as well.

Please note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.

 

Citation

If you find this book or code useful for your research, please consider citing it.

Chicago-style citation:

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX entry:

@book{build-llms-from-scratch-book, author = {Sebastian Raschka}, title = {Build A Large Language Model (From Scratch)}, publisher = {Manning}, year = {2024}, isbn = {978-1633437166}, url = {https://www.manning.com/books/build-a-large-language-model-from-scratch}, github = {https://github.com/rasbt/LLMs-from-scratch} }

Owner

  • Name: Sebastian Raschka
  • Login: rasbt
  • Kind: user
  • Location: Madison, WI
  • Company: @Lightning-AI , University of Wisconsin-Madison

Machine Learning and AI researcher & currently research engineer at a startup

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this book or its accompanying code, please cite it as follows."
title: "Build A Large Language Model (From Scratch), Published by Manning, ISBN 978-1633437166"
abstract: "This book provides a comprehensive, step-by-step guide to implementing a ChatGPT-like large language model from scratch in PyTorch."
date-released: 2024-09-12
authors:
  - family-names: "Raschka"
    given-names: "Sebastian"
license: "Apache-2.0"
url: "https://www.manning.com/books/build-a-large-language-model-from-scratch"
repository-code: "https://github.com/rasbt/LLMs-from-scratch"
keywords:
  - large language models
  - natural language processing
  - artificial intelligence
  - PyTorch
  - machine learning
  - deep learning

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 778
  • Total Committers: 42
  • Avg Commits per committer: 18.524
  • Development Distribution Score (DDS): 0.17
Past Year
  • Commits: 410
  • Committers: 27
  • Avg Commits per committer: 15.185
  • Development Distribution Score (DDS): 0.195
Top Committers
Name Email Commits
rasbt m****l@s****m 646
Daniel Kleine 5****e 59
Intelligence-Manifesto 1****o 9
TITC 3****C 7
Rayed Bin Wahed r****d@t****t 6
casinca 4****a 4
Ikko Eltociear Ashimine e****r@g****m 3
Jinge Wang w****4@1****m 3
Xiaotian Ma 4****8 2
Suman Debnath 5****a 2
Pietro Monticone 3****e 2
Kasen 1****n 2
Jeroen Van Goey j****y@g****m 2
Henry Shi h****h@g****m 2
Greg Gandenberger g****r@s****m 2
Austin Welch a****w 1
Eric Berg e****g@g****m 1
Eric Thomson t****c@g****m 1
DrCesar d****r@d****h 1
Sebastian R A****r@S****n 1
Kostyantyn Borysenko k****o@g****m 1
rvaneijk r****b@b****m 1
ridhachahed 3****d 1
joel-foo j****g@g****m 1
Xiangzhuang Shen d****r@g****m 1
Victor Skvortsov v****3@g****m 1
Tim Hopper t****r 1
Thanh Tran t****6@g****m 1
Tao Qian t****n@g****m 1
SSebo s****z@g****m 1
and 12 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 165
  • Total pull requests: 624
  • Average time to close issues: 1 day
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 82
  • Total pull request authors: 77
  • Average comments per issue: 1.5
  • Average comments per pull request: 1.13
  • Merged pull requests: 493
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 90
  • Pull requests: 364
  • Average time to close issues: 2 days
  • Average time to close pull requests: about 12 hours
  • Issue authors: 53
  • Pull request authors: 46
  • Average comments per issue: 1.14
  • Average comments per pull request: 0.98
  • Merged pull requests: 276
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • labdmitriy (31)
  • d-kleine (31)
  • PrinceSajjadHussain (5)
  • jingedawang (3)
  • casinca (3)
  • rasbt (3)
  • weezymatt (3)
  • xyang2013 (3)
  • RahulYadav-tech-gif (3)
  • WilliamLee30 (2)
  • qibin0506 (2)
  • athul-22 (2)
  • vico (2)
  • EricTay1997 (2)
  • frankchieng (2)
Pull Request Authors
  • rasbt (354)
  • d-kleine (83)
  • Intelligence-Manifesto (21)
  • casinca (11)
  • ziqiyang107 (7)
  • TITC (7)
  • gsganden (6)
  • rayed-therap (6)
  • rayedbw (6)
  • weezymatt (5)
  • eltociear (5)
  • conglt-0917 (4)
  • imkasen (4)
  • jinesh90 (4)
  • henrythe9th (4)
Top Labels
Issue Labels
bug (57) question (28) enhancement (3) documentation (2) update (1) text typo (1)
Pull Request Labels
bug (1) question (1)