llms-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

https://github.com/rasbt/llms-from-scratch

Keywords

ai artificial-intelligence chatgpt deep-learning from-scratch gpt language-model large-language-models llm machine-learning python pytorch transformer

Keywords from Contributors

data-mining jax

Last synced: 6 months ago · JSON representation ·

Repository

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Basic Info

Host: GitHub
Owner: rasbt
License: other
Language: Jupyter Notebook
Default Branch: main
Homepage: https://amzn.to/4fqvn0D
Size: 13.9 MB

Statistics

Stars: 68,384
Watchers: 614
Forks: 9,648
Open Issues: 4
Releases: 0

Topics

ai artificial-intelligence chatgpt deep-learning from-scratch gpt language-model large-language-models llm machine-learning python pytorch transformer

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

Build a Large Language Model (From Scratch)

This repository contains the code for developing, pretraining, and finetuning a GPT-like LLM and is the official code repository for the book Build a Large Language Model (From Scratch).

In Build a Large Language Model (From Scratch), you'll learn and understand how large language models (LLMs) work from the inside out by coding them from the ground up, step by step. In this book, I'll guide you through creating your own LLM, explaining each stage with clear text, diagrams, and examples.

The method described in this book for training and developing your own small-but-functional model for educational purposes mirrors the approach used in creating large-scale foundational models such as those behind ChatGPT. In addition, this book includes code for loading the weights of larger pretrained models for finetuning.

Link to the official source code repository
Link to the book at Manning (the publisher's website)
Link to the book page on Amazon.com
ISBN 9781633437166

To download a copy of this repository, click on the Download ZIP button or execute the following command in your terminal:

bash git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git

(If you downloaded the code bundle from the Manning website, please consider visiting the official code repository on GitHub at https://github.com/rasbt/LLMs-from-scratch for the latest updates.)

Please note that this README.md file is a Markdown (.md) file. If you have downloaded this code bundle from the Manning website and are viewing it on your local computer, I recommend using a Markdown editor or previewer for proper viewing. If you haven't installed a Markdown editor yet, Ghostwriter is a good free option.

You can alternatively view this and other files on GitHub at https://github.com/rasbt/LLMs-from-scratch in your browser, which renders Markdown automatically.

Tip: If you're seeking guidance on installing Python and Python packages and setting up your code environment, I suggest reading the README.md file located in the setup directory.

| Chapter Title | Main Code (for Quick Access) | All Code + Supplementary | |------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------| | Setup recommendations | - | - | | Ch 1: Understanding Large Language Models | No code | - | | Ch 2: Working with Text Data | - ch02.ipynb
- dataloader.ipynb (summary)
- exercise-solutions.ipynb | ./ch02 | | Ch 3: Coding Attention Mechanisms | - ch03.ipynb
- multihead-attention.ipynb (summary)
- exercise-solutions.ipynb| ./ch03 | | Ch 4: Implementing a GPT Model from Scratch | - ch04.ipynb
- gpt.py (summary)
- exercise-solutions.ipynb | ./ch04 | | Ch 5: Pretraining on Unlabeled Data | - ch05.ipynb
- gpt_train.py (summary)
- gpt_generate.py (summary)
- exercise-solutions.ipynb | ./ch05 | | Ch 6: Finetuning for Text Classification | - ch06.ipynb
- gptclassfinetune.py
- exercise-solutions.ipynb | ./ch06 | | Ch 7: Finetuning to Follow Instructions | - ch07.ipynb
- gptinstructionfinetuning.py (summary)
- ollama_evaluate.py (summary)
- exercise-solutions.ipynb | ./ch07 | | Appendix A: Introduction to PyTorch | - code-part1.ipynb
- code-part2.ipynb
- DDP-script.py
- exercise-solutions.ipynb | ./appendix-A | | Appendix B: References and Further Reading | No code | - | | Appendix C: Exercise Solutions | No code | - | | Appendix D: Adding Bells and Whistles to the Training Loop | - appendix-D.ipynb | ./appendix-D | | Appendix E: Parameter-efficient Finetuning with LoRA | - appendix-E.ipynb | ./appendix-E |

The mental model below summarizes the contents covered in this book.

Prerequisites

The most important prerequisite is a strong foundation in Python programming. With this knowledge, you will be well prepared to explore the fascinating world of LLMs and understand the concepts and code examples presented in this book.

If you have some experience with deep neural networks, you may find certain concepts more familiar, as LLMs are built upon these architectures.

This book uses PyTorch to implement the code from scratch without using any external LLM libraries. While proficiency in PyTorch is not a prerequisite, familiarity with PyTorch basics is certainly useful. If you are new to PyTorch, Appendix A provides a concise introduction to PyTorch. Alternatively, you may find my book, PyTorch in One Hour: From Tensors to Training Neural Networks on Multiple GPUs, helpful for learning about the essentials.

Hardware Requirements

The code in the main chapters of this book is designed to run on conventional laptops within a reasonable timeframe and does not require specialized hardware. This approach ensures that a wide audience can engage with the material. Additionally, the code automatically utilizes GPUs if they are available. (Please see the setup doc for additional recommendations.)

Video Course

A 17-hour and 15-minute companion video course where I code through each chapter of the book. The course is organized into chapters and sections that mirror the book's structure so that it can be used as a standalone alternative to the book or complementary code-along resource.

Companion Book / Sequel

Build A Reasoning Model (From Scratch), while a standalone book, can be considered as a sequel to Build A Large Language Model (From Scratch).

It starts with a pretrained model and implements different reasoning approaches, including inference-time scaling, reinforcement learning, and distillation, to improve the model's reasoning capabilities.

Similar to Build A Large Language Model (From Scratch), Build A Reasoning Model (From Scratch) takes a hands-on approach implementing these methods from scratch.

Amazon link (TBD)
Manning link
GitHub repository

Exercises

Each chapter of the book includes several exercises. The solutions are summarized in Appendix C, and the corresponding code notebooks are available in the main chapter folders of this repository (for example, ./ch02/01_main-chapter-code/exercise-solutions.ipynb.

In addition to the code exercises, you can download a free 170-page PDF titled Test Yourself On Build a Large Language Model (From Scratch) from the Manning website. It contains approximately 30 quiz questions and solutions per chapter to help you test your understanding.

Bonus Material

Several folders contain optional materials as a bonus for interested readers:

Setup
Chapter 2: Working with text data
Chapter 3: Coding attention mechanisms
- Comparing Efficient Multi-Head Attention Implementations
- Understanding PyTorch Buffers
Chapter 4: Implementing a GPT model from scratch
- FLOPS Analysis
- KV Cache
Chapter 5: Pretraining on unlabeled data:
Chapter 6: Finetuning for classification
Chapter 7: Finetuning to follow instructions

Questions, Feedback, and Contributing to This Repository

I welcome all sorts of feedback, best shared via the Manning Forum or GitHub Discussions. Likewise, if you have any questions or just want to bounce ideas off others, please don't hesitate to post these in the forum as well.

Please note that since this repository contains the code corresponding to a print book, I currently cannot accept contributions that would extend the contents of the main chapter code, as it would introduce deviations from the physical book. Keeping it consistent helps ensure a smooth experience for everyone.

Citation

If you find this book or code useful for your research, please consider citing it.

Chicago-style citation:

Raschka, Sebastian. Build A Large Language Model (From Scratch). Manning, 2024. ISBN: 978-1633437166.

BibTeX entry:

@book{build-llms-from-scratch-book, author = {Sebastian Raschka}, title = {Build A Large Language Model (From Scratch)}, publisher = {Manning}, year = {2024}, isbn = {978-1633437166}, url = {https://www.manning.com/books/build-a-large-language-model-from-scratch}, github = {https://github.com/rasbt/LLMs-from-scratch} }

Owner

Name: Sebastian Raschka
Login: rasbt
Kind: user
Location: Madison, WI
Company: @Lightning-AI , University of Wisconsin-Madison

Website: https://magazine.sebastianraschka.com
Twitter: rasbt
Repositories: 136
Profile: https://github.com/rasbt

Machine Learning and AI researcher & currently research engineer at a startup

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this book or its accompanying code, please cite it as follows."
title: "Build A Large Language Model (From Scratch), Published by Manning, ISBN 978-1633437166"
abstract: "This book provides a comprehensive, step-by-step guide to implementing a ChatGPT-like large language model from scratch in PyTorch."
date-released: 2024-09-12
authors:
  - family-names: "Raschka"
    given-names: "Sebastian"
license: "Apache-2.0"
url: "https://www.manning.com/books/build-a-large-language-model-from-scratch"
repository-code: "https://github.com/rasbt/LLMs-from-scratch"
keywords:
  - large language models
  - natural language processing
  - artificial intelligence
  - PyTorch
  - machine learning
  - deep learning

Committers

Last synced: 9 months ago

All Time

Total Commits: 778
Total Committers: 42
Avg Commits per committer: 18.524
Development Distribution Score (DDS): 0.17

Past Year

Commits: 410
Committers: 27
Avg Commits per committer: 15.185
Development Distribution Score (DDS): 0.195

Top Committers

Name	Email	Commits
rasbt	m**l@s**m	646
Daniel Kleine	5****e	59
Intelligence-Manifesto	1****o	9
TITC	3****C	7
Rayed Bin Wahed	r**d@t**t	6
casinca	4****a	4
Ikko Eltociear Ashimine	e**r@g**m	3
Jinge Wang	w**4@1**m	3
Xiaotian Ma	4****8	2
Suman Debnath	5****a	2
Pietro Monticone	3****e	2
Kasen	1****n	2
Jeroen Van Goey	j**y@g**m	2
Henry Shi	h**h@g**m	2
Greg Gandenberger	g**r@s**m	2
Austin Welch	a****w	1
Eric Berg	e**g@g**m	1
Eric Thomson	t**c@g**m	1
DrCesar	d**r@d**h	1
Sebastian R	A**r@S**n	1
Kostyantyn Borysenko	k**o@g**m	1
rvaneijk	r**b@b**m	1
ridhachahed	3****d	1
joel-foo	j**g@g**m	1
Xiangzhuang Shen	d**r@g**m	1
Victor Skvortsov	v**3@g**m	1
Tim Hopper	t****r	1
Thanh Tran	t**6@g**m	1
Tao Qian	t**n@g**m	1
SSebo	s**z@g**m	1
and 12 more...

Committer Domains (Top 20 + Academic)

cern.ch: 1 pitt.edu: 1 blaeu.com: 1 sebastian-air.lan: 1 drcesar.tech: 1 shoprunner.com: 1 163.com: 1 therapservices.net: 1 sebastianraschka.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 165
Total pull requests: 624
Average time to close issues: 1 day
Average time to close pull requests: about 9 hours
Total issue authors: 82
Total pull request authors: 77
Average comments per issue: 1.5
Average comments per pull request: 1.13
Merged pull requests: 493
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 90
Pull requests: 364
Average time to close issues: 2 days
Average time to close pull requests: about 12 hours
Issue authors: 53
Pull request authors: 46
Average comments per issue: 1.14
Average comments per pull request: 0.98
Merged pull requests: 276
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

labdmitriy (31)
d-kleine (31)
PrinceSajjadHussain (5)
jingedawang (3)
casinca (3)
rasbt (3)
weezymatt (3)
xyang2013 (3)
RahulYadav-tech-gif (3)
WilliamLee30 (2)
qibin0506 (2)
athul-22 (2)
vico (2)
EricTay1997 (2)
frankchieng (2)

Pull Request Authors

rasbt (354)
d-kleine (83)
Intelligence-Manifesto (21)
casinca (11)
ziqiyang107 (7)
TITC (7)
gsganden (6)
rayed-therap (6)
rayedbw (6)
weezymatt (5)
eltociear (5)
conglt-0917 (4)
imkasen (4)
jinesh90 (4)
henrythe9th (4)

Top Labels

Issue Labels

bug (57) question (28) enhancement (3) documentation (2) update (1) text typo (1)

Pull Request Labels

bug (1) question (1)

llms-from-scratch

Science Score: 54.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Build a Large Language Model (From Scratch)

Table of Contents

Prerequisites

Hardware Requirements

Video Course

Companion Book / Sequel

Exercises

Bonus Material

Questions, Feedback, and Contributing to This Repository

Citation

Owner

Citation (CITATION.cff)

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels