video-compression-and-future-prediction-using-gpt

This repository presents a project focused on advanced video compression and future prediction using Generative Pre-trained Transformer (GPT) and other state-of-the-art techniques.

https://github.com/rishikesh-jadhav/video-compression-and-future-prediction-using-gpt

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.5%) to scientific vocabulary

Keywords

computer-vision deep-learning gpt nlp python pytorch vq-vae

Last synced: 9 months ago · JSON representation ·

Repository

This repository presents a project focused on advanced video compression and future prediction using Generative Pre-trained Transformer (GPT) and other state-of-the-art techniques.

Basic Info

Host: GitHub
Owner: Rishikesh-Jadhav
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 70.2 MB

Statistics

Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 0

Topics

computer-vision deep-learning gpt nlp python pytorch vq-vae

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

README.md

| Source Video | Compressed Video | Future Prediction | | --------------- | ---------------- |------------------ | |

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains: - encoder/decoder models used to heavily compress driving scenes - a world model trained on 3,000,000 minutes of driving videos - a dataset of 100,000 minutes of compressed driving videos

Task

Lossless compression challenge: make me smaller! $500 challenge

Losslessly compress 5,000 minutes of driving video "tokens". Go to ./compression/ to start

Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE

Submit a single zip file containing the compressed data and a python script to decompress it into its original form. Top solutions are listed on comma's official leaderboard.

| Implementation | Compression rate | | :----------------------------------------------------------------------------------| ---------------: | | pkourouklidis (arithmetic coding with GPT) | 2.6 | | anonymous (zpaq) | 2.3 | | rostislav (zpaq) | 2.3 | | anonymous (zpaq) | 2.2 | | anonymous (zpaq) | 2.2 | | 0x41head (zpaq) | 2.2 | | tillinf (zpaq) | 2.2 | | baseline (lzma) | 1.6 |

Overview

A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

Examples

./notebooks/encode.ipynb and ./notebooks/decode.ipynb for an example of how to visualize the dataset using a segment of driving video from comma's drive to Taco Bell

./notebooks/gpt.ipynb for an example of how to use the world model to imagine future frames.

./compression/compress.py for an example of how to compress the tokens using lzma

Download the dataset

Using huggingface datasets python import numpy as np from datasets import load_dataset num_proc = 40 # CPUs go brrrr ds = load_dataset('commaai/commavq', num_proc=num_proc) tokens = np.load(ds['0'][0]['path']) # first segment from the first data shard
Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

References

[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

Owner

Name: Rishikesh Jadhav
Login: Rishikesh-Jadhav
Kind: user

Repositories: 2
Profile: https://github.com/Rishikesh-Jadhav

Robotics Masters student at the University of Maryland - College Park

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use commavq, please cite it as below."
authors:
- family-names: "comma.ai"
title: "commavq: a dataset of tokenized driving video and a GPT model"
date-released: 2023-06-25
url: "https://github.com/commaai/commavq/"

GitHub Events

Total

Watch event: 2
Fork event: 1

Last Year

Watch event: 2
Fork event: 1

Committers

Last synced: over 1 year ago

All Time

Total Commits: 2
Total Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Rishikesh Jadhav	9****v	2

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

video-compression-and-future-prediction-using-gpt

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Task

Lossless compression challenge: make me smaller! $500 challenge

Overview

Examples

Download the dataset

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies