video-compression-and-future-prediction-using-gpt

This repository presents a project focused on advanced video compression and future prediction using Generative Pre-trained Transformer (GPT) and other state-of-the-art techniques.

https://github.com/rishikesh-jadhav/video-compression-and-future-prediction-using-gpt

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.5%) to scientific vocabulary

Keywords

computer-vision deep-learning gpt nlp python pytorch vq-vae
Last synced: 6 months ago · JSON representation ·

Repository

This repository presents a project focused on advanced video compression and future prediction using Generative Pre-trained Transformer (GPT) and other state-of-the-art techniques.

Basic Info
  • Host: GitHub
  • Owner: Rishikesh-Jadhav
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 70.2 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
computer-vision deep-learning gpt nlp python pytorch vq-vae
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

| Source Video | Compressed Video | Future Prediction | | --------------- | ---------------- |------------------ | |

A world model is a model that can predict the next state of the world given the observed previous states and actions.

World models are essential to training all kinds of intelligent agents, especially self-driving models.

commaVQ contains: - encoder/decoder models used to heavily compress driving scenes - a world model trained on 3,000,000 minutes of driving videos - a dataset of 100,000 minutes of compressed driving videos

Task

Lossless compression challenge: make me smaller! $500 challenge

Losslessly compress 5,000 minutes of driving video "tokens". Go to ./compression/ to start

Prize: highest compression rate on 5,000 minutes of driving video (~915MB) - Challenge ended July, 1st 2024 11:59pm AOE

Submit a single zip file containing the compressed data and a python script to decompress it into its original form. Top solutions are listed on comma's official leaderboard.

| Implementation | Compression rate | | :----------------------------------------------------------------------------------| ---------------: | | pkourouklidis (arithmetic coding with GPT) | 2.6 | | anonymous (zpaq) | 2.3 | | rostislav (zpaq) | 2.3 | | anonymous (zpaq) | 2.2 | | anonymous (zpaq) | 2.2 | | 0x41head (zpaq) | 2.2 | | tillinf (zpaq) | 2.2 | | baseline (lzma) | 1.6 |

Overview

A VQ-VAE [1,2] was used to heavily compress each video frame into 128 "tokens" of 10 bits each. Each entry of the dataset is a "segment" of compressed driving video, i.e. 1min of frames at 20 FPS. Each file is of shape 1200x8x16 and saved as int16.

A world model [3] was trained to predict the next token given a context of past tokens. This world model is a Generative Pre-trained Transformer (GPT) [4] trained on 3,000,000 minutes of driving videos following a similar recipe to [5].

Examples

./notebooks/encode.ipynb and ./notebooks/decode.ipynb for an example of how to visualize the dataset using a segment of driving video from comma's drive to Taco Bell

./notebooks/gpt.ipynb for an example of how to use the world model to imagine future frames.

./compression/compress.py for an example of how to compress the tokens using lzma

Download the dataset

  • Using huggingface datasets python import numpy as np from datasets import load_dataset num_proc = 40 # CPUs go brrrr ds = load_dataset('commaai/commavq', num_proc=num_proc) tokens = np.load(ds['0'][0]['path']) # first segment from the first data shard
  • Manually download from huggingface datasets repository: https://huggingface.co/datasets/commaai/commavq

References

[1] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

[2] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.

[3] https://worldmodels.github.io/

[4] Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).

[5] Micheli, Vincent, Eloi Alonso, and François Fleuret. "Transformers are Sample-Efficient World Models." The Eleventh International Conference on Learning Representations. 2022.

Owner

  • Name: Rishikesh Jadhav
  • Login: Rishikesh-Jadhav
  • Kind: user

Robotics Masters student at the University of Maryland - College Park

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use commavq, please cite it as below."
authors:
- family-names: "comma.ai"
title: "commavq: a dataset of tokenized driving video and a GPT model"
date-released: 2023-06-25
url: "https://github.com/commaai/commavq/"

GitHub Events

Total
  • Watch event: 2
  • Fork event: 1
Last Year
  • Watch event: 2
  • Fork event: 1

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 2
  • Total Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Rishikesh Jadhav 9****v 2

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • datasets ==2.15.0
  • torch ==2.2.2
  • tqdm *