https://github.com/bytedance/1d-tokenizer

This repo contains the code for 1D tokenizer and generator

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.4%) to scientific vocabulary

Keywords

research

Last synced: 10 months ago · JSON representation

Repository

This repo contains the code for 1D tokenizer and generator

Basic Info

Host: GitHub
Owner: bytedance
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 28.6 MB

Statistics

Stars: 884
Watchers: 20
Forks: 46
Open Issues: 41
Releases: 0

Topics

research

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

1D Visual Tokenization and Generation

This repo hosts the code and models for the following projects:

FlowTok: FlowTok: Flowing Seamlessly Across Text and Image Tokens
TA-TiTok & MaskGen: Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
RAR: Randomized Autoregressive Visual Generation
TiTok: An Image is Worth 32 Tokens for Reconstruction and Generation

Updates

03/16/2025: The tech report of FlowTok is available. FlowTok is a minimal yet powerful framework that seamlessly flows across text and images by encoding images into a compact 1D token representation. Code will be released soon.
02/24/2025: We release the training code, inference code and model weights of MaskGen.
01/17/2025: We release the training code, inference code and model weights of TA-TiTok.
01/14/2025: The tech report of TA-TiTok and MaskGen is available. TA-TiTok is an innovative text-aware transformer-based 1-dimensional tokenizer designed to handle both discrete and continuous tokens. MaskGen is a powerful and efficient text-to-image masked generative model trained exclusively on open-data. For more details, refer to the README_MaskGen.
11/04/2024: We release the tech report and code for RAR models.
10/16/2024: We update a set of TiTok tokenizer weights trained with an updated single-stage recipe, leading to easier training and better performance. We release the weight of different model size for both VQ and VAE variants TiTok, which we hope could facilitate the research in this area. More details are available in the tech report of TA-TiTok.
09/25/2024: TiTok is accepted by NeurIPS 2024.
09/11/2024: Release the training codes of generator based on TiTok.
08/28/2024: Release the training codes of TiTok.
08/09/2024: Better support on loading pretrained weights from huggingface models, thanks for the help from @NielsRogge！
07/03/2024: Evaluation scripts for reproducing the results reported in the paper, checkpoints of TiTok-B64 and TiTok-S128 are available.
06/21/2024: Demo code and TiTok-L-32 checkpoints release.
06/11/2024: The tech report of TiTok is available.

Short Intro on Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (README)

We introduce TA-TiTok, a novel text-aware transformer-based 1D tokenizer designed to handle both discrete and continuous tokens while effectively aligning reconstructions with textual descriptions. Building on TA-TiTok, we present MaskGen, a versatile text-to-image masked generative model framework. Trained exclusively on open data, MaskGen demonstrates outstanding performance: with 32 continuous tokens, it achieves a FID score of 6.53 on MJHQ-30K, and with 128 discrete tokens, it attains an overall score of 0.57 on GenEval.

teaser

See more details at README_MaskGen.

Short Intro on Randomized Autoregressive Visual Generation (README)

RAR is a an autoregressive (AR) image generator with full compatibility to language modeling. It introduces a randomness annealing strategy with permuted objective at no additional cost, which enhances the model's ability to learn bidirectional contexts while leaving the autoregressive framework intact. RAR sets a FID score 1.48, demonstrating state-of-the-art performance on ImageNet-256 benchmark and significantly outperforming prior AR image generators.

teaser

See more details at README_RAR.

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

We present a compact 1D tokenizer which can represent an image with as few as 32 discrete tokens. As a result, it leads to a substantial speed-up on the sampling process (e.g., 410 × faster than DiT-XL/2) while obtaining a competitive generation quality.

teaser

See more details at README_TiTok.

Installation

shell pip3 install -r requirements.txt

Citing

If you use our work in your research, please use the following BibTeX entry.

BibTeX @article{he2025flowtok, author = {Ju He and Qihang Yu and Qihao Liu and Liang-Chieh Chen}, title = {FlowTok: Flowing Seamlessly Across Text and Image Tokens}, journal = {arXiv preprint arXiv:2503.10772}, year = {2025} }

BibTeX @article{kim2025democratizing, author = {Dongwon Kim and Ju He and Qihang Yu and Chenglin Yang and Xiaohui Shen and Suha Kwak and Liang-Chieh Chen}, title = {Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens}, journal = {arXiv preprint arXiv:2501.07730}, year = {2025} }

BibTeX @article{yu2024randomized, author = {Qihang Yu and Ju He and Xueqing Deng and Xiaohui Shen and Liang-Chieh Chen}, title = {Randomized Autoregressive Visual Generation}, journal = {arXiv preprint arXiv:2411.00776}, year = {2024} }

BibTeX @article{yu2024an, author = {Qihang Yu and Mark Weber and Xueqing Deng and Xiaohui Shen and Daniel Cremers and Liang-Chieh Chen}, title = {An Image is Worth 32 Tokens for Reconstruction and Generation}, journal = {NeurIPS}, year = {2024} }

Acknowledgement

Owner

Name: Bytedance Inc.
Login: bytedance
Kind: organization
Location: Singapore

Website: https://opensource.bytedance.com
Twitter: ByteDanceOSS
Repositories: 255
Profile: https://github.com/bytedance

GitHub Events

Total

Issues event: 65
Watch event: 493
Issue comment event: 84
Push event: 16
Pull request event: 34
Fork event: 44

Last Year

Issues event: 65
Watch event: 493
Issue comment event: 84
Push event: 16
Pull request event: 34
Fork event: 44

Committers

Last synced: about 1 year ago

All Time

Total Commits: 60
Total Committers: 5
Avg Commits per committer: 12.0
Development Distribution Score (DDS): 0.7

Past Year

Commits: 60
Committers: 5
Avg Commits per committer: 12.0
Development Distribution Score (DDS): 0.7

Top Committers

Name	Email	Commits
Qihang Yu	y**o@g**m	18
cornettoyu	1****u	17
TACJu	j**7@j**u	13
Niels	n**1@g**m	6
Ju He	j**1@b**m	6

Committer Domains (Top 20 + Academic)

bytedance.com: 1 jh.edu: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 81
Total pull requests: 46
Average time to close issues: 17 days
Average time to close pull requests: about 3 hours
Total issue authors: 70
Total pull request authors: 9
Average comments per issue: 1.22
Average comments per pull request: 0.11
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 58
Pull requests: 39
Average time to close issues: 8 days
Average time to close pull requests: 38 minutes
Issue authors: 48
Pull request authors: 8
Average comments per issue: 1.05
Average comments per pull request: 0.0
Merged pull requests: 29
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

RohollahHS (5)
Sunnet314159 (2)
jeasinema (2)
localbetascreening (2)
Peterande (2)
Doctor-James (2)
sorobedio (2)
torahoang (1)
Jason3900 (1)
hp-l33 (1)
Changlin-Lee (1)
NilanEkanayake (1)
juncongmoo (1)
dongzhuoyao (1)
Faded1022 (1)

Pull Request Authors

TACJu (21)
yucornetto (15)
Yubel426 (2)
NielsRogge (2)
laiviet (2)
QY-H00 (2)
rand0musername (1)
eltociear (1)
AlienKevin (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

requirements.txt pypi

accelerate *
einops *
gdown *
huggingface-hub *
omegaconf *
open_clip_torch *
pillow *
scipy *
timm *
torch >=2.0.0
torchvision *
transformers *

https://github.com/bytedance/1d-tokenizer

Science Score: 46.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

1D Visual Tokenization and Generation

Updates

Short Intro on Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens (README)

Short Intro on Randomized Autoregressive Visual Generation (README)

Short Intro on An Image is Worth 32 Tokens for Reconstruction and Generation (README)

Installation

Citing

Acknowledgement

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies