https://github.com/cyberagentailab/ralf

[CVPR24 Oral] Official repository for RALF: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, scholar.google
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Keywords

cvpr2024 generative-ai layout retrieval-augmented-generation

Last synced: 5 months ago · JSON representation

Repository

[CVPR24 Oral] Official repository for RALF: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Basic Info

Host: GitHub
Owner: CyberAgentAILab
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2311.13602
Size: 19.2 MB

Statistics

Stars: 69
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 0

Topics

cvpr2024 generative-ai layout retrieval-augmented-generation

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Daichi Horita¹ Naoto Inoue² Kotaro Kikuchi² Kota Yamaguchi² Kiyoharu Aizawa¹
¹The University of Tokyo, ²CyberAgent

CVPR 2024 (Oral)

[![arxiv paper](https://img.shields.io/badge/arxiv-paper-orange)](https://arxiv.org/abs/2311.13602)

[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. This repository aims to provide all-in-one package for content-aware layout generation. If you like this repository, please give it a star!

teaser In this paper, we propose Retrieval-augmented content-aware layout generation. We retrieve nearest neighbor examples based on the input image and use them as a reference to augment the generation process.

Content

Setup
Dataset splits
Pre-processing Dataset
Training
Inference & Evaluation
Inference using a canvas

Overview of Benchmark

We provide not only our method (RALF / Autoreg Baseline) but also other state-of-the-art methods for content-aware layout generation. The following methods are included in this repository: - Autoreg Baseline [Horita+ CVPR24] - RALF [Horita+ CVPR24] - CGL-GAN [Zhou+ IJCAI22] - DS-GAN [Hsu+ CVPR23] - ICVT [Cao+ ACMMM22] - LayoutDM [Inoue+ CVPR23] - MaskGIT [Zhang+ CVPR22] - VQDiffusion [Gu+ CVPR22]

Setup

We recommend using Docker to easily try our code.

1. Requirements

Python3.9+
PyTorch 1.13.1

We recommend using Poetry (all settings and dependencies in pyproject.toml).

2. How to install

Local environment

Install poetry (see official docs).

bash curl -sSL https://install.python-poetry.org | python3 -

Install dependencies (it may be slow..)

bash poetry install

Docker environment

Build a Docker image bash bash scripts/docker/build.sh
Attach the container to your shell. bash bash scripts/docker/exec.sh
Install dependencies in the container

bash poetry install

3. Setup global environment variables

Some variables should be set.　Please make scripts/bin/setup.sh on your own.　At least these three variables should be set. If you download the provided zip, please ignore the setup.

bash DATA_ROOT="./cache/dataset"

Some variables might be set (e.g., OMP_NUM_THREADS)

4. Check Checkpoints and experimental results

The checkpoints and generated layouts of the Autoreg Baseline and our RALF for the unconstrained and constrained tasks are available at google drive or Microsoft OneDrive. After downloading it, please run unzip cache.zip in this directory. Note that the file size is 13GB.

cache directory contains: 1. the preprocessed CGL dataset in cache/dataset. 2. the weights of the layout encoder and ResNet50 in cache/PRECOMPUTED_WEIGHT_DIR. 3. the pre-computed layout feature of CGL in cache/eval_gt_features. 4. the relationship of elements for a relationship task in cache/pku_cgl_relationships_dic_using_canvas_sort_label_lexico.pt. 5. the checkpoints and evaluation results of both the Autoreg Baseline and our RALF in cache/training_logs.

Dataset splits

Train / Test / Val / Real data splits

We perform preprocessing on the PKU and CGL datasets by partitioning the training set into validation and test subsets, as elaborated in Section 4.1. The CGL dataset, as distributed, is already segmented into these divisions. For replication of our results, we furnish details of the filenames within the data_splits/splits/<DATASET_NAME> directory. We encourage the use of these predefined splits when conducting experiments based on our setting and using our reported scores such as CGL-GAN and DS-GAN.

IDs of retrieved samples

We use the training split as a retrieval source. For example, when RALF is trained with the PKU, the training split of PKU is used for training and evaluation. We provide the pre-computed correspondense using DreamSim [Fu+ NeurIPS23] in data_splits/retrieval/<DATASET_NAME>. The data structure follows below yaml FILENAME: - FILENAME top1 - FILENAME top2 ... - FILENAME top16 You can load an image from <IMAGE_ROOT>/<FILENAME>.png.

Pre-processing Dataset

We highly recommend to pre-process datasets since you can run your experiments as quick as possible!!
Each script can be used for processing both PKU and CGL by specifying --dataset_type (pku|cgl)

Dataset setup

Folder names with parentheses will be generated by this pipeline.

Image inpainting

bash poetry run python image2layout/hfds_builder/inpainting.py --dataset_root <DATASET_ROOT>

Saliency detection

bash poetry run python image2layout/hfds_builder/saliency_detection.py --input_dir <INPUT_DIR> --output_dir <OUTPUT_DIR> (--algorithm (isnet|basnet))

Aggregate data and dump to HFDS

bash poetry run python image2layout/hfds_builder/dump_dataset.py --dataset_root <DATASET_ROOT> --output_dir <OUTPUT_DIR>

Training

Tips

configs/<METHOD>_<DATASET>.sh contains the hyperparameters and settings for each method and dataset. Please refer to the file for the details. In particular, please check whether the debugging mode DEBUG=True or False.

Autoreg Baseline with CGL

Please run ```bash bash scripts/train/autoregcgl.sh <GPUID>

If you wanna run train and eval, please run

bash scripts/runjob/endtoend.sh <GPUID e.g. 0> autoreg cgl ``whereTASK_NAMEindicates the unconstrained and constrained tasks. Please refer to the below task list: 1.uncond: Unconstraint generation 2.c: Category → Size + Position 3.cwh: Category + Size → Position 4.partial: Completion 5.refinement: Refinement 6.relation`: Relationship

RALF with CGL

The dataset with inpainting.

Please run ```bash bash scripts/train/ralfcgl.sh <GPUID>

If you wanna run train and eval, please run

bash scripts/runjob/endtoend.sh <GPUID e.g. 0> ralf cgl ```

Other methods

For example, these scripts are helpful. end_to_end.sh is a wrapper script for training, inference, and evaluation. ```bash

DS-GAN with CGL dataset

bash scripts/runjob/endto_end.sh 0 dsgan cgl uncond

LayoutDM with CGL dataset

bash scripts/runjob/endto_end.sh 2 layoutdm cgl uncond

CGL-GAN + Retrieval Augmentation with CGL dataset

bash scripts/runjob/endtoend.sh 2 cglganra cgl uncond ```

Inference & Evaluation

Experimental results are provided in cache/training_logs. For example, a directory of autoreg_c_cgl, which the results of the Autoreg Baseline with Category → Size + Position task, includes: 1. test_<SEED>.pkl: the generated layouts 2. layout_test_<SEED>.png: the rendered layouts, in which top sample is ground truth and bottom sample is a predicted sample 3. gen_final_model.pt: the final checkpoint 4. scores_test.tex: summarized qualitative results

Annotated split

Please see and run bash bash scripts/eval_inference/eval_inference.sh <GPU_ID> <JOB_DIR> <COND_TYPE> cgl

For example, ```bash

Autoreg Baseline with Unconstraint generation

bash scripts/evalinference/evalinference.sh 0 "cache/traininglogs/autoreguncond_cgl" uncond cgl ```

Unannotated split

The dataset with real canvas i.e. no inpainting.

Please see and run bash bash scripts/eval_inference/eval_inference_all.sh <GPU_ID>

Inference using a canvas

Please run bash bash scripts/run_job/inference_single_data.sh <GPU_ID> <JOB_DIR> cgl <SAMPLE_ID> where SAMPLE_ID can optionally be set as a dataset index.

For example, bash bash scripts/run_job/inference_single_data.sh 0 "./cache/training_logs/ralf_uncond_cgl" cgl

Inference using your personal data

Please customize image2layout/train/inferencesingledata.py to load your data.

Citation

If you find our work useful in your research, please consider citing: bibtex @article{horita2024retrievalaugmented, title={{Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation}}, author={Daichi Horita and Naoto Inoue and Kotaro Kikuchi and Kota Yamaguchi and Kiyoharu Aizawa}, booktitle={CVPR}, year={2024} }

Owner

Name: CyberAgent AI Lab
Login: CyberAgentAILab
Kind: organization
Location: Japan

Website: https://cyberagent.ai/ailab/
Twitter: cyberagent_ai
Repositories: 7
Profile: https://github.com/CyberAgentAILab

GitHub Events

Total

Issues event: 8
Watch event: 36
Issue comment event: 4
Fork event: 4

Last Year

Issues event: 8
Watch event: 36
Issue comment event: 4
Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 5
Total pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Total issue authors: 5
Total pull request authors: 0
Average comments per issue: 0.4
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: 1 day
Average time to close pull requests: N/A
Issue authors: 5
Pull request authors: 0
Average comments per issue: 0.4
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

shunk031 (2)
AtsukiOsanai (2)
KamiEbieSan (1)
deadsmither5 (1)
szlou-meta (1)
trouble-maker007 (1)
theKinsley (1)
lijiaqi (1)
SpadgerBoy (1)
Jackieam (1)
hyer (1)
yangtao2019yt (1)
HenryQUQ (1)
szh0808 (1)
wd1511 (1)

Pull Request Authors

UdonDa (3)
naoto0804 (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Dockerfile docker

nvidia/cuda 11.8.0-cudnn8-devel-ubuntu20.04 build

pyproject.toml pypi

ipykernel ^6.23.3 develop
ipython <8.12.1 develop
pysen ^0.10.4 develop
pytest ^7.4.0 develop
datasets ^2.13.0
dreamsim ^0.1.3
einops ^0.6.1
faiss-cpu ^1.7.4
gcsfs ^2023.1.0
gdown ^4.7.1
hydra-core ^1.3.2
multiprocess >=0.70.12
omegaconf ^2.3.0
opencv-python ^4.8.0.74
pillow ^9.5.0
prdc ^0.2
protobuf <=3.20.3
python >=3.9,<3.11
python-json-logger ^2.0.7
pytorch-fid ^0.3.0
pyyaml ^6.0.1
rich ^13.5.2
scipy <=1.10.1
seaborn ^0.12.2
setuptools ^68.0.0
tensorboard ^2.13.0
tensorflow ^2.12.0
tensorflow-datasets ^4.9.2
timm ^0.9.5
torch 1.13.1
torch-tb-profiler ^0.4.1
torchvision 0.14.1

https://github.com/cyberagentailab/ralf

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation

Daichi Horita1 Naoto Inoue2 Kotaro Kikuchi2 Kota Yamaguchi2 Kiyoharu Aizawa1 1The University of Tokyo, 2CyberAgent

CVPR 2024 (Oral)

Content

Overview of Benchmark

Setup

1. Requirements

2. How to install

Local environment

Docker environment

3. Setup global environment variables

4. Check Checkpoints and experimental results

Dataset splits

Train / Test / Val / Real data splits

IDs of retrieved samples

Pre-processing Dataset

Dataset setup

Image inpainting

Saliency detection

Aggregate data and dump to HFDS

Training

Tips

Autoreg Baseline with CGL

If you wanna run train and eval, please run

RALF with CGL

If you wanna run train and eval, please run

Other methods

DS-GAN with CGL dataset

LayoutDM with CGL dataset

CGL-GAN + Retrieval Augmentation with CGL dataset

Inference & Evaluation

Annotated split

Autoreg Baseline with Unconstraint generation

Unannotated split

Inference using a canvas

Inference using your personal data

Citation

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Daichi Horita¹ Naoto Inoue² Kotaro Kikuchi² Kota Yamaguchi² Kiyoharu Aizawa¹
¹The University of Tokyo, ²CyberAgent