https://github.com/cyberagentailab/ralf
[CVPR24 Oral] Official repository for RALF: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
[CVPR24 Oral] Official repository for RALF: Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Basic Info
- Host: GitHub
- Owner: CyberAgentAILab
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2311.13602
- Size: 19.2 MB
Statistics
- Stars: 69
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita1
Naoto Inoue2
Kotaro Kikuchi2
Kota Yamaguchi2
Kiyoharu Aizawa1
1The University of Tokyo,
2CyberAgent
CVPR 2024 (Oral)
[](https://arxiv.org/abs/2311.13602)
[](https://opensource.org/licenses/Apache-2.0)
1The University of Tokyo, 2CyberAgent
Content-aware graphic layout generation aims to automatically arrange visual elements along with a given content, such as an e-commerce product image. This repository aims to provide all-in-one package for content-aware layout generation. If you like this repository, please give it a star!
In this paper, we propose Retrieval-augmented content-aware layout generation. We retrieve nearest neighbor examples based on the input image and use them as a reference to augment the generation process.
Content
- Setup
- Dataset splits
- Pre-processing Dataset
- Training
- Inference & Evaluation
- Inference using a canvas
Overview of Benchmark
We provide not only our method (RALF / Autoreg Baseline) but also other state-of-the-art methods for content-aware layout generation. The following methods are included in this repository: - Autoreg Baseline [Horita+ CVPR24] - RALF [Horita+ CVPR24] - CGL-GAN [Zhou+ IJCAI22] - DS-GAN [Hsu+ CVPR23] - ICVT [Cao+ ACMMM22] - LayoutDM [Inoue+ CVPR23] - MaskGIT [Zhang+ CVPR22] - VQDiffusion [Gu+ CVPR22]
Setup
We recommend using Docker to easily try our code.
1. Requirements
- Python3.9+
- PyTorch 1.13.1
We recommend using Poetry (all settings and dependencies in pyproject.toml).
2. How to install
Local environment
- Install poetry (see official docs).
bash
curl -sSL https://install.python-poetry.org | python3 -
- Install dependencies (it may be slow..)
bash
poetry install
Docker environment
Build a Docker image
bash bash scripts/docker/build.shAttach the container to your shell.
bash bash scripts/docker/exec.shInstall dependencies in the container
bash
poetry install
3. Setup global environment variables
Some variables should be set. Please make scripts/bin/setup.sh on your own. At least these three variables should be set. If you download the provided zip, please ignore the setup.
bash
DATA_ROOT="./cache/dataset"
Some variables might be set (e.g., OMP_NUM_THREADS)
4. Check Checkpoints and experimental results
The checkpoints and generated layouts of the Autoreg Baseline and our RALF for the unconstrained and constrained tasks are available at google drive or Microsoft OneDrive.
After downloading it, please run unzip cache.zip in this directory.
Note that the file size is 13GB.
cache directory contains:
1. the preprocessed CGL dataset in cache/dataset.
2. the weights of the layout encoder and ResNet50 in cache/PRECOMPUTED_WEIGHT_DIR.
3. the pre-computed layout feature of CGL in cache/eval_gt_features.
4. the relationship of elements for a relationship task in cache/pku_cgl_relationships_dic_using_canvas_sort_label_lexico.pt.
5. the checkpoints and evaluation results of both the Autoreg Baseline and our RALF in cache/training_logs.
Dataset splits
Train / Test / Val / Real data splits
We perform preprocessing on the PKU and CGL datasets by partitioning the training set into validation and test subsets, as elaborated in Section 4.1.
The CGL dataset, as distributed, is already segmented into these divisions.
For replication of our results, we furnish details of the filenames within the data_splits/splits/<DATASET_NAME> directory.
We encourage the use of these predefined splits when conducting experiments based on our setting and using our reported scores such as CGL-GAN and DS-GAN.
IDs of retrieved samples
We use the training split as a retrieval source. For example, when RALF is trained with the PKU, the training split of PKU is used for training and evaluation.
We provide the pre-computed correspondense using DreamSim [Fu+ NeurIPS23] in data_splits/retrieval/<DATASET_NAME>. The data structure follows below
yaml
FILENAME:
- FILENAME top1
- FILENAME top2
...
- FILENAME top16
You can load an image from <IMAGE_ROOT>/<FILENAME>.png.
Pre-processing Dataset
We highly recommend to pre-process datasets since you can run your experiments as quick as possible!!
Each script can be used for processing both PKU and CGL by specifying --dataset_type (pku|cgl)
Dataset setup
Folder names with parentheses will be generated by this pipeline.
<DATASET_ROOT>
| - annotation
| | (for PKU)
| | - train_csv_9973.csv
| | - [test_csv_905.csv](https://drive.google.com/file/d/19BIHOdOzVPBqf26SZY0hu1bImIYlRqVd/view?usp=sharing)
| | (for CGL)
| | - layout_train_6w_fixed_v2.json
| | - layout_test_6w_fixed_v2.json
| | - yinhe.json
| - image
| | - train
| | | - original: image with layout elements
| | | - (input): image without layout elements (by inpainting)
| | | - (saliency)
| | | - (saliency_sub)
| | - test
| | | - input: image without layout elements
| | | - (saliency)
| | | - (saliency_sub)
Image inpainting
bash
poetry run python image2layout/hfds_builder/inpainting.py --dataset_root <DATASET_ROOT>
Saliency detection
bash
poetry run python image2layout/hfds_builder/saliency_detection.py --input_dir <INPUT_DIR> --output_dir <OUTPUT_DIR> (--algorithm (isnet|basnet))
Aggregate data and dump to HFDS
bash
poetry run python image2layout/hfds_builder/dump_dataset.py --dataset_root <DATASET_ROOT> --output_dir <OUTPUT_DIR>
Training
Tips
configs/<METHOD>_<DATASET>.sh contains the hyperparameters and settings for each method and dataset. Please refer to the file for the details.
In particular, please check whether the debugging mode DEBUG=True or False.
Autoreg Baseline with CGL
Please run
```bash
bash scripts/train/autoregcgl.sh <GPUID>
If you wanna run train and eval, please run
bash scripts/runjob/endtoend.sh <GPUID e.g. 0> autoreg cgl
whereTASK_NAMEindicates the unconstrained and constrained tasks.
Please refer to the below task list:
1.uncond: Unconstraint generation
2.c: Category → Size + Position
3.cwh: Category + Size → Position
4.partial: Completion
5.refinement: Refinement
6.relation`: Relationship
RALF with CGL
The dataset with inpainting.
Please run
```bash
bash scripts/train/ralfcgl.sh <GPUID>
If you wanna run train and eval, please run
bash scripts/runjob/endtoend.sh <GPUID e.g. 0> ralf cgl
Other methods
For example, these scripts are helpful. end_to_end.sh is a wrapper script for training, inference, and evaluation.
```bash
DS-GAN with CGL dataset
bash scripts/runjob/endto_end.sh 0 dsgan cgl uncond
LayoutDM with CGL dataset
bash scripts/runjob/endto_end.sh 2 layoutdm cgl uncond
CGL-GAN + Retrieval Augmentation with CGL dataset
bash scripts/runjob/endtoend.sh 2 cglganra cgl uncond ```
Inference & Evaluation
Experimental results are provided in cache/training_logs. For example, a directory of autoreg_c_cgl, which the results of the Autoreg Baseline with Category → Size + Position task, includes:
1. test_<SEED>.pkl: the generated layouts
2. layout_test_<SEED>.png: the rendered layouts, in which top sample is ground truth and bottom sample is a predicted sample
3. gen_final_model.pt: the final checkpoint
4. scores_test.tex: summarized qualitative results
Annotated split
Please see and run
bash
bash scripts/eval_inference/eval_inference.sh <GPU_ID> <JOB_DIR> <COND_TYPE> cgl
For example, ```bash
Autoreg Baseline with Unconstraint generation
bash scripts/evalinference/evalinference.sh 0 "cache/traininglogs/autoreguncond_cgl" uncond cgl ```
Unannotated split
The dataset with real canvas i.e. no inpainting.
Please see and run
bash
bash scripts/eval_inference/eval_inference_all.sh <GPU_ID>
Inference using a canvas
Please run
bash
bash scripts/run_job/inference_single_data.sh <GPU_ID> <JOB_DIR> cgl <SAMPLE_ID>
where SAMPLE_ID can optionally be set as a dataset index.
For example,
bash
bash scripts/run_job/inference_single_data.sh 0 "./cache/training_logs/ralf_uncond_cgl" cgl
Inference using your personal data
Please customize image2layout/train/inferencesingledata.py to load your data.
Citation
If you find our work useful in your research, please consider citing:
bibtex
@article{horita2024retrievalaugmented,
title={{Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation}},
author={Daichi Horita and Naoto Inoue and Kotaro Kikuchi and Kota Yamaguchi and Kiyoharu Aizawa},
booktitle={CVPR},
year={2024}
}
Owner
- Name: CyberAgent AI Lab
- Login: CyberAgentAILab
- Kind: organization
- Location: Japan
- Website: https://cyberagent.ai/ailab/
- Twitter: cyberagent_ai
- Repositories: 7
- Profile: https://github.com/CyberAgentAILab
GitHub Events
Total
- Issues event: 8
- Watch event: 36
- Issue comment event: 4
- Fork event: 4
Last Year
- Issues event: 8
- Watch event: 36
- Issue comment event: 4
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 5
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 5
- Total pull request authors: 0
- Average comments per issue: 0.4
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 5
- Pull request authors: 0
- Average comments per issue: 0.4
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- shunk031 (2)
- AtsukiOsanai (2)
- KamiEbieSan (1)
- deadsmither5 (1)
- szlou-meta (1)
- trouble-maker007 (1)
- theKinsley (1)
- lijiaqi (1)
- SpadgerBoy (1)
- Jackieam (1)
- hyer (1)
- yangtao2019yt (1)
- HenryQUQ (1)
- szh0808 (1)
- wd1511 (1)
Pull Request Authors
- UdonDa (3)
- naoto0804 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- nvidia/cuda 11.8.0-cudnn8-devel-ubuntu20.04 build
- ipykernel ^6.23.3 develop
- ipython <8.12.1 develop
- pysen ^0.10.4 develop
- pytest ^7.4.0 develop
- datasets ^2.13.0
- dreamsim ^0.1.3
- einops ^0.6.1
- faiss-cpu ^1.7.4
- gcsfs ^2023.1.0
- gdown ^4.7.1
- hydra-core ^1.3.2
- multiprocess >=0.70.12
- omegaconf ^2.3.0
- opencv-python ^4.8.0.74
- pillow ^9.5.0
- prdc ^0.2
- protobuf <=3.20.3
- python >=3.9,<3.11
- python-json-logger ^2.0.7
- pytorch-fid ^0.3.0
- pyyaml ^6.0.1
- rich ^13.5.2
- scipy <=1.10.1
- seaborn ^0.12.2
- setuptools ^68.0.0
- tensorboard ^2.13.0
- tensorflow ^2.12.0
- tensorflow-datasets ^4.9.2
- timm ^0.9.5
- torch 1.13.1
- torch-tb-profiler ^0.4.1
- torchvision 0.14.1