https://github.com/cyberagentailab/type-r
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, scholar.google -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: CyberAgentAILab
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 5.05 MB
Statistics
- Stars: 5
- Watchers: 0
- Forks: 1
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Type-R: Automatically Retouching Typos for Text-to-Image Generation
Wataru Shimoda1
Naoto Inoue1
Daichi Haraguchi1
Hayato Mitani2
Seichi Uchida2
Kota Yamaguchi1
1CyberAgent, 2Kyushu University
Accepted to CVPR 2025 as a highlight paper

The repository is the official implementation of the paper entitled Type-R: Automatically Retouching Typos for Text-to-Image Generation.
Pipeline
The implementation of Type-R in this repository consists of a three-step pipeline:
- Text-to-image generation
- Generate images from prompts.
- Layout correction
- Performs layout refinement by detecting errors, erasing text, and regenerating the layout.
- Typo correction.
- Renders corrected raster text using a text editing model with OCR-based verification
The pipeline is designed to be plug-and-play, with each module configured using Hydra.
All configuration files are located in src/typerapp/config.

Requirements
📘 Environment
We check the reproducibility under this environment. - Ubuntu 24.04 - Python 3.12 - CUDA 12.6 - PyTorch 2.7.0 - uv 0.7.6
📘 Install
This project manages Python runtime via uv.
This project depends on several packages that involve heavy compilation such as Apex, MaskTextSpotterv3, DeepSolo, and Detectron2.
This project assumes that the environment includes a GPU and CUDA support.
If your system does not have CUDA installed, you can install the required CUDA components using the following commands:
bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
Then, install the required build tools using the command below:
bash
apt-mark unhold $(apt-mark showhold)
apt update
apt -y install \
libfontconfig1 \
libglib2.0-0 \
cuda-nvcc-12-6 \
cuda-profiler-api-12-6 \
libcusparse-dev-12-6 \
libcublas-dev-12-6 \
libcusolver-dev-12-6 \
python3-dev \
libgl1
For more details, see the Dockerfile.
⚠️ The above command assumes that CUDA 12.6 is already installed.
If you're using a different CUDA version, replacing12-6with the appropriate version number should work.
Once the build dependencies are installed, run the following command:
bash
git clone --recursive https://github.com/CyberAgentAILab/Type-R
cd Type-R
./script/apply_patch.sh
uv sync --extra full
⚠️ uv sync may take up to 30 minutes due to building some dependencies. If it completes instantly, your environment might be misconfigured. In that case, refer to the Dockerfile, or try building within a Docker container. You may omit the
--extra fulloption if you do not run the evaluation pipeline to reduce dependencies. ⚠️ This project uses a namespace package, which is currently incompatible with editable installs. Be sure to pass the --no-editable option to uv when syncing dependencies.
To reset the applied patch:
bash
./script/clean_patch.sh
📘 Data resources
We provide the data resources via Hugging Face Datasets. You can download them using the following command:
bash
uv run python tools/dl_resources.py
Or, add the --full option to download all resources:
bash
uv run python tools/dl_resources.py --full
These resources include pretrained model weights, font files, and the MarioEval benchmark dataset if specified full.
⚠️ Some resources with stricter licenses must be downloaded manually. Please refer to the link for details.
📘 GPU resources
Type-R requires different machine specs for each step:
- text-to-image generation
- The text-to-image generation step requires a large amount of VRAM—more than an A100 40GB GPU, especially when using Flux.
- > ⚠️ The run_on_low_vram_gpus option in src/typerapp/config/t2i/flux.yaml allows the model to run on an L4 machine, but inference may take a few minutes.
- layout correction
- Layout correction is relatively lightweight in terms of computational cost compared to the other steps.
- typo correction.
- Typo correction requires a GPU with L4-level specifications when using AnyText.
📘 Permissions of text-to-image models
Flux requires authentication of your Hugging Face profile in order to download model files.
Please see their model card for more information.
You must authenticate your Hugging Face account before running the text-to-image models in the text-to-image generation step by executing:
bash
uv run huggingface-cli login
Usage
📘 Type-R
🔹 Demo
Type-R is designed to be plug-and-play, and module selection is managed via Hydra configuration.
We provide a convenient script to try Type-R using a sample prompt.
To run the demo (configured via src/typerapp/config/demo.yaml):
bash
bash script/demo.sh
- Default output directory: results/demo
- Input prompts are read from resources/prompt/example.txt
- Prompts should be separated by line breaks, with renderable text enclosed in double quotes (")
🔹 Mario-Eval Benchmark (Trial version)
A script is also provided for running Type-R on the Mario-Eval benchmark using only components with permissive licenses and no paid APIs.
bash
bash script/marioevalbench_trial.sh
- Config file: src/typerapp/config/marioevalbench_trial.yaml
- Output directory:
results/marioevalbench_trial - Prompt data (including GPT-4o augmented versions) is provided in: resources/data/marioevalbench/hfds
This script is configured to process a subset of 10 images for the ablation study in the MarioEval benchmark.
See src/typerapp/config/dataset/marioeval_trial.yaml
🔹 Mario-Eval Benchmark (Best configuration)
This configuration achieves the best results reported in the paper. It uses an external model with a non-commercial license and accesses a paid API.
bash
bash script/marioevalbench_best.sh
- Config file: src/typerapp/config/marioevalbench_best.yaml
- Output directory: results/marioevalbench_best
⚠️ Layout correction assumes that the OpenAI API is used. See the usage of the setting from OpenAI API config.
To use Azure OpenAI instead, setuse_azure: truein src/typerapp/config/marioevalbench_best.yaml:
This script is configured to process a subset of all 500 images for the ablation study in the MarioEval benchmark.
See src/typerapp/config/dataset/marioeval.yaml
To run the test set of the MarioEval benchmark, set sub_set: test in src/typerapp/config/dataset/marioeval.yaml.
Please note that this will process 5,000 images.
📘 Evaluation
We provide evaluation scripts in this repository. To run the evaluation scripts on images generated with the best setting:
bash
uv run python -m type_r_app --config-name marioevalbench_best command=evaluation
- You can change the evaluation target by editing the YAML config.
- By default, evaluation includes: VLM evaluation, OCR accuracy, FID score, and CLIPScore.
VLM evaluation options.
- VLM evaluation requires a paid API.
- By default, the system evaluates graphic design quality using
rating_design_quality. - To evaluate other criteria, modify the
evaluationfield in src/typerapp/config/evaluation.yaml.
⚠️ The VLM evaluation assumes that the OpenAI API is used. See the usage of the setting from OpenAI API config.
To use Azure OpenAI instead, setuse_azure: truein src/typerapp/config/evaluation.yaml:
📘 Prompt augmentation
We provide both the data and the code for prompt augmentation. This process requires a paid API.
bash
uv run python -m type_r_app --config-name demo command=prompt-augmentation
- Input: resources/prompt/example.txt
- Output: prompt/augmented.txt under the configured results directory
- Optionally, HFDS format output is also supported (see src/typerapp/launcher/prompt_augmentation.py)
⚠️ Prompt augmentation assumes that the OpenAI API is used. See the usage of the setting from OpenAI API config.
To use Azure OpenAI instead, set
use_azure: truein src/typerapp/config/prompt_augmentation.yaml:
📘 OpenAI API configuration
This repository manages the configuration of the OpenAI API via environment variables.
Please set the following variable:
- OPENAI_API_KEY
To use the Azure OpenAI API instead, please configure the following environment variables accordingly:
- OPENAI_API_VERSION
- AZURE_OPENAI_DEPLOYMENT_NAME
- AZURE_OPENAI_GPT4_DEPLOYMENT_NAME
- AZURE_OPENAI_ENDPOINT
- AZURE_OPENAI_API_KEY
Note that we only verified the basic functionality of the Azure OpenAI API.
📘 Result
We assume the output directory is as follows:
results/ ├── refimg # T2I-generated images ├── layoutcorrectedimg # Images with surplus text removed ├── typocorrectedimg # Final output ├── wordmapping # JSON files with OT-based mapping └── evaluation # Evaluation resultsTo convert the results into an Excel file for easier viewing:
bash
uv run python tools/result2xlsx.py
📘 Test
To run tests, run the following.
bash
uv run pytest tests --gpufunc
License
This project is licensed under the Apache License.
See LICENSE for details.
Third-party licenses
This project depends on the following third-party libraries/components, each of which has its own license:
OCR-related projects
- Deepsolo — Licensed under Adelaidet
- MaskTextSpotterV3 — Licensed under CC BY-NC 4.0
- Apex — Licensed under BSD 3-Clause
- CRAFT — Licensed under MIT License
- MaskRCNN Benchmark — Licensed under MIT License
- Clova Recognition — Licensed under Apache 2.0
- Detectron2 — Licensed under Apache 2.0
- Hi-SAM — Licensed under Apache 2.0
- Paddle — Licensed under Apache 2.0
Text editor
- AnyText — Licensed under Apache 2.0
- UDiffText — Licensed under MIT License
Text remover
- Lama — Licensed under Apache 2.0
- Garnet — Licensed under Apache 2.0
Evaluation metrics
- CLIP score — Licensed under MIT License
- Pytorch FID — Licensed under Apache 2.0
- VLMEval — Licensed under Apache 2.0
Data
- Mario-Eval Benchmark — Licensed under MIT License
No license projects
Our repository does not contain code from the following repositories due to the absence of a license.
Please gather codes and weights from the following links.
- CLIP4str — Licensed under N/A
- Mostel — Licensed under N/A
- TextCtrl — Licensed under N/A
Citation
If you find this code useful for your research, please cite our paper:
@inproceedings{shimoda2025typer,
title={{Type-R: Towards Reproducible Automatic Graphic Design Generation}},
author={Wataru Shimoda and Naoto Inoue and Daichi Haraguchi and Hayato Mitani and Seiichi Uchida and Kota Yamaguchi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025},
}
Owner
- Name: CyberAgent AI Lab
- Login: CyberAgentAILab
- Kind: organization
- Location: Japan
- Website: https://cyberagent.ai/ailab/
- Twitter: cyberagent_ai
- Repositories: 7
- Profile: https://github.com/CyberAgentAILab
GitHub Events
Total
- Issues event: 1
- Watch event: 8
- Delete event: 4
- Push event: 5
- Public event: 1
- Pull request review event: 6
- Pull request review comment event: 1
- Pull request event: 9
- Fork event: 1
- Create event: 1
Last Year
- Issues event: 1
- Watch event: 8
- Delete event: 4
- Push event: 5
- Public event: 1
- Pull request review event: 6
- Pull request review comment event: 1
- Pull request event: 9
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- AlexKuoTW (1)
Pull Request Authors
- kyamagu (5)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- ubuntu 24.04 build
- levenshtein >=0.27.1
- loguru >=0.7.3
- numpy >=2.2.3
- openai >=1.65.5
- pillow >=10.4.0
- pydantic >=2.10.6
- skia-python >=87.6
- type-r-core *
- type-r-eraser *
- numpy >=2.2.3
- opencv-python >=4.6.0.66
- pydantic >=2.10.6
- scikit-image >=0.25.2
- simplejson >=3.20.1
- sortedcontainers >=2.4.0
- torch *
- diffusers >=0.32.2
- easydict >=1.13
- einops >=0.8.1
- fsspec >=2024.2.0
- modelscope >=1.23.2
- numpy >=2.2.3
- open-clip-torch >=2.31.0
- pillow >=10.4.0
- pytorch-lightning >=2.5.0.post0
- safetensors >=0.5.3
- skia-python >=87.6
- torch *
- transformers >=4.49.0
- type-r-core *
- ujson >=5.10.0
- numpy >=2.2.3
- opencv-python >=4.6.0.66
- pillow >=10.4.0
- torch *
- type-r-core *
- clip *
- datasets >=3.3.2,<3.4.0
- jsonlines >=4.0.0
- pandas >=2.2.3
- pydantic >=2.10.6
- scikit-learn >=1.6.1
- type-r-ocr *
- vlmeval *
- DeepSolo *
- MaskTextSpotterV3 *
- addict >=2.4.0
- craft-text-detector *
- einops >=0.8.1
- modelscope <1.24.0
- natsort >=8.4.0
- opencv-python >=4.6.0.66
- paddleocr >=2.9.0,<3.0.0
- paddlepaddle >=2.6.2
- pillow >=10.4.0
- pyclipper >=1.3.0.post6
- pytorch-lightning >=2.5.0.post0
- shapely >=2.0.7
- torch *
- torchvision *
- transformers >=4.49.0
- type-r-core *
- accelerate >=0.34.2
- diffusers >=0.32.2
- google-genai >=1.0.0
- loguru >=0.7.3
- openai >=1.65.5
- pillow >=10.4.0
- requests >=2.32.3
- torch >=2.1.2
- 307 dependencies