forgedit

https://github.com/witcherofresearch/forgedit

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Keywords

dreambooth image-editing image-manipulation stable-diffusion text-guided-image-editing text-guided-image-manipulation

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: witcherofresearch
Language: Python
Default Branch: main
Homepage:
Size: 2.62 MB

Statistics

Stars: 289
Watchers: 3
Forks: 21
Open Issues: 2
Releases: 0

Topics

dreambooth image-editing image-manipulation stable-diffusion text-guided-image-editing text-guided-image-manipulation

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

Forgedit: Text Guided Image Editing via Learning and Forgetting

This is the official implementation of Forgedit: Text Guided Image Editing via Learning and Forgetting.

Abstract

Text-guided image editing on real or synthetic images, given only the original image itself and the target text prompt as inputs, is a very general and challenging task. It requires an editing model to estimate by itself which part of the image should be edited, and then perform either rigid or non-rigid editing while preserving the characteristics of original image. In this paper, we design a novel text-guided image edit- ing method, named as Forgedit. First, we propose a vision-language joint optimization framework capable of reconstructing the original image in 30 seconds, much faster than previous SOTA and much less overfitting. Then we propose a novel vector projection mechanism in text embedding space of Diffusion Models, which is capable to control the identity similarity and editing strength seperately. Finally, we discovered a general property of UNet in Diffusion Models, i.e., UNet encoder learns space and structure, UNet decoder learns appearance and identity. With such a property, we design forgetting mechanisms to successfully tackle the fatal and inevitable overfitting issues when fine-tuning Diffusion Models on one image, thus significantly boosting the editing capability of Diffusion Models. Our method, Forgedit, built on Stable Diffusion, achieves new state-of-the-art results on the challenging text-guided image editing benchmark: TEdBench, surpassing the previous SOTA methods such as Imagic with Imagen, in terms of both CLIP score and LPIPS score.

alt text

Acknowledgement

This code is based on Diffusers implemented Imagic

Installation

The code is tested with the environment of requirements.txt, which takes more than 30 seconds and fewer than 40 seconds to train a model with 512x512 image resolution on one a100 GPU. However, it is not the best setting since previously I tested this code on alibaba cloud with one a800 GPU in a default docker environment, which was faster and took exactly 30 seconds.

TEdBench

The TEdBench proposed in Imagic from Google Research can be found at TEdBench. The complete editing results of our vanilla Forgedit on TEdBench can be found in the vanilla Forgedit tedbench repository. We did not use DreamBooth+Forgedit for quantitative comparison on TEdBench thus the results of DreamBooth+Forgedit are not provided.

Forgedit with Stable Diffusion

To reproduce our results on TEdBench, please use Stable Diffusion 1.4. To reproduce our results on visual storytelling, please use SG161222/RealisticVisionV6.0B1noVAE. The BLIP model is Salesforce/blip-image-captioning-base. Please note that other BLIP variants may lead to inferior performance. In this code release, vanilla Forgedit and DreamBoothForgedit are implemented.

vanilla Forgedit

For example, to reproduce the visual storytelling results, we train, save and edit with SG161222/RealisticVisionV6 model via vanilla Forgedit, text embeddings interpolation is vector subtraction, forgetting strategy is encoderkv, resolution is 768x768, gamma ranges from 0.5 to 0.7. Please note that gamma range could be different for different prompts. This training process takes more than one minute since the resolution is 768x768. If the resolution is set to 512x512, the training time should be 30 to 40 seconds on a100.

accelerate launch src/sample_forgedit_batch_textencoder.py --train=True --edit=True --save=True --forget='encoderkv' --interpolation=vs --targeth=768 --targetw=768 --gammastart=5 --gammaend=8

The model will be saved to ./vanillaforgedit

To edit the image with a saved model, we use the argument --loadfrom

accelerate launch src/sample_forgedit_batch_textencoder.py --train=False --edit=True --save=False --forget='encoderkv' --loadfrom='/mnt/bn/editdiffusion/Forgedit/vanillaforgedit/img=test.jpg_textsteps=400_bsz=10_unetlr=6e-05_textlr=0.001' --interpolation=vs --targeth=768 --targetw=768 --gammastart=5 --gammaend=8

One needs to change the imgurl and prompt in the src/sampleforgeditbatchtextencoder.py.

There are 7 typical forgetting strategies implemented in src/forgeditstablediffusion/pipelineattentionparallel_bsz=1.py, which can be set with --forget argument. Please note the default value is --forget='donotforget' without using forgetting strategies.

More reproducing examples with complete hyperparameters on EditEval v1 could be found in this instruction

DreamBoothForgedit

To fine-tune, save and edit with DreamBoothForgedit with vector projection,

accelerate launch src/sample_dreambooth_batch_textencoder.py --save=True --interpolation=vp

To edit with saved editing models,

accelerate launch src/sample_dreambooth_batch_textencoder.py --train=False --interpolation=vp

Forgetting strategies are implemented in src/forgeditstablediffusion/pipelinedreamboothparallelbsz=1textencoder.py, which can be used in the freezelist in sampledreamboothbatchtextencoder.py

alt text

Citation

Our paper can be found in Forgedit: Text Guided Image Editing via Learning and Forgetting: @article{zhang2023forgedit, title={Forgedit: Text Guided Image Editing via Learning and Forgetting}, author={Zhang, Shiwen and Xiao, Shuai and Huang, Weilin}, journal={arXiv preprint arXiv:2309.10556}, year={2023} }

Owner

Login: witcherofresearch
Kind: user

Repositories: 1
Profile: https://github.com/witcherofresearch

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zhang"
  given-names: "Shiwen"
title: "Pytorch implementation of Forgedit"
version: 1.0.0
date-released: 2024-07-01
url: "https://github.com/witcherofresearch/Forgedit"

GitHub Events

Total

Watch event: 13
Fork event: 2

Last Year

Watch event: 13
Fork event: 2

Dependencies

requirements.txt pypi

accelerate ==0.27.2
diffusers ==0.20.0
torch ==2.1.0
torchvision ==0.16.0
transformers ==4.38.2
xformers ==0.0.22

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science