https://github.com/baoblei/story-adapter

A Training-free Iterative Framework for Long Story Visualization

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

A Training-free Iterative Framework for Long Story Visualization

Basic Info

Host: GitHub
Owner: baoblei
License: mit
Language: Python
Default Branch: main
Homepage: https://jwmao1.github.io/storyadapter/
Size: 261 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of UCSC-VLAA/story-adapter

Created over 1 year ago · Last pushed over 1 year ago

https://github.com/baoblei/story-adapter/blob/main/


  




# Story-Adapter: A Training-free Iterative Framework for Long Story Visualization


  




Code for the paper [Story-Adapter: A Training-free Iterative Framework for Long Story Visualization](https://arxiv.org/abs/2410.06244)

Note: This code base is still not complete. 

### About this repo:

The repository contains the official implementation of "Story-Adapter".

## Introduction 

> Story visualization, the task of generating coherent images based on a narrative, has seen significant advancements with the emergence of text-to-image models, particularly diffusion models. However, maintaining semantic consistency, generating high-quality fine-grained interactions, and ensuring computational feasibility remain challenging, especially in long story visualization (_i.e._, up to 100 frames). In this work, we propose a training-free and computationally efficient framework, termed **Story-Adapter**, to enhance the generative capability of long stories. Specifically, we propose an _iterative_ paradigm to refine each generated image, leveraging both the text prompt and all generated images from the previous iteration. Central to our framework is a training-free global reference cross-attention module, which aggregates all generated images from the previous iteration to preserve semantic consistency across the entire story, while minimizing computational costs with global embeddings. This iterative process progressively optimizes image generation by repeatedly incorporating text constraints, resulting in more precise and fine-grained interactions. Extensive experiments validate the superiority of Story-Adapter in improving both semantic consistency and generative capability for fine-grained interactions, particularly in long story scenarios.







## News 
* **2024.10.10**: [Paper](https://arxiv.org/abs/2410.06244) is released on ArXiv.
* **2024.10.04**: Code released.

## Framework  

> Story-Adapter framework. Illustration of the proposed iterative paradigm, which consists of initialization, iterations in Story-Adapter, and implementation of Global Reference Cross-Attention (GRCA).
Story-Adapter first visualizes each image only based on the text prompt of the story and uses all results as reference images for the future round. 
In the iterative paradigm, Story-Adapter inserts GRCA into SD. For the ith iteration of each image visualization, GRCA will aggregate the information flow of all reference images during the denoising process through cross-attention.
All results from this iteration will be used as a reference image to guide the dynamic update of the story visualization in the next iteration.







## Quick Start 

### Installation
The project is built with Python 3.10.14, PyTorch 2.2.2. CUDA 12.1, cuDNN 8.9.02
For installing, follow these instructions:
~~~
# git clone this repository
git clone https://github.com/UCSC-VLAA/story-adapter.git
cd story-adapter

# create new anaconda env
conda create -n StoryAdapter python=3.10
conda activate StoryAdapter 

# install packages
pip install -r requirements.txt
~~~

### Download the checkpoint
- downloading [RealVisXL_V4.0](https://huggingface.co/SG161222/RealVisXL_V4.0/tree/main) put it into "./RealVisXL_V4.0"
- downloading [clip_image_encoder](https://huggingface.co/h94/IP-Adapter/tree/main/sdxl_models/image_encoder) put it into "./IP-Adapter/sdxl_models/image_encoder"
- downloading [ip-adapter_sdxl](https://huggingface.co/h94/IP-Adapter/resolve/main/sdxl_models/ip-adapter_sdxl.bin?download=true) put it into "./IP-Adapter/sdxl_models/ip-adapter_sdxl.bin"

### Running Demo

~~~
python run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path//IP-Adapter/sdxl_models/ip-adapter_sdxl.bin 
~~~

### Customized Running

~~~
python run.py --base_model_path your_path/RealVisXL_V4.0 --image_encoder_path your_path/IP-Adapter/sdxl_models/image_encoder --ip_ckpt your_path//IP-Adapter/sdxl_models/ip-adapter_sdxl.bin 
--story [your story] 
~~~

## Performance 

### Regular-length Story Visualization 
- downloading the [StorySalon](https://huggingface.co/datasets/haoningwu/StorySalon/resolve/main/testset.zip?download=true) test set."

| GIF1 | GIF2  | GIF3  |
| --- | --- | --- |
|   |  |   |

| GIF4 | GIF5  | GIF6  |
| --- | --- | --- |
|   |  |   |

| GIF7 | GIF8  | GIF9  |
| --- | --- | --- |
|   |  |   |


### Long Story Visualization 


















## Acknowledgement 

Deeply appreciate these wonderful open source projects: [stablediffusion](https://github.com/Stability-AI/StableDiffusion), [clip](https://github.com/openai/CLIP), [ip-adapter](https://github.com/tencent-ailab/IP-Adapter), [storygen](https://github.com/haoningwu3639/StoryGen), [storydiffusion](https://github.com/HVision-NKU/StoryDiffusion), [theatergen](https://github.com/donahowe/TheaterGen), [timm](https://github.com/huggingface/pytorch-image-models).

## Citation 

If you find this repository useful, please consider giving a star  and citation :

```
@misc{mao2024story_adapter,
  title={{Story-Adapter: A Training-free Iterative Framework for Long Story Visualization}},
  author={Mao, Jiawei and Huang, Xiaoke and Xie, Yunfei and Chang, Yuanqi and Hui, Mude and Xu, Bingjie and Zhou, Yuyin},
  journal={arXiv},
  volume={abs/2410.06244},
  year={2024},
}
```