awesome-llms-meet-multimodal-generation
π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
https://github.com/yingqinghe/awesome-llms-meet-multimodal-generation
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
Found CITATION.cff file -
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
Found 2 DOI reference(s) in README -
βAcademic publication links
Links to: arxiv.org, ieee.org -
βAcademic email domains
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (7.0%) to scientific vocabulary
Keywords
Repository
π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Basic Info
Statistics
- Stars: 506
- Watchers: 18
- Forks: 29
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
π€ Introduction
This repository contains a curated list of LLMs meet multimodal generation. Modalities consist of visual (including image, video and 3D) and audio (including sound, speech and music).
We welcome any contributions and suggestions to our repository or the addition of your own work. Feel free to make a pull request or leave your comments!!
π Contents
- π€ Introduction
- π Contents
- π Tips
- π Multimodal Generation
- Image Generation
- π LLM-based
- Non-LLM-based (Clip/T5)
- Datasets
- Video Generation
- π LLM-based
- Non-LLM-based
- Video VAE/Tokenizers
- Audio-Video
- Benchmarks
- Datasets
- 3D Generation
- π LLM-based
- Non-LLM-based (Clip/T5)
- Datasets
- Audio Generation
- π LLM-based
- Non-LLM-based
- Datasets
- Generation with Multiple Modalities
- π LLM-based
- Non-LLM-based
- π Multimodal Editing
- π Multimodal Agents
- π Multimodal Understanding with LLMs
- π Multimodal LLM Safety
- π Related Surveys
- π¨βπ» Team
- π Citation
- βοΈ Star History
π Tips
- β Paper searching via catatogue: directly clicking the content of the catatogue to select the area of your research and browse related papers.
- β
Paper searching via author name: Free feel to search papers of a specific author via
ctrl + Fand then type the author name. The dropdown list of authors will automatically expand when searching. - β
Paper searching via tag: You can also search the related papers via the following tags:
customization,iteractive,human motion generationtokenizer. (More tags are ongoing)
π Multimodal Generation
Image Generation
π LLM-based
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models (12 Feb 2025)
Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian, et al.
Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian, Hanrong Ye, Runtao Liu, Sergey Tulyakov, Kfir Aberman, Dan XuMetaMorph: Multimodal Understanding and Generation via Instruction Tuning (18 Dec 2024)
Shengbang Tong, David Fan, Jiachen Zhu, et al.
Shengbang Tong, David Fan, Jiachen Zhu, Yunyang Xiong, Xinlei Chen, Koustuv Sinha, Michael Rabbat, Yann LeCun, Saining Xie, Zhuang LiuX-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models (2 Dec 2024)
Zeyi Sun, Ziyang Chu, Pan Zhang, et al.
Zeyi Sun, Ziyang Chu, Pan Zhang, Tong Wu, Xiaoyi Dong, Yuhang Zang, Yuanjun Xiong, Dahua Lin, Jiaqi WangCosmos Tokenizer: A suite of image and video neural tokenizers (06 Nov 2024)
Fitsum Reda, Jinwei Gu, Xian Liu et al.
Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liutokenizer[ICLR 2025 Spotlight] Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance (29 Oct 2024)
Dongmin Park, Sebin Kim, Taehong Moon et al.
Dongmin Park, Sebin Kim, Taehong Moon, Minkyu Kim, Kangwook Lee, Jaewoong ChoElasticTok: Adaptive Tokenization for Image and Video (10 Oct 2024)
Wilson Yan, Matei Zaharia, Volodymyr Mnih et al.
Wilson Yan, Matei Zaharia, Volodymyr Mnih, Pieter Abbeel, Aleksandra Faust, Hao LiutokenizerDART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation (10 Oct 2024)
Jiatao Gu, Yuyang Wang, Yizhe Zhang et al.
Jiatao Gu, Yuyang Wang, Yizhe Zhang, Qihang Zhang, Dinghuai Zhang, Navdeep Jaitly, Josh Susskind, Shuangfei ZhaiVILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation (6 Sep 2024)
Yecheng Wu, Zhuoyang Zhang, Junyu Chen et al.
Yecheng Wu, Zhuoyang Zhang, Junyu Chen, Haotian Tang, Dacheng Li, Yunhao Fang, Ligeng Zhu, Enze Xie, Hongxu Yin, Li Yi, Song Han, Yao LuOmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation (13 Jun 2024)
Junke Wang, Yi Jiang, Zehuan Yuan et al.
Junke Wang, Yi Jiang, Zehuan Yuan, Binyue Peng, Zuxuan Wu, Yu-Gang JiangtokenizerInstantUnify: Integrates Multimodal LLM into Diffusion Models (Aug 2024)
Qixun Wang, Xu Bai, Rui Wang et al.
Qixun Wang, Xu Bai, Rui Wang, Haofan WangShow-o: One Single Transformer to Unify Multimodal Understanding and Generation (22 Aug 2024)
Jinheng Xie, Weijia Mao, Zechen Bai, et al.
Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng ShouImage Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 Jun 2024)
Renjie Pi, Jianshu Zhang, Jipeng Zhang et al.
Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong ZhangT2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text (11 Jun 2024)
[ACL 2024] Aoxiong Yin, Haoyuan Li, Kai Shen et al.
Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting ZhuangOpen-World Human-Object Interaction Detection via Multi-modal Prompts (11 Jun 2024)
Jie Yang, Bingliang Li, Ailing Zeng et al.
Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao ZhangCommonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? (11 Jun 2024)
Xingyu Fu, Muyu He, Yujie Lu et al.
Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan RothAn Image is Worth 32 Tokens for Reconstruction and Generation (11 Jun 2024)
Qihang Yu, Mark Weber, Xueqing Deng et al.
Qihang Yu, Mark Weber, Xueqing Deng, Xiaohui Shen, Daniel Cremers, Liang-Chieh ChenTRINS: Towards Multimodal Language Models that Can Read (10 Jun 2024)
[CVPR 2024] Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
Ruiyi Zhang, Yanzhe Zhang, Jian Chen, Yufan Zhou, Jiuxiang Gu, Changyou Chen, Tong Sun[LlamaGen] Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation (10 Jun 2024)
Peize Sun, Yi Jiang, Shoufa Chen et al.
Peize Sun, Yi Jiang, Shoufa Chen, Shilong Zhang, Bingyue Peng, Ping Luo, Zehuan Yuan
Chameleon: Mixed-Modal Early-Fusion Foundation Models (16 May 2024)
Chameleon Team
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation (22 Apr 2024)
Yuying Ge, Sijie Zhao, Jinguo Zhu, et al.
Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying ShanGraphic Design with Large Multimodal Model (22 Apr 2024)
Yutao Cheng, Zhao Zhang, Maoke Yang, et al.
Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, and Jie ShaoPMG : Personalized Multimodal Generation with Large Language Models (7 Apr 2024)
Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, et al.
Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi XiaoMineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control (19 Mar 2024)
Enshen Zhou, Yiran Qin, Zhenfei Yin, et al.
Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing ShaoELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment (8 Mar 2024)
Xiwei Hu, Rui Wang, Yixiao Fang, et al.
Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang YuStrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis (30 Jan 2024)
Zecheng Tang, Chenfei Wu, Zekai Zhang, et al.
Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan DuantokenizerDiffusionGPT: LLM-Driven Text-to-Image Generation System (18 Jan 2024)
Jie Qin, Jie Wu, Weifeng Chen, et al.
Jie Qin, Jie Wu, Weifeng Chen, Yuxi Ren, Huixia Li, Hefeng Wu, Xuefeng Xiao, Rui Wang, Shilei WenStarVector: Generating Scalable Vector Graphics Code from Images (17 Dec 2023)
Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, et al.
Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco PedersoliVL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation (14 Dec 2023)
Jinguo Zhu, Xiaohan Ding, Yixiao Ge, et al.
Jinguo Zhu, Xiaohan Ding, Yixiao Ge, Yuying Ge, Sijie Zhao, Hengshuang Zhao, Xiaohua Wang, Ying ShanStoryGPT-V: Large Language Models as Consistent Story Visualizers (13 Dec 2023)
Xiaoqian Shen, Mohamed Elhoseiny
Xiaoqian Shen, Mohamed ElhoseinyGENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator (11 Dec 2023)
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng Shou
Henry Hengyuan Zhao, Pan Zhou, Mike Zheng ShouCustomization Assistant for Text-to-image Generation (5 Dec 2023)
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, et al.
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Tong SuncustomizationChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model (29 Nov 2023)
Xiaowei Chi, Yijiang Liu, Zhengkai Jiang, et al.
Xiaowei Chi, Yijiang Liu, Zhengkai Jiang, Rongyu Zhang, Ziyi Lin, Renrui Zhang, Peng Gao, Chaoyou Fu, Shanghang Zhang, Qifeng Liu, Yike GuoDreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback (29 Nov 2023)
Jiao Sun, Deqing Fu, Yushi Hu, et al.
Jiao Sun, Deqing Fu, Yushi Hu, Su Wang, Royi Rassin, Da-Cheng Juan, Dana Alon, Charles Herrmann, Sjoerd van Steenkiste, Ranjay Krishna, Cyrus RashtchianCOLE: A Hierarchical Generation Framework for Graphic Design (28 Nov 2023)
Peidong Jia, Chenxuan Li, Zeyu Liu, et al.
Peidong Jia, Chenxuan Li, Zeyu Liu, Yichao Shen, Xingru Chen, Yuhui Yuan, Yinglin Zheng, Dong Chen, Ji Li, Xiaodong Xie, Shanghang Zhang, Baining GuoTextDiffuser-2: Unleashing the Power of Language Models for Text Rendering (28 Nov 2023)
Jingye Chen, Yupan Huang, Tengchao Lv, et al.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu WeiLLMGA: Multimodal Large Language Model based Generation Assistant (27 Nov 2023)
Bin Xia, Shiyin Wang, Yingfan Tao, et al.
Bin Xia, Shiyin Wang, Yingfan Tao, Yitong Wang, Jiaya JiaSelf-correcting LLM-controlled Diffusion Models (27 Nov 2023)
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, et al.
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor Darrell[ParaDiffusion] Paragraph-to-Image Generation with Information-Enriched Diffusion Model (29 Nov 2023)
Weijia Wu, Zhuang Li, Yefei He, et al.
Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan WangTokenize and Embed ALL for Multi-modal Large Language Models (8 Nov 2023)
Zhen Yang, Yingxue Zhang, Fandong Meng, et al.
Zhen Yang, Yingxue Zhang, Fandong Meng, Jie ZhoutokenizerWordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models (20 Oct 2023)
Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, et al.
Jun-Yan He, Zhi-Qi Cheng, Chenyang Li, Jingdong Sun, Wangmeng Xiang, Xianhui Lin, Xiaoyang Kang, Zengke Jin, Yusen Hu, Bin Luo, Yifeng Geng, Xuansong Xie, Jingren ZhouLLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts (16 Oct 2023)
[ICLR 2024] Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, et al.
Hanan Gani, Shariq Farooq Bhat, Muzammal Naseer, Salman Khan, Peter WonkaMaking Multimodal Generation Easier: When Diffusion Models Meet LLMs (13 Oct 2023)
Xiangyu Zhao, Bo Liu, Qijiong Liu, et al.
Xiangyu Zhao, Bo Liu, Qijiong Liu, Guangyuan Shi, Xiao-Ming WuIdea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation (12 Oct 2023)
Zhengyuan Yang, Jianfeng Wang, Linjie Li, et al.
Zhengyuan Yang, Jianfeng Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan WangOpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation (11 Oct 2023)
Jie An, Zhengyuan Yang, Linjie Li, et al.
Jie An, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Lijuan Wang, Jiebo LuoMini-DALLE3: Interactive Text to Image by Prompting Large Language Models (11 Oct 2023)
Zeqiang Lai, Xizhou Zhu, Jifeng Dai, et al.
Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang[DALL-E 3] Improving Image Generation with Better Captions
James Betker, Gabriel Goh, Li Jing, et al.
James Betker, Gabriel Goh, Li Jing, Tim Brooks, Jianfeng Wang, Linjie Li, Long Ouyang, Juntang Zhuang, Joyce Lee, Yufei Guo, Wesam Manassra, Prafulla Dhariwal, Casey Chu, Yunxin Jiao, Aditya RameshMiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens (3 Oct 2023)\ Kaizhi Zheng, Xuehai He, Xin Eric Wang.\
Making LLaMA SEE and Draw with SEED Tokenizer (2 Oct 2023)
Yuying Ge, Sijie Zhao, Ziyun Zeng, et al.
Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying ShantokenizerInstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists (30 Sep 2023)
Yulu Gan, Sungwoo Park, Alexander Schubert, et al.
Yulu Gan, Sungwoo Park, Alexander Schubert, Anthony Philippakis, Ahmed M. AlaaInternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition (26 Sep 2023)
Pan Zhang, Xiaoyi Dong, Bin Wang, et al.
Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi WangText-to-Image Generation for Abstract Concepts (26 Sep 2023)
Jiayi Liao, Xu Chen, Qiang Fu, et al.
Jiayi Liao, Xu Chen, Qiang Fu, Lun Du, Xiangnan He, Xiang Wang, Shi Han, Dongmei ZhangDreamLLM: Synergistic Multimodal Comprehension and Creation (20 Sep 2023)
[ICLR 2024] Runpei Dong, Chunrui Han, Yuang Peng, et al.
Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, Hongyu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li YiSwitchGPT: Adapting Large Language Models for Non-Text Outputs (14 Sep 2023)\ Wang, Xinyu, Bohan Zhuang, and Qi Wu.\
NExT-GPT: Any-to-Any Multimodal LLM (11 Sep 2023)
Shengqiong Wu, Hao Fei, Leigang Qu, et al.
Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng ChuaLayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation (9 Aug 2023)
Leigang Qu, Shengqiong Wu, Hao Fei, et al. ACM MM 2023
Leigang Qu, Shengqiong Wu, Hao Fei, Liqiang Nie, Tat-Seng ChuaPlanting a SEED of Vision in Large Language Model (16 Jul 2023)
Yuying Ge, Yixiao Ge, Ziyun Zeng, et al.
Yuying Ge, Yixiao Ge, Ziyun Zeng, Xintao Wang, Ying ShanGenerative Pretraining in Multimodality (11 Jul 2023)
Quan Sun, Qiying Yu, Yufeng Cui, et al.
Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong WangSPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs (30 Jun 2023)
[NeurIPS 2023 Spotlight] Lijun Yu, Yong Cheng, Zhiruo Wang, et al.
Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu JiangControllable Text-to-Image Generation with GPT-4 (29 May 2023)
Tianjun Zhang, Yi Zhang, Vibhav Vineet, et al.
Tianjun Zhang, Yi Zhang, Vibhav Vineet, Neel Joshi, Xin WangGenerating Images with Multimodal Language Models (26 May 2023)\ [NeurIPS 2023] Koh, Jing Yu, Daniel Fried, and Ruslan Salakhutdinov. \
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models (24 May 2023)
[NeurIPS 2023] Weixi Feng, Wanrong Zhu, Tsu-jui Fu, et al.
Weixi Feng, Wanrong Zhu, Tsu-jui Fu, Varun Jampani, Arjun Akula, Xuehai He, Sugato Basu, Xin Eric Wang, William Yang WangVisual Programming for Text-to-Image Generation and Evaluation (24 May 2023)\ [NeurIPS 2023] Jaemin Cho, Abhay Zala, Mohit Bansal.\
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (23 May 2023)
Long Lian, Boyi Li, Adam Yala, et al.
Long Lian, Boyi Li, Adam Yala, Trevor DarrellInteractive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration (22 May 2023)
Qifan Yu, Juncheng Li, Wentao Ye, et al.
Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting ZhuangLLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation (18 May 2023)
[NeurIPS 2023] Yujie Lu, Xianjun Yang, Xiujun Li, et al.
Yujie Lu, Xianjun Yang, Xiujun Li, Xin Eric Wang, William Yang WangSUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models (9 May 2023)
[ACM MM 2023] Shanshan Zhong, Zhongzhan Huang, Wushao Wen, et al.
Shanshan Zhong, Zhongzhan Huang, Wushao Wen, Jinghui Qin, Liang LinGrounding Language Models to Images for Multimodal Inputs and Outputs (31 Jan 2023)\ [ICML 2023] Koh, Jing Yu, Ruslan Salakhutdinov, and Daniel Fried.\
[RPG-DiffusionMaster] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (22 Jan 2024)
[ICML 2024] Ling Yang, Zhaochen Yu, Chenlin Meng, et al.
Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin CuiRealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models (20 Feb 2024)
Xinchen Zhang, Ling Yang, Yaqi Cai, et al.
Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui
Non-LLM-based (Clip/T5)
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models (11 Nov 2024)
NVIDIA: Yuval Atzmon, Maciej Bala, Yogesh Balaji, et al.
NVIDIA: Yuval Atzmon, Maciej Bala, Yogesh Balaji, Tiffany Cai, Yin Cui, Jiaojiao Fan, Yunhao Ge, Siddharth Gururani, Jacob Huffman, Ronald Isaac, Pooya Jannaty, Tero Karras, Grace Lam, J. P. Lewis, Aaron Licata, Yen-Chen Lin, Ming-Yu Liu, Qianli Ma, Arun Mallya, Ashlee Martino-Tarr, Doug Mendez, Seungjun Nah, Chris Pruett, Fitsum Reda, Jiaming Song, Ting-Chun Wang, Fangyin Wei, Xiaohui Zeng, Yu Zeng, Qinsheng ZhangInstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation (3 Apr 2024)
Haofan Wang, Matteo Spinelli, Qixun Wang, et al.
Haofan Wang, Matteo Spinelli, Qixun Wang, Xu Bai, Zekui Qin, Anthony ChenInstantID: Zero-shot Identity-Preserving Generation in Seconds (15 Jan 2024)
Qixun Wang, Xu Bai, Haofan Wang, et al.
Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, Yao HuPIXART-Ξ±: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis (30 Sep 2023)
[ICLR 2024] Junsong Chen, Jincheng Yu, Chongjian Ge, et al.
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, Zhenguo LiTextDiffuser: Diffusion Models as Text Painters (18 May 2023)
[NeurIPS 2023] Jingye Chen, Yupan Huang, Tengchao Lv, et al.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu WeiTiGAN: Text-Based Interactive Image Generation and Manipulation (Dec 2022)
[AAAI 2022] Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, et al.
Yufan Zhou, Ruiyi Zhang, Jiuxiang Gu, Chris Tensmeyer, Tong Yu,Changyou Chen, Jinhui Xu, Tong SunTags:
iteractiveMulti-Concept Customization of Text-to-Image Diffusion (8 Dec 2022)
[CVPR 2023] Nupur Kumari, Bingliang Zhang, Richard Zhang, et al.
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, Jun-Yan Zhu\ Tags:
customizationDreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (25 Aug 2022)
[CVPR 2023] Nataniel Ruiz, Yuanzhen Li, Varun Jampani, et al.
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman\ Tags:
customizationAn Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion (2 Aug 2022)
Rinon Gal, Yuval Alaluf, Yuval Atzmon, et al.
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or\ Tags:
customizationPhotorealistic Text-to-Image Diffusion Models with Deep Language Understanding (23 May 2022)\ [NeurIPS 2022]
Saharia, Chitwan Chan, William Saxena, Saurabh Li, Lala Whang, Jay Denton, Emily L Ghasemipour, Kamyar Gontijo Lopes, Raphael Karagol Ayan, Burcu Salimans, Tim others
High-Resolution Image Synthesis with Latent Diffusion Models (20 Dec 2021)\ [CVPR 2022 (Oral)]
Rombach, Robin Blattmann, Andreas Lorenz, et al.
Rombach, Robin Blattmann, Andreas Lorenz, Dominik Esser, Patrick Ommer, Bj{\"o}rn
Datasets
MIMIC-IT: Multi-Modal In-Context Instruction Tuning (8 Jun 2023)
[NeurIPS 2023] Bo Li, Yuanhan Zhang, Liangyu Chen, et al.
Bo Li, Yuanhan Zhang, Liangyu Chen, Jinghao Wang, Fanyi Pu, Jingkang Yang, Chunyuan Li, Ziwei Liu[LAION-Glyph] GlyphControl: Glyph Conditional Control for Visual Text Generation (29 May 2023)
[NeurIPS 2023] Yukang Yang, Dongnan Gui, Yuhui Yuan, et al.
Yukang Yang, Dongnan Gui, Yuhui Yuan, Weicong Liang, Haisong Ding, Han Hu, Kai Chen[MARIO-10M] TextDiffuser: Diffusion Models as Text Painters (18 May 2023)
[NeurIPS 2023] Jingye Chen, Yupan Huang, Tengchao Lv, et al.
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu WeiDataComp: In search of the next generation of multimodal datasets (27 Apr 2023)
[NeurIPS 2023] Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, et al.
Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt[LLava-instruct] Visual Instruction Tuning (17 Apr 2023)
[NeurIPS 2023] Haotian Liu, Chunyuan Li, Qingyang Wu, et al.
Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae LeeMultimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text (14 Apr 2023)
[NeurIPS 2023] Wanrong Zhu, Jack Hessel, Anas Awadalla, et al.
Wanrong Zhu, Jack Hessel, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin ChoiLanguage Is Not All You Need: Aligning Perception with Language Models (27 Feb 2023)
[NeurIPS 2023] Shaohan Huang, Li Dong, Wenhui Wang, et al.
Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu WeiLAION-5B: An open large-scale dataset for training next generation image-text models (16 Oct 2022)
[NeurIPS 2022] Christoph Schuhmann, Romain Beaumont, Richard Vencu, et al.
Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, Jenia JitsevLAION COCO: 600M SYNTHETIC CAPTIONS FROM LAION2B-EN (15 Sep 2022)
Christoph Schuhmann, Andreas KΓΆpf , Theo Coombes, et al.
Christoph Schuhmann, Andreas KΓΆpf , Theo Coombes, Richard Vencu, Benjamin Trom , Romain Beaumont[M3W] Flamingo: a Visual Language Model for Few-Shot Learning (29 Apr 2022)
[NeurIPS 2022] Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, et al.
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan[LAION-FACE]General Facial Representation Learning in a Visual-Linguistic Manner (6 Dec 2021)
[NeurIPS 2021] Yinglin Zheng, Hao Yang, Ting Zhang, et al.
Yinglin Zheng, Hao Yang, Ting Zhang, Jianmin Bao, Dongdong Chen, Yangyu Huang, Lu Yuan, Dong Chen, Ming Zeng, Fang Wen[LAION-400M] Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs (3 Nov 2021)
[NeurIPS 2021] Christoph Schuhmann, Richard Vencu, Romain Beaumont, et al.
Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran KomatsuzakiWIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning (2 Mar 2021)
[SIGIR 2021] Krishna Srinivasan, Karthik Raman, Jiecao Chen, et al.
Krishna Srinivasan, Karthik Raman, Jiecao Chen, Michael Bendersky, Marc NajorkConceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts (17 Feb 2021)
[CVPR 2021] Soravit Changpinyo, Piyush Sharma, Nan Ding, et al.
Soravit Changpinyo, Piyush Sharma, Nan Ding, Radu Soricut[ALIGN] Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision (11 Feb 2021)
[ICML 2021] Chao Jia, Yinfei Yang, Ye Xia, et al.
Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig[MS COCO] Microsoft COCO: Common Objects in Context (1 May 2014)
[ECCV 2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, et al.
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr DollΓ‘r[Im2Text] Describing Images Using 1 Million Captioned Photographs (12 Dec 2011)\ [NeurIPS 2011] Vicente Ordonez, Girish Kulkarni, Tamara Berg\
Video Generation
π LLM-based
Loong: Generating Minute-level Long Videos with Autoregressive Language Models (3 Oct 2024)
Yuqing Wang, Tianwei Xiong, Daquan Zhou, et al.
Yuqing Wang, Tianwei Xiong, Daquan Zhou, Zhijie Lin, Yang Zhao, Bingyi Kang, Jiashi Feng, Xihui LiuCompositional 3D-aware Video Generation with LLM Director (31 Aug 2024)
Hanxin Zhu, Tianyu He, Anni Tang, et al.
Hanxin Zhu, Tianyu He, Anni Tang, Junliang Guo, Zhibo Chen, Jiang BianAnim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation (19 Aug 2024)
[SIGGRAPH Asia 2024] Yunxin Li, Haoyuan Shi, Baotian Hu, et al.
Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang[BSQ-ViT] Image and Video Tokenization with Binary Spherical Quantization (11 Jun 2024)\ [Tech Report]Yue Zhao, Yuanjun Xiong, Philipp KrΓ€henbΓΌhl\
tokenizerDriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (11 Mar 2024)
Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, et al.
Guosheng Zhao, Xiaofeng Wang, Zheng Zhu, Xinze Chen, Guan Huang, Xiaoyi Bao, Xingang Wang[Sora] Video generation models as world simulators (15 Feb 2024)
Tim Brooks, Bill Peebles, Connor Holmes, et al.
Tim Brooks and Bill Peebles and Connor Holmes and Will DePue and Yufei Guo and Li Jing and David Schnurr and Joe Taylor and Troy Luhman and Eric Luhman and Clarence Ng and Ricky Wang and Aditya Ramesh[LWM] World Model on Million-Length Video And Language With Blockwise RingAttention (13 Feb 2024)
Hao Liu, Wilson Yan, Matei Zaharia, et al.
Hao Liu, Wilson Yan, Matei Zaharia, Pieter Abbeel[LGVI] Towards Language-Driven Video Inpainting via Multimodal Large Language Models (18 Jan 2024)
Jianzong Wu, Xiangtai Li, Chenyang Si, et al.
Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change LoyVideo-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization: Content-Consistent Multi-Scene Video Generation with LLM (2 Jan 2024)
Yang Jin, Zhicheng Sun, Kun Xu, et al.
Yang Jin, Zhicheng Sun, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang Song, Kun Gai, Yadong MutokenizerVideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM (2 Jan 2024)
Fuchen Long, Zhaofan Qiu, Ting Yao, et al.
Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei[PRO-Motion] Plan, Posture and Go: Towards Open-World Text-to-Motion Generation (22 Dec 2023)
Jinpeng Liu, Wenxun Dai, Chunyu Wang, et al.
Jinpeng Liu, Wenxun Dai, Chunyu Wang, Yiji Cheng, Yansong Tang, Xin TongVideoPoet: A Large Language Model for Zero-Shot Video Generation (21 Dec 2023)
Dan Kondratyuk, Lijun Yu, Xiuye Gu, et al.
Dan Kondratyuk, Lijun Yu, Xiuye Gu, JosΓ© Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam, Hassan Akbari, Yair Alon, Vighnesh Birodkar, Yong Cheng, Ming-Chang Chiu, Josh Dillon, Irfan Essa, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, David Ross, Grant Schindler, Mikhail Sirotenko, Kihyuk Sohn, Krishna Somandepalli, Huisheng Wang, Jimmy Yan, Ming-Hsuan Yang, Xuan Yang, Bryan Seybold, Lu JiangFlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax (27 Nov 2023)
[arXiv 2023] Yu Lu, Linchao Zhu, Hehe Fan, et al.
Yu Lu, Linchao Zhu, Hehe Fan, Yi YangInterControl: Generate Human Motion Interactions by Controlling Every Joint (27 Nov 2023)
Zhenzhi Wang, Jingbo Wang, Dahua Lin, et al.
Zhenzhi Wang, Jingbo Wang, Dahua Lin, Bo Dai\ Tags:
human motion generationMotionLLM: Multimodal Motion-Language Learning with Large Language Models (27 May 2024)
Qi Wu, Yubo Zhao, Yifan Wang, et al.
Qi Wu, Yubo Zhao, Yifan Wang, Yu-Wing Tai, Chi-Keung Tang\ Tags:
general human motion generationGPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (21 Nov 2023)
Jiaxi Lv, Yi Huang, Mingfu Yan, et al.
Jiaxi Lv, Yi Huang, Mingfu Yan, Jiancheng Huang, Jianzhuang Liu, Yifan Liu, Yafei Wen, Xiaoxin Chen, Shifeng Chen[LVD] LLM-grounded Video Diffusion Models (29 Sep 2023)
Long Lian, Baifeng Shi, Adam Yala, et al.
Long Lian, Baifeng Shi, Adam Yala, Trevor Darrell, Boyi LiVideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning (26 Sep 2023)
[arXiv 2023] Han Lin, Abhay Zala, Jaemin Cho, et al.
Han Lin, Abhay Zala, Jaemin Cho, Mohit BansalFree-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator (25 Sep 2023)
[NIPS 2023] Hanzhuo Huang, Yufan Feng, Cheng Shi, et al.
Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, Sibei Yang[Dysen-VDM] Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models (26 Aug 2023)
[CVPR 2024] Hao Fei, Shengqiong Wu, Wei Ji, et al.
Hao Fei, Shengqiong Wu, Wei Ji, Hanwang Zhang, Tat-Seng Chua[DirecT2V] Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation (23 May 2023)
[arXiv 2023] Susung Hong, Junyoung Seo, Sunghwan Hong, et al.
Susung Hong, Junyoung Seo, Sunghwan Hong, Heeseong Shin, Seungryong KimText2Motion: From Natural Language Instructions to Feasible Plans (21 Mar 2023)
[Autonomous Robots 2023] Kevin Lin, Christopher Agia, Toki Migimatsu, et al.
Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, Jeannette Bohg
Non-LLM-based
OSV: One Step is Enough for High-Quality Image to Video Generation (17 Sep 2024)
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang, et al.
Xiaofeng Mao, Zhengkai Jiang, Fu-Yun Wang, Wenbing Zhu, Jiangning Zhang, Hao Chen, Mingmin Chi, Yabiao Wang[PAB] Real-Time Video Generation with Pyramid Attention Broadcast (26 Jun 2024)
Xuanlei Zhao, Xiaolong Jin, Kai Wang, et al.
Xuanlei Zhao, Xiaolong Jin, Kai Wang, Yang YouVideo-Infinity: Distributed Long Video Generation (24 Jun 2024)
Zhenxiong Tan, Xingyi Yang, Songhua Liu, et al.
Zhenxiong Tan, Xingyi Yang, Songhua Liu, Xinchao WangPandora: Towards General World Model with Natural Language Actions and Video (12 Jun 2024)
Jiannan Xiang, Guangyi Liu, Yi Gu, et al.
Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting HuText-Animator: Controllable Visual Text Video Generation (25 Jun 2024)
Lin Liu, Quande Liu, Shengju Qian, et al.
Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi TianMotionBooth: Motion-Aware Customized Text-to-Video Generation (25 Jun 2024)
Jianzong Wu, Xiangtai Li, Yanhong Zeng, et al.
Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai ChenFreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models (24 Jun 2024)
Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, et al.
Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei LiuIdentifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model (22 Jun 2024)
Min Zhao, Hongzhou Zhu, Chendong Xiang, et al.
Min Zhao, Hongzhou Zhu, Chendong Xiang, Kaiwen Zheng, Chongxuan Li, Jun ZhuImage Conductor: Precision Control for Interactive Video Synthesis (21 Jun 2024)
Yaowei Li, Xintao Wang, Zhaoyang Zhang, et al.
Yaowei Li, Xintao Wang, Zhaoyang Zhang, Zhouxia Wang, Ziyang Yuan, Liangbin Xie, Yuexian Zou, Ying ShanVIDEOSCORE: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation (21 Jun 2024)
Xuan He, Dongfu Jiang, Ge Zhang, et al.
Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu ChenDreamitate: Real-World Visuomotor Policy Learning via Video Generation (24 Jun 2024)
Junbang Liang, Ruoshi Liu, Ege Ozguroglu, et al.
Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick[MCM] Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation (11 Jun 2024)
Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, et al.
Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan WangSearching Priors Makes Text-to-Video Synthesis Better (5 Jun 2024)
Haoran Cheng, Liang Peng, Linxuan Xia, et al.
Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi WuZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation (3 Jun 2024)
Shaoshu Yang, Yong Zhang, Xiaodong Cun, et al.
Shaoshu Yang, Yong Zhang, Xiaodong Cun, Ying Shan, Ran HeEasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture (30 May 2024)
Sijie Zhao, Yong Zhang, Xiaodong Cun, et al.
Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan[MOFT] Video Diffusion Models are Training-free Motion Interpreter and Controller (23 Mar 2024)
Zeqi Xiao, Yifan Zhou, Shuai Yang, et al.
Zeqi Xiao, Yifan Zhou, Shuai Yang, Xingang PanStreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text (21 Mar 2024)
Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, et al.
Roberto Henschel, Levon Khachatryan, Daniil Hayrapetyan, Hayk Poghosyan, Vahram Tadevosyan, Zhangyang Wang, Shant Navasardyan, Humphrey ShiSnap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis (22 Feb 2024)
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, et al.
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey TulyakovVideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models (17 Jan 2024)
Haoxin Chen, Yong Zhang, Xiaodong Cun, et al.
Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying ShanStable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets (25 Nov 2023)
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, et al.
Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin RombachVideoCrafter1: Open Diffusion Models for High-Quality Video Generation (30 Oct 2023)
Haoxin Chen, Menghan Xia, Yingqing He, et al.
Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying ShanDynamiCrafter: Animating Open-domain Images with Video Diffusion Priors (18 Oct 2023)
Jinbo Xing, Menghan Xia, Yong Zhang, et al.
Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Xintao Wang, Tien-Tsin Wong, Ying ShanFreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling (23 Oct 2023)
Haonan Qiu, Menghan Xia, Yong Zhang, et al.
Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei LiuAnimate-A-Story: Storytelling with Retrieval-Augmented Video Generation (13 Jul 2023)
Yingqing He, Menghan Xia, Haoxin Chen, et al.
Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng ChenMake-Your-Video: Customized Video Generation Using Textual and Structural Guidance (1 Jun 2023)
Jinbo Xing, Menghan Xia, Yuxin Liu, et al.
Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin WongFollow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos (3 Apr 2023)
Yue Ma, Yingqing He, Xiaodong Cun, et al.
Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng ChenReal-time Controllable Denoising for Image and Video (29 Mar 2023)
[CVPR 2023] Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, et al.
Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, Xiaogang Wang, Ping Luo, Kaimo Lin, Jinwei GuVideoFusion: Decomposed Diffusion Models for High-Quality Video Generation (15 Mar 2023)
Zhengxiong Luo, Dayou Chen, Yingya Zhang, et al.
Zhengxiong Luo, Dayou Chen, Yingya Zhang, Yan Huang, Liang Wang, Yujun Shen, Deli Zhao, Jingren Zhou, Tieniu Tan
Video VAE/Tokenizers
DLFR-VAE: Dynamic Latent Frame Rate VAE for Video Generation (17 Feb 2025)
Zhihang Yuan, Siyuan Wang, Rui Xie, et al.
Zhihang Yuan, Siyuan Wang, Rui Xie, Hanling Zhang, Tongcheng Fang, Yuzhang Shang, Shengen Yan, Guohao Dai, Yu WangVideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE (23 Dec 2024)
Yazhou Xing, Yang Fei, Yingqing He, et al.
Yazhou Xing, Yang Fei, Yingqing He, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng ChenVidTwin: Video VAE with Decoupled Structure and Dynamics (23 Dec 2024)
Yuchi Wang, Junliang Guo, Xinyi Xie, et al.
Yuchi Wang, Junliang Guo, Xinyi Xie, Tianyu He, Xu Sun, Jiang BianVidTok: A Versatile and Open-Source Video Tokenizer (17 Dec 2024)
Anni Tang, Tianyu He, Junliang Guo, et al.
Anni Tang, Tianyu He, Junliang Guo, Xinle Cheng, Li Song, Jiang Bian[CVPR 2025] WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model (26 Nov 2024)
Zongjian Li, Bin Lin, Yang Ye, et al.
Zongjian Li, Bin Lin, Yang Ye, Liuhan Chen, Xinhua Cheng, Shenghai Yuan, Li Yuan[CVPR 2025] [IV-VAE] Improved Video VAE for Latent Video Diffusion Model (10 Nov 2024)
Pingyu Wu, Kai Zhu, Yu Liu, et al.
Pingyu Wu, Kai Zhu, Yu Liu, Liming Zhao, Wei Zhai, Yang Cao, Zheng-Jun Zha[Tech Report] Cosmos Tokenizer: A suite of image and video neural tokenizers (Nov 6, 2024)
Fitsum Reda, Jinwei Gu, Xian Liu, et al.
Fitsum Reda, Jinwei Gu, Xian Liu, Songwei Ge, Ting-Chun Wang, Haoxiang Wang, Ming-Yu Liu[NeurIPS 2024] CV-VAE: A Compatible Video VAE for Latent Generative Video Models (30 May 2024)
Sijie Zhao, Yong Zhang, Xiaodong Cun, et al.
Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan[ICLR 2024] [MAGVIT-v2] Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation (9 Oct 2023)
Lijun Yu, JosΓ© Lezama, Nitesh B. Gundavarapu, et al.
Lijun Yu, JosΓ© Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang
Audio-Video
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization (30 Mar 2025)
Kai Liu, Wei Li, Lai Chen, et al.
Kai Liu, Wei Li, Lai Chen, Shengqiong Wu, Yanhao Zheng, Jiayi Ji, Fan Zhou, Rongxin Jiang, Jiebo Luo, Hao Fei, Tat-Seng Chua[LVAS-Agent] Long-Video Audio Synthesis with Multi-Agent Collaboration (13 Mar 2025)
Yehang Zhang, Xinli Xu, Xiaojie Xu, et al
Yehang Zhang, Xinli Xu, Xiaojie Xu, Li Liu, Yingcong Chen
UniForm: A Unified Diffusion Transformer for Audio-Video Generation (6 Feb 2025)
Lei Zhao, Linfeng Feng, Dongxu Ge, et al
Lei Zhao, Linfeng Feng, Dongxu Ge, Fangqiu Yi, Chi Zhang, Xiao-Lei Zhang, Xuelong LiTIA2V: Video generation conditioned on triple modalities of textβimageβaudio (4 Jan 2025)
Minglu Zhao, Wenmin Wang, Rui Zhang, et al.
Minglu Zhao, Wenmin Wang, Rui Zhang, Haomei Jia, Qi ChenSAVGBench: Benchmarking Spatially Aligned Audio-Video Generation (18 Dec 2024)
Kazuki Shimada, Christian Simon, Takashi Shibuya, et al.
Kazuki Shimada, Christian Simon, Takashi Shibuya, Shusuke Takahashi, Yuki MitsufujiAV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation (19 Dec 2024)
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, et al,
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Alper Canberk, Kwot Sin Lee, Vicente Ordonez, Sergey TulyakovSyncFlow: Temporally Aligned Joint Audio-Video Generation from Text (3 Dec 2024)
Haohe Liu, Gael Le Lan, Xinhao Mei, et al.
Haohe Liu, Gael Le Lan, Xinhao Mei, Zhaoheng Ni, Anurag Kumar, Varun Nagaraja, Wenwu Wang, Mark D. Plumbley, Yangyang Shi, Vikas ChandraA Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation (26 Sep 2024)
Masato Ishii, Akio Hayakawa, Takashi Shibuya
Masato Ishii, Akio Hayakawa, Takashi Shibuya, Yuki MitsufujiAV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation (11 Jun 2024)
Kai Wang, Shijian Deng, Jing Shi, et al.
Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng TianDiscriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation (28 May 2024)
Akio Hayakawa, Masato Ishii, Takashi Shibuya, et al.
Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki MitsufujiAudioScenic: Audio-Driven Video Scene Editing (25 Apr 2024)
Kaixin Shen, Ruijie Quan, Linchao Zhu, et al.
Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi YangA Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation (22 May 2024)
Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, et al.
Gwanghyun Kim, Alonso Martinez, Yu-Chuan Su, Brendan Jou, JosΓ© Lezama, Agrim Gupta, Lijun Yu, Lu Jiang, Aren Jansen, Jacob Walker, Krishna SomandepalliSemantically consistent Video-to-Audio Generation using Multimodal Language Large Model (25 Apr 2024)
Gehui Chen, Guan'an Wang, Xiaowen Huang, et al.
Gehui Chen, Guan'an Wang, Xiaowen Huang, Jitao SangTAVGBench: Benchmarking Text to Audible-Video Generation (22 Apr 2024)
Yuxin Mao, Xuyang Shen, Jing Zhang, et al.
Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai[ECCV 2024 Oral] ASVA: Audio-Synchronized Visual Animation (8 Mar 2024)
Lin Zhang, Shentong Mo, Yijing Zhang, et al.
Lin Zhang, Shentong Mo, Yijing Zhang, Pedro Morgado[CVPR 2024] Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners (27 Feb 2024)
Yazhou Xing, Yingqing He, Zeyue Tian, et al.
Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng ChenTrΓ€umerAI: Dreaming Music with StyleGAN (9 Feb 2021)
Dasaem Jeong, Seungheon Doh, Taegyun Kwon (NeurIPS Workshop 2020)
Dasaem Jeong, Seungheon Doh, Taegyun KwonSound2Sight: Generating Visual Dynamics from Sound and Context (23 Jul 2020)
Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja. (ECCV 2020)
Anoop Cherian, Moitreya Chatterjee, Narendra Ahuja
Benchmarks
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models (20 Nov 2024)
Ziqi Huang, Fan Zhang, Xiaojie Xu, et al.
Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu[VideoGen-Eval] The Dawn of Video Generation: Preliminary Explorations with SORA-like Models (7 Oct 2024)
Ailing Zeng, Yuhang Yang, Weidong Chen, et al.
Ailing Zeng, Yuhang Yang, Weidong Chen, Wei LiuChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation (26 Jun 2024)
Shenghai Yuan, Jinfa Huang, Yongqi Xu, et al.
Shenghai Yuan, Jinfa Huang, Yongqi Xu, Yaoyang Liu, Shaofeng Zhang, Yujun Shi, Ruijie Zhu, Xinhua Cheng, Jiebo Luo, Li YuanTAVGBench: Benchmarking Text to Audible-Video Generation (22 Apr 2024)
Yuxin Mao, Xuyang Shen, Jing Zhang, et al.
Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao DaiSora Generates Videos with Stunning Geometrical Consistency (27 Feb 2024)
Xuanyi Li, Daquan Zhou, Chenxu Zhang, et al.
Xuanyi Li, Daquan Zhou, Chenxu Zhang, Shaodong Wei, Qibin Hou, Ming-Ming Cheng[CVPR 2024 Highlight] VBench: Comprehensive Benchmark Suite for Video Generative Models (29 Nov 2023)
Ziqi Huang, Yinan He, Jiashuo Yu, et al.
Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu[CVPR 2024] EvalCrafter: Benchmarking and Evaluating Large Video Generation Models (23 Mar 2024)
Yaofang Liu, Xiaodong Cun, Xuebo Liu, et al.
Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan
Datasets
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation (5 Aug 2024)
Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, et al.
Zhiyu Tan, Xiaomeng Yang, Luozheng Qin, Hao LiVript: A Video Is Worth Thousands of Words (10 Jun 2024)
[NIPS 2024 Dataset & Benchmark track] Dongjie Yang, Suyuan Huang, Chengqiang Lu, et al.
Dongjie Yang, Suyuan Huang, Chengqiang Lu, Xiaodong Han, Haoxin Zhang, Yan Gao, Yao Hu, Hai ZhaoMMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions (30 Jul 2024)
Xiaowei Chi, Yatian Wang, Aosong Cheng, et al.
Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike GuoInternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation (13 Jul 2023)
[ICLR 2024 Spotlight] Yi Wang, Yinan He, Yizhuo Li, et al.
Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, Limin Wang, Yu Qiao[HD-VG-130M] VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation (18 May 2023)
Wenjing Wang, Huan Yang, Zixi Tuo, et al.
Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu[VideoCC3M] Learning Audio-Video Modalities from Image Captions (18 May 2023)
[ECCV 2022] Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, et al.
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, Cordelia SchmidCelebV-Text: A Large-Scale Facial Text-Video Dataset (26 Mar 2023)
[CVPR 2023] Jianhui Yu, Hao Zhu, Liming Jiang, et al.
Jianhui Yu, Hao Zhu, Liming Jiang, Chen Change Loy, Weidong Cai, Wayne Wu[HD-VILA-100M] Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions (19 Nov 2021)
[CVPR 2022] Hongwei Xue, Tiankai Hang, Yanhong Zeng, et al.
Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo[YT-Temporal-180M] MERLOT: Multimodal Neural Script Knowledge Models (4 Jun 2021)
[NeurIPS 2021] Rowan Zellers, Ximing Lu, Jack Hessel, et al.
Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi[WebVid-10M] Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval (1 Apr 2021)
[ICCV 2021] Max Bain, Arsha Nagrani, GΓΌl Varol, et al.
Max Bain, Arsha Nagrani, GΓΌl Varol, Andrew Zisserman[WTS70M] Learning Video Representations from Textual Web Supervision (29 Jul 2020)
Jonathan C. Stroud, Zhichao Lu, Chen Sun, et al.
Jonathan C. Stroud, Zhichao Lu, Chen Sun, Jia Deng, Rahul Sukthankar, Cordelia Schmid, David A. RossHowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips (7 Jun 2019)
[ICCV 2019] Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, et al.
Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef SivicVATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research (6 Apr 2019)
[ICCV 2019 Oral] Xin Wang, Jiawei Wu, Junkun Chen, et al.
Xin Wang, Jiawei Wu, Junkun Chen, Lei Li, Yuan-Fang Wang, William Yang WangHow2: A Large-scale Dataset for Multimodal Language Understanding (7 Jun 2019)
[NeurIPS 2018] Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, et al.
Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, LoΓ―c Barrault, Lucia Specia, Florian Metze[ActivityNet Captions] Dense-Captioning Events in Videos (2 May 2017)
[ICCV 2017] Ranjay Krishna, Kenji Hata, Frederic Ren, et al.
Ranjay Krishna, Kenji Hata, Frederic Ren, Li Fei-Fei, Juan Carlos Niebles[LSMDC] Movie Description (12 May 2016)
[IJCV 2017] Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, et al.
Anna Rohrbach, Atousa Torabi, Marcus Rohrbach, Niket Tandon, Christopher Pal, Hugo Larochelle, Aaron Courville, Bernt SchieleMSR-VTT: A Large Video Description Dataset for Bridging Video and Language (1 Apr 2021)
[CVPR 2016] Jun Xu , Tao Mei , Ting Yao, et al.
Jun Xu , Tao Mei , Ting Yao and Yong Rui
3D Generation
π LLM-based
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (2 Mar 2024)
Ziniu Hu, Ahmet Iscen, Aashi Jain, et al.
Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza FathiMotionScript: Natural Language Descriptions for Expressive 3D Human Motions (19 Dec 2023)
Payam Jome Yazdian, Eric Liu, Li Cheng, et al.
Payam Jome Yazdian, Eric Liu, Li Cheng, Angelica LimHOLODECK: Language Guided Generation of 3D Embodied AI Environments (19 Dec 2023)
[CVPR 2024]Yue Yang, Fan-Yun Sun, Luca Weihs, et al.
Yue Yang, Fan-Yun Sun, Luca Weihs, Eli VanderBilt, Alvaro Herrasti, Winson Han, Jiajun Wu, Nick Haber, Ranjay Krishna, Lingjie Liu, Chris Callison-Burch, Mark Yatskar, Aniruddha Kembhavi, Christopher ClarkPoseGPT: Chatting about 3D Human Pose (30 Nov 2023)
Yao Feng, Jing Lin, Sai Kumar Dwivedi, et al.
[CVPR 2024] Yao Feng, Jing Lin, Sai Kumar Dwivedi, Yu Sun, Priyanka Patel, Michael J. Black3D-GPT: Procedural 3D MODELING WITH LARGE LANGUAGE MODELS (19 Oct 2023)
Chunyi Sun, Junlin Han, Weijian Deng, et al.
Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould
Non-LLM-based (Clip/T5)
DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion (12 Mar 2024)
Yuanze Lin, Ronald Clark, Philip Torr.
Yuanze Lin, Ronald Clark, Philip TorrConsistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior (12 Mar 2024)
Zike Wu, Pan Zhou, Xuanyu Yi, et al.
[CVPR 2024]Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang ZhangAToM: Amortized Text-to-Mesh using 2D Diffusion (1 Feb 2024)
Guocheng Qian, Junli Cao, Aliaksandr Siarohin, et al.
Guocheng Qian, Junli Cao, Aliaksandr Siarohin, Yash Kant, Chaoyang Wang, Michael Vasilkovsky, Hsin-Ying Lee, Yuwei Fang, Ivan Skorokhodov, Peiye Zhuang, Igor Gilitschenski, Jian Ren, Bernard Ghanem, Kfir Aberman, Sergey TulyakovDreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior ( 12 Mar 2024)
Tianyu Huang, Yihan Zeng, Zhilu Zhang, et al.
[CVPR 2024]Tianyu Huang, Yihan Zeng, Zhilu Zhang, Wan Xu, Hang Xu, Songcen Xu, Rynson W. H. Lau, Wangmeng ZuoUniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation (14 Dec 2023)
Zexiang Liu, Yangguang Li, Youtian Lin, et al.
Zexiang Liu, Yangguang Li, Youtian Lin, Xin Yu, Sida Peng, Yan-Pei Cao, Xiaojuan Qi, Xiaoshui Huang, Ding Liang, Wanli OuyangSherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior (11 Dec 2023)
[CVPR 2024] Fangfu Liu, Diankun Wu, Yi Wei, et al.
Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi DuanLearn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting (8 Dec 2023)
Xiaofeng Yang, Yiwen Chen, Cheng Chen, et al.
Xiaofeng Yang, Yiwen Chen, Cheng Chen, Chi Zhang, Yi Xu, Xulei Yang, Fayao Liu, Guosheng LinDreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling (28 Nov 2023)
Linqi Zhou, Andy Shih, Chenlin Meng, et al.
Linqi Zhou, Andy Shih, Chenlin Meng, Stefano ErmonRichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D (28 Nov 2023)
[CVPR 2024] Lingteng Qiu, Guanying Chen, Xiaodong Gu, et al.
Lingteng Qiu, Guanying Chen, Xiaodong Gu, Qi Zuo, Mutian Xu, Yushuang Wu, Weihao Yuan, Zilong Dong, Liefeng Bo, Xiaoguang HanDreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models (30 Nov 2023)
[CVPR 2024] Yukang Cao, Yan-Pei Cao, Kai Han, et al.
Yukang Cao, Yan-Pei Cao, Kai Han, Ying Shan, Kwan-Yee K. WongLucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching (2 Dec 2023)
[CVPR 2024] Yixun Liang, Xin Yang, Jiantao Lin, et al.
Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiaogang Xu, Yingcong ChenGaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models (12 Oct 2023)
[CVPR 2024] Taoran Yi, Jiemin Fang, Junjie Wang, et al.
Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, Xinggang WangText-to-3D using Gaussian Splatting (28 Sep 2023)
[CVPR 2024] Zilong Chen, Feng Wang, Huaping Liu
Zilong Chen, Feng Wang, Huaping LiuEfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior (10 Sep 2023)
[CVPR 2024] Zhipeng Hu, Minda Zhao, Chaoyi Zhao, Xinyue Liang, Lincheng Li, Zeng Zhao, Changjie Fan, Xiaowei Zhou, Xin Yu
TADA! Text to Animatable Digital Avatars (21 Aug 2023)
[3DV 2024] Tingting Liao, Hongwei Yi, Yuliang Xiu, et al.
Tingting Liao, Hongwei Yi, Yuliang Xiu, Jiaxaing Tang, Yangyi Huang, Justus Thies, Michael J. BlackSweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D (20 Oct 2023 )
[ICLR 2024] Weiyu Li, Rui Chen, Xuelin Chen, et al.
Weiyu Li, Rui Chen, Xuelin Chen, Ping TanNoise-Free Score Distillation (26 Oct 2023)
[ICLR 2024] Oren Katzir, Or Patashnik, Daniel Cohen-Or, et al.
Oren Katzir, Or Patashnik, Daniel Cohen-Or, Dani LischinskiText-to-3D with Classifier Score Distillation (26 Oct 2023 )
[ICLR 2024] Xin Yu, Yuan-Chen Guo, Yangguang Li, et al.
Xin Yu, Yuan-Chen Guo, Yangguang Li, Ding Liang, Song-Hai Zhang, Xiaojuan QiHiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance (28 Nov 2023)
[ICLR 2024] Junzhe Zhu, Peiye Zhuang.
Junzhe Zhu, Peiye ZhuangMVDream: Multi-view Diffusion for 3D Generation (31 Aug 2023)
[ICLR 2024] Yichun Shi, Peng Wang, Jianglong Ye, et al.
Yichun Shi, Peng Wang, Jianglong Ye, Mai Long, Kejie Li, Xiao YangDreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation (28 Sep 2023)
[ICLR 2024] Jiaxiang Tang, Jiawei Ren, Hang Zhou, et al.
Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang ZengLet 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation (11 Apr 2023)
[ICLR 2024] Junyoung Seo, Wooseok Jang, Min-Seop Kwak, et al.
Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong KimIT3D: Improved Text-to-3D Generation with Explicit View Synthesis (22 Aug 2023)
[AAAI 2024] Yiwen Chen, Chi Zhang, Xiaofeng Yang, et al.
Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng LinHD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation (30 Jul 2023)
[WACV 2024] Jinbo Wu, Xiaobo Gao, Xing Liu, et al.
Jinbo Wu, Xiaobo Gao, Xing Liu, Zhengyang Shen, Chen Zhao, Haocheng Feng, Jingtuo Liu, Errui DingRe-imagine the Negative Prompt Algorithm: Transform 2D Diffusion into 3D, alleviate Janus problem and Beyond (11 Apr 2023)
Mohammadreza Armandpour, Ali Sadeghian, Huangjie Zheng, et al.
Mohammadreza Armandpour, Ali Sadeghian, Huangjie Zheng, Amir Sadeghian, Mingyuan ZhouLatent-NeRF for Shape-Guided Generation of 3D Shapes and Textures (14 Nov 2022)
[CVPR 2023] Gal Metzer, Elad Richardson, Or Patashnik, et al.
Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, Daniel Cohen-OrMagic3D: High-Resolution Text-to-3D Content Creation (18 Nov 2022)
[CVPR 2023 Highlight] Chen-Hsuan Lin, Jun Gao, Luming Tang, et al.
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, Tsung-Yi LinScore Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation (1 Dec 2022)
[CVPR 2023] Haochen Wang, Xiaodan Du, Jiahao Li, et al.
Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, Greg ShakhnarovichHigh-fidelity 3D Face Generation from Natural Language Descriptions (5 May 2023)
[CVPR 2023] Menghua Wu, Hao Zhu, Linjia Huang, et al.
Menghua Wu, Hao Zhu, Linjia Huang, Yiyu Zhuang, Yuanxun Lu, Xun CaoRODIN: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (12 Dec 2022)
[CVPR 2023 Highlight] Tengfei Wang, Bo Zhang, Ting Zhang, et al.
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining GuoClipFace: Text-guided Editing of Textured 3D Morphable Models (24 Apr 2023)
[SIGGRAPH 2023] Tengfei Wang, Bo Zhang, Ting Zhang, et al.
Tengfei Wang, Bo Zhang, Ting Zhang, Shuyang Gu, Jianmin Bao, Tadas Baltrusaitis, Jingjing Shen, Dong Chen, Fang Wen, Qifeng Chen, Baining GuoDreamFusion: Text-to-3D using 2D Diffusion (29 Sep 2022)
[ICLR 2023 Oral] Ben Poole, Ajay Jain, Jonathan T. Barron, et al.
Ben Poole, Ajay Jain, Jonathan T. Barron, Ben MildenhallProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (25 May 2023)
[NeurIPS 2023 Spotlight] Zhengyi Wang, Cheng Lu, Yikai Wang, et al.
Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, Jun ZhuHeadSculpt: Crafting 3D Head Avatars with Text (25 May 2023)
[NeurIPS 2023] Xiao Han, Yukang Cao, Kai Han, et al.
Xiao Han, Yukang Cao, Kai Han, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang, Kwan-Yee K. WongATT3D: Amortized Text-to-3D Object Synthesis (6 Jun 2023)
[ICCV 2023] Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, et al.
Jonathan Lorraine, Kevin Xie, Xiaohui Zeng, Chen-Hsuan Lin, Towaki Takikawa, Nicholas Sharp, Tsung-Yi Lin, Ming-Yu Liu, Sanja Fidler, James LucasFantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation (24 Mar 2023)
[ICCV 2023] Rui Chen, Yongwei Chen, Ningxin Jiao, et al.
Rui Chen, Yongwei Chen, Ningxin Jiao, Kui JiaText2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models (10 Sep 2023)
[ICCV 2023] Lukas HΓΆllein, Ang Cao, Andrew Owens, et al.
Lukas HΓΆllein, Ang Cao, Andrew Owens, Justin Johnson, Matthias NieΓnerX-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance (28 Mar 2023)
[ICCV 2023] Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, et al.
Yiwei Ma, Xiaioqing Zhang, Xiaoshuai Sun, Jiayi Ji, Haowei Wang, Guannan Jiang, Weilin Zhuang, Rongrong JiStyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation (31 May 2023)
Chi Zhang, Yiwen Chen, Yijun Fu, et al.
Chi Zhang, Yiwen Chen, Yijun Fu, Zhenglin Zhou, Gang YU, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua ShenTextMesh: Generation of Realistic 3D Meshes From Text Prompts (24 Apr 2023)
[3DV 2023] Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, et al.
Christina Tsalicoglou, Fabian Manhardt, Alessio Tonioni, Michael Niemeyer, Federico TombariClip-forge: Towards zero-shot text-to-shape generation (28 Apr 2022)
[CVPR 2022] Aditya Sanghi, Hang Chu, Joseph G. Lambourne, et al.
Aditya Sanghi, Hang Chu, Joseph G. Lambourne, Ye Wang, Chin-Yi Cheng, Marco Fumero, Kamal Rahimi MalekshanZero-Shot Text-Guided Object Generation with Dream Fields (2 Dec 2021)
[CVPR 2022] Ajay Jain, Ben Mildenhall, Jonathan T. Barron, et al.
Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, Ben PooleText2Mesh: Text-Driven Neural Stylization for Meshes (6 Dec 2021)
[CVPR 2022] Oscar Michel, Roi Bar-On, Richard Liu, et al.
Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, Rana HanockaTANGO: Text-driven Photorealistic and Robust 3D Stylization via Lighting Decomposition (20 Oct 2022)
[NeurIPS 2022 Spotlight] Yongwei Chen, Rui Chen, Jiabao Lei, et al.
Yongwei Chen, Rui Chen, Jiabao Lei, Yabin Zhang, Kui JiaCLIP-Mesh: Generating textured meshes from text using pretrained image-text models (24 Mar 2022)
[SIGGRAPH ASIA 2022] Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, et al.
Nasir Mohammad Khalid, Tianhao Xie, Eugene Belilovsky, Tiberiu PopaMotionCLIP: Exposing Human Motion Generation to CLIP Space (15 Mar 2022)
[ECCV 2022] Guy Tevet, Brian Gordon, Amir Hertz, et al.
Guy Tevet, Brian Gordon, Amir Hertz, Amit H. Bermano, Daniel Cohen-Or
Datasets
Objaverse-XL: A Universe of 10M+ 3D Objects (11 Jul 2023)
Matt Deitke, Dustin Schwenk, Jordi Salvador, et al.
Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali FarhadiObjaverse: A Universe of Annotated 3D Objects (15 Dec 2022)
[CVPR 2023] Matt Deitke, Dustin Schwenk, Jordi Salvador, et al.
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, Ali Farhadi
Audio Generation
π LLM-based
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation (27 Feb 2024)
Shuangrui Ding, Zihan Liu, Xiaoyi Dong, et al.
Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi WangChatMusician: Understanding and Generating Music Intrinsically with LLM (25 Feb 2024)
Ruibin Yuan, Hanfeng Lin, Yi Wang, et al.
Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu, Tao Jiang, Wenhao Huang, Wenhu Chen, Emmanouil Benetos, Jie Fu, Gus Xia, Roger Dannenberg, Wei Xue, Shiyin Kang, Yike GuoAnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling (19 Feb 2024)
Jun Zhan, Junqi Dai, Jiasheng Ye, et al.
Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, Hang Yan, Jie Fu, Tao Gui, Tianxiang Sun, Yugang Jiang, Xipeng QiuBoosting Large Language Model for Speech Synthesis: An Empirical Study (30 Dec 2023)
Hongkun Hao, Long Zhou, Shujie Liu, et al.
Hongkun Hao, Long Zhou, Shujie Liu, Jinyu Li, Shujie Hu, Rui Wang, Furu WeiUnified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action (28 Dec 2023)
Jiasen Lu, Christopher Clark, Sangho Lee, et al.
Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha KembhaviM2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models (19 Nov 2023)
Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, et al.
Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, Ying ShanLauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT (7 Oct 2023)
Jiaming Wang, Zhihao Du, Qian Chen, et al.
Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang ZhangLLaSM: Large Language and Speech Model (30 Aug 2023)
Yu Shu, Siwei Dong, Guangyao Chen, et al.
Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin ShiAudioPaLM: A Large Language Model That Can Speak and Listen (22 Jun 2023)
Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, et al.
Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, ZalΓ‘n Borsos, FΓ©lix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo VelimiroviΔ, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian FrankPengi: An Audio Language Model for Audio Tasks (19 May 2023)
Soham Deshmukh, Benjamin Elizalde, Rita Singh, et al.
Soham Deshmukh, Benjamin Elizalde, Rita Singh, Huaming WangSpeechgpt: Empowering large language models with intrinsic cross-modal conversational abilities (18 May 2023)
Dong Zhang, Shimin Li, Xin Zhang, et al.
Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng QiuSparks of Artificial General Intelligence: Early experiments with GPT-4 (22 Mar 2023)
SΓ©bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, et al.
SΓ©bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
Non-LLM-based
Audiobox: Unified Audio Generation with Natural Language Prompts (25 Dec 2023)\ Apoorv Vyas, Bowen Shi, Matthew Le\
Music ControlNet: Multiple Time-varying Controls for Music Generation (13 Nov 2023)
Shih-Lun Wu, Chris Donahue, Shinji Watanabe, et al.
Shih-Lun Wu, Chris Donahue, Shinji Watanabe, Nicholas J. BryanLoop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing (19 Oct 2023)
Yixiao Zhang, Akira Maezawa, Gus Xia, et al.
Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon DixonMusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models (18 Oct 2023)
Dingyao Yu, Kaitao Song, Peiling Lu, et al.
Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang BianUniAudio: An Audio Foundation Model Toward Universal Audio Generation (1 Oct 2023)\ Dongchao Yang, Jinchuan Tian, Xu Tan\
AudioLM: a Language Modeling Approach to Audio Generation (7 Sep 2022)
ZalΓ‘n Borsos, RaphaΓ«l Marinier, Damien Vincent, et al. (IEEE/ACM Transactions on Audio, Speech, and Language Processing)
ZalΓ‘n Borsos, RaphaΓ«l Marinier, Damien Vincent, Eugene Kharitonov, Olivier Pietquin, Matt Sharifi, Dominik Roblek, Olivier Teboul, David Grangier, Marco Tagliasacchi, Neil ZeghidourWavjourney: Compositional audio creation with large language models (26 Jul 2023)
Xubo Liu, Zhongkai Zhu, Haohe Liu, et al.
Xubo Liu, Zhongkai Zhu, Haohe Liu, Yi Yuan, Meng Cui, Qiushi Huang, Jinhua Liang, Yin Cao, Qiuqiang Kong, Mark D. Plumbley, Wenwu WangInvestigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody (16 Jun 2023)
Sofoklis Kakouros, Juraj Ε imko, Martti Vainio, et al. (2023 SSW)
Sofoklis Kakouros, Juraj Ε imko, Martti Vainio, Antti SuniSimple and Controllable Music Generation (8 Jun 2023)
Jade Copet, Felix Kreuk, Itai Gat, et al.
Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre DΓ©fossezMake-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation (29 May 2023)
Jiawei Huang, Yi Ren, Rongjie Huang, et al.
Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou ZhaoJukebox: A Generative Model for Music (30 Apr 2020)
Prafulla Dhariwal, Heewoo Jun, Christine Payne, et al.
Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya SutskeverAudiogpt: Understanding and generating speech, music, sound, and talking head (25 Apr 2023)
Rongjie Huang, Mingze Li, Dongchao Yang, et al.
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji WatanabeTANGO: Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model (24 Apr 2023)
Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, et al.
Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya PoriaHugginggpt: Solving ai tasks with chatgpt and its friends in huggingface (30 Mar 2023)
Yongliang Shen, Kaitao Song, Xu Tan, et al.
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting ZhuangNeural codec language models are zero-shot text to speech synthesizers (5 Jan 2023)
Chengyi Wang, Sanyuan Chen, Yu Wu, et al.
Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu WeiMusicLM: Generating Music From Text (26 Jan 2023)
Andrea Agostinelli, Timo I. Denk, ZalΓ‘n Borsos, et al.
Andrea Agostinelli, Timo I. Denk, ZalΓ‘n Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian FrankDatasets
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context (15 Sep 2023)
Wei Kang, Xiaoyu Yang, Zengwei Yao, et al.
Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel PoveyWenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition (7 Oct 2021)
BinBin Zhang, Hang Lv, Pengcheng Guo, et al.
BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong PengVggsound: A large-scale audio-visual dataset (29 Apr 2020)
Honglie Chen, Weidi Xie, Andrea Vedaldi, et al. (ICASSP)
Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew ZissermanLibri-Light: A Benchmark for ASR with Limited or No Supervision (17 Dec 2019 )
Jacob Kahn, Morgane Rivière, Weiyi Zheng, et al. (ICASSP)
Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel DupouxThe mtg-jamendo dataset for automatic music tagging (15 Jun 2019)
Dmitry Bogdanov, Minz Won, Philip Tovstogan, et al. (ICML)
Dmitry Bogdanov, Minz Won, Philip Tovstogan, Alastair Porter, Xavier SerraLibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech (5 Apr 2019)
Heiga Zen, Viet Dang, Rob Clark, et al.
Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui WuEnabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset (29 Oct 2018)
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, et al.
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas EckAudio Set: An ontology and human-labeled dataset for audio events (05 Mar 2017)
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, et al. (TASLP)
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, Marvin RitterLibrispeech: An ASR corpus based on public domain audio books (19 Apr2015)
Vassil Panayotov, Guoguo Chen, Daniel Povey, et al. (ICASSP)
Vassil Panayotov, Guoguo Chen, Daniel Povey, Sanjeev KhudanpurEvaluation of Algorithms Using Games: The Case of Music Tagging (26 Oct 2009)
Edith Law, Kris West, Michael Mandel, et al. (ISMIR)
Edith Law, Kris West, Michael Mandel, Mert Bay J. Stephen Downie
Generation with Multiple Modalities
π LLM-based
C3LLM: Conditional Multimodal Content Generation Using Large Language Models (25 May 2024)
Zixuan Wang, Qinkai Duan, Yu-Wing Tai, et al.
Zixuan Wang, Qinkai Duan, Yu-Wing Tai, Chi-Keung TangCoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation (30 Nov 2023)
Zineng Tang, Ziyi Yang, Mahmoud Khademi, et al.
Zineng Tang, Ziyi Yang, Mahmoud Khademi, Yang Liu, Chenguang Zhu, Mohit BansalTEAL: Tokenize and Embed ALL for Multi-modal Large Language Models (8 Nov 2023)
Zhen Yang, Yingxue Zhang, Fandong Meng, et al.
Zhen Yang, Yingxue Zhang, Fandong Meng, Jie ZhoutokenizerNExT-GPT: Any-to-Any Multimodal LLM (11 Sep 2023)
Shengqiong Wu, Hao Fei, Leigang Qu, et al.
Shengqiong Wu, Hao Fei, Leigang Qu, Wei Ji, Tat-Seng ChuaCoDi: Any-to-Any Generation via Composable Diffusion (19 May 2023)
[NeurIPS 2023] Zineng Tang, Ziyi Yang, Chenguang Zhu, et al.
Zineng Tang, Ziyi Yang, Chenguang Zhu, Michael Zeng, Mohit Bansal
Non-LLM-based
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (9 Jan 2024)
[CVPR 2024] Junming Chen, et al.
Junming Chen, Yunfei Liu, Jianan Wang, Ailing Zeng, Yu Li, Qifeng ChenSeeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners (27 Feb 2024)
[CVPR 2024] Yazhou Xing, Yingqing He, Zeyue Tian, et al.
Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen
π Multimodal Editing
Image Editing
π LLM-based
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale (7 Jul 2024)
Haozhe Zhao, Xiaojian Ma, Liang Chen, et al.
Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao ChangTIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing (27 May 2024)
Xinyu Zhang, Mengxue Kang, Fei Wei, et al.
Xinyu Zhang, Mengxue Kang, Fei Wei, Shuang Xu, Yuhe Liu, Lin MaSmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models (11 Dec 2023)
[CVPR 2024] Yuzhou Huang, Liangbin Xie, Xintao Wang, et al.
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying ShanSelf-correcting LLM-controlled Diffusion Models (27 Nov 2023)
[CVPR 2024] Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, et al.
Tsung-Han Wu, Long Lian, Joseph E. Gonzalez, Boyi Li, Trevor DarrellEmu Edit: Precise Image Editing via Recognition and Generation Tasks (16 Nov 2023)
[ArXiv 2023] Shelly Sheynin, Adam Polyak, Uriel Singer, et al.
Shelly Sheynin, Adam Polyak, Uriel Singer, Yuval Kirstain, Amit Zohar, Oron Ashual, Devi Parikh, Yaniv TaigmanGuiding Instruction-based Image Editing via Multimodal Large Language Models
[ICLR 2024 (Spotlight)] Tsu-Jui Fu, Wenze Hu, Xianzhi Du, et al.
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, Zhe GanCHATEDIT: Towards Multi-turn Interactive Facial Image Editing via Dialogue (20 Mar 2023)
[EMNLP 2023] Xing Cui, Zekun Li, Peipei Li, et al.
Xing Cui, Zekun Li, Peipei Li, Yibo Hu, Hailin Shi, Zhaofeng HeHIVE: Harnessing Human Feedback for Instructional Visual Editing (16 Mar 2023)
Shu Zhang, Xinyi Yang, Yihao Feng, et al.
Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran Xu.Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (8 Mar 2023)
Chenfei Wu, Shengming Yin, Weizhen Qi, et al.
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan DuanInstructPix2Pix: Learning to Follow Image Editing Instructions (17 Nov 2022)\ [CVPR 2023 (Highlight)] Brooks, Tim, Aleksander Holynski, and Alexei A. Efros.\
Non-LLM-based (Clip/T5)
SeedEdit: Align Image Re-Generation to Image Editing (11 Nov 2024)\ Yichun Shi, Peng Wang, Weilin Huang \
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing (4 Feb 2024)
[CVPR 2024] Chong Mou, Xintao Wang, Jiechong Song, et al.
Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang.ZONE: Zero-Shot Instruction-Guided Local Editing (28 Dec 2023)
Shanglin Li, Bohan Zeng, Yutang Feng, et al.
Shanglin Li, Bohan Zeng, Yutang Feng, Sicheng Gao, Xuhui Liu, Jiaming Liu, Li Lin, Xu Tang, Yao Hu, Jianzhuang Liu, Baochang Zhang.Watch Your Steps: Local Image and Scene Editing by Text Instructions (17 Aug 2023 )
Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, et al.
Ashkan Mirzaei, Tristan Aumentado-Armstrong, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski.Dragondiffusion: Enabling drag-style manipulation on diffusion models (5 Jul 2023)
[ICLR 2024] Chong Mou, Xintao Wang, Jiechong Song, et al.
Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang.Differential Diffusion: Giving Each Pixel Its Strength (1 Jun 2023)
[Arxiv 2023] Thao Nguyen, Yuheng Li, Utkarsh Ojha, et al.
Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae LeeVisual Instruction Inversion: Image Editing via Visual Prompting (26 Jul 2023)
[ArXiv 2023] Thao Nguyen, Yuheng Li, Utkarsh Ojha, et al.
Thao Nguyen, Yuheng Li, Utkarsh Ojha, Yong Jae Lee.MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing (17 Apr 2023)
[ICCV 2023] Mingdeng Cao, Xintao Wang, Zhongang Qi, et al.
Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, Yinqiang Zheng.PAIR-Diffusion: A Comprehensive Multimodal Object-Level Image Editor (30 Mar 2023)
[ArXiv 2023] Vidit Goel, Elia Peruzzo, Yifan Jiang, et al.
Vidit Goel, Elia Peruzzo, Yifan Jiang, Dejia Xu, Xingqian Xu, Nicu Sebe, Trevor Darrell, Zhangyang Wang, Humphrey Shi.Zero-shot Image-to-Image Translation (6 Feb 2023)
[SIGGRAPH 2023] Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, et al.
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu.SINE: SINgle Image Editing with Text-to-Image Diffusion Models (8 Dec 2022)
[CVPR 2023] Zhixing Zhang, Ligong Han, Arnab Ghosh, et al.
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris Metaxas, Jian Ren.Interactive Image Manipulation with Complex Text Instructions (25 Nov 2022)
[WACV 2023] Ryugo Morita, Zhiqiang Zhang, Man M. Ho, et al.
Ryugo Morita, Zhiqiang Zhang, Man M. Ho, Jinjia Zhou.Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation (22 Nov 2022)
[CVPR 2023] Narek Tumanyan, Michal Geyer, Shai Bagon, et al.
Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel.Imagic: Text-Based Real Image Editing with Diffusion Models (17 Oct 2022)
[CVPR 2023] Bahjat Kawar, Shiran Zada, Oran Lang, et al.
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, Michal Irani.<!--
-->
Null-text Inversion for Editing Real Images using Guided Diffusion Models
[ICLR 2023] Ron Mokady, Amir Hertz, Kfir Aberman, et al.
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or.Prompt-to-Prompt Image Editing with Cross Attention Control
[ICLR 2023] Amir Hertz, Ron Mokady, Jay Tenenbaum, et al.
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, Daniel Cohen-Or.DiffEdit: Diffusion-based semantic image editing with mask guidance (20 Oct 2022)
[ICLR 2023] Guillaume Couairon, Jakob Verbeek, Holger Schwenk, et al.
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord.<!--
--> <!--
-->
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation (6 Oct 2021)\ [CVPR 2022] Gwanghyun Kim, Taesung Kwon, Jong Chul Ye.\
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations (2 Aug 2021)
[ICLR 2022] Chenlin Meng, Yutong He, Yang Song, et al.
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, Stefano Ermon.
Video Editing
π LLM-based
CONSISTENT VIDEO-TO-VIDEO TRANSFER USING SYNTHETIC DATASET (1 Nov 2023)\ Jiaxin Cheng, Tianjun Xiao, Tong He.\
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions (21 May 2023)
Bosheng Qin, Juncheng Li, Siliang Tang, et al.
Bosheng Qin, Juncheng Li, Siliang Tang, Tat-Seng Chua, Yueting Zhuang.<!--
-->
Non-LLM-based (Clip/T5)
AudioScenic: Audio-Driven Video Scene Editing (25 Apr 2024)
Kaixin Shen, Ruijie Quan, Linchao Zhu, et al.
Kaixin Shen, Ruijie Quan, Linchao Zhu, Jun Xiao, Yi YangLATENTWARP: CONSISTENT DIFFUSION LATENTS FOR ZERO-SHOT VIDEO-TO-VIDEO TRANSLATION (1 Nov 2023)
Yuxiang Bao, Di Qiu, Guoliang Kang, et al.
Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan.<!--
--> <!--
-->
MagicStick: Controllable Video Editing via Control Handle Transformations (1 Nov 2023)
Yue Ma, Xiaodong Cun, Yingqing He, et al.
Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen) )
MagicEdit: High-Fidelity Temporally Coherent Video Editing (28 Aug 2023)
Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, et al.
Jun Hao Liew, Hanshu Yan, Jianfeng Zhang, Zhongcong Xu, Jiashi Feng.StableVideo: Text-driven Consistency-aware Diffusion Video Editing (18 Aug 2023)
[ICCV 2023] Wenhao Chai, Xun Guo, Gaoang Wang, et al.
Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu.CoDeF: Content Deformation Fields for Temporally Consistent Video Processing (15 Aug 2023)
Hao Ouyang, Qiuyu Wang, Yuxi Xiao, et al.
Hao Ouyang, Qiuyu Wang, Yuxi Xiao, Qingyan Bai, Juntao Zhang, Kecheng Zheng, Xiaowei Zhou, Qifeng Chen.TokenFlow: Consistent Diffusion Features for Consistent Video Editing (19 Jul 2023)
Michal Geyer, Omer Bar-Tal, Shai Bagon, et al.
Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel.Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation (13 Jun 2023)
Shuai Yang, Yifan Zhou, Ziwei Liu, et al.
Shuai Yang, Yifan Zhou, Ziwei Liu, Chen Change Loy.ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing (26 May 2023)
Min Zhao, Rongzhen Wang, Fan Bao, et al.
Min Zhao, Rongzhen Wang, Fan Bao, Chongxuan Li, Jun Zhu.Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts (15 May 2023) Michal Geyer, Omer Bar-Tal, Shai Bagon, Tali Dekel.\
Pix2Video: Video Editing using Image Diffusion (22 Mar 2023)\ [ICCV 2023] Ceylan, Duygu, Chun-Hao P. Huang, and Niloy J. Mitra.\
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing (16 Mar 2023)
[ICCV 2023] Chenyang Qi, Xiaodong Cun, Yong Zhang, et al.
Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen.Video-P2P: Video Editing with Cross-attention Control (8 Mar 2023)
Shaoteng Liu, Yuechen Zhang, Wenbo Li, et al.
Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia.Dreamix: Video Diffusion Models are General Video Editors (2 Feb 2023)
Eyal Molad, Eliahu Horwitz, Dani Valevski, et al.
Eyal Molad, Eliahu Horwitz, Dani Valevski, Alex Rav Acha, Yossi Matias, Yael Pritch, Yaniv Leviathan, Yedid Hoshen.Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation (22 Dec 2022)
[ICCV 2023] Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, et al.
Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, Mike Zheng Shou.M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers (2 Apr 2021)
[CVPR 2022] Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, et al.
Tsu-Jui Fu, Xin Eric Wang, Scott T. Grafton, Miguel P. Eckstein, William Yang Wang.
3D Editing
π LLM-based
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code (2 Mar 2024)
Ziniu Hu, Ahmet Iscen, Aashi Jain, et al.
Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi3D-GPT: Procedural 3D MODELING WITH LARGE LANGUAGE MODELS (19 Oct 2023)
Chunyi Sun, Junlin Han, Weijian Deng, et al.
Chunyi Sun, Junlin Han, Weijian Deng, Xinlong Wang, Zishan Qin, Stephen Gould
Non-LLM-based (Clip/T5)
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models (16 Nov 2023)
Xianfang Zeng, Xin Chen, Zhongqi Qi, et al.
Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation (16 Nov 2023)
Dale Decatur, Itai Lang, Kfir Aberman, et al.
Dale Decatur, Itai Lang, Kfir Aberman, Rana HanockaBlending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields (23 Aug 2023)
Hyeonseop Song, Seokhun Choi, Hoseok Do, et al.
Hyeonseop Song, Seokhun Choi, Hoseok Do, Chul Lee, Taehyeong KimSINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field (23 Mar 2023)
[CVPR 2023] Chong Bao, Yinda Zhang, Bangbang Yang, et al.
Chong Bao, Yinda Zhang, Bangbang Yang, Tianxing Fan, Zesong Yang, Hujun Bao, Guofeng Zhang, Zhaopeng CuiTextDeformer: Geometry Manipulation using Text Guidance (26 Apr 2023)
[TVCG 2022] William Gao, Noam Aigerman, Thibault Groueix, et al.
William Gao, Noam Aigerman, Thibault Groueix, Vladimir G. Kim, Rana HanockaInstruct-NeRF2NeRF: Editing 3D Scenes with Instructions (22 Mar 2023)
[SIGGRAPH Asia 2023] Ayaan Haque, Matthew Tancik, Alexei A. Efros, et al.
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo KanazawaDreamEditor: Text-Driven 3D Scene Editing with Neural Fields (23 Jun 2023)
[SIGGRAPH Asia 2023] Jingyu Zhuang, Chen Wang, Lingjie Liu, et al.
Jingyu Zhuang, Chen Wang, Lingjie Liu, Liang Lin, Guanbin LiSKED: Sketch-guided Text-based 3D Editing (19 Mar 2023)
[ICCV 2023] Aryan Mikaeili, Or Perel, Mehdi Safaee, et al.
Aryan Mikaeili, Or Perel, Mehdi Safaee, Daniel Cohen-Or, Ali Mahdavi-AmiriBlended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields (22 Jun 2023)
[ICCVW 2023] Ori Gordon, Omri Avrahami, Dani Lischinski.
Ori Gordon, Omri Avrahami, Dani LischinskiClipFace: Text-guided Editing of Textured 3D Morphable Modelssting Neural Radiance Fields (2 Dec 2022)
[SIGGRAPH 2023] Shivangi Aneja, Justus Thies, Angela Dai, et al.
Shivangi Aneja, Justus Thies, Angela Dai, Matthias NieΓnerCLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fieldsadiance Fields (9 Dec 2021)
[CVPR 2022] Can Wang, Menglei Chai, Mingming He, et al.
Can Wang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
Audio Editing
π LLM-based
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing (19 Oct 2023)
Yixiao Zhang, Akira Maezawa, Gus Xia, et al.
Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon DixonUniAudio: An Audio Foundation Model Toward Universal Audio Generation (1 Oct 2023)\ Dongchao Yang, Jinchuan Tian, Xu Tan\
Non-LLM-based (Clip/T5)
π Multimodal Agents
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing (1 Nov 2023)
Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, et al.
Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li\ Tags:
Image ChatImage Segmentation,Image GenerationImage EditingControlLLM: Augment Language Models with Tools by Searching on Graphs (26 Oct 2023)
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, et al.
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao, Erfei Cui, Ziheng Li, Xizhou Zhu, Lewei Lu, Qifeng Chen, Yu Qiao, Jifeng Dai, Wenhai Wang\ Tags:
Image UnderstandingImage GenerationImage EditingVideo UnderstandingVideo GenerationVideo EditingAudio UnderstandingAudio GenerationImageBind-LLM: Multi-modality Instruction Tuning (7 Sep 2023)
Jiaming Han, Renrui Zhang, Wenqi Shao, et al.
Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao\ Modalities:
textimagevideoaudiopoint cloudModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models (2 Sep 2023)
Chenliang Li, Hehong Chen, Ming Yan, et al.
Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren ZhouInternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language (9 May 2023)
Zhaoyang Liu, Yinan He, Wenhai Wang, et al.
Zhaoyang Liu, Yinan He, Wenhai Wang, Weiyun Wang, Yi Wang, Shoufa Chen, Qinglong Zhang, Zeqiang Lai, Yang Yang, Qingyun Li, Jiashuo Yu, Kunchang Li, Zhe Chen, Xue Yang, Xizhou Zhu, Yali Wang, Limin Wang, Ping Luo, Jifeng Dai, Yu Qiao\ Condition Modality:
textimagevideoaudioHuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face (30 Mar 2023)
Yongliang Shen, Kaitao Song, Xu Tan, et al.
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting ZhuangVisual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models (8 Mar 2023)
Chenfei Wu, Shengming Yin, Weizhen Qi, et al.
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan
π Multimodal Understanding with LLMs
Multiple modalities
- Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities (9 Nov 2023)
[CVPR 2024] AJ Piergiovanni, Isaac Noble, Dahun Kim, et al.
AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelovatext, video, audio
Image Understanding
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions (11 Jun 2024)
Renjie Pi, Jianshu Zhang, Jipeng Zhang et al.
Renjie Pi, Jianshu Zhang, Jipeng Zhang, Rui Pan, Zhekai Chen, Tong ZhangT2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text (11 Jun 2024)
[ACL 2024] Aoxiong Yin, Haoyuan Li, Kai Shen et al.
Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting ZhuangOpen-World Human-Object Interaction Detection via Multi-modal Prompts (11 Jun 2024)
Jie Yang, Bingliang Li, Ailing Zeng et al.
Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao ZhangCommonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? (11 Jun 2024)
Xingyu Fu, Muyu He, Yujie Lu et al.
Xingyu Fu, Muyu He, Yujie Lu, William Yang Wang, Dan RothInternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks (21 Dec 2023)
Zhe Chen, Jiannan Wu, Wenhai Wang, et al.
Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng DaiLLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (28 Nov 2023)\ Yanwei Li, Chengyao Wang, Jiaya Jia\
CogVLM: Visual Expert for Pretrained Language Models (6 Nov 2023)
Weihan Wang, Qingsong Lv, Wenmeng Yu, et al.
Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie TangMiniGPT-v2: large language model as a unified interface for vision-language multi-task learning (14 Oct 2023)
Jun Chen, Deyao Zhu, Xiaoqian Shen, et al.
Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed ElhoseinyOphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue (21 Jun 2023)
Weihao Gao, Zhuo Deng, Zhiyuan Niu, et al.
Weihao Gao, Zhuo Deng, Zhiyuan Niu, Fuju Rong, Chucheng Chen, Zheng Gong, Wenze Zhang, Daimin Xiao, Fang Li, Zhenjie Cao, Zhaoyi Ma, Wenbin Wei, Lan MaInternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition (26 Sep 2023)
Pan Zhang, Xiaoyi Dong, Bin Wang, et al.
Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang[LaVIT] Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization (9 Sep 2023)
Yang Jin, Kun Xu, Kun Xu, et al.
Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin Chen, Chenyi Lei, An Liu, Chengru Song, Xiaoqiang Lei, Di Zhang, Wenwu Ou, Kun Gai, Yadong MutokenizerQwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond (24 Aug 2023)
Jinze Bai, Shuai Bai, Shusheng Yang, et al.
Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, Jingren ZhouVisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks (18 May 2023)
[NeurIPS 2023] Wenhai Wang, Zhe Chen, Xiaokang Chen, et al.
Wenhai Wang, Zhe Chen, Xiaokang Chen, Jiannan Wu, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng DaiInstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning (11 May 2023)
Wenliang Dai, Junnan Li, Dongxu Li, et al.
Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven HoiMiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models (20 Apr 2023)
Deyao Zhu, Jun Chen, Xiaoqian Shen, et al.
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed ElhoseinyVisual Instruction Tuning (17 Apr 2023)
[NeurIPS 2023 (Oral)] Liu, Haotian, et al.
Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
Video Understanding
StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification (11 Nov 2024)
Yichen He, Yuan Lin, Jianchao Wu, et al.
Yichen He, Yuan Lin, Jianchao Wu, Hanchong Zhang, Yuchen Zhang, Ruicheng LeVideo-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding (22 Sep 2024)
Yan Shu, Peitian Zhang, Zheng Liu, et al.
Yan Shu, Peitian Zhang, Zheng Liu, Minghao Qin, Junjie Zhou, Tiejun Huang, Bo ZhaoOryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution (19 Sep 2024)
Zuyan Liu, Yuhao Dong, Ziwei Liu, et al.
Zuyan Liu, Yuhao Dong, Ziwei Liu, Winston Hu, Jiwen Lu, Yongming RaoVideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs (25 Apr 2024)
Zesen Cheng, Sicong Leng, Hang Zhang, et al.
Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong BingPLLaVA: Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning (25 Apr 2024)
Lin Xu, Yilin Zhao, Daquan Zhou, et al.
Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi FengMovieChat: From Dense Token to Sparse Memory for Long Video Understanding (3 Dec 2023) \ Enxin, Song, et al. \
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (28 Nov 2023) \ Yanwei, Li, et al. \
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models (27 Nov 2023)\ Ning, Munan, et al. \
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models (22 Nov 2023)\ Munasinghe, Shehan, et al. \
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection (16 Nov 2023)\ Lin, Bin, et al. \
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding (14 Nov 2023)\ Jin, Peng, et al. \
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding (5 Jun 2023)\ Zhang, Hang, Xin Li, and Lidong Bing. EMNLP 2023's demo track. \
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos? (31 Jul 2023)\ Zhao, Qi, et al.\
Valley: Video Assistant with Large Language model Enhanced ability (12 Jun 2023)\ Luo, Ruipu, et al.\
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models (8 Jun 2023)\ Muhammad Maaz, Hanoona Rasheed, Salman Khan, et al.\
VideoChat: Chat-Centric Video Understanding (10 May 2023)\ Li, KunChang, et al. \
VideoLLM: Modeling Video Sequence with Large Language Models (22 May 2023)\ Chen, Guo, et al.\
Learning video embedding space with Natural Language Supervision (25 Mar 2023)\ Uppala, Phani Krishna, Shriti Priya, and Vaidehi Joshi.\
3D Understanding
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding (12 Oct 2024)
[NeurIPS 2024] Yunze Man, Shuhong Zheng, Zhipeng Bao, et al.
Yunze Man, Shuhong Zheng, Zhipeng Bao, Martial Hebert, Liang-Yan Gui, Yu-Xiong WangSituation3D: Situational Awareness Matters in 3D Vision Language Reasoning (12 Oct 2024)\ [CVPR 2024] Yunze Man, Liang-Yan Gui, Yu-Xiong Wang \
LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning (30 Nov 2023)
[CVPR2024]Sijin Chen, Xin Chen, Chi Zhang, et al.
[CVPR 2024] Sijin Chen, Xin Chen, Chi Zhang, Mingsheng Li, Gang Yu, Hao Fei, Hongyuan Zhu, Jiayuan Fan, Tao ChenLiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (21 Dec 2023)\ Senqiao Yang, Jiaming Liu, Ray Zhang, et al.\
3D-LLM: Injecting the 3D World into Large Language Models (24 Jul 2023)
[NeurIPS 2023 Spotlight] Yining Hong, Haoyu Zhen, Peihao Chen, et al.
Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang GanPointLLM: Empowering Large Language Models to Understand Point Clouds (31 Aug 2023)
[NeurIPS 2023 Spotlight] Runsen Xu, Xiaolong Wang, Tai Wang, et al.
Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua LinPointCLIP: Point Cloud Understanding by CLIP (31 Aug 2023)
[CVPR 2022] Renrui Zhang, Ziyu Guo, Wei Zhang,, et al.
Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li
Audio Understanding
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action (28 Dec 2023)
Jiasen Lu, Christopher Clark, Sangho Lee, et al.
Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha KembhaviM2UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models (19 Nov 2023)
Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, et al.
Atin Sakkeer Hussain, Shansong Liu, Chenshuo Sun, Ying ShanQwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models (14 Nov 2023)
Yunfei Chu, Jin Xu, Xiaohuan Zhou, et al.
Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, Jingren ZhouSALMONN: Towards Generic Hearing Abilities for Large Language Models (20 Oct 2023)
Changli Tang, Wenyi Yu, Guangzhi Sun, et al.
Changli Tang, Wenyi Yu, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao ZhangMusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models (18 Oct 2023)
Dingyao Yu, Kaitao Song, Peiling Lu, et al.
Dingyao Yu, Kaitao Song, Peiling Lu, Tianyu He, Xu Tan, Wei Ye, Shikun Zhang, Jiang BianLlark: A multimodal foundation model for music (11 Oct 2023)
Josh Gardner, Simon Durand, Daniel Stoller, et al.
Josh Gardner, Simon Durand, Daniel Stoller, Rachel M. BittnerLauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT (7 Oct 2023)
Jiaming Wang, Zhihao Du, Qian Chen, et al.
Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang ZhangImproving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation (29 Sep 2023)
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, et al.
Shih-Lun Wu, Xuankai Chang, Gordon Wichern, Jee-weon Jung, FranΓ§ois Germain, Jonathan Le Roux, Shinji WatanabeConnecting Speech Encoder and Large Language Model for ASR (25 Sep 2023)
Wenyi Yu, Changli Tang, Guangzhi Sun, et al.
Wenyi Yu, Changli Tang, Guangzhi Sun, Xianzhao Chen, Tian Tan, Wei Li, Lu Lu, Zejun Ma, Chao ZhangCan Whisper perform speech-based in-context learning (13 Sep 2023)
Siyin Wang, Chao-Han Huck Yang, Ji Wu, et al.
Siyin Wang, Chao-Han Huck Yang, Ji Wu, Chao ZhangMusic understanding LLaMA: Advancing text-to-music generation with question answering and captioning (22 Aug 2023)
Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, et al.
Shansong Liu, Atin Sakkeer Hussain, Chenshuo Sun, Ying ShanOn decoder-only architecture for speech-to-text and large language model integration (8 Jul 2023)
Jian Wu, Yashesh Gaur, Zhuo Chen, et al.
Jian Wu, Yashesh Gaur, Zhuo Chen, Long Zhou, Yimeng Zhu, Tianrui Wang, Jinyu Li, Shujie Liu, Bo Ren, Linquan Liu, Yu WuAudioPaLM: A Large Language Model That Can Speak and Listen (22 Jun 2023)
Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, et al.
Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, ZalΓ‘n Borsos, FΓ©lix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo VelimiroviΔ, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats, Neil Zeghidour, Yu Zhang, Zhishuai Zhang, Lukas Zilka, Christian FrankHugginggpt: Solving ai tasks with chatgpt and its friends in huggingface (30 Mar 2023)
Yongliang Shen, Kaitao Song, Xu Tan, et al.
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting ZhuangSparks of Artificial General Intelligence: Early experiments with GPT-4 (22 Mar 2023)
SΓ©bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, et al.
SΓ©bastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi ZhangListen, Think, and Understand (18 May 2023)
Yuan Gong, Hongyin Luo, Alexander H. Liu, et al.
Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James GlassSpeechgpt: Empowering large language models with intrinsic cross-modal conversational abilities (18 May 2023)
Dong Zhang, Shimin Li, Xin Zhang, et al.
Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng QiuAudiogpt: Understanding and generating speech, music, sound, and talking head (25 Apr 2023)
Rongjie Huang, Mingze Li, Dongchao Yang, et al.
Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe
π Multimodal LLM Safety
Attack
Jailbreaking gpt-4v via self-adversarial attacks with system prompts. (20 Jan 2024)
Yuanwei Wu, Xiang Li, Yixin Liu, et al.
Yuanwei Wu, Xiang Li, Yixin Liu, Pan Zhou, Lichao SunDefending chatgpt against jailbreak attack via self-reminders. (1 Dec 2023)
Yueqi Xie, Jingwei Yi, Jiawei Shao, et al.
Yueqi Xie, Jingwei Yi, Jiawei Shao, Justin Curl, Lingjuan Lyu, Qifeng Chen, Xing Xie, Fangzhao WuMisusing Tools in Large Language Models With Visual Adversarial Examples (4 Oct 2023)
Xiaohan Fu, Zihan Wang, Shuheng Li, et al.
Xiaohan Fu, Zihan Wang, Shuheng Li, Rajesh K. Gupta, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Earlence FernandesImage Hijacks: Adversarial Images can Control Generative Models at Runtime. (18 Sep 2023)
Luke Bailey, Euan Ong, Stuart Russell, et al.
Luke Bailey, Euan Ong, Stuart Russell, Scott EmmonsUniversal and Transferable Adversarial Attacks on Aligned Language Models (27 Jul 2023)
Andy Zou, Zifan Wang, Nicholas Carlini, et al.
Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, Matt FredriksonPrompt injection attack against llm-integrated applications (8 Jun 2023)
Yi Liu, Gelei Deng, Yuekang Li, et al.
Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, Yang LiuAutomatically Auditing Large Language Models via Discrete Optimization (8 Mar 2023)
Erik Jones, Anca Dragan, Aditi Raghunathan, et al.
Erik Jones, Anca Dragan, Aditi Raghunathan, Jacob SteinhardtPoisoning Web-Scale Training Datasets is Practical (20 Feb 2023)
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, et al.
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tram rExploiting programmatic behavior of llms: Dual-use through standard security attacks. (11 Feb 2023)
Daniel Kang, Xuechen Li, Ion Stoica, et al.
Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori HashimotoIgnore previous prompt: Attack techniques for language models (17 Nov 2022)\ F bio Perez, Ian Ribeiro (NeurIPS 2022 Workshop)\
Universal Adversarial Triggers for Attacking and Analyzing NLP (20 Aug 2019)
Eric Wallace, Shi Feng, Nikhil Kandpal, et al. (EMNLP 2019)
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer SinghAdversarial Examples for Evaluating Reading Comprehension Systems (23 Jul 2017)\ Robin Jia, Percy Liang (EMNLP 2017)\
Defense and Detect
Detecting and correcting hate speech in multimodal memes with large visual language model. (12 Nov 2023)\ Minh-Hao Van, Xintao Wu\
Detecting Pretraining Data from Large Language Models (3 Nov 2023)
Weijia Shi, Anirudh Ajith, Mengzhou Xia, et al.
Weijia Shi, Anirudh Ajith, Mengzhou Xia, Yangsibo Huang, Daogao Liu, Terra Blevins, Danqi Chen, Luke ZettlemoyerJailbreak and guard aligned language models with only few in-context demonstrations (10 Oct 2023)\ Zeming Wei, Yifei Wang, Yisen Wang\
Smoothllm: Defending large language models against jailbreaking attacks. (5 Oct 2023)
Alexander Robey, Eric Wong, Hamed Hassani, et al.
Alexander Robey, Eric Wong, Hamed Hassani, George J. PappasA Watermark for Large Language Models (6 Jun 2023)
John Kirchenbauer, Jonas Geiping, Yuxin Wen, et al. (ICML 2023)
John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, Tom GoldsteinUnsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models (23 May 2023)
Yiting Qu, Xinyue Shen, Xinlei He, et al. (ACM CCS 2023)
Yiting Qu, Xinyue Shen, Xinlei He, Michael Backes, Savvas Zannettou, Yang ZhangTRAK: Attributing Model Behavior at Scale (3 Apr 2023)
Sung Min Park, Kristian Georgiev, Andrew Ilyas, et al.
Sung Min Park, Kristian Georgiev, Andrew Ilyas, Guillaume Leclerc, Aleksander MadryPoisoning Web-Scale Training Datasets is Practical (20 Feb 2023)
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, et al.
Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tram rMitigating Inappropriate Degeneration in Diffusion Models (9 Nov 2022)
Patrick Schramowski, Manuel Brack, Bj?rn Deiseroth, et al. (CVPR 2023)
Patrick Schramowski, Manuel Brack, Bj?rn Deiseroth, Kristian KerstingExtracting Training Data from Large Language Models (15 Jun 2021)
Nicholas Carlini, Florian Tramer, Eric Wallace, et al.
Nicholas Carlini, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, Colin Raffel
Alignment
Direct Preference Optimization: Your Language Model is Secretly a Reward Model (13 Dec 2023)
Rafael Rafailov, Archit Sharma, Eric Mitchell, et al.
Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea FinnRaft: Reward ranked fine tuning for generative foundation model alignment (1 Dec 2023)
Hanze Dong, Wei Xiong, Deepanshu Goyal, et al. (Transactions on Machine Learning Research (TMLR))
Hanze Dong, Wei Xiong, Deepanshu Goyal, Yihan Zhang, Winnie Chow, Rui Pan, Shizhe Diao, Jipeng Zhang, Kashun Shum, Tong ZhangBetter aligning text-to-image models with human preference (22 Aug 2023)
Xiaoshi Wu, Keqiang Sun, Feng Zhu, et al. (ICCV 2023)
Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, Hongsheng LiScalable agent alignment via reward modeling: a research direction (19 Nov 2018)
Jan Leike, David Krueger, Tom Everitt, et al.
Jan Leike, David Krueger, Tom Everitt, Miljan Martic, Vishal Maini, Shane LeggProximal policy optimization algorithms (20 Jul 2017)
John Schulman, Filip Wolski, Prafulla Dhariwal, et al.
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov
Datasets
Goat-bench: Safety insights to large multimodal models through meme-based social abuse. (7 Jan 2024)
Hongzhan Lin, Ziyang Luo, Bo Wang, et al.
Hongzhan Lin, Ziyang Luo, Bo Wang, Ruichao Yang, Jing MaTovilag: Your visual-language generative model is also an evildoer. (13 Dec 2023)
Xinpeng Wang, Xiaoyuan Yi, Han Jiang, et al. (EMNLP 2023 Oral)
Xinpeng Wang, Xiaoyuan Yi, Han Jiang, Shanlin Zhou, Zhihua Wei, Xing XieFigstep: Jailbreaking large vision-language models via typographic visual prompts. (13 Dec 2023)
Yichen Gong, Delong Ran, Jinyuan Liu, et al.
Yichen Gong, Delong Ran, Jinyuan Liu, Conglei Wang, Tianshuo Cong, Anyu Wang, Sisi Duan, Xiaoyun WangQuery-relevant images jailbreak large multi-modal models. (29 Nov 2023)
Xin Liu, Yichen Zhu, Yunshi Lan, et al.
Xin Liu, Yichen Zhu, Yunshi Lan, Chao Yang, Yu QiaoDress: Instructing large vision-language models to align and interact with humans via natural language feedback. (16 Nov 2023)
Yangyi Chen, Karan Sikka, Michael Cogswell, et al.
Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay DivakaranBeavertails: Towards improved safety alignment of llm via a human-preference dataset (7 Nov 2023)
Jiaming Ji, Mickel Liu, Juntao Dai, et al. (NeurIPS 2023)
Jiaming Ji, Mickel Liu, Juntao Dai, Xuehai Pan, Chi Zhang, Ce Bian, Chi Zhang, Ruiyang Sun, Yizhou Wang, Yaodong YangCan pre-trained vision and language models answer visual information-seeking questions? (17 Oct 2023)
Yang Chen, Hexiang Hu, Yi Luan, et al. (EMNLP 2023)
Yang Chen, Hexiang Hu, Yi Luan, Haitian Sun, Soravit Changpinyo, Alan Ritter, Ming-Wei ChangCan language models be instructed to protect personal information? (3 Oct 2023)
Yang Chen, Ethan Mendes, Sauvik Das, et al.
Yang Chen, Ethan Mendes, Sauvik Das, Wei Xu, Alan RitterSafetybench: Evaluating the safety of large language models with multiple choice questions (13 Sep 2023)
Zhexin Zhang, Leqi Lei, Lindong Wu, et al.
Zhexin Zhang, Leqi Lei, Lindong Wu, Rui Sun, Yongkang Huang, Chong Long, Xiao Liu, Xuanyu Lei, Jie Tang, Minlie HuangSafety assessment of chinese large language models (20 Apr 2023)
Hao Sun, Zhexin Zhang, Jiawen Deng, et al.
Hao Sun, Zhexin Zhang, Jiawen Deng, Jiale Cheng, Minlie Huang
3D, Video and Audio Safety
Not My Voice! A Taxonomy of Ethical and Safety Harms of Speech Generators (25 Jan 2024)\ Wiebke Hutiri, Oresiti Papakyriakopoulos, Alice Xiang\
) )
Adv3D: Generating 3D Adversarial Examples in Driving Scenarios with NeRF (4 Sep 2023)\ Leheng Li, Qing Lian, Ying-Cong Chen\
Deepfake Video Detection Using Generative Convolutional Vision Transformer (13 Jul 2023)\ Deressa Wodajo, Solomon Atnafu, Zahid Akhtar\
M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection (19 Apr 2022)\ Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang\
Deepfake Video Detection Using Convolutional Vision Transformer (11 Mar 2021)\ Deressa Wodajo, Solomon Atnafu\
"Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward" (25 Feb 2021)\ Momina Masood, Marriam Nawaz, Khalid Mahmood Malik, Ali Javed, Aun Irtaza\
π Related Surveys
LLM
MM-LLMs: Recent Advances in MultiModal Large Language Models (24 Jan 2024)
Duzhen Zhang, Yahan Yu, Chenxing Li
Duzhen Zhang, Yahan Yu, Chenxing Li, Jiahua Dong, Dan Su, Chenhui Chu, Dong YuA Survey on Multimodal Large Language Models (23 Jun 2023)
Shukang Yin, Chaoyou Fu, Sirui Zhao, et al.
Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong ChenMultimodal Large Language Models: A Survey (22 Nov 2023)
[IEEE BigData 2023] Jiayang Wu, Wensheng Gan, Zefeng Chen, et al.
Jiayang Wu, Wensheng Gan, Zefeng Chen, Shicheng Wan, Philip S. YuA Survey of Large Language Models (31 Mar 2023)
Wayne Xin Zhao, Kun Zhou, Junyi Li, et al.
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
Vision
Autoregressive Models in Vision: A Survey (8 Nov 2024)
Jing Xiong, Gongye Liu, Lun Huang, et al.
Jing Xiong, Gongye Liu, Lun Huang, Chengyue Wu, Taiqiang Wu, Yao Mu, Yuan Yao, Hui Shen, Zhongwei Wan, Jinfa Huang, Chaofan Tao, Shen Yan, Huaxiu Yao, Lingpeng Kong, Hongxia Yang, Mi Zhang, Guillermo Sapiro, Jiebo Luo, Ping Luo, Ngai WongState of the Art on Diffusion Models for Visual Computing (11 Oct 2023)
Ryan Po, Wang Yifan, Vladislav Golyanik, et al.
Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias NieΓner, BjΓΆrn Ommer, Christian Theobalt, Peter Wonka, Gordon WetzsteinDiffusion Models in Vision: A Survey (10 Sep 2022)
[TPAMI 2023] Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, et al.
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
π¨βπ» Team
Here is the list of our contributors in each modality of this repository.
| Modality/Task | Contributors | | ----------------------------- | -------------------------------------------------------------------- | | Image Generation | Jingye Chen, Xiaowei Chi, Yingqing He | | Video Generation | Yingqing He, Xiaowei Chi, Jingye Chen | | Image and Video Editing | Yazhou Xing | | 3D Generation and Editing | Hongyu Liu | | Audio Generation and Editing | Zeyue Tian, Ruibin Yuan | | LLM Agent | Zhaoyang Liu | | Safety | Runtao Liu | | Leaders | Yingqing He, Zhaoyang Liu |
π Citation
If you find this work useful in your research, Please cite the paper as below:
bib
@article{he2024llms,
title={LLMs Meet Multimodal Generation and Editing: A Survey},
author={He, Yingqing and Liu, Zhaoyang and Chen, Jingye and Tian, Zeyue and Liu, Hongyu and Chi, Xiaowei and Liu, Runtao and Yuan, Ruibin and Xing, Yazhou and Wang, Wenhai and Dai, Jifeng and Zhang, Yong and Xue, Wei and Liu, Qifeng and Guo, Yike and Chen, Qifeng},
journal={arXiv preprint arXiv:2405.19334},
year={2024},
}
βοΈ Star History
Owner
- Name: Cassie
- Login: YingqingHe
- Kind: user
- Company: HKUST
- Repositories: 8
- Profile: https://github.com/YingqingHe
Ph.D. student @ HKUST; Contact: yhebm@connect.ust.hk
Citation (citation.json)
{
"da0d382c7fa981ba185ca633868442b75cb76de6": 0,
"d4b1a1c62a03ccffcf24983eb4fe22335cbb89b6": 1,
"60d3ade5c0085f5de1f5ab944cc058c78706ac66": 0,
"ea6982a936a2b263bbf46ff6eb27fc0b63fddaf7": 6,
"e49cb2ab3a7990e3d05042197ae8b3fd934453de": 1,
"cb2295766b2f8f35524f6a9f93ae39d948d50bd4": 0,
"f30bb09dbd95845d792bdac217a9a652635ee8a5": 1,
"78582ad19779a69d97b797a3c6eb2397f99398b6": 5,
"22d55c52f43f59634586ab95fefbb7dba8c8b190": 0,
"d16f72b7be526dee5eb49e5afffeea2bddba5e66": 5,
"8441c30ad4abdca9ee380aa6f22ffd731b10231b": 1,
"1c6e2a4da1ead685a95c079751bf4d7a727d8180": 1,
"769a924d0af014acec326f50c15c5d70d258a969": 2,
"42c4315b5d2e33d7d9a0afdf84e6a47ccd7a700e": 4,
"59d716b442ab760a78f58de6748c0fa1d507bfc1": 1,
"58b77dc0603eb52559d98a383bf9649fd31d0bc5": 4,
"4cb2c262ce34f41974f1b1623fc5a6e32956ded3": 4,
"03bf1da1caa5f63203d43ed78c12c35a78fc6ed9": 3,
"1d14a708622917da4b9820ada6d32af24fc1651a": 6,
"7f1ba5630c3baa09b11cc665b3f71cdb117e5ffb": 3,
"f669d7a6fab0147253178a6fc854e05e3d92fb3f": 6,
"cfee1826dd4743eab44c6e27a0cc5970effa4d80": 146,
"e7d09b6f2bc878cf2c993acf675f409d0b55f35a": 26,
"5ba1525dc6d382ee0a4a1ca3c64fc5907ca64c67": 19,
"819f477065088220a6f706cd9ef76dbcb4b4c134": 5,
"c1e450284e7d6cac1855330a1197df8537df653f": 48,
"0d38f1edac66b4645cf5fa05abaf9d92cba5d5d3": 1,
"7b689adb8c156d6158660f90d1c86888ee281f63": 38,
"366564d210768814bc880e391b909cfbd95f8964": 2,
"fa75a55760e6ea49b39b83cb85c99a22e1088254": 94,
"7d78238a9bad60433d616abdd93c735087d99670": 10,
"40298b8d50109c52fc10763eddc64a07cf8acb31": 30,
"94053805cd59f2e9a47fe3f080c7e7afefb337cc": 43,
"376f494126d1ea4f571ea0263c43ac2b6331800a": 12,
"3a79545719fb193a6b4042ef7d1d87cfd267be06": 20,
"6fb5c0eff3696ef252aca9638e10176ecce7cecb": 73,
"66d755730f5d08a6f4fcc5e81f24982ba389dca9": 31,
"9837349417e36ef5be06da0fd6c74042148bdaa2": 24,
"e9ae0c76a71b8f302eb17b1c4462b9cc97d87cd0": 39,
"43a55dbd95c9d5cd82de8db276f41adeec4a937d": 7,
"9f411fda2ad5b141a3115f707bcf5ee865b3fb94": 41,
"972501b057e2b84d6ce6506f70bcac697bab7872": 10,
"6173520a1eb2814d067e8c5fd16212b7cbf6ee78": 35,
"8fafd95a6ffbecf9c1b5f4542ac4b78a00602551": 29,
"e779781f1bea273573fc9d3f1a5e874bcff2cd2b": 12,
"839dc73c1adae268144d9cfb9d70985b2001304f": 9,
"144eca44e250cc462f6fc3a172abb865978f66f5": 307,
"5b19bf6c3f4b25cac96362c98b930cf4b37f6744": 1090,
"9695824d7a01fad57ba9c01d7d76a519d78d65e7": 2844,
"c10075b3746a9f3dd5811970e93c8ca3ad39b39d": 5641,
"d47524cd5c3c4b57af2e5a29f6f91c420310f236": 78,
"5fbe4c92791fbecb179c1ab79bba9a59b2e155ba": 12,
"f9570989919338079088270a9cf1a7afc8db8093": 103,
"a5036f31f0e629dc661f120b8c3b1f374d479ab8": 820,
"df958800014d310b6df34ad83d771314d68fbb2d": 64,
"fbfef4723d8c8467d7bd523e1d0b703cce0e0f9c": 261,
"e5c8960eb2ec034ffbd353ef39fd1cb541d3c7c9": 1288,
"26218bdcc3945c7edae7aa2adbfba4cd820a2df3": 1483,
"037bab9d26ef7da11ee32d7682836604d2cc8a72": 70,
"b668ce936cff0b0ca8b635cd5f25a62eaf4eb3df": 756,
"98e565fa06f6c7bf7c46833b5106b26dc45130c4": 186,
"394be105b87e9bfe72c20efe6338de10604e1a11": 617,
"141a5033d9994242b18bb3b217e79582f1ee9306": 2120,
"71b7178df5d2b112d07e45038cb5637208659ff7": 33630,
"8e080b98efbe65c02a116439205ca2344b9f7cd4": 1192,
"c1b5195bc09a2232ec2b69e5a2a6bd39b3162c62": 1,
"fc84fcf269a37ed7ddcb1b0f2d7d1a00f677eaea": 1,
"4599d5af850da482f591a02a3b17d56e0d358771": 1,
"0c4f46e4dcae5527018e6432fb60cfe8c3354e97": 18,
"9cdb7e415a96795dc6705e66f3b798238b4dec2c": 0,
"985f0c89c5a607742ec43c1fdc2cbfe54541cbad": 15,
"87bf66eb6d22df17f70170a0e575b4f12c4813ef": 10,
"16753e0317730e8c1b297338300a8c6163dd06f2": 12,
"120aca3e415b6641a0b0cd20695ab85ed7789612": 24,
"d0a7f7fe31e0e0c42b471b4c47a313bd8c8e5206": 6,
"b1750d2a6e3480e690999916a86c8b3876577b39": 15,
"8f2d4758e6d525509ae36bb30224dc9259027e6b": 110,
"21a77ed349c8621d0a0ef8407eb744e3de3b13c5": 0,
"4e9a8141da2a8c603722b07d096109207f8e0b66": 7,
"1206b05eae5a06ba662ae79fb291b50e359c4f42": 51,
"1891c3756f870d902a0b793a1dcd5cc34c778612": 38,
"d831988859f0c077b38094446d8585a8340af223": 7,
"77040969110fab39a55699cb06f9edf68789445a": 18,
"52b10ae66d025e99fbb602935e155f97f4f0696f": 31,
"ee73edebd42626d9c2d91e35fd2ed3cdb0fb26d0": 53,
"26c6090b7e7ba4513f82aa28d41360c60770c618": 108,
"369b449415d50387fba048bbd4d26ee890df84b5": 36,
"50bbf2c11984d18aa14f964a4909ac25f07e50ea": 43,
"aa1b722485106c84e52c5e35b2d4b2f8c7fb3135": 48,
"484d2194ce8459bfa9da906e556f63812c6ca999": 11,
"e1a3e6856b6ac6af3600b5954392e5368603fd1b": 84,
"90357a6dc817e2f7cec477a51156675fbf545cf1": 277,
"bac87bdb1cabc35fafb8176a234d332ebcc02864": 585,
"da55208bc9b56b5f394c242239d8cd0734bd5a87": 42,
"9311779489e597315488749ee6c386bfa3f3512e": 847,
"28b74bb7c8b08cceb2430ec2d54dfa0f3225d796": 373,
"f56cb5dc32b5b280546998418fda7769d0858629": 223,
"96dd1fc39a368d23291816d57763bc6eb4f7b8d6": 934,
"154c22ca5eef149aedc8a986fa684ca1fd14e7dc": 278,
"b8e2e9f3ba008e28257195ec69a00e07f260131d": 1395,
"816792e66f463be2aa1888e4ecb51f8fb2b4dd79": 0,
"1dbc2cdcae3e17c3d721d12a5a2d98ced727681a": 2,
"4673c2ac4abb4b055da87171231acb60801ffe74": 0,
"09157a8c0e7d7263ac035690118ddcbe295cee5c": 0,
"d212fa27f5868f0fd106e1a7bba908fd47da0816": 47,
"588930cdd801f335b5e524d13f99aa94136a20a0": 7,
"e88d5399956c9d9519a5cfd49308b7d439167543": 3,
"daf3b117f789b2b95223e58592979fb57627515e": 17,
"1e8403af2e1e7a8f803d8df9e8daac584f99c2a0": 25,
"438e9fb79c9e37d43223e61bb575ebd2dae0b0a7": 29,
"4e21879b564cc2e803b16edf0dda9f1edb91b497": 13,
"6f709278506813d04a074e6fa20188cce9bb927b": 12,
"85a70c0a048cba4f53dcf332ee73f6032a2e53bc": 14,
"cc1a674bb164d09a060cf5b26fe518c02fae0ddc": 99,
"c5e9fd131cde68c218d0ea69cd617a67c7f35d42": 230,
"95aa6fa4e42387561cff22378348d528adea37f2": 55,
"0cbb518c364067200476a51e5ce7476a4f582770": 202,
"b19ca192a5bebbc3473be61989baf085ff21daa5": 50,
"5356c3dac654854a0842753bcc2e3433dc4a2afd": 81,
"793939b83e10903f58d8edbb7534963df627a1fe": 212,
"bdf4af8311637c681904e71cf50f96fd0026f578": 466,
"fc011ed5ee986332523a62d2783adee1179dc1ed": 238,
"4c94d04afa4309ec2f06bdd0fe3781f91461b362": 908,
"738e3e0623054da29dc57fc6aee5e6711867c4e8": 197,
"03e1c3b5fdad9b21bbed3d13af7e8d6c73cbcfa6": 345,
"f8bf2225a2993e3ead73d886b5797378d6e53186": 18,
"2c6392491b6a942e08db46c8fff0ef5ba1fd9de8": 67,
"d15b27edf3630728cdb40f49946365d9011641cf": 222,
"44e49f72fb6b97f52c25a30f0adc68c2384430ba": 47,
"8941e477b2f39eb92712f04400412da60d349ec1": 167,
"e82df4b6a3628501fce67835ad8316d6525ad133": 140,
"1b90e9e9734bed6b379ae87d688cb3b887baf597": 70,
"1b31dbf44e68b698120552366df03e6e35a1e428": 230,
"e7c8a74423a5811a3aac5f33001fce32d2e2386c": 0,
"48494aa30f35a64858644aba839c8cba38c0cf2a": 0,
"14191e9f12913ad8c7ac6e1188682afac04aad09": 0,
"c1dd77e48dd615ee6881b2cc876a00a92cae6eac": 2,
"6c64ddd2190909de2c680dd18abc9b92e80c39f9": 8,
"2841d684461d087bbb0176af7cb4c6c84fdf2225": 2,
"ffa05cb5504ba08254f498223f613b3ebcf87692": 13,
"7b22ecd9f1ced58c1704ac6191e029b98054e330": 10,
"3efb81de24eb88017d6dbcf22cb4215084223fd8": 69,
"ad22af138fa1d1490cda0301abf8159a7c30c5a2": 35,
"5cac6430bd379c9d2fe13137dfd6ae7721a2679f": 76,
"574beee702be3856d60aa482ec725168fe64fc99": 1407,
"f124ae1e4663359193be32adb37b07b3252d5329": 6,
"42239e71a712d70cd24e06ffc0cf0d22fc628a36": 3,
"cca4218dd7c10c1614bbd84aa7cd7e00027bdc7c": 3,
"beaf64df85f8204b8cd89a7f46827608e6d16922": 2,
"74bfbbb7307a7af2686043ea97ab8b34cb062ba8": 15,
"8c870bef01a4fbb20f60722ffc2f6bee3870b18b": 232,
"aa7bcd1f9453c9096ec78900a7b94e816ed0e1c5": 9,
"63aad36dc981348493be6743292a04234b29ba4e": 2,
"4cc8e18f5eece0b0d8e1abcb8ee10fb33680fbb2": 71,
"83d4b22d803ae856cf6b308482bd504fa151d39e": 10,
"67dea28495cab71703993d0d52ca4733b9a66077": 477,
"8bc617c9139648d7a92991d70c671230bac7b2e2": 83,
"f51bc74814a3452009ea5ca262d9768d08149ee6": 51,
"d1120d67b700e4dfe8b39eb1e48fbdea4e1a0c43": 413,
"c2f91f35df893714418cc29096083dce0b441229": 203,
"428854d9e75f94f0e61f37c6887c77800437d516": 171,
"e99b45179686982401d2d6ec919e42b327f04c0b": 2,
"9de3ac21af795dac56f6031e73db8198716bb352": 103,
"66831f683141c11ed7e20b0f2e8b40700740c164": 316,
"f59c038dee828e0a8c2fc28130d12e39ee4952d6": 449,
"23037085b0815455e6d47333089b925c8c0e21d5": 102,
"2789b6c84ba1422746246685001accba5563e7c1": 555,
"2603a68b4503ba949c91c7e00cd342624b4aae2f": 331,
"5ba2218b708ca64ab556e39d5997202e012717d5": 2338,
"34038d9424ce602d7ac917a4e582d977725d4393": 4815,
"8a1384e041cc6ea2735b01c734aeef666dc92884": 214,
"388b0f44faf0a14cc402c2554ec36a868cf59129": 1,
"5bcb0153dd0840113eb27d4d6f753414ef656a03": 8,
"092245d86b77181c36f972b1b7a17a59cd989c4a": 11,
"5a185965ad1e87367d044b47043706d00b85b007": 1,
"372bc41602bbd21f192305775f0a58de9880e454": 28,
"af997821231898a5f8d0fd78dad4eec526acabe5": 337,
"a2d2bbe4c542173662a444b33b76c66992697830": 582,
"198b3d809594a76bc473927af37b858132ac7fdd": 1,
"05eb2ad3af471c05a24abbf70258688e579cdf22": 0,
"737ad8905228cd410e3342b5cceefd4feb57d166": 9,
"2cfaa5b3571d3b75f040f6d639359a3c673f5561": 36,
"6e5760e5d4b468bbf01a95a6f64bd65c3aa3d798": 3,
"f4c62aa336de45273e0fdfcfbd65b3c2e552ad56": 11,
"85963807c11abe38e9a2797d9860e012238607ef": 102,
"c614a4da924466f62ca39002af425c9d14d240a3": 0,
"daf61010eee0fbf6f9bab7db71c395ffca6f3ff3": 159,
"a6ad30123bef4b19ee40c3d63cfabf00d211f0ef": 69,
"387144d293567408c363313aac971294e7ec8547": 1,
"b000d6865db824af1563708fb7a545ddd65c6b3a": 224,
"23e261a20a315059b4de5492ed071c97a20c12e7": 496,
"90428f3a8caa5082f825ebf3138514ddf273dae3": 327,
"04e541391e8dce14d099d00fb2c21dbbd8afe87f": 717,
"064ccebc03d3afabaae30fe29a457c1cfcdff7e3": 208,
"8f8dedb511c0324d1cb7f9750560109ca9290b5f": 318,
"f671a09e3e5922e6d38cb77dda8d76d5ceac2a27": 604,
"e8bbffb8413cb1f88e99a7ecbabd21a6eac82271": 2,
"205d2ed0906440f07a0275d7d6a63bced60951fc": 6,
"8819777e104f8c4197c262e11a01b070b50007aa": 20,
"05cbac9a5101f47a6fabad72398616506572c9fa": 41,
"c2d65fc3a7fde3f7662c6ef9448e5737d7e5551f": 28,
"4761f173965195798cd3046ef4af608a83504e4d": 80,
"1e09b83fe064826a9a1ac61a7bdc00f26be41aee": 74,
"14acc36d8c87f31f8dcbbf8433b91af70a2a516a": 0,
"5f51eda9f7abddca027941d50fb0b6bf6f508eff": 13,
"32a3c2fbd3e733bd0eea938517fec2ff8dc7c701": 92,
"14ccb8bcceb6de10eda6ad08bec242a4f2946497": 133,
"6283502d6900a0b403e2454b1cb1cf16ddefd5a7": 81,
"9758ddd6ffbaac75aa0447a9664e6989811a05e2": 107,
"1367dcff4ccb927a5e95c452041288b3f0dd0eff": 275,
"81349524489f8ba0812ac2529eac92ec45959782": 9,
"496bdd2804a231a3336463fca8e0a4c6a46f0304": 1,
"bf7f31e07d9b128a0f555c275bc3fdb851f725b8": 9,
"222c47b81fe04598fd84fe8b9a43f694415ec7e9": 44,
"4974186c3b5b50112cfd909de115d5fbe25411fd": 21,
"26c22380282a00166273038bc5ba785d845d61ad": 131,
"029f3e2c215edac138be26ade67b3d70b8f74dd7": 39,
"6ebec1ece44daa090158ff2531d6fabb94a4e683": 27,
"3a5d4352d3dd53148a9544233bb59f88d2504910": 12,
"f21e8eddf42580d1f38a11ec5acd8891c0454a1f": 31,
"0483be6c3ec6cd41ffe248f86effc7468d3ac7be": 244,
"c020f15be1dee20f9e2e0c5a6f05f272b5508325": 7,
"288e7224d53d68669eb67f2496e068dc965c639e": 11,
"54c68b8623505dc6bf7a0b08aaa77ca9165f2d7f": 33,
"e2f1f04f648a8863d11439aa4c80ee65d6caccda": 6,
"54a8b153ed04a872da878d695239bdc413dc782c": 40,
"6a33e58ef961a3a0a5657518b2be86395eb7c8d0": 10,
"486c2df78cbb770a90a55f7fa3fe19102fba2c24": 5,
"2313afae52d98e569da2dedbf14daf9efc74e7cf": 51,
"1ddbd08ad8cf22a5c66c4242194c4286328533bf": 97,
"0f8d12775a4685575f1489796b5dee9e11fbdfb5": 7,
"bcac614f9774488447221ebb4f16f05e3975ec1e": 5,
"fc6a2f7478f68adefd69e2071f27e38aa1647f2f": 69,
"42a30dc5470f54ec249f25d3c31e05d7c376c8e3": 134,
"8bd6a2a89503be083176f2cc26fabedb79238cbd": 474,
"ca6a2bc279be5a3349a22bfd6866ed633d18734b": 649,
"6f9b7c8cde1be2e62a503c31cac883c6d44c9d0d": 16,
"b037bb09aa162d8a543e64ec777ca0edc732d2af": 1,
"4edbb942c2d20a6f5a4e3caa763a9761be953231": 1,
"107fb6eec2febbae12db29bf3e311aaf5680027c": 19,
"aad3d2e690f6c73f04a14622ceff51464bbc560e": 11,
"5d321194696f1f75cf9da045e6022b2f20ba5b9c": 176,
"6024f320e0a5b9b8fc29b86903aa9a96956b26dd": 6,
"4c4d176c6e28f48041f215d563f6ee8633534cff": 42,
"bf7025a2e5dbb3c09deae02a1aa98a256ca559e2": 107,
"d48cb91b9e555194f7494c4d4bb9815021d3ee45": 133,
"f9bfc6d9ba1665b73af3323d46c7642b852759ef": 33,
"4e54a45d2118b61ae1baec07308af3fdd2c48759": 1,
"5edf706467dc76cd09319592d18db0ad4e1fb64d": 6,
"7637ed79d30d0139901175ae4abedd822c217ab4": 48,
"6bcc6ab9c28805d4067e99b2cdc7524550fe80e1": 31,
"f3ce9ba3fcec362b70263a7ed63d9404975496a0": 215,
"f90595f99a0c66d2bb6d0f230f17c7cd8c58f44d": 14,
"f72be31de9f9a09d4410fd38bc717efe43444827": 9,
"91eaf9752002f6999a3bd8d33303cdcdbd135317": 4,
"8f0a24d1678e4d0e584b0932196cd257d5c53c7d": 2,
"5596bd3e26ec2207666ec1ff3db4415d212f14b9": 11,
"3a944ddba8b6fbaaac36126fc955f181f8b8b06a": 8,
"a33b437618be733fea7176bd98e18b6362af0838": 11,
"8e1868f84091272544cb4209c4ccaad7cc88af27": 24,
"4bb0b12803791764d641a4cef1e0ce39cf049542": 33,
"18a8b97d75a87e8fef07542d8875d4a62b553744": 8,
"e762f92273cd96f63b7788c0173b9b6450adedd7": 32,
"ac5b4df0e398ca48388330ac5c795b6fe708793c": 6,
"5bdaadb84db0cbf72aaebda9f55f4288b63c6e9b": 22,
"47030369e97cc44d4b2e3cf1be85da0fd134904a": 309,
"db4cf9f6a653d5c15973e836c800ea47743251ae": 77,
"2f94f03fdac62d05f0f416b7b3855d1f597afee9": 64,
"2cf43a61d0937ad25f23eaef7c90253ab799b3c7": 61,
"0cf694b8f85ab2e11d45595de211a15cfbadcd22": 94,
"9716a2876d08fce9d8e5c5ba4d7b1a9af44806d6": 151,
"18a1c21f35153c45d0ef30c564bffb7d70a13ccc": 613,
"ffb949d3493c3b2f3c9acf9c75cb03938933ddf0": 1445,
"60f4dc690ea42fb77b04fc685e9d9c3a1e209319": 5,
"3422d5e0cdfdc935d6a84a1e3d3f96659265fe3a": 20,
"6b135e922a0c673aeb0b05c5aeecdb6c794791c6": 43,
"8cf9b49698fdb1b754df2556576412a7b44929f6": 46,
"cb5b71a622aff47014d4f28a958679629a8b6363": 199,
"c9e548d72f5ad72215025602be36f72042219baf": 29,
"4f2ae5fa2dc74af9c36ee57b359a4b3241006a92": 32,
"0231f2aed9a96cb516242fb57f2cb63f5651c4d8": 76,
"df7d26339adf4eb0c07160947b9d2973c24911ba": 1005,
"0d1c76d45afa012ded7ab741194baf142117c495": 450,
"3ab661db57d924f4ff1706e05ac807873ca00e0a": 116,
"14c3cf58192774b9b6fc6188df99efd6ab5fc739": 44,
"c6f913e4baa7f2c85363c0625c87003ad3b3a14c": 244,
"dce6f9d4017b1785979e7520fd0834ef8cf02f4b": 12282,
"d98aa44f79fe798ad5ff0cac6e7bf32ee30bd156": 4,
"10280c290825fc0b0c884e988f4f1dedb80e4e80": 1,
"b78b5ce5f21f46d8149824463f8eebd6103d49aa": 7,
"1a5a79b393b3f00eb5a47243ee031ad799d2f641": 5,
"391eaeb1092c2b145ff0e5a2fa61637a42921fce": 9,
"92930ed3560ea6c86d53cf52158bc793b089054d": 56,
"f890b4dfe915174b23db909b07c515d465eaeff2": 19,
"2403c8e72a90d9c778970fc0812ecdcc58800c5d": 14,
"9b9a4fa3ed510fc6eb1bf831979235f3d9f8b556": 13,
"59fc49dfd81b92661437eaf7e339c0792ccd8755": 33,
"daa6a6b2c495d002d72075c6203c98061d1e35f9": 2,
"86301139cc02eb53247e63fca91b916348591505": 0,
"21e0858665cddf51689fc680f72ec4e00b68ae04": 132,
"e8f1c51c4e881345c0588bec8aa8bc6d9164a535": 123,
"a050c9b0c321839e4427ab9defa3463be7825ac4": 4,
"ebedc4d7a2356090904baba4104ef0832bc236df": 114,
"52941cadbd340344f3e0a6f50719fe55b3de5088": 14,
"c61d54644e9aedcfc756e5d6fe4cc8b78c87755d": 875,
"6487ec82f6d8082a5b402a5416ea03009acb1679": 27,
"efa1647594b236361610a20d507127f0586a379b": 371,
"210e63599d49abdb848a4440d4244cdcdedeadff": 5,
"fe1b3f0d074974ce946f10f3bbf52e8351bc0156": 5,
"6d10f9b0e0a579a1359df7dfbdef00bc798d5714": 3,
"d8aaed01dffc621488aecbb0ef01b50f86e44bc1": 12,
"86b5318b0a69ccdeec17abb0120e4bd7688a4b59": 57,
"fec17239569efd6914f0df9e25b66b310969d3c5": 23,
"2b94785cbfd865a01cc68d7d4c7500b710e5e2fb": 30,
"c15380dcda5a010827e3b014dcebe95b1218c680": 4
}
GitHub Events
Total
- Issues event: 3
- Watch event: 166
- Issue comment event: 4
- Push event: 19
- Pull request event: 3
- Fork event: 11
Last Year
- Issues event: 3
- Watch event: 166
- Issue comment event: 4
- Push event: 19
- Pull request event: 3
- Fork event: 11