Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
(Source: Make-A-Video, SimDA, PYoCo, Video LDM and Tune-A-Video)
- [News] The Chinese translation is available on Zhihu. Special thanks to Dai-Wenxun for this.
Methods | Task | Github |
---|---|---|
GEN-2 | T2V Generation & Editing | - |
ModelScope | T2V Generation | |
ZeroScope | T2V Generation | - |
T2V Synthesis Colab | T2V Genetation | |
VideoCraft | T2V Genetation & Editing | |
Diffusers (T2V synthesis) | T2V Genetation | - |
AnimateDiff | Personalized T2V Genetation | |
Text2Video-Zero | T2V Genetation | |
HotShot-XL | T2V Genetation | |
Genmo | T2V Genetation | - |
Fliki | T2V Generation | - |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
CelebV-Text: A Large-Scale Facial Text-Video Dataset | - | CVPR, 2023 | ||
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation | - | May, 2023 | ||
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation | - | - | May, 2023 | |
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions | - | - | Nov, 2021 | |
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval | - | - | ICCV, 2021 | |
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language | - | - | CVPR, 2016 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild | - | - | Dec., 2012 | |
First Order Motion Model for Image Animation | - | - | May, 2023 | |
Learning to Generate Time-Lapse Videos Using Multi-Stage Dynamic Generative Adversarial Networks | - | - | CVPR,2018 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
CVPR 2023 Text Guided Video Editing Competition | - | - | Oct., 2023 | |
EvalCrafter: Benchmarking and Evaluating Large Video Generation Models | - | - | Oct., 2023 | |
Measuring the Quality of Text-to-Video Model Outputs: Metrics and Dataset | - | - | Sep., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
FreeNoise: Tuning-Free Longer Video Diffusion Via Noise Rescheduling | Oct, 2023 | |||
ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation | Oct, 2023 | |||
LLM-grounded Video Diffusion Models | - | - | Oct, 2023 | |
Free-Bloom: Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator | - | NeurIPS, 2023 | ||
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis | - | - | Aug, 2023 | |
Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation | - | May, 2023 | ||
Text2video-Zero: Text-to-Image Diffusion Models Are Zero-Shot Video Generators | Mar., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
DisCo: Disentangled Control for Referring Human Dance Generation in Real World | Jul., 2023 | |||
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model | - | - | Aug., 2023 | |
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion | Apr., 2023 | |||
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos | Apr., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Motion-Conditioned Diffusion Model for Controllable Video Synthesis | - | Apr., 2023 | ||
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory | - | - | Aug., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion | - | - | ICCV, 2023 | |
Generative Disco: Text-to-Video Generation for Music Visualization | - | - | Apr., 2023 | |
AADiff: Audio-Aligned Video Synthesis with Text-to-Image Diffusion | - | - | CVPRW, 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Make-It-4D: Synthesizing a Consistent Long-Term Dynamic Scene Video from a Single Image | - | - | MM, 2023 | |
Generative Image Dynamics | - | Sep., 2023 | ||
LaMD: Latent Motion Diffusion for Video Generation | - | - | Apr., 2023 | |
Conditional Image-to-Video Generation with Latent Flow Diffusion Models | - | CVPR 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity | May, 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation | Jul., 2023 | |||
Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance | Jun., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
VideoComposer: Compositional Video Synthesis with Motion Controllability | Jun., 2023 | |||
NExT-GPT: Any-to-Any Multimodal LLM | - | - | Sep, 2023 | |
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images | - | Jun, 2023 | ||
Any-to-Any Generation via Composable Diffusion | May, 2023 | |||
Mm-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation | - | CVPR 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
Video Probabilistic Diffusion Models in Projected Latent Space | CVPR 2023 | |||
VIDM: Video Implicit Diffusion Models | AAAI 2023 | |||
GD-VDM: Generated Depth for better Diffusion-based Video Generation | - | Jun., 2023 | ||
LEO: Generative Latent Image Animator for Human Video Synthesis | May., 2023 |
Title | arXiv | Github | WebSite | Pub. & Date |
---|---|---|---|---|
VDT: An Empirical Study on Video Diffusion with Transformers | - | May, 2023 |
Title | arXiv | Github | Pub. & Date |
---|---|---|---|
LDMVFI: Video Frame Interpolation with Latent Diffusion Models | - | Mar., 2023 | |
CaDM: Codec-aware Diffusion Modeling for Neural-enhanced Video Streaming | - | Nov., 2022 | |
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos | - | - |
Title | arXiv | Github | Website | Pub. & Date |
---|---|---|---|---|
Video Diffusion Models with Local-Global Context Guidance | - | IJCAI, 2023 | ||
Seer: Language Instructed Video Prediction with Latent Diffusion Models | - | Mar., 2023 | ||
Diffusion Models for Video Prediction and Infilling | TMLR 2022 | |||
McVd: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | NeurIPS 2022 | |||
Diffusion Probabilistic Modeling for Video Generation | - | Mar., 2022 | ||
Flexible Diffusion Modeling of Long Videos | May, 2022 | |||
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models | May, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation | - | - | Sep, 2023 | |
MagicEdit: High-Fidelity and Temporally Coherent Video Editing | - | - | Aug, 2023 | |
Edit Temporal-Consistent Videos with Image Diffusion Model | - | - | Aug, 2023 | |
Structure and Content-Guided Video Synthesis With Diffusion Models | - | ICCV, 2023 | ||
Dreamix: Video Diffusion Models Are General Video Editors | - | Feb, 2023 |
Title | arXiv | Github | Website | Pub. & Date |
---|---|---|---|---|
StableVideo: Text-driven Consistency-aware Diffusion Video Editing | ICCV, 2023 | |||
Shape-aware Text-driven Layered Video Editing | - | - | CVPR, 2023 | |
SAVE: Spectral-Shift-Aware Adaptation of Image Diffusion Models for Text-guided Video Editing | - | May, 2023 | ||
Towards Consistent Video Editing with Text-to-Image Diffusion Models | - | - | Mar., 2023 | |
Edit-A-Video: Single Video Editing with Object-Aware Consistency | - | Mar., 2023 | ||
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation | ICCV, 2023 | |||
ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing | May, 2023 | |||
Video-P2P: Video Editing with Cross-attention Control | Mar., 2023 | |||
SinFusion: Training Diffusion Models on a Single Image or Video | Nov., 2022 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions | - | - | May, 2023 | |
Collaborative Score Distillation for Consistent Visual Synthesis | - | - | July, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet | - | July, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
Speech Driven Video Editing via an Audio-Conditioned Diffusion Model | - | - | May., 2023 | |
Soundini: Sound-Guided Diffusion for Natural Video Editing | Apr., 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
CCEdit: Creative and Controllable Video Editing via Diffusion Models | - | - | Sep, 2023 | |
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts | May, 2023 |
Title | arXiv | Github | Website | Pub. Date |
---|---|---|---|---|
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing | - | Oct., 2023 | ||
INVE: Interactive Neural Video Editing | - | Jul., 2023 | ||
Shape-Aware Text-Driven Layered Video Editing | - | Jan., 2023 |
If you have any suggestions or find our work helpful, feel free to contact us
Homepage: Zhen Xing
Email: zhenxingfd@gmail.com
If you find our work useful, please consider citing it:
@article{vdmsurvey,
title={A Survey on Video Diffusion Models},
author={Zhen Xing and Qijun Feng and Haoran Chen and Qi Dai and Han Hu and Hang Xu and Zuxuan Wu and Yu-Gang Jiang},
journal={arXiv preprint arXiv:2310.10647},
year={2023}
}