Project page | Paper | Data
This repo contains the official implementation for the paper "GenFusion: Closing the loop between Reconstruction and Generation via Videos".
conda env create --file environment.yml
conda activate genfusion
cd Reconstruction
CC=gcc-9 CXX=g++-9 pip install submodules/simple-knn
CC=gcc-9 CXX=g++-9 pip install submodules/diff-surfel-rasterization
The generation model is finetuned from DynamiCrafter.
Step 1. Download the DL3DV Renderings dataset
Step 2. Download pretrained models via Hugging Face, and put the model.ckpt with the required resolution in checkpoints/dynamicrafter_512_v1/model.ckpt.
Step 3. Run the following command to start training on a resolution of 512x320.
cd ./GenerationModel
sh configs/training_video_v1.0/run_interp.sh
Step 4. Finetuned the model on a higher resolution of 960x512.
sh configs/training_960_v1.0/run_interp.sh
If you want to use the model for generation inference, you can download our pre-trained model from here.
If your skip the generation model training, you can download our pre-trained model from here and put it in the ./diffusion_ckpt
folder
Step 1. The testing scenes in our paper is selected from DL3DV Benchmarkand DL3DV dataset. Download scenes from Huggingface and put them in the ./data
folder.
Step 2. To get the quantitative results in our paper, you can run the following command.
cd ./Reconstruction
python genfusion_scripts/batch_ours.py [gpu_ids]
To run a masked 3D reconstruction on your own scene, you can use the following command.
cd ./Reconstruction
python train.py --data_dir [data_dir] \
-m [output_dir] \
--iterations 7_000 \
--test_iterations 7_000 \
--diffusion_ckpt ./diffusion_ckpt/epoch=59-step=34000.ckpt \
--diffusion_config ./generation_infer.yaml \
--num_frames 16 \
--outpaint_type crop \
--start_diffusion_iter 3000 \
--sparse_view 0 \
--downsample_factor 2 \
--diffusion_resize_width 960 \
--diffusion_resize_height 512 \
--diffusion_crop_width 960 \
--diffusion_crop_height 512 \
--patch_size [crop_width] [crop_height] \ # we recommend to use half of the image resolution
--opacity_reset_interval 9000 \
--lambda_dist 0.0 \
--lambda_reg 0.5 \
--lambda_dssim 0.8 \
--densify_from_iter 1000 \
--unconditional_guidance_scale 3.2 \
--repair
Step 1. Download the Mip-NeRF data and train/test split from ReconFusion
Step 2. Run the following command to start training.
cd ./Reconstruction
python genfusion_scripts/batch_sparse.py [gpu_ids]
To run a sparse view reconstruction on your own scene, a train_test_split_{sparse_view}.json
file is required in the following format:
{
"test_ids": [
0, 8, 12, ...
],
"train_ids": [
1, 2, 3, ...
]
}
Then run the following command:
python train.py \
--data_dir [data_dir]
-m [output_dir] \
--iterations 7000 \
--test_iterations 7_000 \
--diffusion_ckpt ./diffusion_ckpt/epoch=59-step=34000.ckpt \
--diffusion_config ./generation_infer.yaml \
--num_frames 16 \
--outpaint_type sparse \
--start_diffusion_iter 1000 \
--sparse_view [num_of_sparse_views] \
--diffusion_resize_width 960 \
--diffusion_resize_height 512 \
--diffusion_crop_width 960 \
--diffusion_crop_height 512 \
--mono_depth \
--repair \
--densify_from_iter 100 \
--diffusion_until 7000 \
--diffusion_every 1000 \
--densify_until_iter 5000 \
--densification_interval 500 \
--opacity_reset_interval 3100 \
--lambda_dist 10.0 \
--lambda_dssim 0.5 \
--lambda_reg 1.0 \
--unconditional_guidance_scale 3.2
We provide two options to reconstruct the unseen area of the scene.
- Option 1: Pre-defined camera path using RemoteViewer, save the trajectory as a json file. Use counter scene in Mip-NeRF360 as an example:
python train.py \
--data_dir [data_dir] \
-m output_ours/counter_completion \
--test_iterations 7_000 \
--diffusion_ckpt ./diffusion_ckpt/epoch=59-step=34000.ckpt \
--diffusion_config ./generation_infer.yaml \
--num_frames 16 \
--outpaint_type rotation \
--add_indices 7 15 \
--depth_loss \
--iterations 26000 \
--diffusion_resize_width 960 \
--diffusion_resize_height 512 \
--diffusion_crop_width 960 \
--diffusion_crop_height 512 \
--repair \
--port 6678 \
--densify_from_iter 500 \
--densify_until_iter 12000 \
--diffusion_until 30000 \
--start_diffusion_iter 5000 \
--diffusion_every 4000 \
--opacity_reset_interval 15000 \
--unconditional_guidance_scale 2.2 \
--start_dist_iter 3000 \
--camera_path_file [trajectory file]
- Option 2: Use path paramters in the script to determine the shape of sampling path.
python train.py \
--data_dir [data_dir] \
-m [output_dir] \
--test_iterations 7_000 \
--diffusion_ckpt ./diffusion_ckpt/epoch=59-step=34000.ckpt \
--diffusion_config ./generation_infer.yaml \
--num_frames 16 \
--outpaint_type rotation \
--add_indices 7 15 \
--depth_loss \
--iterations 26000 \
--diffusion_resize_width 960 \
--diffusion_resize_height 512 \
--diffusion_crop_width 960 \
--diffusion_crop_height 512 \
--port 6691 \
--densify_from_iter 500 \
--densify_until_iter 12000 \
--diffusion_until 30000 \
--start_diffusion_iter 5000 \
--diffusion_every 4000 \
--opacity_reset_interval 15000 \
--repair \
# define the following parameters for your own scene
--path_scale 1.0 \
--rotation_angle 90 \
--position_z_offset 0.5 \
--distance -0.2 \
--unconditional_guidance_scale 2.2 \
--start_dist_iter 3000
If you find this work useful in your research, please consider citing:
@inproceedings{Wu2025GenFusion,
author = {Sibo Wu and Congrong Xu and Binbin Huang and Geiger Andreas and Anpei Chen},
title = {GenFusion: Closing the Loop between Reconstruction and Generation via Videos},
booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2025}
}
This project is built upon 2DGS and DynamiCrafter. The dataset processing is based on gsplat. We thank all the authors for their great repos.
Special thanks to LiangrunDa for his help on the project website and part of the code.