Training Guide

This guide provides simple snippets to train diffnext models.

1. Build VAE cache

To optimize training workflow, we preprocess images or videos into VAE latents.

Requirements:

pip install protobuf==3.20.3 codewithgpu decord

Build T2I cache

Following snippet can be used to cache image latents:

import os, codewithgpu, torch, PIL.Image, numpy as np
from diffnext.models.autoencoders.autoencoder_kl import AutoencoderKL

device, dtype = torch.device("cuda"), torch.float16
vae = AutoencoderKL.from_pretrained("/path/to/nova-d48w1024-sdxl1024/vae")
vae = vae.to(device=device, dtype=dtype).eval()

features = {"moments": "bytes", "caption": "string", "text": "string", "shape": ["int64"]}
_, writer = os.makedirs("./img_dataset"), codewithgpu.RecordWriter("./img_dataset", features)

img = PIL.Image.open("./assets/sample_image.jpg")
x = torch.as_tensor(np.array(img)[None, ...].transpose(0, 3, 1, 2)).to(device).to(dtype)
with torch.no_grad():
    x = vae.encode(x.sub(127.5).div(127.5)).latent_dist.parameters.cpu().numpy()[0]
example = {"caption": "long caption", "text": "short text"}
writer.write({"shape": x.shape, "moments": x.tobytes(), **example}), writer.close()

Build T2V cache

Following snippet can be used to cache video latents:

import os, codewithgpu, torch, decord, numpy as np
from diffnext.models.autoencoders.autoencoder_kl_opensora import AutoencoderKLOpenSora

device, dtype = torch.device("cuda"), torch.float16
vae = AutoencoderKLOpenSora.from_pretrained("/path/to/nova-d48w1024-osp480/vae")
vae = vae.to(device=device, dtype=dtype).eval()

features = {"moments": "bytes", "caption": "string", "text": "string", "shape": ["int64"], "flow": "float64"}
_, writer = os.makedirs("./vid_dataset"), codewithgpu.RecordWriter("./vid_dataset", features)

resize, crop_size, frame_ids = 480, (480, 768), list(range(0, 65, 2))
vid = decord.VideoReader("./assets/sample_video.mp4")
h, w = vid[0].shape[:2]
scale = float(resize) / float(min(h, w))
size = int(h * scale + 0.5), int(w * scale + 0.5)
y, x = (size[0] - crop_size[0]) // 2, (size[1] - crop_size[1]) // 2
vid = decord.VideoReader("./assets/sample_video.mp4", height=size[0], width=size[1])
vid = vid.get_batch(frame_ids).asnumpy()
vid = vid[:, y : y + crop_size[0], x : x + crop_size[1]]
x = torch.as_tensor(vid[None, ...].transpose((0, 4, 1, 2, 3))).to(device).to(dtype)
with torch.no_grad():
    x = vae.encode(x.sub(127.5).div(127.5)).latent_dist.parameters.cpu().numpy()[0]
example = {"caption": "long caption", "text": "short text", "flow": 5}
writer.write({"shape": x.shape, "moments": x.tobytes(), **example}), writer.close()

2. Train models

Train T2I model

Following snippet provides simple T2I training arguments:

from diffnext.config import cfg
cfg.PIPELINE.TYPE = "nova_train_t2i"
cfg.MODEL.WEIGHTS = "/path/to/nova-d48w1024-sdxl1024"
cfg.TRAIN.DATASET = "./img_dataset"
cfg.SOLVER.BASE_LR, cfg.SOLVER.MAX_STEPS = 1e-4, 100
open("./nova_d48w1024_1024px.yml", "w").write(str(cfg))

python scripts/train.py --cfg ./nova_d48w1024_1024px.yml

Train T2V model

Following snippet provides simple T2V training arguments:

from diffnext.config import cfg
cfg.PIPELINE.TYPE = "nova_train_t2v"
cfg.MODEL.WEIGHTS = "/path/to/nova-d48w1024-osp480"
cfg.TRAIN.DATASET = "./vid_dataset"
cfg.SOLVER.BASE_LR, cfg.SOLVER.MAX_STEPS = 1e-4, 100
open("./nova_d48w1024_480px.yml", "w").write(str(cfg))

python scripts/train.py --cfg ./nova_d48w1024_480px.yml

Train DeepSpeed model

python scripts/train.py --cfg ./nova_d48w1024_1024px.yml --deepspeed ./configs/deepspeed/zero2_bf16.json

This script launches multi-nodes job using hostfile.

Argument usage:

python scripts/train.py --host /path/to/my_hostfile

Requirements:

The total number of slots accumulated in the hostfile should be equal to cfg.NUM_GPUS.
The launcher machine must be able to SSH to all host machines with passwordless login.

See DeepSpeed's Doc for the hostfile details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training.md

training.md

Training Guide

1. Build VAE cache

Requirements:

Build T2I cache

Build T2V cache

2. Train models

Train T2I model

Train T2V model

Train DeepSpeed model

Files

training.md

Latest commit

History

training.md

File metadata and controls

Training Guide

1. Build VAE cache

Requirements:

Build T2I cache

Build T2V cache

2. Train models

Train T2I model

Train T2V model

Train DeepSpeed model