SD 3.5L Support an Lycoris #3

xavier-airilab · 2024-11-10T14:51:54Z

Hi, I am very interested in your research and would love to test it with the SD3.5L model.

Currently, I’m fine-tuning a large Lycoris LoKr model for that parent and am exploring control solutions. Your approach looks incredibly promising as a unified model handling multiple traits, and I’d also be interested in the possibility of adapting my trained LoKr model into a LoRAdapter if feasible. I have an architecture-focused dataset with specialized conditioning that I believe could yield strong results in architectural scenarios.

My expertise so far is mostly with community tools, but I am eager to learn and actively contribute. Bghira recommended your repository as an ideal solution for my needs, so here I am.

Thank you for your attention, and for all the innovations you and your team have made open-source over the years. I look forward

kliyer-ai · 2024-11-11T19:23:58Z

Hi Xavier,
Great to hear your interest! I think it's feasible and you should definitely try. I'm currently busy with the upcoming CVPR deadline but if you have more concrete questions, I can answer them after the deadline.

xavier-airilab · 2024-11-15T17:44:59Z

Hi Kliyer,

Thank you for your reply. I've been reviewing the code in the repository further and have a few questions:

Are the structure and style models two independent models? I initially thought they were part of a single model capable of performing both tasks. However, independent models work fine for my purposes as well.
Regarding the difference between structure and style, is the core distinction based on the layer where the LoRA is trained? For example:
Structure: adaption_mode: only_self/only_res_conv
Style: adaption_mode: only_cross
In the case of SD3.5L, is the previous layer system still relevant? I'm unclear on how to adapt this logic to DIT.
Lastly, I’m not very familiar with Hydra. Is it possible to use the same system I currently use for training LyCORIS (ST) and directly add image conditions into the training code? (This may be naive or wishful thinking.) I'm wondering if there are specific processes in training LoRA-Dapter that make it significantly more complex than a standard LoRA with conditioning input.

I look forward to your insights. Thank you for your time, and best of luck with aCVPR!

bghira · 2024-11-15T18:26:05Z

(author of ST / simpletuner)

also unfamiliar with Hydra, which is the main thing keeping me from implementing LoRAdapter there. the problem isn't that it's impossible to figure out, but that due to the lack of familiarity with Hydra, it will require a fair amount of doc digging and reverse engineering to figure out how the config-driven training is really working here.

stefan-baumann · 2024-11-16T07:17:37Z

@bghira That problem is relatively easily solved actually. You can just start a training (e.g., with python train.py experiment=train_struct_sd15 or python train.py experiment=train_style_sd15), and then, the automatically created output directory will contain a rendered config (e.g., at LoRAdapter/outputs/train/runs/2024-11-16/08-05-48/.hydra/config.yaml) that contains all the information. For train_struct_sd15, this config looks like this:

data:
  _target_: src.data.local.ImageDataModule
  directories:
  - data
  transform:
  - _target_: torchvision.transforms.Resize
    size: 512
  - _target_: torchvision.transforms.CenterCrop
    size: 512
  - _target_: torchvision.transforms.ToTensor
  - _target_: torchvision.transforms.Normalize
    mean:
    - 0.5
    - 0.5
    - 0.5
    std:
    - 0.5
    - 0.5
    - 0.5
  batch_size: 8
  caption_from_name: true
  caption_prefix: 'a picture of '
model:
  _target_: src.model.SD15
  pipeline_type: diffusers.StableDiffusionPipeline
  model_name: runwayml/stable-diffusion-v1-5
  local_files_only: ${local_files_only}
size: 512
max_train_steps: null
epochs: 10
learning_rate: 0.0001
lr_warmup_steps: 0
lr_scheduler: constant
prompt: null
gradient_accumulation_steps: 1
ckpt_steps: 3000
val_steps: 3000
val_images: 4
seed: 42
n_samples: 4
tag: ''
local_files_only: false
lora:
  struct:
    mapper_network:
      _target_: src.mapper_network.FixedStructureMapper15
      c_dim: ${..config.c_dim}
    encoder:
      _target_: src.annotators.midas.DepthEstimator
      model: Intel/dpt-hybrid-midas
      size: ${size}
      local_files_only: ${local_files_only}
    cfg: false
    config:
      c_dim: 128
      rank: 128
      adaption_mode: only_res_conv
      lora_cls: NewStructLoRAConv
    optimize: true
log_c: true
val_batches: 4

This one should be easily interpretable: It's some global config variables that are interpreted by the trainer file, plus direct Python class instanciations (every block with a _target_ field just instantiates that class with the args present in that block. Variable values with dollar signs and braces are interpolations, they access the value specified in the brace from somewhere else in the config, so ${local_files_only} just uses the value specified at the top level, while ${..config.c_dim} is relative and goes up two levels, then accesses config.c_dim there.

I hope that at least helps with the question of how config files relate to what happens during training.

xavier-airilab · 2024-11-18T16:43:08Z

So the main things is:
Structure: adaption_mode: only_self/only_res_conv
Style: adaption_mode: only_cross

Or is there other special configuration?
"NewStructLoRAConv"something special there?

kliyer-ai · 2024-11-18T16:56:20Z

Hi Kliyer,

Thank you for your reply. I've been reviewing the code in the repository further and have a few questions:

Are the structure and style models two independent models? I initially thought they were part of a single model capable of performing both tasks. However, independent models work fine for my purposes as well.

Regarding the difference between structure and style, is the core distinction based on the layer where the LoRA is trained? For example:
Structure: adaption_mode: only_self/only_res_conv
Style: adaption_mode: only_cross
In the case of SD3.5L, is the previous layer system still relevant? I'm unclear on how to adapt this logic to DIT.

Lastly, I’m not very familiar with Hydra. Is it possible to use the same system I currently use for training LyCORIS (ST) and directly add image conditions into the training code? (This may be naive or wishful thinking.) I'm wondering if there are specific processes in training LoRA-Dapter that make it significantly more complex than a standard LoRA with conditioning input.

I look forward to your insights. Thank you for your time, and best of luck with aCVPR!

Hi @xavier-airilab,
regarding your questions:

They are two separate LoRAs (LoRAdapters) that you add to the same base model, e.g. SD 1.5.
Yes, it is based on that. Stable Diffusion 1.* and 2.* use a hybrid architecture of interleaved ResNet blocks (convolutional layers) and attention layers (linear layers). For any kind of spatial conditioning, e.g. depth, I found that adapting the convolutional layer works well (only_res_conv), and adapting the self-attention layers (only_self) can also work well. So this is dependent on the model architecture at hand and for SD3.5L you will have to adapt it.
Already answered by @stefan-baumann

kliyer-ai · 2024-11-18T17:04:45Z

Concretely, the logic for adapting the LoRA to he model architecture is handled here: https://github.com/CompVis/LoRAdapter/blob/main/src/model.py#L139

This loops over all keys of the model's state dict and checks what the current layer is based on the key. Here you would have to add your logic for new models, e.g.

if adaption_mode == "my_adaption_mode" and "layer.you.want.to.adapt" in path:
                _continue = False

xavier-airilab · 2024-11-18T17:52:23Z

Thanks you for your answers, I think you gave me enough answer to start, already a lot to check and try with after your replies 😉

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD 3.5L Support an Lycoris #3

SD 3.5L Support an Lycoris #3

xavier-airilab commented Nov 10, 2024

kliyer-ai commented Nov 11, 2024

xavier-airilab commented Nov 15, 2024

bghira commented Nov 15, 2024

stefan-baumann commented Nov 16, 2024

xavier-airilab commented Nov 18, 2024 •

edited

Loading

kliyer-ai commented Nov 18, 2024

kliyer-ai commented Nov 18, 2024

xavier-airilab commented Nov 18, 2024

SD 3.5L Support an Lycoris #3

SD 3.5L Support an Lycoris #3

Comments

xavier-airilab commented Nov 10, 2024

kliyer-ai commented Nov 11, 2024

xavier-airilab commented Nov 15, 2024

bghira commented Nov 15, 2024

stefan-baumann commented Nov 16, 2024

xavier-airilab commented Nov 18, 2024 • edited Loading

kliyer-ai commented Nov 18, 2024

kliyer-ai commented Nov 18, 2024

xavier-airilab commented Nov 18, 2024

xavier-airilab commented Nov 18, 2024 •

edited

Loading