Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SD 3.5L Support an Lycoris #3

Open
xavier-airilab opened this issue Nov 10, 2024 · 8 comments
Open

SD 3.5L Support an Lycoris #3

xavier-airilab opened this issue Nov 10, 2024 · 8 comments

Comments

@xavier-airilab
Copy link

Hi, I am very interested in your research and would love to test it with the SD3.5L model.

Currently, I’m fine-tuning a large Lycoris LoKr model for that parent and am exploring control solutions. Your approach looks incredibly promising as a unified model handling multiple traits, and I’d also be interested in the possibility of adapting my trained LoKr model into a LoRAdapter if feasible. I have an architecture-focused dataset with specialized conditioning that I believe could yield strong results in architectural scenarios.

My expertise so far is mostly with community tools, but I am eager to learn and actively contribute. Bghira recommended your repository as an ideal solution for my needs, so here I am.

Thank you for your attention, and for all the innovations you and your team have made open-source over the years. I look forward

@kliyer-ai
Copy link
Collaborator

Hi Xavier,
Great to hear your interest! I think it's feasible and you should definitely try. I'm currently busy with the upcoming CVPR deadline but if you have more concrete questions, I can answer them after the deadline.

@xavier-airilab
Copy link
Author

Hi Kliyer,

Thank you for your reply. I've been reviewing the code in the repository further and have a few questions:

  1. Are the structure and style models two independent models? I initially thought they were part of a single model capable of performing both tasks. However, independent models work fine for my purposes as well.

  2. Regarding the difference between structure and style, is the core distinction based on the layer where the LoRA is trained? For example:
    Structure: adaption_mode: only_self/only_res_conv
    Style: adaption_mode: only_cross
    In the case of SD3.5L, is the previous layer system still relevant? I'm unclear on how to adapt this logic to DIT.

  3. Lastly, I’m not very familiar with Hydra. Is it possible to use the same system I currently use for training LyCORIS (ST) and directly add image conditions into the training code? (This may be naive or wishful thinking.) I'm wondering if there are specific processes in training LoRA-Dapter that make it significantly more complex than a standard LoRA with conditioning input.

I look forward to your insights. Thank you for your time, and best of luck with aCVPR!

@bghira
Copy link

bghira commented Nov 15, 2024

(author of ST / simpletuner)

also unfamiliar with Hydra, which is the main thing keeping me from implementing LoRAdapter there. the problem isn't that it's impossible to figure out, but that due to the lack of familiarity with Hydra, it will require a fair amount of doc digging and reverse engineering to figure out how the config-driven training is really working here.

@stefan-baumann
Copy link
Member

@bghira That problem is relatively easily solved actually. You can just start a training (e.g., with python train.py experiment=train_struct_sd15 or python train.py experiment=train_style_sd15), and then, the automatically created output directory will contain a rendered config (e.g., at LoRAdapter/outputs/train/runs/2024-11-16/08-05-48/.hydra/config.yaml) that contains all the information. For train_struct_sd15, this config looks like this:

data:
  _target_: src.data.local.ImageDataModule
  directories:
  - data
  transform:
  - _target_: torchvision.transforms.Resize
    size: 512
  - _target_: torchvision.transforms.CenterCrop
    size: 512
  - _target_: torchvision.transforms.ToTensor
  - _target_: torchvision.transforms.Normalize
    mean:
    - 0.5
    - 0.5
    - 0.5
    std:
    - 0.5
    - 0.5
    - 0.5
  batch_size: 8
  caption_from_name: true
  caption_prefix: 'a picture of '
model:
  _target_: src.model.SD15
  pipeline_type: diffusers.StableDiffusionPipeline
  model_name: runwayml/stable-diffusion-v1-5
  local_files_only: ${local_files_only}
size: 512
max_train_steps: null
epochs: 10
learning_rate: 0.0001
lr_warmup_steps: 0
lr_scheduler: constant
prompt: null
gradient_accumulation_steps: 1
ckpt_steps: 3000
val_steps: 3000
val_images: 4
seed: 42
n_samples: 4
tag: ''
local_files_only: false
lora:
  struct:
    mapper_network:
      _target_: src.mapper_network.FixedStructureMapper15
      c_dim: ${..config.c_dim}
    encoder:
      _target_: src.annotators.midas.DepthEstimator
      model: Intel/dpt-hybrid-midas
      size: ${size}
      local_files_only: ${local_files_only}
    cfg: false
    config:
      c_dim: 128
      rank: 128
      adaption_mode: only_res_conv
      lora_cls: NewStructLoRAConv
    optimize: true
log_c: true
val_batches: 4

This one should be easily interpretable: It's some global config variables that are interpreted by the trainer file, plus direct Python class instanciations (every block with a _target_ field just instantiates that class with the args present in that block. Variable values with dollar signs and braces are interpolations, they access the value specified in the brace from somewhere else in the config, so ${local_files_only} just uses the value specified at the top level, while ${..config.c_dim} is relative and goes up two levels, then accesses config.c_dim there.

I hope that at least helps with the question of how config files relate to what happens during training.

@xavier-airilab
Copy link
Author

xavier-airilab commented Nov 18, 2024

So the main things is:
Structure: adaption_mode: only_self/only_res_conv
Style: adaption_mode: only_cross

Or is there other special configuration?
"NewStructLoRAConv"something special there?

@kliyer-ai
Copy link
Collaborator

Hi Kliyer,

Thank you for your reply. I've been reviewing the code in the repository further and have a few questions:

  1. Are the structure and style models two independent models? I initially thought they were part of a single model capable of performing both tasks. However, independent models work fine for my purposes as well.
  2. Regarding the difference between structure and style, is the core distinction based on the layer where the LoRA is trained? For example:
    Structure: adaption_mode: only_self/only_res_conv
    Style: adaption_mode: only_cross
    In the case of SD3.5L, is the previous layer system still relevant? I'm unclear on how to adapt this logic to DIT.
  3. Lastly, I’m not very familiar with Hydra. Is it possible to use the same system I currently use for training LyCORIS (ST) and directly add image conditions into the training code? (This may be naive or wishful thinking.) I'm wondering if there are specific processes in training LoRA-Dapter that make it significantly more complex than a standard LoRA with conditioning input.

I look forward to your insights. Thank you for your time, and best of luck with aCVPR!

Hi @xavier-airilab,
regarding your questions:

  1. They are two separate LoRAs (LoRAdapters) that you add to the same base model, e.g. SD 1.5.
  2. Yes, it is based on that. Stable Diffusion 1.* and 2.* use a hybrid architecture of interleaved ResNet blocks (convolutional layers) and attention layers (linear layers). For any kind of spatial conditioning, e.g. depth, I found that adapting the convolutional layer works well (only_res_conv), and adapting the self-attention layers (only_self) can also work well. So this is dependent on the model architecture at hand and for SD3.5L you will have to adapt it.
  3. Already answered by @stefan-baumann

@kliyer-ai
Copy link
Collaborator

Concretely, the logic for adapting the LoRA to he model architecture is handled here: https://github.com/CompVis/LoRAdapter/blob/main/src/model.py#L139

This loops over all keys of the model's state dict and checks what the current layer is based on the key. Here you would have to add your logic for new models, e.g.

if adaption_mode == "my_adaption_mode" and "layer.you.want.to.adapt" in path:
                _continue = False

@xavier-airilab
Copy link
Author

Thanks you for your answers, I think you gave me enough answer to start, already a lot to check and try with after your replies 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants