A TypeError in modeling_utils.caching_allocator_warmup function #37074

ZeroMakesAll · 2025-03-28T13:01:19Z

System Info

transformers version: 4.50.2
Platform: Linux-5.15.0-1040-nvidia-x86_64-with-glibc2.35
Python version: 3.12.9
Huggingface_hub version: 0.29.3
Safetensors version: 0.5.3
Accelerate version: 1.5.2
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (GPU?): 2.6.0+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA H800

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Save bool values in model params
Load model use <device_map="auto">
An error occurred in modeling_utils.caching_allocator_warmup (line 5854), because one bool value takes 1/8 byte and then the type of byte_count is float

Expected behavior

Before allocating video memory, do a type check on the byte_count

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2025-03-28T14:16:27Z

Hi @ZeroMakesAll, I'm not sure I understand. torch.bool actually uses 8 bits per entry, so that entries are byte-aligned.

>>> x = torch.ones((32768, 32768), dtype=torch.bool, device="cuda")
>>> x.untyped_storage().nbytes()
1073741824  # 32768 * 32768, 1 byte per entry

ZeroMakesAll · 2025-03-28T15:11:56Z

@Rocketknight1 Thanks, the information you provided was very helpful to me. However, I found that transformers define the size of bool here (modeling_utils.dtype_byte_size)

if dtype == torch.bool:
    return 1 / 8

Seems that huggingface use this function to estimate memory allocation. It return a float and cause the TypeError in modeling_utils line 5854.

_ = torch.empty(byte_count // 2, dtype=torch.float16, device=device, requires_grad=False)

Here, byte_count can't be a float

Rocketknight1 · 2025-03-31T13:55:55Z

Hi @ZeroMakesAll, thanks for that! This is definitely a bug in dtype_byte_size. I'll make a PR to fix it.

Rocketknight1 · 2025-03-31T14:09:48Z

Fix open at #37144

ZeroMakesAll added the bug label Mar 28, 2025

Rocketknight1 mentioned this issue Mar 31, 2025

No more dtype_byte_size() #37144

Merged

Rocketknight1 closed this as completed in #37144 Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A TypeError in modeling_utils.caching_allocator_warmup function #37074

A TypeError in modeling_utils.caching_allocator_warmup function #37074

ZeroMakesAll commented Mar 28, 2025

Rocketknight1 commented Mar 28, 2025 •

edited

Loading

ZeroMakesAll commented Mar 28, 2025

Rocketknight1 commented Mar 31, 2025

Rocketknight1 commented Mar 31, 2025

A TypeError in modeling_utils.caching_allocator_warmup function #37074

A TypeError in modeling_utils.caching_allocator_warmup function #37074

Comments

ZeroMakesAll commented Mar 28, 2025

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Mar 28, 2025 • edited Loading

ZeroMakesAll commented Mar 28, 2025

Rocketknight1 commented Mar 31, 2025

Rocketknight1 commented Mar 31, 2025

Rocketknight1 commented Mar 28, 2025 •

edited

Loading