[BUG] Unreasonable memory comsumption #53

ZhiYuanZeng · 2023-02-19T12:40:34Z

🐛 Describe the bug

Creating an TransformerEncoder causes memory overflow, but the same config works with the huggingface transformers module.

# config.py
from colossalai.amp import AMP_TYPE

fp16=dict(
    mode=AMP_TYPE.TORCH
)
NUM_MICRO_BATCHES=8
parallel = dict(
    tensor=dict(size=4, mode='2d')
)

# launch command: python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 xxx.py
# Memory overflow on Nvidia 2080 Ti
from titans.layer.block import TransformerEncoderLayer,TransformerEncoder
colossalai.launch_from_torch(config='/home/zyzeng/fastnlp/examples/config.py')
backbone=TransformerEncoder(
    TransformerEncoderLayer(hidden_size=768, nhead=12, dim_feedforward=768*4), 
    num_layers=12
)

# No memory overflow on Nvidia 2080 Ti
from transformers import BertModel, AutoConfig
config=AutoConfig.from_pretrained('bert-base-uncased')
model=BertModel(config)
model.cuda()

Environment

No response

The text was updated successfully, but these errors were encountered:

ZhiYuanZeng · 2023-02-19T12:43:40Z

I also have a question: is the Tensor parallel module must be created after colossalai.launch?

ZhiYuanZeng added the bug Something isn't working label Feb 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unreasonable memory comsumption #53

[BUG] Unreasonable memory comsumption #53

ZhiYuanZeng commented Feb 19, 2023

ZhiYuanZeng commented Feb 19, 2023

[BUG] Unreasonable memory comsumption #53

[BUG] Unreasonable memory comsumption #53

Comments

ZhiYuanZeng commented Feb 19, 2023

🐛 Describe the bug

Environment

ZhiYuanZeng commented Feb 19, 2023