Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

[BUG] Unreasonable memory comsumption #53

Open
ZhiYuanZeng opened this issue Feb 19, 2023 · 1 comment
Open

[BUG] Unreasonable memory comsumption #53

ZhiYuanZeng opened this issue Feb 19, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@ZhiYuanZeng
Copy link

🐛 Describe the bug

Creating an TransformerEncoder causes memory overflow, but the same config works with the huggingface transformers module.

# config.py
from colossalai.amp import AMP_TYPE

fp16=dict(
    mode=AMP_TYPE.TORCH
)
NUM_MICRO_BATCHES=8
parallel = dict(
    tensor=dict(size=4, mode='2d')
)

# launch command: python -m torch.distributed.launch --nproc_per_node 4 --master_port 12345 xxx.py
# Memory overflow on Nvidia 2080 Ti
from titans.layer.block import TransformerEncoderLayer,TransformerEncoder
colossalai.launch_from_torch(config='/home/zyzeng/fastnlp/examples/config.py')
backbone=TransformerEncoder(
    TransformerEncoderLayer(hidden_size=768, nhead=12, dim_feedforward=768*4), 
    num_layers=12
)
# No memory overflow on Nvidia 2080 Ti
from transformers import BertModel, AutoConfig
config=AutoConfig.from_pretrained('bert-base-uncased')
model=BertModel(config)
model.cuda()

Environment

No response

@ZhiYuanZeng ZhiYuanZeng added the bug Something isn't working label Feb 19, 2023
@ZhiYuanZeng
Copy link
Author

I also have a question: is the Tensor parallel module must be created after colossalai.launch?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant