RMSNorm weight initialization #519

bugm · 2023-12-04T08:56:52Z

bugm
Dec 4, 2023

I noticed that chatglm3 uses torch.empty to initialize the weight of trainable parameter in RMSNorm instead of torch.ones. That is different from the implementation in the original paper and llama.
Is there any special reason to use a tensor with uninitialized data to initialize the weight?

Btlmd · 2023-12-20T07:30:06Z

Btlmd
Dec 20, 2023
Maintainer

If you are referring the following code

class RMSNorm(torch.nn.Module):
    def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None, **kwargs):
        super().__init__()
        self.weight = torch.nn.Parameter(torch.empty(normalized_shape, device=device, dtype=dtype))
        self.eps = eps

The modeling_chatglm.py model is not intended for training from scratch, but only for inference and finetuning. The weight is expected to be loaded from a pre-trained model, so the torch.empty() is not expected to take any effect.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RMSNorm weight initialization #519

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

RMSNorm weight initialization #519

bugm Dec 4, 2023

Replies: 1 comment

Btlmd Dec 20, 2023 Maintainer

bugm
Dec 4, 2023

Btlmd
Dec 20, 2023
Maintainer