Replies: 1 comment
-
If you are referring the following code class RMSNorm(torch.nn.Module):
def __init__(self, normalized_shape, eps=1e-5, device=None, dtype=None, **kwargs):
super().__init__()
self.weight = torch.nn.Parameter(torch.empty(normalized_shape, device=device, dtype=dtype))
self.eps = eps The |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I noticed that chatglm3 uses torch.empty to initialize the weight of trainable parameter in RMSNorm instead of torch.ones. That is different from the implementation in the original paper and llama.
Is there any special reason to use a tensor with uninitialized data to initialize the weight?
Beta Was this translation helpful? Give feedback.
All reactions