Part II : Building My First Large Language Model from Scratch
medium.com·10h·
Discuss: DEV
Flag this post

3 min readDec 13, 2023

**Layer normalization (LN) **is a technique in deep learning used to stabilize the training process and improve the performance of neural networks. It addresses the internal covariate shift (ICS) problem, where the distribution of activations within a layer changes during training, making it difficult for the network to learn effectively.

How does it work?

LN normalizes the activations of each layer independently across all features. This means that the mean and variance of the activations are calculated for each layer separately, and then the activations are scaled and shifted to have a standard normal distribution (mean of 0 and variance of 1).

Layer Normalization Explained

Layer normalization (LN) is a technique in deep learning use…

Similar Posts

Loading similar posts...