Weight Initialization in Deep Learning: Xavier (Glorot), He (Kaiming), and Beyond

Why Xavier and He Initialization Decide Whether Your Neural Network Learns or Fails

6 min read22 hours ago

–

Your neural network is ready. Layers are stacked. Activation functions chosen. Optimization algorithm selected. You run the first epoch expecting progress. Instead, the loss plateaus or explodes. Gradients vanish. Convergence stalls. The network fails to learn meaningfully.

The culprit? Weight initialization.

It seems paradoxical: before training even begins, the initial values of weights — random numbers between -1 and 1 — determine whether your network learns efficiently or gets stu…

Why Xavier and He Initialization Decide Whether Your Neural Network Learns or Fails

Why Xavier and He Initialization Decide Whether Your Neural Network Learns or Fails

The Problem: Why Initialization Matters

Similar Posts