Building makemore Part 3: Activations & Gradients, BatchNorm (opens in new tab)
<iframe id="ytplayer" type="text/html" width="640" height="360" src="https://www.youtube-nocookie.com/embed/P6sfmUTpUmc" frameborder="0" allowfullscreen="" referrerpolicy="strict-origin-when-cross-origin"></iframe><br><span style="white-space: pre-wrap;">We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are...</span>
Read the original article