Member-only story
Overfitting of Deep Neural Networks, Bias-Variance trade-off, Dropouts & Regularization
6 min read9 hours ago
โ
In classical neural networks, people mostly tried 2โ3 layered networks.
Press enter or click to view image in full size
Photo by Google DeepMind on Unsplash
There were a few hiccups that made them avoid deeper architectures, such as:
- Vanishing Gradients, this made it difficult to train deep networks.
- Too little data, not enough samples, leading to overfitting.
- Little or Limited computational power.
For example:
In deep neural networks, if we have thousands of weights but not millions โฆ
Member-only story
Overfitting of Deep Neural Networks, Bias-Variance trade-off, Dropouts & Regularization
6 min read9 hours ago
โ
In classical neural networks, people mostly tried 2โ3 layered networks.
Press enter or click to view image in full size
Photo by Google DeepMind on Unsplash
There were a few hiccups that made them avoid deeper architectures, such as:
- Vanishing Gradients, this made it difficult to train deep networks.
- Too little data, not enough samples, leading to overfitting.
- Little or Limited computational power.
For example:
In deep neural networks, if we have thousands of weights but not millions of data points, the models may overfit. This means we have too many parameters and too little data. Alternatively, if we have thousands of weights,
then each iteration โ mini-batch โ gradients โ weight updates โ will take a lot of time and computational power.
What about epochs and multiple epochs?
Things have changed a lot in the last decade and a half.
- With the help of the internet, we now have a large amount of labeled data (e.g., ImageNet โ over 2 million labeled samples).