📘 Understanding Overfitting in Neural Networks and Techniques to Prevent It
Using Fashion-MNIST Experiments
Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.
All experiments, code, and plots in this post are taken directly from the accompanying notebook.
📂 Dataset Overview: Fashion-MNIST
The Fashion-MNIST dataset contains:
- 60,000 training images
- 10,000 test images
- 28×28 grayscale format
- 10 outpu…
📘 Understanding Overfitting in Neural Networks and Techniques to Prevent It
Using Fashion-MNIST Experiments
Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.
All experiments, code, and plots in this post are taken directly from the accompanying notebook.
📂 Dataset Overview: Fashion-MNIST
The Fashion-MNIST dataset contains:
- 60,000 training images
- 10,000 test images
- 28×28 grayscale format
- 10 output classes
A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.
🧠 Model Architecture Used Throughout
All experiments share the same CNN architecture, with optional L2 regularisation and Dropout:
def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
model = keras.Sequential([
layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.Dropout(dropout_rate),
layers.Dense(10, activation='softmax')
])
model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
return model
📊 Plotting Function
All performance diagrams were generated using the following utility:
def plot_history(history, title_prefix=""):
hist = history.history
plt.figure(figsize=(12,5))
plt.subplot(1,2,1)
plt.plot(hist["loss"], label="Train Loss")
plt.plot(hist["val_loss"], label="Val Loss")
plt.title(f"{title_prefix} Loss")
plt.legend()
plt.subplot(1,2,2)
plt.plot(hist["accuracy"], label="Train Accuracy")
plt.plot(hist["val_accuracy"], label="Val Accuracy")
plt.title(f"{title_prefix} Accuracy")
plt.legend()
plt.tight_layout()
plt.show()
🔍 1. Baseline Model (No Regularisation)
baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")
Baseline Performance Plot
Observations
- Training accuracy continues to increase steadily.
- Validation accuracy peaks early and then declines.
- Training loss decreases, while validation loss increases.
➡ This is clear evidence of overfitting.
🛠 2. Dropout (0.5 Rate)
dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")
Dropout Plot
Observations
- Training accuracy increases more slowly (expected due to Dropout).
- Validation accuracy tracks the training curve more closely.
- Divergence between training and validation loss is significantly reduced.
➡ Dropout is highly effective in this experiment, producing noticeably improved generalisation.
🧱 3. L2 Regularisation (λ = 0.001)
l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")
L2 Plot
Observations
- Training loss is noticeably higher due to weight penalisation.
- Validation loss trends are more stable compared to the baseline.
- Validation accuracy improves moderately.
➡ L2 regularisation produces smoother learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.
⏳ 4. Early Stopping
earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)
history_early = earlystop_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20,
callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")
Early Stopping Plot
Observations
- Training terminates after validation loss stops improving.
- Avoids the late-epoch overfitting seen in the baseline.
- Produces one of the cleanest validation curves among all models.
➡ Early stopping is a simple and effective generalisation technique.
📦 (Optional) TensorFlow Lite Conversion
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))
This step demonstrates model size reduction for deployment purposes, although it is not a regularisation strategy.
🧾 Conclusion
The experimental results highlight the following:
- The baseline model exhibits clear overfitting.
- Dropout provides the largest improvement in validation behaviour.
- L2 regularisation helps stabilise training dynamics.
- Early Stopping prevents late-epoch divergence and improves generalisation.
Combining Dropout + Early Stopping produces the most robust performance on the reduced Fashion-MNIST dataset.