Understanding Overfitting in Neural Networks (TensorFlow- CNN)

📘 Understanding Overfitting in Neural Networks and Techniques to Prevent It

Using Fashion-MNIST Experiments

Overfitting is a fundamental challenge when developing neural networks. A model that performs extremely well on the training dataset may fail to generalize to unseen data, leading to poor real-world performance. This post presents a structured investigation of overfitting using the Fashion-MNIST dataset and evaluates several mitigation strategies, including Dropout, L2 Regularisation, and Early Stopping.

All experiments, code, and plots in this post are taken directly from the accompanying notebook.

📂 Dataset Overview: Fashion-MNIST

The Fashion-MNIST dataset contains:

60,000 training images
10,000 test images
28×28 grayscale format
10 outpu…

📘 Understanding Overfitting in Neural Networks and Techniques to Prevent It

Using Fashion-MNIST Experiments

All experiments, code, and plots in this post are taken directly from the accompanying notebook.

📂 Dataset Overview: Fashion-MNIST

The Fashion-MNIST dataset contains:

60,000 training images
10,000 test images
28×28 grayscale format
10 output classes

A significantly smaller subset of the training data is intentionally used to make overfitting behaviour more visible.

🧠 Model Architecture Used Throughout

All experiments share the same CNN architecture, with optional L2 regularisation and Dropout:

def create_cnn_model(l2_lambda=0.0, dropout_rate=0.0):
model = keras.Sequential([
layers.Conv2D(32, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Conv2D(64, (3,3), activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.MaxPooling2D((2,2)),
layers.Flatten(),
layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l2(l2_lambda)),
layers.Dropout(dropout_rate),
layers.Dense(10, activation='softmax')
])

model.compile(
optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
return model

📊 Plotting Function

All performance diagrams were generated using the following utility:

def plot_history(history, title_prefix=""):
hist = history.history
plt.figure(figsize=(12,5))

plt.subplot(1,2,1)
plt.plot(hist["loss"], label="Train Loss")
plt.plot(hist["val_loss"], label="Val Loss")
plt.title(f"{title_prefix} Loss")
plt.legend()

plt.subplot(1,2,2)
plt.plot(hist["accuracy"], label="Train Accuracy")
plt.plot(hist["val_accuracy"], label="Val Accuracy")
plt.title(f"{title_prefix} Accuracy")
plt.legend()

plt.tight_layout()
plt.show()

🔍 1. Baseline Model (No Regularisation)

baseline_model = create_cnn_model(l2_lambda=0.0, dropout_rate=0.0)
history_baseline = baseline_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_baseline, title_prefix="Baseline (no regularisation)")

Baseline Performance Plot

Observations

Training accuracy continues to increase steadily.
Validation accuracy peaks early and then declines.
Training loss decreases, while validation loss increases.

➡ This is clear evidence of overfitting.

🛠 2. Dropout (0.5 Rate)

dropout_model = create_cnn_model(dropout_rate=0.5)
history_dropout = dropout_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_dropout, title_prefix="Dropout (0.5)")

Dropout Plot

Observations

Training accuracy increases more slowly (expected due to Dropout).
Validation accuracy tracks the training curve more closely.
Divergence between training and validation loss is significantly reduced.

➡ Dropout is highly effective in this experiment, producing noticeably improved generalisation.

🧱 3. L2 Regularisation (λ = 0.001)

l2_model = create_cnn_model(l2_lambda=0.001)
history_l2 = l2_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20
)
plot_history(history_l2, title_prefix="L2 Regularisation")

L2 Plot

Observations

Training loss is noticeably higher due to weight penalisation.
Validation loss trends are more stable compared to the baseline.
Validation accuracy improves moderately.

➡ L2 regularisation produces smoother learning dynamics and alleviates overfitting, though its impact is milder than Dropout in this setup.

⏳ 4. Early Stopping

earlystop_model = create_cnn_model()
early_stop = keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=3,
restore_best_weights=True
)

history_early = earlystop_model.fit(
x_train_small, y_train_small,
validation_split=0.2,
epochs=20,
callbacks=[early_stop]
)
plot_history(history_early, title_prefix="Early Stopping")

Early Stopping Plot

Observations

Training terminates after validation loss stops improving.
Avoids the late-epoch overfitting seen in the baseline.
Produces one of the cleanest validation curves among all models.

➡ Early stopping is a simple and effective generalisation technique.

📦 (Optional) TensorFlow Lite Conversion

converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
print("Quantised model size (bytes):", len(tflite_model))

This step demonstrates model size reduction for deployment purposes, although it is not a regularisation strategy.

🧾 Conclusion

The experimental results highlight the following:

The baseline model exhibits clear overfitting.
Dropout provides the largest improvement in validation behaviour.
L2 regularisation helps stabilise training dynamics.
Early Stopping prevents late-epoch divergence and improves generalisation.

Combining Dropout + Early Stopping produces the most robust performance on the reduced Fashion-MNIST dataset.

📂 Dataset Overview: Fashion-MNIST

📂 Dataset Overview: Fashion-MNIST

🧠 Model Architecture Used Throughout

📊 Plotting Function

🔍 1. Baseline Model (No Regularisation)

Baseline Performance Plot

Observations

🛠 2. Dropout (0.5 Rate)

Dropout Plot

Observations

🧱 3. L2 Regularisation (λ = 0.001)

L2 Plot

Observations

⏳ 4. Early Stopping

Early Stopping Plot

Observations

📦 (Optional) TensorFlow Lite Conversion

🧾 Conclusion

Similar Posts