You prepare for an exam.
You give a mock test. You get 72 marks.
Now the real question is not:
“Did I pass or fail?”
The real question is:
“How good is 72?”
Is it:
- Better than before?
- Good enough?
- Just lucky?
That’s exactly what model evaluation is about.
Why We Need Evaluation
A model can always give predictions.
But prediction alone means nothing.
We must ask:
- Can I trust this model?
- Will it work on new data?
- Is it learning patterns or memorizing data?
Evaluation answers these questions.
R-squared (R²): The Most Popular Metric
Imagine this.
You’re trying to predict house prices.
Before using ML, your best guess is:
“All houses cost around ₹50 lakh.”
That’s your baseline.
Now your model predicts differ…
You prepare for an exam.
You give a mock test. You get 72 marks.
Now the real question is not:
“Did I pass or fail?”
The real question is:
“How good is 72?”
Is it:
- Better than before?
- Good enough?
- Just lucky?
That’s exactly what model evaluation is about.
Why We Need Evaluation
A model can always give predictions.
But prediction alone means nothing.
We must ask:
- Can I trust this model?
- Will it work on new data?
- Is it learning patterns or memorizing data?
Evaluation answers these questions.
R-squared (R²): The Most Popular Metric
Imagine this.
You’re trying to predict house prices.
Before using ML, your best guess is:
“All houses cost around ₹50 lakh.”
That’s your baseline.
Now your model predicts different prices for different houses.
R² asks:
“How much better is your model compared to this dumb guess?”
R² in simple words
R² tells how much of the problem your model explains.
- R² = 0.80 → model explains 80% pattern
- R² = 0.20 → model explains very little
- R² = 1 → perfect (rare, suspicious)
Important Truth About R²
High R² does not always mean good model.
Why?
- It can overfit
- It can memorize
- It can fail on new data
That’s why we never trust R² alone.
Residuals: Listening to the Model’s Mistakes
Residual = actual value − predicted value.
Think of residuals as model’s complaints.
If residuals look:
- Random → model is healthy
- Patterned → model is missing something
Residual plots help us see:
“Is the model behaving logically?”
Standard Error (SE): How Confident Is the Model?
Imagine two friends predicting house prices.
Friend A:
- Usually wrong by ₹5,000
Friend B:
- Usually wrong by ₹50,000
Who do you trust more?
Standard Error tells:
“On average, how far predictions are from truth.”
Lower SE = more reliable model.
Train vs Test Performance (Very Important)
If:
- Training accuracy is very high
- Testing accuracy is low
That means:
Model memorized instead of learning.
This is how we detect overfitting.
Like a student who learns answers by heart but fails when questions change.
This problem is called overfitting — the model knows the past too well, but can’t handle anything new.
Tiny Real-Life Thought 🧠
If someone always scores high in practice tests but fails in the real exam —
you know something is wrong.
Same with ML models.
3-Line Takeaway
- Evaluation tells if model is trustworthy
- R² shows explained variation
- SE shows prediction reliability
What’s Coming Next 👀
Now the big question:
Why do some models fail even when metrics look good?
That leads us to:
👉 Day 6 — Why Linear Regression Breaks (Assumptions & Multicollinearity)
I love breaking down complex topics into simple, easy-to-understand explanations so everyone can follow along. If you’re into learning AI in a beginner-friendly way, make sure to follow for more!
Connect on Linkedin: https://www.linkedin.com/in/chanchalsingh22/ Connect on YouTube: https://www.youtube.com/@Brains_Behind_Bots