The One-Line Summary: Logistic regression IS regression — it regresses (predicts) the LOG-ODDS of an event, which happens to be a continuous number, and only becomes classification when you apply a threshold to the resulting probability.
The Confusing Name
Every machine learning student has this moment:
STUDENT'S INTERNAL MONOLOGUE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Week 1: "Regression predicts continuous numbers like
price, temperature, age..."
Week 2: "Classification predicts categories like
spam/not spam, cat/dog, yes/no..."
Week 3: "Today we'll learn LOGISTIC REGRESSION
for CLASSIFICATION..."
Student: "Wait... WHAT?! 🤯"
The Short Answer
WHY "REGRESSION"?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Logistic regr...
The One-Line Summary: Logistic regression IS regression — it regresses (predicts) the LOG-ODDS of an event, which happens to be a continuous number, and only becomes classification when you apply a threshold to the resulting probability.
The Confusing Name
Every machine learning student has this moment:
STUDENT'S INTERNAL MONOLOGUE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Week 1: "Regression predicts continuous numbers like
price, temperature, age..."
Week 2: "Classification predicts categories like
spam/not spam, cat/dog, yes/no..."
Week 3: "Today we'll learn LOGISTIC REGRESSION
for CLASSIFICATION..."
Student: "Wait... WHAT?! 🤯"
The Short Answer
WHY "REGRESSION"?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Logistic regression DOES predict a continuous number!
It predicts: The LOG-ODDS of the positive class
(a number from -∞ to +∞)
Which becomes: A PROBABILITY
(a number from 0 to 1)
The CLASSIFICATION part only happens AFTER,
when you apply a threshold (like 0.5).
LINEAR REGRESSION: Predicts a continuous number
LOGISTIC REGRESSION: Predicts a continuous number (probability!)
↓
THEN you threshold it for classification
What Logistic Regression Actually Predicts
Let’s trace through what the model outputs:
THE THREE STAGES OF LOGISTIC REGRESSION OUTPUT:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STAGE 1: Linear Combination (z)
──────────────────────────────
z = β₀ + β₁x₁ + β₂x₂ + ...
This is REGRESSION!
z can be any number: -∞ to +∞
Example: z = 2.3, z = -1.7, z = 0.5
STAGE 2: Probability (p)
──────────────────────────────
p = σ(z) = 1 / (1 + e^(-z))
This is STILL a continuous number!
p ranges from 0 to 1
Example: p = 0.91, p = 0.15, p = 0.62
STAGE 3: Class Label (ŷ)
──────────────────────────────
ŷ = 1 if p ≥ 0.5, else 0
THIS is where classification happens!
ŷ is discrete: only 0 or 1
Example: ŷ = 1, ŷ = 0
THE REGRESSION IS IN STAGES 1 AND 2!
The classification is just a post-processing step.
The Log-Odds: What’s Actually Being Regressed
Here’s the key insight:
import numpy as np
print("WHAT LOGISTIC REGRESSION ACTUALLY REGRESSES")
print("="*60)
print("""
The model finds coefficients such that:
ln(p / (1-p)) = β₀ + β₁x₁ + β₂x₂ + ...
───────────── ─────────────────────────
LOG-ODDS LINEAR!
This IS regression! We're predicting a continuous value
(the log-odds) as a linear function of the features.
""")
# Show the relationship
print("Probability → Odds → Log-Odds")
print("-"*60)
print(f"{'P(y=1)':<12} {'Odds':<15} {'Log-Odds':<15} {'Meaning'}")
print("-"*60)
for p in [0.01, 0.10, 0.25, 0.50, 0.75, 0.90, 0.99]:
odds = p / (1 - p)
log_odds = np.log(odds)
if p < 0.5:
meaning = "More likely 0"
elif p > 0.5:
meaning = "More likely 1"
else:
meaning = "50-50"
print(f"{p:<12.2f} {odds:<15.4f} {log_odds:<15.4f} {meaning}")
print("""
The LOG-ODDS is a continuous number from -∞ to +∞.
Logistic regression REGRESSES this value!
""")
Output:
WHAT LOGISTIC REGRESSION ACTUALLY REGRESSES
============================================================
The model finds coefficients such that:
ln(p / (1-p)) = β₀ + β₁x₁ + β₂x₂ + ...
───────────── ─────────────────────────
LOG-ODDS LINEAR!
This IS regression! We're predicting a continuous value
(the log-odds) as a linear function of the features.
Probability → Odds → Log-Odds
------------------------------------------------------------
P(y=1) Odds Log-Odds Meaning
------------------------------------------------------------
0.01 0.0101 -4.5951 More likely 0
0.10 0.1111 -2.1972 More likely 0
0.25 0.3333 -1.0986 More likely 0
0.50 1.0000 0.0000 50-50
0.75 3.0000 1.0986 More likely 1
0.90 9.0000 2.1972 More likely 1
0.99 99.0000 4.5951 More likely 1
The LOG-ODDS is a continuous number from -∞ to +∞.
Logistic regression REGRESSES this value!
Visual: The Regression Hidden Inside
THE REGRESSION YOU DON'T SEE:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What you THINK logistic regression does:
(Predicts 0 or 1)
y │
1 │ ● ● ● ● ●
│
0 │ ● ● ●
└─────────────────────────── x
What logistic regression ACTUALLY does:
(Predicts continuous probability)
p │
1 │ ●●●●●●●
│ ●●●
0.5 │- - - - - - - ●●- - - - - -
│ ●●●
0 │ ●●●●●●●
└─────────────────────────── x
↑
This S-curve is the
REGRESSION of probability!
What it's REALLY doing internally:
(Regressing log-odds — a straight line!)
log │
odds│ ●
2 │ ●
1 │ ●
0 │- - - - - - ●- - - - - - - -
-1 │ ●
-2 │ ●
└─────────────────────────── x
This is LINEAR REGRESSION
on the log-odds scale!
The Historical Reason
THE HISTORY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
1805: Legendre & Gauss develop "least squares regression"
for predicting continuous outcomes.
1838: Pierre François Verhulst develops the "logistic
function" to model population growth.
(The S-curve that limits growth)
1944: Joseph Berkson coins "logistic regression" combining:
• "Logistic" - the S-shaped function
• "Regression" - because it predicts a continuous
value (probability/log-odds)
The name stuck, even though we now primarily use it
for classification tasks!
WHY DIDN'T THEY CALL IT "LOGISTIC CLASSIFICATION"?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Because the MODEL itself does regression!
The classification is YOUR choice of what to do
with the predicted probability.
You could:
• Threshold at 0.5 for classification
• Threshold at 0.3 for high-recall classification
• Use the raw probability for ranking
• Use the probability in a cost-benefit analysis
The model doesn't know you want to classify.
It just regresses probabilities.
Code: Seeing the Regression
import numpy as np
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
# Create simple data
np.random.seed(42)
X = np.linspace(-3, 3, 100).reshape(-1, 1)
y = (X.ravel() + np.random.randn(100) * 0.5 > 0).astype(int)
# Fit logistic regression
model = LogisticRegression()
model.fit(X, y)
# Get all three outputs
z = model.intercept_[0] + model.coef_[0][0] * X.ravel() # Linear combination
p = model.predict_proba(X)[:, 1] # Probability
y_pred = model.predict(X) # Class label
print("THE THREE OUTPUTS OF LOGISTIC REGRESSION")
print("="*60)
print(f"\n{'X':<10} {'z (linear)':<15} {'p (prob)':<15} {'ŷ (class)':<10}")
print("-"*50)
# Show for a few values
indices = [0, 25, 50, 75, 99]
for i in indices:
print(f"{X[i,0]:<10.2f} {z[i]:<15.4f} {p[i]:<15.4f} {y_pred[i]:<10}")
print(f"""
OBSERVATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• z (linear combination) is CONTINUOUS: {z.min():.2f} to {z.max():.2f}
THIS IS REGRESSION!
• p (probability) is CONTINUOUS: {p.min():.4f} to {p.max():.4f}
THIS IS ALSO REGRESSION!
• ŷ (class label) is DISCRETE: only 0 or 1
THIS is classification, but it's just thresholding p!
""")
Output:
THE THREE OUTPUTS OF LOGISTIC REGRESSION
============================================================
X z (linear) p (prob) ŷ (class)
--------------------------------------------------
-3.00 -5.2341 0.0053 0
-1.50 -2.6171 0.0682 0
0.00 0.0000 0.5000 1
1.50 2.6171 0.9318 1
3.00 5.2341 0.9947 1
OBSERVATIONS:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• z (linear combination) is CONTINUOUS: -5.23 to 5.23
THIS IS REGRESSION!
• p (probability) is CONTINUOUS: 0.0053 to 0.9947
THIS IS ALSO REGRESSION!
• ŷ (class label) is DISCRETE: only 0 or 1
THIS is classification, but it's just thresholding p!
The Family of Regression Models
Logistic regression belongs to a broader family:
GENERALIZED LINEAR MODELS (GLMs):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
All GLMs have this structure:
g(E[y]) = β₀ + β₁x₁ + β₂x₂ + ...
Where g() is a "link function" that transforms
the expected value of y.
LINEAR REGRESSION:
Link function: g(μ) = μ (identity)
Predicts: μ directly
Use for: Continuous outcomes (price, height, etc.)
LOGISTIC REGRESSION:
Link function: g(p) = ln(p/(1-p)) (logit)
Predicts: log-odds, which gives probability
Use for: Binary outcomes (0/1, yes/no)
POISSON REGRESSION:
Link function: g(λ) = ln(λ) (log)
Predicts: log of count rate
Use for: Count data (number of events)
ALL ARE CALLED "REGRESSION" BECAUSE ALL PREDICT
A CONTINUOUS VALUE (just on different scales)!
Why This Matters
Understanding that logistic regression IS regression helps you:
1. UNDERSTAND THE OUTPUT BETTER
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
The primary output is a PROBABILITY, not a class.
You can use this probability for:
• Ranking (sort by confidence)
• Calibrated predictions (actual probability estimates)
• Decision theory (combine with costs/benefits)
• Soft voting in ensembles
2. INTERPRET COEFFICIENTS CORRECTLY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Coefficients affect the LOG-ODDS linearly:
• β₁ = 0.5 means: each unit of x₁ ADDS 0.5 to log-odds
• This MULTIPLIES odds by e^0.5 ≈ 1.65
This is like linear regression, just on a different scale!
3. APPLY REGULARIZATION PROPERLY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Since it's regression, the same regularization techniques work:
• L2 (Ridge) for multicollinearity
• L1 (Lasso) for feature selection
• Elastic Net for both
4. CHOOSE THE RIGHT THRESHOLD
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Since classification is just thresholding:
• 0.5 is arbitrary, not magical
• Adjust based on precision/recall needs
• ROC curve explores all thresholds
The Naming Convention Across ML
WHY SOME CLASSIFIERS HAVE "REGRESSION" IN THE NAME:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
LOGISTIC REGRESSION → Classification
Why "regression": Regresses log-odds/probability
SOFTMAX REGRESSION → Multiclass Classification
Why "regression": Regresses class probabilities
ORDINAL REGRESSION → Ordered Classification
Why "regression": Regresses cumulative probabilities
WHY SOME DON'T:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DECISION TREE → Classification or Regression
Named after the structure (tree), not the method
RANDOM FOREST → Classification or Regression
Named after the ensemble structure
SUPPORT VECTOR MACHINE → Classification or Regression
Named after the mathematical concept (support vectors)
NEURAL NETWORK → Classification or Regression
Named after the biological inspiration
THE PATTERN:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Old statistical methods → Named after what they DO
(regression, classification, estimation)
Modern ML methods → Named after their STRUCTURE
(tree, forest, network, boosting)
A Simple Analogy
THE THERMOSTAT ANALOGY:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
A thermostat measures TEMPERATURE (continuous)
Then makes a BINARY decision (heat on/off)
Temperature: 68°F, 71°F, 65°F, 73°F ...
↓
Decision: If temp < 70°F → Heat ON
If temp ≥ 70°F → Heat OFF
Is the thermostat a "temperature measurer" or an "on/off switch"?
BOTH! It measures temperature (continuous)
then thresholds to make a decision (binary).
LOGISTIC REGRESSION IS THE SAME:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
It predicts PROBABILITY (continuous)
Then makes a BINARY decision (class 0/1)
Probability: 0.23, 0.87, 0.45, 0.91 ...
↓
Decision: If prob < 0.5 → Class 0
If prob ≥ 0.5 → Class 1
The MODEL is regression (predicting probability).
The APPLICATION is classification (thresholding).
Quick Reference
THE NAMING EXPLAINED:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
"LOGISTIC" → The logistic (sigmoid) function used
σ(z) = 1 / (1 + e^(-z))
"REGRESSION" → Because it regresses (predicts):
• Log-odds (continuous: -∞ to +∞)
• Probability (continuous: 0 to 1)
COMPARISON:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Linear Reg. Logistic Reg.
─────────────────────────────────────────────────
Predicts Continuous y Continuous p
Range (-∞, +∞) (0, 1)
Typical use Regression Classification*
Loss function MSE Cross-entropy
Link function Identity Logit
*Classification comes from thresholding p
Key Takeaways
Logistic regression IS regression — It predicts continuous probabilities 1.
Classification is just thresholding — The model outputs probability, YOU decide the cutoff 1.
It regresses log-odds — ln(p/(1-p)) is a linear function of features 1.
Historical naming — "Regression" was used because it predicts a continuous quantity 1.
Part of GLM family — All GLMs are "regression" with different link functions 1.
The sigmoid transforms, doesn’t classify — It maps (-∞, +∞) to (0, 1) 1.
Same techniques apply — Regularization, cross-validation, etc. work because it IS regression 1.
Output is more than just 0/1 — The probability itself is valuable for ranking, calibration, decision-making
The One-Sentence Summary
Logistic regression is called "regression" because it genuinely IS regression — it predicts the continuous log-odds (or equivalently, probability) as a linear function of features, and the classification part only happens afterward when YOU choose to threshold that probability at 0.5 or whatever cutoff makes sense for your problem.
A Final Thought
NEXT TIME SOMEONE ASKS:
"Why is it called regression if it's for classification?"
YOU CAN SAY:
"Because it IS regression! It regresses probability —
a continuous number between 0 and 1. The classification
part is just you picking a threshold. The model doesn't
even know you want to classify; it just predicts
probabilities, and you decide what to do with them."
What’s Next?
Now that you understand why logistic regression is called "regression," explore:
- Probability Calibration — When predicted probabilities need adjustment
- ROC Curves — Evaluating all possible thresholds
- Generalized Linear Models — The broader family of regression techniques
- Multinomial Logistic Regression — Extending to multiple classes
Follow me for the next article in this series!
Let’s Connect!
If the "regression" mystery is finally solved for you, drop a heart!
Questions? Ask in the comments — I read and respond to every one.
Did this naming ever confuse you? I spent weeks confused in my first ML course until someone explained that the classification is just thresholding! 🤯
The difference between "logistic regression does classification" and "logistic regression does regression and then you threshold for classification"? Understanding the second version means you truly understand the algorithm.
Share this with someone still puzzled by the name. It’s one of ML’s most common points of confusion!
Happy learning! 📚