2026-02-08

Motivation

Image classification needs decisions In medical imaging (e.g., chest X-rays), we often want a binary decision:

  • Disease vs No disease
  • Abnormal vs Normal
  • Positive vs Negative

Deep models (CNNs) are powerful, but classical statistics still matters:

  • Interpretability
  • Uncertainty quantification
  • Simple baselines using extracted features

Concept

Rather than feeding raw pixels, we can extract numerical information from an image, such as:

  • Mean and variance of intensity
  • edge density (Texture proxy)
  • measurements of shape and area (derived from segmentation).
  • embedding values of a pretrained model (CNN feature vector)

Then we model the probability of class membership using statistics.

Features

To keep this assignment self-contained, we simulate a small dataset that behaves similar to extracted features:

  • x1:mean intensity (normalized)
  • x2:texture score
  • x3:edge density

Class label \(y \in \{0,1\}\) represents “abnormal”.

Features (Code only)

n <- 400
x1 <- rnorm(n, mean = 0, sd = 1)                 
x2 <- rnorm(n, mean = 0, sd = 1)                 
x3 <- 0.6*x1 + rnorm(n, mean = 0, sd = 0.8)      
linpred <- -0.2 + 1.2*x1 + 0.9*x2 - 0.7*x3
p <- 1/(1 + exp(-linpred))
y <- rbinom(n, size = 1, prob = p)
df <- tibble(
  x1 = x1, x2 = x2, x3 = x3,
  y = factor(y, levels = c(0,1), labels = c("Normal", "Abnormal")))
head(df)

Features (Preview)

x1 x2 x3 y
0.202 1.311 0.653 Abnormal
0.810 -0.969 0.568 Normal
-0.066 -0.106 0.088 Abnormal
0.754 -0.788 0.065 Normal
-1.627 -0.307 -1.181 Normal
1.002 -1.177 0.319 Abnormal

Logistic regression: the statistical model

We model the probability an image is abnormal: \[ p(x)=P(Y=1 \mid X=x)=\frac{1}{1+\exp\left(-(\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3)\right)} \]

Equivalent log-odds form: \[ \log\left(\frac{p(x)}{1 - p(x)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \]

Each \(\beta_j\) reflects the change in log-odds of abnormality for a one-unit increase in feature \(x_j\), while other features remain constant.

Training objective: log-loss

Logistic regression chooses parameters that minimize negative log likelihood: \[ \boldsymbol{\beta} = - \sum_{i=1}^{n} \left[ y_i \log(p_i) + (1 - y_i)\log(1 - p_i) \right] \] where, \[ p_i = P(Y_i = 1 \mid x_i) \]

Feature: Separation by Class

We then fit a generalized linear model with binomial family.

mod <- glm(y ~ x1 + x2 + x3, data = df, family = binomial())
summary(mod)
## 
## Call:
## glm(formula = y ~ x1 + x2 + x3, family = binomial(), data = df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.09727    0.11841  -0.822    0.411    
## x1           1.35341    0.18702   7.237 4.59e-13 ***
## x2           1.01119    0.14407   7.019 2.24e-12 ***
## x3          -0.66041    0.16103  -4.101 4.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 554.36  on 399  degrees of freedom
## Residual deviance: 426.16  on 396  degrees of freedom
## AIC: 434.16
## 
## Number of Fisher Scoring iterations: 4

Key outputs:

  • coefficient signs (direction)
  • standard errors
  • p-values (evidence a feature matters)

Predicted Probability vs one feature

Feature: Space & Predicted Probability

Takeaways

  • Once characteristics are extracted from images, the classification process frequently turns tabular.
  • Logistic regression is a solid baseline:
    • interpretable coefficients.
    • uncertainty through standard errors and p-values.
    • quick and simple to deploy
  • Interactive graphs assist in determining separability in feature space.