Hw3: Image Modeling & Statistics - Logistic Regression with Extracted Features

2026-02-08

Motivation

Image classification needs decisions In medical imaging (e.g., chest X-rays), we often want a binary decision:

Disease vs No disease
Abnormal vs Normal
Positive vs Negative

Deep models (CNNs) are powerful, but classical statistics still matters:

Interpretability
Uncertainty quantification
Simple baselines using extracted features

Concept

Rather than feeding raw pixels, we can extract numerical information from an image, such as:

Mean and variance of intensity
edge density (Texture proxy)
measurements of shape and area (derived from segmentation).
embedding values of a pretrained model (CNN feature vector)

Then we model the probability of class membership using statistics.

Features

To keep this assignment self-contained, we simulate a small dataset that behaves similar to extracted features:

x1:mean intensity (normalized)
x2:texture score
x3:edge density

Class label \(y \in \{0,1\}\) represents “abnormal”.

Features (Code only)

n <- 400
x1 <- rnorm(n, mean = 0, sd = 1)                 
x2 <- rnorm(n, mean = 0, sd = 1)                 
x3 <- 0.6*x1 + rnorm(n, mean = 0, sd = 0.8)      
linpred <- -0.2 + 1.2*x1 + 0.9*x2 - 0.7*x3
p <- 1/(1 + exp(-linpred))
y <- rbinom(n, size = 1, prob = p)
df <- tibble(
  x1 = x1, x2 = x2, x3 = x3,
  y = factor(y, levels = c(0,1), labels = c("Normal", "Abnormal")))
head(df)

Features (Preview)

x1	x2	x3	y
0.202	1.311	0.653	Abnormal
0.810	-0.969	0.568	Normal
-0.066	-0.106	0.088	Abnormal
0.754	-0.788	0.065	Normal
-1.627	-0.307	-1.181	Normal
1.002	-1.177	0.319	Abnormal

Logistic regression: the statistical model

We model the probability an image is abnormal: \[ p(x)=P(Y=1 \mid X=x)=\frac{1}{1+\exp\left(-(\beta_0+\beta_1 x_1+\beta_2 x_2+\beta_3 x_3)\right)} \]

Equivalent log-odds form: \[ \log\left(\frac{p(x)}{1 - p(x)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 \]

Each \(\beta_j\) reflects the change in log-odds of abnormality for a one-unit increase in feature \(x_j\), while other features remain constant.

Training objective: log-loss

Logistic regression chooses parameters that minimize negative log likelihood: \[ \boldsymbol{\beta} = - \sum_{i=1}^{n} \left[ y_i \log(p_i) + (1 - y_i)\log(1 - p_i) \right] \] where, \[ p_i = P(Y_i = 1 \mid x_i) \]

Feature: Separation by Class

We then fit a generalized linear model with binomial family.

mod <- glm(y ~ x1 + x2 + x3, data = df, family = binomial())
summary(mod)

## 
## Call:
## glm(formula = y ~ x1 + x2 + x3, family = binomial(), data = df)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.09727    0.11841  -0.822    0.411    
## x1           1.35341    0.18702   7.237 4.59e-13 ***
## x2           1.01119    0.14407   7.019 2.24e-12 ***
## x3          -0.66041    0.16103  -4.101 4.11e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 554.36  on 399  degrees of freedom
## Residual deviance: 426.16  on 396  degrees of freedom
## AIC: 434.16
## 
## Number of Fisher Scoring iterations: 4

Key outputs:

coefficient signs (direction)
standard errors
p-values (evidence a feature matters)

Predicted Probability vs one feature

Feature: Space & Predicted Probability

Takeaways

Once characteristics are extracted from images, the classification process frequently turns tabular.
Logistic regression is a solid baseline:
- interpretable coefficients.
- uncertainty through standard errors and p-values.
- quick and simple to deploy
Interactive graphs assist in determining separability in feature space.