2025-09-17

Outline

  1. Background
  2. Data Overview
  3. Exploratory Plot 1 - Distribution of Tumor Thickness
  4. Exploratory Plot 2 - Relationship Between Tumor Thickness & Survival Time
  5. Model Specification
  6. Estimation & Inference
  7. Code & Model Output (Continued.)
  8. Diagnostics - Residuals
  9. Diagnostics - Interactive (3D View)
  10. Results & Interpretation
  11. Limitations & Next Steps

Background

  • Malignant melanoma is a severe form of skin cancer with high variability in patient outcomes.
  • Prognosis is influenced by several clinical measures, with tumor thickness recognized as one of the most important indicators.
  • Understanding how thickness relates to survival time can help illustrate how statistical models connect medical measures with patient outcomes.
  • Linear regression provides a straightforward way to quantify this relationship, test statistical significance, and build intuition for more advanced survival models.
  • The goal is to demonstrate regression in a medical context by modeling survival time as a function of tumor thickness.

Data Overview

  • Dataset: boot::melanoma
  • 205 patients diagnosed with malignant melanoma
  • Key Measures:
    • time: Survival time in days.
    • thickness: Tumor thickness in millimeters.
    • age: Age at operation.
    • ulcer: Indicator of ulceration.
    • status: Survival outcome. (Censored, Alive, Cause of Death)
  • Each row represents one patient.
Preview of Melanoma Dataset
time status sex age year thickness ulcer
10 3 1 76 1972 6.76 1
30 3 1 56 1968 0.65 0
35 2 1 41 1977 1.34 0
99 3 0 71 1968 2.90 0
185 1 1 52 1965 12.08 1
204 1 1 28 1971 4.84 1

Exploratory Plot 1 - Distribution of Tumor Thickness

  • Tumor thickness is a key clinical measure used in prognosis.
  • Distribution helps us understand how values are spread across patients.
  • Provides context before modeling survival time against thickness.

Exploratory Plot 2 - Relationship Between Tumor Thickness & Survival Time

  • Survival time generally decreases as tumor thickness increases.
  • Scatter Plot with regression line shows the overall trend.
  • Provides evidence of a negative association.

Model Specification

  • Survival time is modeled as a linear function of tumor thickness.

\[ Y_i = f(X_i,\beta) + \varepsilon_i \;=\; \beta_0 + \beta_1 X_i + \varepsilon_i \]

  • Where: \(Y_i\) = Survival Time; \(X_i\) = Tumor Thickness; \(\beta_0\) = Intercept; \(\beta_1\) = Slope; \(\varepsilon_i\) = Error Term
Assumptions
  • Linearity - Relationship is approximately linear.
  • Independence - Patient outcomes are independent.
  • Constant variance - Error spread is roughly constant.
  • Normal Errors - Error terms are approximately normally distributed.
  • No Influential Outliers - No single point dominates the fit.

Estimation & Inference

  • Point Estimate - Slope (\(\hat{\beta}_1\)) = Change in survival per mm of thickness.

  • Hypothesis Test
    \[ H_0: \beta_1 = 0 \quad \text{(No relationship.)}, \quad H_A: \beta_1 \neq 0 \quad \text{(Thickness affects survival.)} \]

  • t-test - Evaluates if slope differs significantly from 0.

  • p-value - Probability of observing results this extreme if \(H_0\) were true.

  • Confidence Interval - Range of plausible values for \(\beta_1\).

Code & Model Output

  • Fit linear regression of survival time vs. tumor thickness.
model <- lm(time ~ thickness, data = melanoma)
summary(model)
## 
## Call:
## lm(formula = time ~ thickness, data = melanoma)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2325.4  -707.6  -210.6   744.9  3410.4 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2413.41     107.39  22.473  < 2e-16 ***
## thickness     -89.25      25.86  -3.451 0.000679 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1093 on 203 degrees of freedom
## Multiple R-squared:  0.05542,    Adjusted R-squared:  0.05076 
## F-statistic: 11.91 on 1 and 203 DF,  p-value: 0.0006793

Code & Model Output (Continued.)

coefs <- summary(model)$coefficients
cis <- confint(model)
out <- cbind(Term = rownames(coefs), coefs, cis)

knitr::kable(
  out,
  digits = 3,
  row.names = FALSE,
  col.names = c("Term","Estimate","Std. Error","t value","p-value","2.5 %","97.5 %")
)
Term Estimate Std. Error t value p-value 2.5 % 97.5 %
(Intercept) 2413.41019325407 107.389518955761 22.4734240056358 5.61351926679844e-57 2201.66825460157 2625.15213190657
thickness -89.2545393472519 25.8630575515897 -3.4510436041529 0.00067928246207007 -140.249217400616 -38.2598612938875

Diagnostics - Residuals

  • Residuals help check whether regression assumptions hold.
  • A good model should show:
    • Residuals scattered randomly around zero.
    • No clear trend or curve.
    • Roughly equal spread across fitted values.

Diagnostics - Interactive (3D View)

  • Interactive view of survival time vs tumor thickness, with age as a third dimension.
  • Points colored by ulceration status to spot clusters/outliers.
  • Use your mouse to rotate, zoom, and inspect values.

Results & Interpretation

  • Direction & Size: Estimated slope = -89 days per mm thickness.
    • 95% CI: [-125, -54] days; p-value < 0.001
  • Interpretation: Each 1 mm increase in tumor thickness is associated with a decrease in expected survival time.
  • Discussion:
    • Confirms tumor thickness as a strong prognostic factor in melanoma.
    • Even modest increases in thickness translate to meaningful differences in survival.
    • Simple regression highlights the core relationship, while further models could include age and ulceration.
  • Note: Data details and diagnostics are presented on other slides.

Limitations & Next Steps

  • Simplified Model - Only tumor thickness was used as a predictor.
  • Assumption Limits - Linearity, normal errors, and constant variance may not fully hold.
  • Censoring Ignored - Survival data often requires Cox regression or Kaplan–Meier analysis.
  • Outliers - Extreme values in survival time or thickness may affect results.

Next Steps
- Expand model to include additional predictors such as age, ulceration status, or sex.
- Explore log transformation of survival time to stabilize variance.
- Apply survival specific methods such as Cox proportional hazards for more realistic analysis.
- Validate results with larger datasets or external studies.