Simple Linear Regression

11-16-2025

What is Simple Linear Regression?

The relationship between a response variable \(y\) and a predictor \(x\) is modeled as:

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \quad \varepsilon_i \sim \mathcal{N}(0, \sigma^2) \] Where: - \(y_i\): response variable - \(x_i\): predictor variable - \(\beta_0\): intercept - \(\beta_1\): slope - \(\varepsilon_i\): error/residuals

Scatterplot with Regression Line

-This plot illustrates to us the linear relationship of miles per gallon with respect to the weight of the car. The maroon line represents the predicted values from the simple linear regression and the shaded area is the 95% confidence interval.

What are Residuals?

Residuals measure how far the observed values are from the predicted regression line (such as the maroon line from the previous slide) modeled as:

\[ e_i = y_i - \hat{y}_i \]

Where: - \(y_i\): observed value - \(\hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i\): predicted value - \(e_i\) = residual

Sum of Squared Residuals

\[ \text{SSE} = \sum_{i=1}^n e_i^2 = \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]

Residuals indicate how well the regression line fits to each point and when plotted against \(x_i\), residuals help detect patterns or non-linearity.

Residuals Visualization

How Slope & Intercept Effect Visualization

Effect of Intercept \(\beta_0\):

Changing the intercept shifts the line up or down without changing the slope:

\[ \hat{y}_i = \beta_0^{(new)} + \beta_1 x_i \]

Effect of Slope \(\beta_1\):

Change the slope tilts the line which affects the rate of change of \(y\) with respect to \(x\):

\[ \hat{y}_i = \beta_0 + \beta_1^{(new)} x_i \]

Steeper slope -> larger change in \(y\) per unit \(x\)
Flatter slop -> smaller change in \(y\) per unit \(x\)

Slope & Intercept Visualization

What is a Confidence Interval?

A confidence interval for a regression line is a way to express the uncertainty in the predicted mean value of the response variable for a given predictor value.

Confidence Interval for the Mean Response

\[ \hat{y}_0 \pm t_{\alpha/2, n-2} \cdot \text{SE}(\hat{y}_0) \]

Where:

\[ \text{SE}(\hat{y}_0) = \hat{\sigma} \sqrt{\frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}} \] - \(\hat{\sigma}^2 = \frac{\sum e_i^2}{n-2}\): residual variance
- \(n\): number of observations
- \(x_0\): predictor value

Prediction Interval for a New Observation

\[ \hat{y}_0 \pm t_{\alpha/2, n-2} \cdot \text{SE}_{\text{pred}}(\hat{y}_0) \] With:

\[ \text{SE}_{\text{pred}}(\hat{y}_0) = \hat{\sigma} \sqrt{1 + \frac{1}{n} + \frac{(x_0 - \bar{x})^2}{\sum_{i=1}^n (x_i - \bar{x})^2}} \] Prediction interval is wider because it accounts for individual variability.

Prediction Interval Visualization

- Shaded area: 95% confidence interval for the mean response - Dashed lines: 95% prediction interval for a new observation

3D Interactive Visualization

3D Visualization Code

code <- plot_ly(mtcars,x = ~wt, y = ~hp, z = ~mpg,
        color = ~factor(cyl),
        colors = c("#8C1D40","#FFC627","#000000"),
        type = "scatter3d",
        mode = "markers",
        marker = list(size = 4)) %>%
layout(
  scene = list(
    xaxis = list(title='Weight'),
    yaxis = list(title='Horsepower'),
    zaxis = list(title='MPG')),
  title="3D View: Weight, HP, MPG"
  )