2025-03-16

Slide 1: Introduction

  • Simple Linear Regression analyzes the relationship between a single predictor (independent variable) and a response (dependent variable).
  • We typically model it as \(y = \beta_0 + \beta_1 x + \varepsilon\).
  • We will use the built-in cars dataset in R, which measures speed (mph) and stopping distances (ft).

Slide 2: Motivation

  • Why study linear regression?
    • It is one of the most fundamental and widely used statistical techniques.
    • Helps us understand how one variable changes in relation to another.
    • Forms a basis for more complex models like multiple regression, logistic regression, etc.

Slide 3: Model Assumptions (Math Slide #1)

Here is the simple linear regression model with common assumptions:

\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]

\[ \varepsilon_i \sim N(0, \sigma^2) \]

independent errors (\(\varepsilon_i\) are independent).

  • \(\beta_0\) (intercept) and \(\beta_1\) (slope) are unknown parameters.
  • \(\varepsilon_i\) are random errors assumed to have mean 0 and constant variance \(\sigma^2\).

Slide 4: Estimating Parameters (Math Slide #2)

We often estimate \(\beta_0\) and \(\beta_1\) via Least Squares, which minimizes the sum of squared residuals:

\[ \hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})} {\sum_{i=1}^n (x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \,\bar{x}. \]

\[ \bar{x} = \frac{1}{n}\sum_{i=1}^n x_i, \quad \bar{y} = \frac{1}{n}\sum_{i=1}^n y_i. \]

Slide 4: The Data

First 8 rows of the ‘cars’ dataset
speed dist
4 2
4 10
7 4
7 22
8 16
9 10
10 18
10 26

Slide 5: The Summary of Data

Summary of the ‘cars’ dataset
speed dist
Min. 4.0 2.00
1st Qu. 12.0 26.00
Median 15.0 36.00
Mean 15.4 42.98
3rd Qu. 19.0 56.00
Max. 25.0 120.00

Slide 6: First Plot with ggplot(Code)

ggplot(cars, aes(x = speed, y = dist)) +
  geom_point() +
  labs(title = "Scatter Plot of Speed vs. Distance",
       x = "Speed (mph)",
       y = "Stopping Distance (ft)") +
  theme_minimal()

Slide 6: First Plot with ggplot(Output)

Slide 7: Adding a Regression Line (Second ggplot)

## `geom_smooth()` using formula = 'y ~ x'

Slide 8: Interactive Plot (Plotly)

Slide 9: Model Results

Regression Coefficients
term estimate std.error statistic p.value
(Intercept) -17.579 6.758 -2.601 0.012
speed 3.932 0.416 9.464 0.000
Overall Model Statistics
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
0.651 0.644 15.38 89.567 0 1 -206.578 419.157 424.893 11353.52 48 50

Slide 10: Conclusion

  • Simple Linear Regression is a straightforward way to quantify and test the relationship between two variables.
  • In our cars example, speed is a significant predictor of stopping distance.
  • Next steps might include:
    • Checking diagnostic plots (residuals, normality).
    • Trying transformations or polynomial terms if the relationship is non-linear.
    • Extending to multiple regression if more predictors are available.