2025-06-11

Slide 1: Introduction

In this presentation, we explore Simple Linear Regression, a fundamental technique in statistics used to model relationships between two continuous variables.

Slide 2: The Model

We model the relationship between a response variable \(y\) and a predictor variable \(x\) as:

\[ y = \beta_0 + \beta_1 x + \varepsilon, \quad \varepsilon \sim N(0, \sigma^2) \]

Slide 3: General Data Overview using the data from Cars

data(mtcars)
mod <- lm(mpg ~ wt, data=mtcars)
summary(mod)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Slide 4: ggplot2 Scatter Plot with Regression Line

library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE, color="green") +
  labs(title="MPG vs Weight", x="Weight", y="Miles Per Gallon")

Slide 5: Parameters

Estimating Parameters: From the model \(y = \beta_0 + \beta_1 x + \varepsilon\), we estimate:

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Slide 6: Residual Graph

ggplot(mtcars, aes(x=wt, y=resid(mod))) +
  geom_point() +
  geom_hline(yintercept = 0, linetype="dashed") +
  labs(title="Residuals vs Weight", y="Residuals", x="Weight")

Slide 7: 3D Plot

library(plotly)
plot_ly(data=mtcars, x=~wt, y=~hp, z=~mpg,
        type="scatter3d", mode="markers",
        color=~factor(cyl)) %>%
  layout(title="MPG vs Weight and Horsepower")

Slide 8: Conclusion

  • Linear regression is a powerful tool for understanding relationships between variables.
  • Visualization and diagnostics help assess model fit.
  • We can extend this model to include multiple predictors.