2025-11-09

Simple Linear Regression

Understanding relationships between variables

What is Linear Regression?

We model the relationship between a dependent variable \(Y\) and an independent variable \(X\):

\[ Y = \beta_0 + \beta_1 X + \varepsilon \]

Slide 3 - Dataset Example

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Slide 4 - GGPlot 1

Slide 5 - Fitted Regression Line

## `geom_smooth()` using formula = 'y ~ x'

Regression Equation

After fitting the model, we obtain coefficients \(\hat{\beta}_0\) and \(\hat{\beta}_1\):

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]

Slide 7 - Interactive Plotly Plot

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Slide 8

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Residual Plot

Residual plots help visualize how well the regression model fits the data.

Coefficient of Determination \(R^2\)

The coefficient of determination \(R^2\) measures how well the regression line explains the variability of the data:

\[ R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2} \]

Higher \(R^2\) values indicate a better model fit, meaning the model explains a larger proportion of the variance in \(Y\).

Conclusion

Linear regression estimates how Y changes with changes in X

Residuals indicate model fit

Tools like ggplot2 and plotly help visualize data beautifully