2025-11-17

Why use linear regressions?

  • Most straightforward and basic model that can be used

  • Finds the linear relationship between a dependent variable with one or more independent variable

  • Can be used to predict values that are not already in the data

Structure of a regression

  • regression line takes slope intercept form

\(Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n\)

  • each \(\beta\) is an independent variable that has an effect on the dependent variable

  • values for \(\beta\) are calculated by minimizing the sum of squared residuals

  • for our examples, we will only be using one dependent variable

The following code will display the linear relationship between women’s height and weight in the next slide

plot1 = ggplot(women, aes(x = height, y = weight)) + 
        geom_point() + 
        ggtitle("Women's Average Height vs Weight") + 
        geom_smooth(method = "lm", se = F)
plot1

We will be using r’s lm() function to calculate the linear model.

## `geom_smooth()` using formula = 'y ~ x'

## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Using the formula for a regression line and the data given by the linear model

\(Y = \beta_0 + \beta_1 X_1\)

The y intercept \(\beta_0\) = -87.5166667

The coefficient \(\beta_1\) = 3.45

and the model has an r-squared of 0.9910098

Example Plot 2

## `geom_smooth()` using formula = 'y ~ x'