Most straightforward and basic model that can be used
Finds the linear relationship between a dependent variable with one or more independent variable
Can be used to predict values that are not already in the data
2025-11-17
Most straightforward and basic model that can be used
Finds the linear relationship between a dependent variable with one or more independent variable
Can be used to predict values that are not already in the data
\(Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_n X_n\)
each \(\beta\) is an independent variable that has an effect on the dependent variable
values for \(\beta\) are calculated by minimizing the sum of squared residuals
for our examples, we will only be using one dependent variable
The following code will display the linear relationship between women’s height and weight in the next slide
plot1 = ggplot(women, aes(x = height, y = weight)) +
geom_point() +
ggtitle("Women's Average Height vs Weight") +
geom_smooth(method = "lm", se = F)
plot1
We will be using r’s lm() function to calculate the linear model.
## `geom_smooth()` using formula = 'y ~ x'
## ## Call: ## lm(formula = weight ~ height, data = women) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.7333 -1.1333 -0.3833 0.7417 3.1167 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 *** ## height 3.45000 0.09114 37.85 1.09e-14 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.525 on 13 degrees of freedom ## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903 ## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
Using the formula for a regression line and the data given by the linear model
\(Y = \beta_0 + \beta_1 X_1\)
The y intercept \(\beta_0\) = -87.5166667
The coefficient \(\beta_1\) = 3.45
and the model has an r-squared of 0.9910098
## `geom_smooth()` using formula = 'y ~ x'