2024-06-09

1

The formula for simple linear regression is:

\(y = \beta_0 + \beta_1 x\)

where \(y\) is the dependent variable, \(\beta_0\) is the constant or intercept, \(\beta_1\) is \(x\)’s slope, and \(x\) is the independent variable. Linear regression tells us the value of unknown data by using known data values.

Showing Linear Regression Through ggplot

Here we show the relationship between the girth and volume of trees from the trees data set.

## `geom_smooth()` using formula = 'y ~ x'

R Code for Previous Graph

library(ggplot2)
g <- ggplot(data = trees, aes(x = Girth, y = Volume)) + geom_point()
g + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'

Finding the Linear Regression Equation for Girth vs Volume

model <- lm(trees$Volume ~ trees$Girth)
coef(model)
## (Intercept) trees$Girth 
##  -36.943459    5.065856

This tells us that our formula is volume \(= -32.94 + 5.07 \times\) girth. So this can be written as: \(y = -32.94 + 5.07 \times x\)

Another Example Using mtcars Data Set

## `geom_smooth()` using formula = 'y ~ x'

Finding Linear Regression Formula of Weight vs MPG

model2 <- lm(mtcars$mpg ~ mtcars$wt)
coef(model2)
## (Intercept)   mtcars$wt 
##   37.285126   -5.344472

This tells us that our formula is MPG = \(37.29 - 5.34 \times\) weight. So this can be written as: \(y = 37.29 - 5.34 \times x\)

Correlation Coefficient

We can find the correlation coefficient between 2 variables by using cor().

cor(trees$Volume, trees$Girth)
## [1] 0.9671194

This is near perfect positive correlation since this number is very close to +1.

cor(mtcars$mpg, mtcars$wt)
## [1] -0.8676594

This is very close to -1, which is perfect negative correlation. Both of these values are not close to 0, which suggests a strong relationship between variables.