04/16/2023

1

What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to show the relationship between two variables:

  • Independent Variable (x variable)
  • Dependent Variable (y variable)

Linear regression shows how the change in the independent variable affects the dependent variable. Once a regression line is found it can be used to make predictions about what the independent variable should be at any given dependent (x) variable.

Line of Best Fit

The Line of Best Fit or Regression Line can be found and is used to show the trend between the two variables.

The formula is: \[ y = \alpha + \beta x \] Where \[ y = Dependent Variable \] \[ x = Independent Variable \] \[ \alpha = Intercept \] \[ \beta = Slope \]

Dataset

data("Orange")
head(Orange)
##   Tree  age circumference
## 1    1  118            30
## 2    1  484            58
## 3    1  664            87
## 4    1 1004           115
## 5    1 1231           120
## 6    1 1372           142
x <- Orange$age
y <- Orange$circumference

The data set we’ll use to showcase linear regression in the following examples is called “Orange” and shows information about the growth of Orange trees. It shows their, age, circumference, and is categorized by what type of tree it is.

Plotting

In the following slides, I’ll demonstrate Linear Regression to show the relationship between the dependent variable age & independent variable circumference using:

  • Base R
  • Plotly
  • ggplot

The regression line will show the trend between the two, and can be used to predict what the circumference of a tree will be based on its age.

Example of Simple Linear Regression in Base R

plot(x, y, col=Orange$Tree, xlab = "Age (days)",
     ylab = "Circumference (mm)")
mod <- lm(y ~ x, data = Orange)
abline(mod)

Example of Simple Linear Regression using Plotly

xax <- list(title = "Age (days)") ; yax <- list(title = "Circumference (mm)")
fig <- plot_ly(x=x, y=y, type = "scatter", mode = "markers", height = 350) %>%
       add_lines(x = x, y = fitted(mod), name = 'regression line') %>%
       layout(xaxis = xax, yaxis = yax)
config(fig)

Example of Simple Linear Regression using ggplot

g <- ggplot(data = Orange, aes(x = x, y = y)) + geom_point()
g + geom_smooth(method = "lm") + xlab("Age (days)") + ylab("Circumference (mm)")
## `geom_smooth()` using formula = 'y ~ x'

Using ggplot to include a 3rd Parameter + Different Regression Line Type

g <- ggplot(data = Orange, aes(x = x, y = y, color = Tree)) + geom_point()
g + xlab("Age (days)") + ylab("Circumference (mm)") + stat_smooth(method = 'loess', color = 'blue')
## `geom_smooth()` using formula = 'y ~ x'