2026-06-07

What is Simple Linear Regression?

  • Simple Linear Regression is a method of finding the closest fitting linear relationship between a dependent and independent variable

  • The basic linear equation y = bx + a is the main equation to be found, with the variable b representing the slope and the variable a representing the y-intercept

Finding B

  • To find the slope variable b, we use the equation b = r\(\frac{s_{y}}{s_{x}}\)

  • To find the correlation factor, you can use tools like r’s cor() function to find it

  • To find the standard deviation s, we use the equation \(s = \sqrt[]{\frac{1}{n-1}\sum_{k=1}^n(x_{i}-\bar x)^{2}}\)

Finding A

  • To find the y-intercept a, we use the equation \(a = \bar y - b \cdot \bar x\)

  • To find the mean of y and x, it is convenient in r to use the mean() function

Base Scatter Plot

GGPlot Code

You can use ggplot to automatically calculate the linear regression using geom_smooth() like this

ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length)) +
  geom_point() +
  labs(x = "Sepal Width", y = "Sepal length", 
  title = "Sepal Width vs. Sepal Length with Linear Regeression Line") +
  geom_smooth(method = lm, se = FALSE, formula = y ~ x)

GGPlot Plot

Plotly Code

Plotly is more complicated to perform linear regression in and plot, requiring to filter out missing values yourself first before using the lm() function to create a linear regression fit

filtered = iris %>% filter(!is.na(iris$Sepal.Width))
lr = lm(Sepal.Length ~ Sepal.Width, data=filtered)
filtered %>%
  plot_ly(x = ~Sepal.Width) %>% 
  add_markers(y = ~Sepal.Length)  %>%
  add_lines(x = ~Sepal.Width, y = fitted(lr))

Plotly Graph