2025-03-16

NOTE

Simple Linear Regression

  • Only one dependent and one independent variable
  • Assumes the relationship between the independent and dependent variable is linear
  • Most common way to do Simple Linear Regression is by using Ordinary Least Squares

Equations

\(y = b_0 + b_1 * x_1\)

  • \(b_0\) is the y-intercept
  • \(b_1\) is the slope

Equations (continued)

\(a = r * \frac{sy}{sx}\)

  • \(a\) is the slope
  • \(r\) is the correlation
  • \(sy\) and \(sx\) are the standard deviation of \(y\) and \(x\)

\(i = \bar y - r * \frac{sy}{sx} * \bar x\)

  • \(i\) is the intercept
  • \(\bar y\) and \(\bar x\) are the mean of \(y\) and \(x\)

Using R and plotly

Uses the Iris dataset:

mod = lm(iris$Petal.Length~iris$Petal.Width)

fig = plot_ly(x = iris$Petal.Width, y = iris$Petal.Length, 
              type = "scatter",
              mode = "markers", name = "data") %>% 
  add_lines(x = iris$Petal.Width, y = fitted(mod), 
            name = "fitted") %>%
  layout(xaxis = list(title = "Petal Width"), 
         yaxis = list(title = "Petal Length"))

Plot

config(fig, displaylogo = F)

Another Example

Uses the Trees dataset:

mod = lm(trees$Volume~trees$Girth)
fig = plot_ly(x = trees$Girth, y = trees$Volume, 
              type = "scatter",
              mode = "markers", name = "data") %>% 
  add_lines(x = trees$Girth, y = fitted(mod), 
            name = "fitted") %>%
  layout(xaxis = list(title = "Girth"), 
         yaxis = list(title = "Volume"))

Plot

config(fig, displaylogo = F)

Example using ggplot2

Uses the Iris dataset:

g = ggplot(iris, aes(x = Petal.Width, y = Petal.Length)) + 
  geom_point()

Plot

g + geom_smooth(method = 'lm', se = F)
  `geom_smooth()` using formula = 'y ~ x'