2024-10-20

Slide 1: What is simple linear regression?

  • Simple linear regression models the relationship between two variables.
  • We usually use x and y to model out equation.
  • We estimate the relationship between the dependent variable (Y) and the independent variable (X) using a linear equation.

Slide 2: Simple Linear Regression Equation (using Latex)

The simple linear regression equation is modeled by: \[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • \(Y\) is the dependent variable
  • \(X\) is the independent variable
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the slope
  • Think back to algebra: y = mx+b

Slide 3:

  • Say we have numbers 2, 5, 4, 7, 6, 9, 8, 11, 10, 12 for our x values and 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 associated to each point.
  • Let’s plot these point using R in the next slide.

Slide 4: R Output slides

  • R code for how to create a scatter plot for these data points.
library(ggplot2)
data <- data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), 
                   y = c(2, 5, 4, 7, 6, 9, 8, 11, 10, 12))
ggplot(data, aes(x = x, y = y)) + 
  geom_point() + 
  labs(title = "Scatterplot of X vs Y", x = "X", y = "Y")

Slide 5: Image of the scatter plot R code created previously

Slide 6: Fitting the line of best fit to the scatter plot

Slide 7: Linear Regression Plot using Plotly

## (Intercept)           x 
##    1.866667    1.006061

Slide 8: Using our data to plug into the equation.

  • Extracting the coefficients from the model we are able to obtain the intercept, 1.866667, and the x, 1.006061. The equation we were able to obtain for this data frame based in the R code is modeled below: \[ Y = 1.866667 + 1.006061 \cdot X \]

Slide 9: Applying numbers to the equation.

Using the equation from earlier: \[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • we can come to the conclusion that for every one unit increase in x, the independent variable y increases by approximately 1 and while x is fixed at 0, the model automatically starts at the intercept 1.866. \(Y\) is the dependent variable. \(X\) is the independent variable. \(\beta_0\) is the 1.866667 (intercept). \(\beta_1\) is the 1.006061 (slope)

Slide 10: Conclusion

  • Simple linear regression is an important and widely used statistical technique used in order to model relationships between independent and dependent variables.
  • It is used in order to predict trends and have context on how variables interact with each other in the model and how they relate with one another.