Formula for simple linear regression: y = \(\beta_0\) + \(\beta_1\) x
- y = predicted value
- x = input
- \(\beta_1\) = slope
- \(\beta_0\) = intercept
2025-10-20
Formula for simple linear regression: y = \(\beta_0\) + \(\beta_1\) x
Let’s do an example using the linear regression formula.
Assume we have a data set with 2 points: (0,5), (6,17). Find the best-fit line equation.
Formula for best-fit line: y = \(\beta_0\) + \(\beta_1\) x
Step 1: Calulate \(\beta_1\) (slope):
Formula for slope:\(\beta_1 = \frac{y_2 - y_1}{x_2 - x_1}\)
Apply: \(\beta_1 = \frac{17 - 5}{6 - 0}\) = 2
Step 2: Calulate \(\beta_0\) (intercept): Rearrange best-fit line formula to solve for \(\beta_0\) and substitute variables with calculated slope and one data point above.
Apply using (6,17): \(\beta_0\) = 17 - 2 * 6 = 5
Plugging in our findings, we have the final equation!
y = 5 + 2 x
R code to make a plot of the previous example:
# load plotting library library(ggplot2) # Define points point1 = c(0,6) point2 = c(5,17) # Make dataframe of points data = data.frame(x=point1, y=point2) # Plot points/line ggplot(data, aes(x=x,y=y)) + geom_point() + geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'
Let’s see what the best-fit line looks like in a real-world dataset in R, which has many more data points.
This plot shows the murder rate across several urban populations: