2025-10-25

What is Simple Linear Regression?

  • 2D linear regression model with one independent and dependent variable which creates a linear function
  • Predicts the dependent variable values as a function of the independent variable
  • Equation: \(\hat{y} = b_0 + b_1*x\)
     - Where \(\hat{y}\) is the predicted value, \(b_0\) is the y-intercept, \(b_1\) is the slope, and \(x\) is independent variable
  • Coefficient of determination, \({R^2}\), shows how well fit the line is, with values close to 1 showing the model is very accurate

Example 1

To determine the linear fit manually, two data points can be taken which can estimate the slope and intercept.
For example, taking (1,3) and (5,17) gets slope, \({b_1} = (17-3)/(5-1) = 3.5\) and using either point, (5,17) for example, \(b_0 = 17-3.5*5 = -0.5\) with the linear model being,
  \(\hat{y} = -0.5 + 3.5*x\)
The graph below graphs the fit with the data, showing that the line is very close to all the points meaning it is accurate in determining unknown dependent variables.

Example 2

It can also be inaccurate as shown below. This depends on the data set where the independent and dependent variables having a linear relationship creates an accurate model but if not then it would create an inaccurate model.
Equation: \(\hat{y} = 3 + 9.2*x\)

Example 3

These models can be created without any manual calculations useful for more complex data sets.
In R, a simple linear regression can be created using the lm and fitted functions as shown below using the mtcars data set.

data(mtcars)

mod <- lm(hp ~ disp, data = mtcars)
x = mtcars$disp; y = mtcars$hp
xax <- list(
  title = "Displacement",
  titlefont = list(family = "Modern Computer Roman")
)

yax <- list(
  title = "Horsepower",
  titlefont = list(family = "Modern Computer Roman"),
  range= c(0, 300)
)
fig <- plot_ly(x = x, y = y, type = "scatter", mode = "markers", name = "data",
               width = 800, height = 430) %>%
       add_lines(x=x, y = fitted(mod), name = "fitted") %>%
       layout(xaxis = xax, yaxis = yax) %>%
       layout(margin = list(
       l = 150,
       r = 50,
       b = 20,
       t = 40
       ))
  
config(fig, displaylogo = T)

Determing the Accuracy of these Models

As mentioned earlier, the accuracy of these models can be determined by finding the {\(R^2\)} where the closer it is to 1 the more accurate the model is.
This can be found using many programs like R.
Using Example 3, the summary function can be used to determine the {\(R^2\)}. It is shown that {\(R^2=0.6131\)}, revealing that there is a slight correlation between the two variables but that the linear regression model would not be that accurate.

summary(mod)
## 
## Call:
## lm(formula = hp ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -48.623 -28.378  -6.558  13.588 157.562 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  45.7345    16.1289   2.836  0.00811 ** 
## disp          0.4375     0.0618   7.080 7.14e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.65 on 30 degrees of freedom
## Multiple R-squared:  0.6256, Adjusted R-squared:  0.6131 
## F-statistic: 50.13 on 1 and 30 DF,  p-value: 7.143e-08

R-squared by Hand

\(R^2\) for simple regression can be calculated using the formula:
\(R^2 = \frac{(n\sum xy - (\sum x)(\sum y))^2}{(n\sum x^2 - (\sum x)^2)(n\sum y^2 - (\sum y)^2)}\)
Where n is the number of observations in the data set.
This is a tedious computation which becomes more time consuming as more observations are added
Therefore, it is recommended to use online tools to compute \(R^2\) for large data sets

Calculation Example Using Example 1 with Code

x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 10, 14, 17)

n <- length(x)
numerator <- (n * sum(x * y) - sum(x) * sum(y))^2
denominator <- (n * sum(x^2) - (sum(x))^2) * (n * sum(y^2) - (sum(y))^2)

R2 <- numerator / denominator
R2
## [1] 0.997557

Comparison of the Two Methods

The two methods can be compared.
Going back to Example 1, the previous slide calculated \(R^2 = 0.9976\).
The summary function can also be used with the lm function.
For this example, both values are the same but may differ depending on rounding and missing data.

summary(lm(y ~ x))$r.squared
## [1] 0.997557

Thank you!