2024-10-18

What is Simple Linear Regression

  • An approach for predicting a response with only one feature.
  • We assume that two variables (dependent and independent) are LINEARLY related

The Math

  • We can visually observe linear correlation, but how can we obtain the line that best fits the data?

Least Squares

  • \(h(x_i) = \beta_0 + \beta_1(x_i)\)

  • We want to find \(\beta\) values that fit this data

  • In Least Squares, we form a matrix from the data

  • In other words, our goal is to calculate \(\beta = (X^TX)^{-1}X^Ty\)

    • \(X\) is the design matrix (with a column of 1s for the intercept and a column for the independent variable).

    • \(y\) is the vector of observed values (dependent variable).

    • \(\beta\) is the vector of coefficients (intercept and slope).

The Code For Least Squares

# Step 1: Construct the matrix X (with a column of 1s for the intercept)
X <- cbind(1, x)  # (1st column: intercept, 2nd column: x values)
# Step 2: Compute X^T * X
XtX <- t(X) %*% X
# Step 3: Compute X^T * y
Xty <- t(X) %*% y
# Step 4: Compute (X^T * X)^(-1)
XtX_inv <- solve(XtX)
# Step 5: Compute the coefficients beta = (X^T * X)^(-1) * X^T * y
beta <- XtX_inv %*% Xty
## Coefficients (Intercept and Slope):
##        [,1]
##   0.8487934
## x 1.7401009

Plotting the fitted line

Errors In Simple Linear Regression

  • How do we assess how our line performs in linear regression? Error.
    • Mean Squared Error (L2): \(\frac{1}{n} \sum_{i=1}^n (\hat y_{i} - y_i)^2\)
      • Good for regression since variables can be modeled in Gaussian distribution.
    • Mean Absolute Error (L1): \(\frac{1}{n} \sum_{i=1}^n |\hat y_{i} - y_i|\)
      • Good when data has outliers.

Plotting the Error (Residuals)

Getting the MSE and MAE

# calculates MSE
calculate_mse <- function(actual, predicted) {
  mean((actual - predicted)^2)
}
# calculates MAE
calculate_mae <- function(actual, predicted) {
  mean(abs(actual - predicted))
}

mse <- calculate_mse(df$y, df$predicted)
mae <- calculate_mae(df$y, df$predicted)
## [1] "Mean Squared Error (MSE): 4.2103"
## [1] "Mean Absolute Error (MAE): 1.7082"