2025-03-17

What is Linear Regresssion

Linear Regression is a statistical method that is used to predict the relationship between one dependent variable and one independent variable.

To do this, the two variables are treated as \(x\) and \(y\) in a linear equation represented by \(y = wx + b\) where \(y\) is the dependent variable, \(x\) is the independent variable, \(w\) is the weight, and \(b\) is the bias. The model then begins modify the weight and bias to try to find the most accurate predictions of \(y\) given \(x\).

Loss Function

To improve the ‘guesses’ of the weight and bias, you have to first find how accurate the current guess is.

The most commonly used loss function in linear regression is MSE and looks like \(MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2\). \(y_i\) is the \(y\) that the model predicted, and \(\hat{y}_i\) is the actual value at that \(i\).

To simplify the equation, at every \(x\) that is given, the model predicts the output, finds the delta between that prediction and the actual \(y\) value at that point, squares it so it is not negative, sums all of those deltas together and then divides by the number of inputs to normalize that value. (The delta is squared and does not use the absolute value because the absolute value function is not differentiable, which is important for the optimization step)

Optimization

To then find the more accurate weights and bias we use a special version when finding the gradient.

The gradient is a vector that points in the steepest direction on a surface, but computing this vector often requires that you employ the chain rule. When computing the gradiant by hand this isn’t an issue, but because of memory constraints we cannot follow the same steps. Instead, we have to use auto-differentiation.

Auto-differentiation breaks down an equation into it’s most base parts and substitute them for arbitrary variables, i.e. \(ln(x_1) + x_1x_2 - sin(x_2)\) at the point \((2,5)\) is represented as \(v_{-1} = x_1 = 2\), \(v_0 = x_2 = 5\), \(v_1 = ln(v_{-1})\), \(v_2 = v_{-1} * v_0\) \(v_3 = sin(v_0)\), \(v_4 = v_1 + v_2\), \(v_5 = v_4 - v_3\), and \(y = 5\).

Optimization (cont.)

You then go backwards through the ‘tree’, finding the partial at every point down that tree so you can then find the gradient of the original equation (\(\bar v_{-1}\) and \(\bar v_0\)).

for the previous example working backwards you get:
\(\bar v_5 = \frac{\partial y}{\partial v_5} = 1\), \(\bar v_4 = \frac{\partial y}{\partial v_4} = \frac{\partial y}{\partial v_5} \frac{\partial v_5}{\partial v_4}\), \(\bar v_3 = \frac{\partial y}{\partial v_3} = \frac{\partial y}{\partial v_5} \frac{\partial v_5}{\partial v_3}\) …

However, in linear regression we are able to simplify this process down and use the Normal Equation: \(\beta = (X^TX)^{-1}X^Ty\)
where \(\beta\) is the vector of coefficients, \(X\) is the independent variable matrix, and \(y\) is the dependent variable vector.

When to use Simple Linear Regression

You should use simple linear regression when:

  • You have a single predictor variable
  • Have a linear relationship
  • Are predicting a continuous outcome
  • Trying to find the impact of a single variable

Setting Up Linear Regression in R

model = lm(mtcars$mpg~mtcars$disp)
x = mtcars$disp;y = mtcars$mpg

xax = list(
  title = 'Displacement (cu. in.)',
  titlefont = 'Modern Computer Roman'
)

yax = list(
  title = 'Miles Per Gallon',
  titlefont = 'Modern Computer Roman'
)

Plotting

Verifying the final model

After the model has been trained, you should look at the residual vs. fitted values to visualize the predicted an observed values, and the Q-Q plot to compare the distribution of residuals vs a normal distribution.

Verifying the final model (Cont. 1)

Verifying the final model (Cont. 2)