2025-11-05

{r}

What is Linear Regression

Linear Regression is a statistical method to understand a relationship between a dependent variable and a independent variable.

It is typically represented by a straight linethat shows how changes in the independent variable affect the dependent variable.

A simple algebraic expression that explains this well is y = x; this shows the one-to-one relationship between the variables x and y.

Linear Relationship: y = x

Weight to Miles Per Gallon

With basic intuition, we can assume that the more a car weighs the lower the miles per gallon the car will have. Lets model this.

What is Multiple Linear Regression?

The best way to think of multiple linear regression is where you have one dependent variable but more than one independent variable.

The way this would be written in the most simple form would be \[y = x + z\] The dependent variable would be y, and the two independent variables would be x and z.

Mathematical Model of Multiple Linear Regression

We can express the relationship between a dependent variable \(y\) and multiple independent variables (for example, \(x\) and \(z\)) as:

\[ y = \beta_0 + \beta_1 x + \beta_2 z + \epsilon \]

Where:

- \(\beta_0\): Intercept — the expected value of \(y\) when all predictors are 0

-\(\beta_1, \beta_2\): Coefficients that measure how much \(y\) changes for a one-unit increase in \(x\) or \(z\), holding the other variable constant

-\(\epsilon\): Random error term representing variation not explained by the model

The Multiple Linear Regression Formula

The formula for Multiple Linear Regression is

\[ y_i = \beta_0 + \sum_{j=1}^{p} \beta_j x_{ij} + \epsilon_i \]Where:

- \(y_i\): response (dependent) variable

- \(\beta_0\): intercept — predicted \(y\) when all predictors = 0

- \(\beta_1, \beta_2, \ldots, \beta_p\): coefficients (effect of each predictor)

- \(x_{i1}, x_{i2}, \ldots, x_{ip}\): predictor values for observation \(i\)

- \(\epsilon_i\): random error term

Why is this helpful?

Regression is a very useful tool when it comes to trying to identify the relationship between particular variables and their outcome while being able to understand how the other variables affect them.

An example:

What variables have a higher rate of leading to heart disease? Is it stress, lack of physical activity, kinds of food being eaten, or one of a hundred different examples that could lead to heart disease?

Regression analysis at its core hopes to find the best linear relationship between different variable.

Weight and Horsepower to Miles Per Gallon

Lets think about how Horsepower and Weight in cars will effect the Mile Per Gallon

Below is the R code used to generate the interactive 3D Plotly graph:

model <- lm(mpg ~ hp + wt, data = mtcars)

x <- seq(min(mtcars$hp), max(mtcars$hp), length.out = 30)
y <- seq(min(mtcars$wt), max(mtcars$wt), length.out = 30)
z <- outer(x, y, function(hp, wt) {
  b <- coef(model); b[1] + b[2]*hp + b[3]*wt
})

plot_ly(mtcars, x=~hp, y=~wt, z=~mpg, type="scatter3d", mode="markers",
        marker=list(size=4, color="orange")) %>%
  add_surface(x=x, y=y, z=z, opacity=0.6, colorscale="Viridis")