2026-02-03

Simple Linear Regression

  • Allows you to compare the relationship between two variables
  • The x variable is an independent and nonrandom variable, known as the predictor
  • The y variable is dependent on the value of the x variable, known as the response
  • When choosing a line of best fit, the values show the expected value of y for a given x
  • Not all linear regressions have a line of best fit with negative slope, despite what name may make one assume

Example of Linear Regression

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

From this data set, mtcars, we can set up a linear regression model through using the hp column as the predictor and mpg as the response. This is because the horsepower a car has will effect how many miles per gallon the car has.

Plotting mtcars Linear Regression

Further Explanation of Linear Regression

Models with a positive slope are more accurately named as having a linear positive trend. When there is an increasing trend, the variables are said to have positive correlation while a decreasing trend has negative correlation.

There are two formulas relating to linear regression:

model: \(\text{y} = \beta_0 + \beta_1\cdot\text{x} + \varepsilon; \hspace{0.5cm} \varepsilon \sim \mathcal{N}(0; \sigma^2)\)
fitted line: \(\text{y} = \hat{\beta}_0 + \hat{\beta}_1\cdot\text{x};\)     \(\hat{\beta}_0 = b_0 - \text{ estimate of } \beta_0\); \(\hat{\beta}_1 = b_1 - \text{ estimate of }\beta_1\)

Linear Positive Trend with diamonds

model: \(\text{price} = \beta_0 + \beta_1\cdot\text{carat } + \varepsilon; \hspace{0.5cm} \varepsilon \sim \mathcal{N}(0; \sigma^2)\)
fitted line: \(\text{price} = \hat{\beta}_0 + \hat{\beta}_1\cdot\text{carat}\)

Linear Regression Models with Two Independent Variables

While both cases were instances of simple linear regression, there are more complex depictions of this idea. This will be shown with the previously viewed data set mtcars, a data set smaller than diamonds and therefore easier to visualize. From the set, miles per gallon is dependent on both the horsepower and weight of the car. Unlike with the previous examples where a line was used to observe the linear regression function, a surface is used in a 3D space. That is so for each mpg value, there is a corresponding hp and wt value.

Graph of mtcars 3D Linear Regression

Code for the plotly 3D Plot

model <- lm(mpg ~ hp + wt, data = mtcars)

xx <- seq(min(hp), max(hp), by=0.05) 
yy <- seq(min(wt), max(wt), by=0.05) 
zsurface <- expand.grid(hp = xx, wt = yy, KEEP.OUT.ATTRS = F) 
zsurface$mpg <- predict.lm(model, newdata = zsurface) 
zsurface <-acast(zsurface, wt ~ hp, value.var = "mpg")

threeplot <- plot_ly(data = mtcars, x = ~hp, y = ~wt, z = ~mpg) %>% 
  add_markers(name = "data") %>%
  layout(title = "Miles Per Gallon vs Horsepower, Weight") %>% 
  add_trace(z = zsurface, x = xx, y = yy, type = "surface", 
            name = "Linear Regression") %>% hide_colorbar()
partial_bundle(threeplot)