2025-03-14

Linear Regression

Simple Linear Regression (SLR) estimates the linear relationship between two quantitative variables. SLR can be used to predict missing values. Multiple linear regression (MLR) is like simple linear regression, however it takes more than one input and produces a single output. It’s a common form of supervised learning in machine learning. MLR can be used in predicting housing prices, sales, and sports performance.

Drawbacks of Linear Regression

  • Models can become over-fitted to the data it’s trained on, and therefore not predict well with new data.
  • In MLR, Multicollinearity (when independent variables are highly correlated) can make it hard to isolate the individual effects of the independent variables on the dependent variable.
  • Assumes a linear relationship between the independent and dependent variable.

3D Plot of Lot area and overall quality of a home vs the price

Code for previous slide:

xax <- list(
  title = "Lot Area",
  titlefont = list(family="Modern Computer Roman")
)
yax <- list(
  title = "Overall Quality",
  titlefont = list(family="Modern Computer Roman")
)
zax <- list(
  title = "Sale Price",
  titlefont = list(family="Modern Computer Roman")
)
p = plot_ly(data=housePrice, x= housePrice$LotArea, y= housePrice$OverallQual, z= housePrice$SalePrice, type= "scatter3d", mode= "markers", color=housePrice$OverallQual) %>%
  layout(title = "House Sale Price VS. (Lot Area, Overall Quality), color= Overall Quality",
      scene=list(xaxis=xax, yaxis=yax, zaxis=zax))

Linear regression of lot area vs sale price of houses

This is an example of outliers negatively affecting the linear regression. This plot compares the lot area and sale price of houses, color = overall quality

Plot of tree girth vs volume

This is a SLR that compares the girth and volume of trees, color = height

SLR equation

Simple linear regression is similar to slope intercept form: \[ \begin{equation} y = mx + b + \epsilon \end{equation} \] where y is the dependent variable, x is the independent variable, m is the slope, and b is the y intercept. in SLR, m is considered the weight and \(\epsilon\) is the error term.

MLR equation

Multiple linear regression can be represented by the equation: \[ \begin{equation} y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon \end{equation} \] where \(\beta_0\) represents the slope intercept, \(\beta_1...\beta_k\) is the regression coefficients (or slope intercepts), \(x_1...x_k\) are the independent variables, and \(\epsilon\) is the error term.