Multiple Linear Regression

Author

J Sigma

In simple linear regression, we try to explain or predict a response variable using one explanatory variable.

For example:

In reality, final marks are rarely determined by lecture attendance alone.

Other factors may matter too:

So instead of using one explanatory variable, we use multiple predictors simultaneously.

Simple Linear Regression vs Multiple Linear Regression

Population and Sample Models

We have already established, for simple linear regression, that the population model is given by

\[y_{i}=\beta_{0}+\beta_{1}x +\epsilon_{i}\]

where

  • \(i\) refers to a specific observation

  • \(\beta_{0}\) is the intercept parameter

  • \(\beta_{1}\) is the slope parameter; and

  • \(\epsilon_{i}\) is the error for the \(i\)-th observation

The sample model is then given by

\[\hat{y}_{i}=\hat{\beta_{0}}+\hat{\beta}_{1}x\]

since we assume that the errors are normally distributed with a mean of \(0\). Here, there is one dependent variable and only one explanatory variable. For multiple linear regression, we have more than one explanatory variable. So, we adjust the population model for simple linear regression slightly. We have:

\[y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\dots+\beta_{p}x_{pi}+\epsilon_{i}\]

Here, we have \(p\) independent variables.

  • \(i\) refers to an observation

  • \(\beta_{0}\) is the intercept parameter

  • \(\beta_{j}\) is the slope parameter for the \(j\)-th independent variable; and

  • \(\epsilon_{i}\) is the error

For multiple linear regression, the sample model is given by

\[\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\dots+\hat{\beta}_{p}x_{pi}\]

Visually, we have the following difference

Here, we can see that simple linear regression can be represented by a line on a two-dimensional plane. In contrast, multiple linear regression forms a plane in three-dimensional space. We could have a hyperplane for more than two variables

Estimating the \(\beta\) Parameters

Like in simple linear regression, the ordinary least squares (OLS) method is used to estimate the \(\beta\) parameters by minimising \(\sum\epsilon_{i}^{2}\)

Aim of Multiple Linear Regression

  1. To model the dependent variable using many independent variables
  2. To predict the value of the dependent variable from the values of the multiple independent variables
  3. To understand how a dependent variable changes with many independent variables

We will try to achieve these aims by using the following example problem:

WarningWorking Example

A company produces Fresh, a brand of detergent. In order to manage its inventory more effectively and make revenue predictions, this company would like to better predict the demand for Fresh. To develop a prediction model, the company has gathered data concerning demand for Fresh over the last 30 sales periods. The first few lines of the dataset are shown below:

Code
#########################
# IMPORTING DATA INTO R
#########################

fresh_data <- data.frame(
  demand = c(
    7.38, 8.51, 9.52, 7.50, 9.33, 8.28, 8.75, 7.87,
    7.10, 8.00, 7.89, 8.15, 9.10, 8.86, 8.90, 8.87,
    9.26, 9.30, 8.75, 7.95, 7.65, 7.27, 8.30, 8.50,
    8.75, 9.21, 8.27, 7.67, 7.93, 9.26
  ),

  fresh_price = c(
    3.85, 3.75, 3.70, 3.70, 3.60, 3.60, 3.60, 3.80,
    3.80, 3.85, 3.90, 3.90, 3.70, 3.75, 3.75, 3.80,
    3.70, 3.80, 3.70, 3.80, 3.80, 3.75, 3.70, 3.55,
    3.60, 3.65, 3.70, 3.75, 3.80, 3.70
  ),

  ads_expenditure = c(
    5.5, 6.75, 7.25, 5.5, 7.0, 6.5, 6.75, 5.25,
    5.25, 6.0, 6.5, 6.25, 7.0, 6.9, 6.8, 6.8,
    7.1, 7.0, 6.8, 6.5, 6.25, 6.0, 6.5, 7.0,
    6.8, 6.8, 6.5, 5.75, 5.8, 6.8
  ),

  size = c(
    "Small", "Big", "Big", "Small", "Big", "Small", "Big", "Small",
    "Small", "Small", "Small", "Small", "Big", "Big", "Big", "Big",
    "Big", "Big", "Big", "Small", "Small", "Small", "Small", "Big",
    "Big", "Big", "Small", "Small", "Small", "Big"
  ),

  ads_campaign = c(
    "B", "B", "B", "A", "C", "A", "C", "C",
    "B", "C", "A", "C", "C", "A", "B", "B",
    "B", "A", "B", "B", "C", "A", "A", "A",
    "A", "B", "C", "B", "C", "C"
  ),

  competitor_price = c(
    3.80, 4.00, 4.30, 3.70, 3.85, 3.80, 3.75, 3.85,
    3.65, 4.00, 4.10, 4.00, 4.10, 4.20, 4.10, 4.10,
    4.20, 4.30, 4.10, 3.75, 3.75, 3.65, 3.90, 3.65,
    4.10, 4.25, 3.65, 3.75, 3.85, 4.25
  )
)

# first few (6) line of data
head(fresh_data)
  demand fresh_price ads_expenditure  size ads_campaign competitor_price
1   7.38        3.85            5.50 Small            B             3.80
2   8.51        3.75            6.75   Big            B             4.00
3   9.52        3.70            7.25   Big            B             4.30
4   7.50        3.70            5.50 Small            A             3.70
5   9.33        3.60            7.00   Big            C             3.85
6   8.28        3.60            6.50 Small            A             3.80

Here, we have

  • \(y \implies \text{demand for Fresh (in $100 000$s)}\)

  • \(x_{1} \implies \text{price for Fresh (in $10$ rands)}\)

  • \(x_{2} \implies \text{ad expenditure to promote Fresh (in $1000$ rands)}\)

  • \(x_{3} \implies \text{size of the company (big or small)}\)

  • \(x_{4} \implies \text{ads campaign used by the company}\). (A: TV campaigns, B: Mixture of TV and Radio ads, C: Mixture of TV, radio, magazine and newspaper ads)

  • \(x_{5} \implies \text{average competitor price for liquid detergents}\)