Multiple Linear Regression

Author

J Sigma

In simple linear regression, we try to explain or predict a response variable using one explanatory variable.

For example:

Response variable: Final mark
Explanatory variable: Lecture attendance

In reality, final marks are rarely determined by lecture attendance alone.

Other factors may matter too:

study hours
tutorial attendance
assignment performance
sleep
prior mathematical background

So instead of using one explanatory variable, we use multiple predictors simultaneously.

Simple Linear Regression vs Multiple Linear Regression

Population and Sample Models

We have already established, for simple linear regression, that the population model is given by

\[y_{i}=\beta_{0}+\beta_{1}x +\epsilon_{i}\]

where

$i$ refers to a specific observation
$\beta_{0}$ is the intercept parameter
$\beta_{1}$ is the slope parameter; and
$\epsilon_{i}$ is the error for the $i$-th observation

The sample model is then given by

\[\hat{y}_{i}=\hat{\beta_{0}}+\hat{\beta}_{1}x\]

since we assume that the errors are normally distributed with a mean of $0$. Here, there is one dependent variable and only one explanatory variable. For multiple linear regression, we have more than one explanatory variable. So, we adjust the population model for simple linear regression slightly. We have:

\[y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\dots+\beta_{p}x_{pi}+\epsilon_{i}\]

Here, we have $p$ independent variables.

$i$ refers to an observation
$\beta_{0}$ is the intercept parameter
$\beta_{j}$ is the slope parameter for the $j$-th independent variable; and
$\epsilon_{i}$ is the error

For multiple linear regression, the sample model is given by

\[\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\dots+\hat{\beta}_{p}x_{pi}\]

Visually, we have the following difference

Here, we can see that simple linear regression can be represented by a line on a two-dimensional plane. In contrast, multiple linear regression forms a plane in three-dimensional space. We could have a hyperplane for more than two variables

Estimating the $\beta$ Parameters

Like in simple linear regression, the ordinary least squares (OLS) method is used to estimate the $\beta$ parameters by minimising $\sum\epsilon_{i}^{2}$

Aim of Multiple Linear Regression

To model the dependent variable using many independent variables
To predict the value of the dependent variable from the values of the multiple independent variables
To understand how a dependent variable changes with many independent variables

We will try to achieve these aims by using the following example problem:

Working Example

A company produces Fresh, a brand of detergent. In order to manage its inventory more effectively and make revenue predictions, this company would like to better predict the demand for Fresh. To develop a prediction model, the company has gathered data concerning demand for Fresh over the last 30 sales periods. The first few lines of the dataset are shown below:

Code

#########################
# IMPORTING DATA INTO R
#########################

fresh_data <- data.frame(
  demand = c(
    7.38, 8.51, 9.52, 7.50, 9.33, 8.28, 8.75, 7.87,
    7.10, 8.00, 7.89, 8.15, 9.10, 8.86, 8.90, 8.87,
    9.26, 9.30, 8.75, 7.95, 7.65, 7.27, 8.30, 8.50,
    8.75, 9.21, 8.27, 7.67, 7.93, 9.26
  ),

  fresh_price = c(
    3.85, 3.75, 3.70, 3.70, 3.60, 3.60, 3.60, 3.80,
    3.80, 3.85, 3.90, 3.90, 3.70, 3.75, 3.75, 3.80,
    3.70, 3.80, 3.70, 3.80, 3.80, 3.75, 3.70, 3.55,
    3.60, 3.65, 3.70, 3.75, 3.80, 3.70
  ),

  ads_expenditure = c(
    5.5, 6.75, 7.25, 5.5, 7.0, 6.5, 6.75, 5.25,
    5.25, 6.0, 6.5, 6.25, 7.0, 6.9, 6.8, 6.8,
    7.1, 7.0, 6.8, 6.5, 6.25, 6.0, 6.5, 7.0,
    6.8, 6.8, 6.5, 5.75, 5.8, 6.8
  ),

  size = c(
    "Small", "Big", "Big", "Small", "Big", "Small", "Big", "Small",
    "Small", "Small", "Small", "Small", "Big", "Big", "Big", "Big",
    "Big", "Big", "Big", "Small", "Small", "Small", "Small", "Big",
    "Big", "Big", "Small", "Small", "Small", "Big"
  ),

  ads_campaign = c(
    "B", "B", "B", "A", "C", "A", "C", "C",
    "B", "C", "A", "C", "C", "A", "B", "B",
    "B", "A", "B", "B", "C", "A", "A", "A",
    "A", "B", "C", "B", "C", "C"
  ),

  competitor_price = c(
    3.80, 4.00, 4.30, 3.70, 3.85, 3.80, 3.75, 3.85,
    3.65, 4.00, 4.10, 4.00, 4.10, 4.20, 4.10, 4.10,
    4.20, 4.30, 4.10, 3.75, 3.75, 3.65, 3.90, 3.65,
    4.10, 4.25, 3.65, 3.75, 3.85, 4.25
  )
)

# first few (6) line of data
head(fresh_data)

  demand fresh_price ads_expenditure  size ads_campaign competitor_price
1   7.38        3.85            5.50 Small            B             3.80
2   8.51        3.75            6.75   Big            B             4.00
3   9.52        3.70            7.25   Big            B             4.30
4   7.50        3.70            5.50 Small            A             3.70
5   9.33        3.60            7.00   Big            C             3.85
6   8.28        3.60            6.50 Small            A             3.80

Here, we have

$y \implies \text{demand for Fresh (in $100 000$s)}$
$x_{1} \implies \text{price for Fresh (in $10$ rands)}$
$x_{2} \implies \text{ad expenditure to promote Fresh (in $1000$ rands)}$
$x_{3} \implies \text{size of the company (big or small)}$
$x_{4} \implies \text{ads campaign used by the company}$. (A: TV campaigns, B: Mixture of TV and Radio ads, C: Mixture of TV, radio, magazine and newspaper ads)
$x_{5} \implies \text{average competitor price for liquid detergents}$

--- title: "Multiple Linear Regression" author: "J Sigma" editor: source format: html: css: styles.css toc: true toc-depth: 3 number-sections: false theme: cosmo code-fold: true code-tools: true smooth-scroll: true embed-resources: true page-navigation: true pdf: documentclass: article toc: true number-sections: false execute: engine: knitr echo: true warning: false message: false --- In **simple linear regression**, we try to explain or predict a response variable using **one explanatory variable**. For example: - Response variable: Final mark - Explanatory variable: Lecture attendance In reality, final marks are rarely determined by lecture attendance alone. Other factors may matter too: - study hours - tutorial attendance - assignment performance - sleep - prior mathematical background So instead of using one explanatory variable, we use **multiple predictors simultaneously**. # **Simple Linear Regression vs Multiple Linear Regression** ## **Population and Sample Models** We have already established, for simple linear regression, that the population model is given by $$y_{i}=\beta_{0}+\beta_{1}x +\epsilon_{i}$$ where - $i$ refers to a specific observation - $\beta_{0}$ is the intercept parameter - $\beta_{1}$ is the slope parameter; and - $\epsilon_{i}$ is the error for the $i$-th observation The sample model is then given by $$\hat{y}_{i}=\hat{\beta_{0}}+\hat{\beta}_{1}x$$ since we assume that the errors are normally distributed with a mean of $0$. Here, there is one dependent variable and only one explanatory variable. For multiple linear regression, we have more than one explanatory variable. So, we adjust the population model for simple linear regression slightly. We have: $$y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\dots+\beta_{p}x_{pi}+\epsilon_{i}$$ Here, we have $p$ independent variables. - $i$ refers to an observation - $\beta_{0}$ is the intercept parameter - $\beta_{j}$ is the slope parameter for the $j$-th independent variable; and - $\epsilon_{i}$ is the error For multiple linear regression, the sample model is given by $$\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\dots+\hat{\beta}_{p}x_{pi}$$ Visually, we have the following difference ![Here, we can see that simple linear regression can be represented by a line on a two-dimensional plane. In contrast, multiple linear regression forms a plane in three-dimensional space. We could have a hyperplane for more than two variables](images/SLR_MLR.png){fig-align="center"} ## **Estimating the** $\beta$ **Parameters** Like in simple linear regression, the **ordinary least squares (OLS)** method is used to estimate the $\beta$ parameters by minimising $\sum\epsilon_{i}^{2}$ ## **Aim of Multiple Linear Regression** 1. To **model** the dependent variable using many independent variables 2. To **predict** the value of the dependent variable from the values of the multiple independent variables 3. To **understand** how a dependent variable changes with many independent variables We will try to achieve these aims by using the following example problem: ::: {.callout-warning title="Working Example" icon="false"} A company produces ***Fresh***, a brand of detergent. In order to manage its inventory more effectively and make revenue predictions, this company would like to better predict the demand for Fresh. To develop a prediction model, the company has gathered data concerning demand for Fresh over the last 30 sales periods. The first few lines of the dataset are shown below: ```{r} ######################### # IMPORTING DATA INTO R ######################### fresh_data <- data.frame( demand = c( 7.38, 8.51, 9.52, 7.50, 9.33, 8.28, 8.75, 7.87, 7.10, 8.00, 7.89, 8.15, 9.10, 8.86, 8.90, 8.87, 9.26, 9.30, 8.75, 7.95, 7.65, 7.27, 8.30, 8.50, 8.75, 9.21, 8.27, 7.67, 7.93, 9.26 ), fresh_price = c( 3.85, 3.75, 3.70, 3.70, 3.60, 3.60, 3.60, 3.80, 3.80, 3.85, 3.90, 3.90, 3.70, 3.75, 3.75, 3.80, 3.70, 3.80, 3.70, 3.80, 3.80, 3.75, 3.70, 3.55, 3.60, 3.65, 3.70, 3.75, 3.80, 3.70 ), ads_expenditure = c( 5.5, 6.75, 7.25, 5.5, 7.0, 6.5, 6.75, 5.25, 5.25, 6.0, 6.5, 6.25, 7.0, 6.9, 6.8, 6.8, 7.1, 7.0, 6.8, 6.5, 6.25, 6.0, 6.5, 7.0, 6.8, 6.8, 6.5, 5.75, 5.8, 6.8 ), size = c( "Small", "Big", "Big", "Small", "Big", "Small", "Big", "Small", "Small", "Small", "Small", "Small", "Big", "Big", "Big", "Big", "Big", "Big", "Big", "Small", "Small", "Small", "Small", "Big", "Big", "Big", "Small", "Small", "Small", "Big" ), ads_campaign = c( "B", "B", "B", "A", "C", "A", "C", "C", "B", "C", "A", "C", "C", "A", "B", "B", "B", "A", "B", "B", "C", "A", "A", "A", "A", "B", "C", "B", "C", "C" ), competitor_price = c( 3.80, 4.00, 4.30, 3.70, 3.85, 3.80, 3.75, 3.85, 3.65, 4.00, 4.10, 4.00, 4.10, 4.20, 4.10, 4.10, 4.20, 4.30, 4.10, 3.75, 3.75, 3.65, 3.90, 3.65, 4.10, 4.25, 3.65, 3.75, 3.85, 4.25 ) ) # first few (6) line of data head(fresh_data) ``` Here, we have - $y \implies \text{demand for Fresh (in $100 000$s)}$ - $x_{1} \implies \text{price for Fresh (in $10$ rands)}$ - $x_{2} \implies \text{ad expenditure to promote Fresh (in $1000$ rands)}$ - $x_{3} \implies \text{size of the company (big or small)}$ - $x_{4} \implies \text{ads campaign used by the company}$. (A: TV campaigns, B: Mixture of TV and Radio ads, C: Mixture of TV, radio, magazine and newspaper ads) - $x_{5} \implies \text{average competitor price for liquid detergents}$ :::

Simple Linear Regression vs Multiple Linear Regression

Population and Sample Models

Estimating the \(\beta\) Parameters

Aim of Multiple Linear Regression