---
title: "Multiple Linear Regression"
author: "J Sigma"
editor: source
format:
html:
css: styles.css
toc: true
toc-depth: 3
number-sections: false
theme: cosmo
code-fold: true
code-tools: true
smooth-scroll: true
embed-resources: true
page-navigation: true
pdf:
documentclass: article
toc: true
number-sections: false
execute:
engine: knitr
echo: true
warning: false
message: false
---
In **simple linear regression**, we try to explain or predict a response variable using **one explanatory variable**.
For example:
- Response variable: Final mark
- Explanatory variable: Lecture attendance
In reality, final marks are rarely determined by lecture attendance alone.
Other factors may matter too:
- study hours
- tutorial attendance
- assignment performance
- sleep
- prior mathematical background
So instead of using one explanatory variable, we use **multiple predictors simultaneously**.
# **Simple Linear Regression vs Multiple Linear Regression**
## **Population and Sample Models**
We have already established, for simple linear regression, that the population model is given by
$$y_{i}=\beta_{0}+\beta_{1}x
+\epsilon_{i}$$
where
- $i$ refers to a specific observation
- $\beta_{0}$ is the intercept parameter
- $\beta_{1}$ is the slope parameter; and
- $\epsilon_{i}$ is the error for the $i$-th observation
The sample model is then given by
$$\hat{y}_{i}=\hat{\beta_{0}}+\hat{\beta}_{1}x$$
since we assume that the errors are normally distributed with a mean of $0$. Here, there is one dependent variable and only one explanatory variable. For multiple linear regression, we have more than one explanatory variable. So, we adjust the population model for simple linear regression slightly. We have:
$$y_{i}=\beta_{0}+\beta_{1}x_{1i}+\beta_{2}x_{2i}+\dots+\beta_{p}x_{pi}+\epsilon_{i}$$
Here, we have $p$ independent variables.
- $i$ refers to an observation
- $\beta_{0}$ is the intercept parameter
- $\beta_{j}$ is the slope parameter for the $j$-th independent variable; and
- $\epsilon_{i}$ is the error
For multiple linear regression, the sample model is given by
$$\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\dots+\hat{\beta}_{p}x_{pi}$$
Visually, we have the following difference
{fig-align="center"}
## **Estimating the** $\beta$ **Parameters**
Like in simple linear regression, the **ordinary least squares (OLS)** method is used to estimate the $\beta$ parameters by minimising $\sum\epsilon_{i}^{2}$
## **Aim of Multiple Linear Regression**
1. To **model** the dependent variable using many independent variables
2. To **predict** the value of the dependent variable from the values of the multiple independent variables
3. To **understand** how a dependent variable changes with many independent variables
We will try to achieve these aims by using the following example problem:
::: {.callout-warning title="Working Example" icon="false"}
A company produces ***Fresh***, a brand of detergent. In order to manage its inventory more effectively and make revenue predictions, this company would like to better predict the demand for Fresh. To develop a prediction model, the company has gathered data concerning demand for Fresh over the last 30 sales periods. The first few lines of the dataset are shown below:
```{r}
#########################
# IMPORTING DATA INTO R
#########################
fresh_data <- data.frame(
demand = c(
7.38, 8.51, 9.52, 7.50, 9.33, 8.28, 8.75, 7.87,
7.10, 8.00, 7.89, 8.15, 9.10, 8.86, 8.90, 8.87,
9.26, 9.30, 8.75, 7.95, 7.65, 7.27, 8.30, 8.50,
8.75, 9.21, 8.27, 7.67, 7.93, 9.26
),
fresh_price = c(
3.85, 3.75, 3.70, 3.70, 3.60, 3.60, 3.60, 3.80,
3.80, 3.85, 3.90, 3.90, 3.70, 3.75, 3.75, 3.80,
3.70, 3.80, 3.70, 3.80, 3.80, 3.75, 3.70, 3.55,
3.60, 3.65, 3.70, 3.75, 3.80, 3.70
),
ads_expenditure = c(
5.5, 6.75, 7.25, 5.5, 7.0, 6.5, 6.75, 5.25,
5.25, 6.0, 6.5, 6.25, 7.0, 6.9, 6.8, 6.8,
7.1, 7.0, 6.8, 6.5, 6.25, 6.0, 6.5, 7.0,
6.8, 6.8, 6.5, 5.75, 5.8, 6.8
),
size = c(
"Small", "Big", "Big", "Small", "Big", "Small", "Big", "Small",
"Small", "Small", "Small", "Small", "Big", "Big", "Big", "Big",
"Big", "Big", "Big", "Small", "Small", "Small", "Small", "Big",
"Big", "Big", "Small", "Small", "Small", "Big"
),
ads_campaign = c(
"B", "B", "B", "A", "C", "A", "C", "C",
"B", "C", "A", "C", "C", "A", "B", "B",
"B", "A", "B", "B", "C", "A", "A", "A",
"A", "B", "C", "B", "C", "C"
),
competitor_price = c(
3.80, 4.00, 4.30, 3.70, 3.85, 3.80, 3.75, 3.85,
3.65, 4.00, 4.10, 4.00, 4.10, 4.20, 4.10, 4.10,
4.20, 4.30, 4.10, 3.75, 3.75, 3.65, 3.90, 3.65,
4.10, 4.25, 3.65, 3.75, 3.85, 4.25
)
)
# first few (6) line of data
head(fresh_data)
```
Here, we have
- $y \implies \text{demand for Fresh (in $100 000$s)}$
- $x_{1} \implies \text{price for Fresh (in $10$ rands)}$
- $x_{2} \implies \text{ad expenditure to promote Fresh (in $1000$ rands)}$
- $x_{3} \implies \text{size of the company (big or small)}$
- $x_{4} \implies \text{ads campaign used by the company}$. (A: TV campaigns, B: Mixture of TV and Radio ads, C: Mixture of TV, radio, magazine and newspaper ads)
- $x_{5} \implies \text{average competitor price for liquid detergents}$
:::
# **Correlation Analysis**
For multiple linear regression, we calculate the correlation between $y$ and each independent variable, $x_{j}$. So, for $p$ independent variables, we end up having $p$ correlation coefficients.
::: callout-note
Later, we will be interested in looking at the correlation between each of the independent variables. As it turns out, this is important for discussing how much value the explanatory variables add to explaining how $y$ changes.
:::
Since we want the correlation coefficients between all of our variables, a convenient way of doing this is by creating a **correlation matrix**, as opposed to manually calculating the correlation between all the variables.
```{r}
#########################
# CORRELATION MATRIX
#########################
fresh_numeric <- fresh_data[, c(1,2,3,6)]
# here, we only used variables from the dataset which are numeric (or continuous)
# because we can create a correlation matrix for those without introducing
# any new intricacies
# later on, we will see how we many include "catgeorical varibales", c(4,5)
cor(fresh_numeric)
```
- For **demand** and **Fresh price** : $r=-0.469$ which indicates a moderate negative correlation between demand and the price of *Fresh*
##
- For **demand** and **ads expenditure** : $r=0.882$ which indicates a strong positive relationship between demand and ads expenditure
- For **demand** and **competitor price** : $r=0.754$ which indicates a fairly strong positive relationship between demand and competitor price
## **Correlation Matrix and Pairwise Plot**
```{r, message=FALSE, warning=FALSE, echo=FALSE}
###############################
# CORRELATION AND MATRIX PLOT
###############################
# correlation matrix
library(ggcorrplot)
cor_matrix <- cor(fresh_numeric, use = "complete.obs")
ggcorrplot(cor_matrix,
type = "lower", # shows only lower triangle
lab = TRUE, # shows correlation values
lab_size = 3,
colors = c("blue", "white", "red"),
title = "Correlation Matrix Plot",
ggtheme = theme_minimal())
# pairwise plot
library(GGally)
ggpairs(fresh_numeric,
title="Pairwise Plot")
```
A **correlation matrix plot** is a visual representation of a correlation matrix. Here, each cell represents the correlation between two variables, and contains the value for $r$ for each pair of variables.
- Red blocks indicate a positive correlation, and the intensity increases with the strength of correlation
- Blue blocks indicate a negative correlation and the intensity increases with the strength of correlation
- White blocks are where there is little to no correlation between variables
A **pairwise plot** plots every single variable against every other variable, and shows
- **scatter plots** in the lower triangle. This gives a visual idea of the relationship between variables, and a qualitative idea of how strong it is
- **density curves** along the diagonal which show the spread of the values of a particular variable. This gives an idea about whether the values are skewed or centred; and
- **correlation coefficients** between two variables in the upper triangle
## **Multicollinearity**
**Multicollinearity** occurs when there are high correlations between the independent variables in a multiple linear regression analysis. This is not a good thing, because it introduces redundancies in explaining the variability in the dependent variable.
For our example,
- there is no concern in the correlation between price and competitor price since we have that $r=0.078$. This is a very weak positive correlation
- there is some concern in the correlation between price and ads expenditure. Here, $r=-0.469$. This is a relatively strong positive correlation and is significant enough to cause some redundancies in explaining changes the dependent variable
- there is concern in the correlation between competitor price and ads expenditure since $r=0.605$. This is a moderately strong positive correlation between the two variables
::: callout-note
For now, we continue with this example to explore other things we are able to do in MLR using R. However, in the real world, we might have wanted to do away with some variables provided we have thoroughly analysed the impacts of doing so.
:::
### **Consequences of Multicollinearity for a Regression Analysis**
Multicollinearity is problematic because we do not know which variable explains the variability in the dependent variable, and this causes uncertainty in the parameter estimates.
It affects analyses in the following ways:
- Inflated standard error, and therefore inflated $p$-values
- removing/adding correlated independent variables from the model results in larger changes to parameter estimates
- An increase in RSE when adding a correlated independent variable to the model which causes the errors in the model to be bigger. This lowers the accuracy of the model
# **Multiple Linear Regression with Continuous Explanatory Variables**
Here, we only consider a multiple linear regression analysis using only numeric variables. If our data contains categorical variables – which, in our case, it does – we do not include them in the regression analysis.
Later on, we will look at what to do so as to include categorical variables.
## **Performing Multiple Linear Regression in R**
```{r}
############################################
# FITTING A MULTIPLE LINEAR REGRESSION MODEL
############################################
fresh_fit <- lm(demand ~ fresh_price + ads_expenditure + competitor_price, data=fresh_data)
summary(fresh_fit)
```
We get our regression equation as:
$$\hat{y}=7.0705-2.2826(\text{fresh price})+0.5162(\text{ads expenditure})+1.6530(\text{competitor price})$$
## **Interpreting the** $\beta$ **Coefficients**
::: {.callout-important title="General Interpretation of the Coefficients"}
**Interpretation of** $\beta_{0}$ : it is the average estimated value of $y$ when all $x_{j}=0$ for $j \in \{1,2,3, \dots, p\}$
**Interpretation of** $\hat{\beta}_{j}$ : it is the average estimated change in $y$ for a unit increase in $x_{j}$, holding all the other independent variables constant
:::
For our example,
$\hat{\beta}_{0}$ : on average, the demand for Fresh is $7.0705\times100000=707050$ bottles when all the indepedendent variables take on a value of $0$.
$\hat{\beta}_{1}$ : on average, the demand for Fresh decreases by $2.2826\times100000=228260$ bottles for each additional $\text{R}10$ increase in the price of fresh, holding all the other independent variables constant.
$\hat{\beta}_{2}$ : on average, the demand for Fresh increases by $0.5162\times100000=51620$ bottles for each additional $\text{R}1000$ increase in the ads expenditure to promote Fresh, ceteris paribus.
$\hat{\beta}_{3}$ : on average, the demand for Fresh increases by $1.6530\times100000=165300$ bottles for each additional $\text{R}10$ increase in the price for Fresh, ceteris paribus.
## **Testing the Significance of the** $\beta$ **Estimates**
In multiple linear regression, we do a significance test on each of the $\beta_{j}$ estimates separately. The test proceeds as we normally would do it. Suppose we do a significance test for the ads expenditure estimate. Then, we would have that
$$H_{0} : \beta_{2}=0$$
$$\text{and}$$
$$H_{1}:\beta_{2}\neq0$$
are our null and alternative hypotheses. We define $\alpha=0.05$. The test statistic still follows a $t$ distribution, and we have that
$$\begin{align*}
t=\frac{\hat{\beta}_{j}-\beta_{j}}{se(\hat{\beta}_{j})} \sim t_{n-p-1}
\end{align*}$$
with $p$ being the number of independent variables used to model the dependent variable. For our case, we get that
$$\begin{align*}
t &=\frac{0.5162-(0)}{0.1201} \sim t_{30-3-1}\\
&=4.298 \sim t_{27}
\end{align*}$$
::: callout-note
The test statistic is also given in the **model summary**
:::
We can, then, calculate the $p$-value associated with the test statistic:
```{r}
####################################
# CALCULATING THE P-VALUE MANUALLY
####################################
pval <- 2*pt(q=4.298, df=26, lower.tail=FALSE)
pval
```
This value could have also be obtained from the model summary.
We then reject the null hypothesis since the $p$-value is less than the significance level, and conclude that there is significant evidence of a linear relationship between the ads expenditure and the demand of Fresh.
## **Confidence Intervals for** $\beta_{j}$ **Estimates**
We can get a confidence interval around our estimates. These are ranges of values for which we a relatively sure that they contain the true $\beta$ values. We calculate them using
$$\text{CI}=\beta_{j}\pm t_{\alpha/2, n-p-1}\times se(\hat{\beta_{j}})$$
As an example, we can calculate a $95\%$ confidence interval for the ads expenditure as follows:
```{r}
##########################################
# CRITICAL VALUE FOR CONF. INT.
##########################################
# critical value
tcrit <- qt(p=0.975, df=26)
tcrit
```
and so we have that
$$\begin{align*}
\text{CI} &= 0.5162 \pm 2.06\times0.1201\\
&=[0.27,0.76]
\end{align*}$$
## **Checking Overall Model Significance**
The way we check overall model significance for multiple linear regression is the same as the way we check it for simple linear regression. Again, we begin by gathering some information
| **Source of Variation** | $\text{df}$ | **Sum of Squares** | **Mean Squares** | **F** |
|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|
| **Regression** | $p$ | $SS_{reg}=(\hat{y}_{i}-\bar{y})^{2}$ | $MS_{reg}=\frac{SS_{reg}}{p}$ | $F=\frac{MS_{reg}}{MSE}$ |
| **Error (Residuals)** | $n-p-1$ | $SSE=(y_{i}-\hat{y}_{i})^{2}$ | $MSE=\frac{SSE}{n-p-1}$ | |
| **Total** | $n-1$ | $SS_{tot}=(y_{i}-\bar{y})^{2}$ | | |
We the use the $\text{f}$-statistic to perform an $\text{F}$ test. As before, we have that $\sqrt{MSE}=RSE$.
Our null and alternative hypotheses are given by
$$H_{0}: \beta_{1}=\beta_{2}=\dots=\beta{p}=0 \text{ (null model)}$$
$$\text{and}$$
$$H_{1}:\text{at least one $\beta_{j}\neq0$}$$
We perform the significance test at the $5\%$ level. We move on to calculating the test statistic.
```{r}
########################################################
# CALCULATING THE TEST STATISTIC FROM FIRST PRINCIPLES
########################################################
p <- length(coef(fresh_fit)[-1]) # number of beta estimates (less the
#interecpt)
n <- nobs(fresh_fit) #number of observations
# Regression
y <- fresh_data$demand
ybar <- mean(y)
yhat <- fitted(fresh_fit)
SS_reg <- sum((yhat-ybar)^2)
# Error (Residuals)
SSE <- sum((y-yhat)^2)
# MS Regression
MS_reg <- SS_reg/p
# MS Errors
MSE <- SSE/(n-p-1)
# F statistic
f <- MS_reg/MSE
f
```
We have, then, that our test statistic is given by $\text{F}=82.82$ with $27$ degrees of freedom. Then,
```{r}
##############
# P-VALUE
##############
pval <- pf(q=82.82327, df1=p, df2=n-p-1, lower.tail=F)
pval
```
Since our $p$-value is much less than the significance level we have defined, we reject the null hypothesis and conclude that there is significant evidence that our model is different from the null model.
## **Coefficient of Determination**
For multiple linear regression, the coefficient of determination, $R^{2}$, is given by:
$$R^{2}=\frac{SS_{reg}}{SS_{tot}}=\frac{\sum(y_{i}-\hat{y}_{i})^{2}}{\sum(y_{i}-\bar{y})^{2}}$$
There is an issue here:
- $SS_{tot}$ only depends on $y_{i}$, and so it does not change when we add new model predictors
- In MLS, adding a new variable gives the model more **flexibility** (or degrees of freedom) since we can always set the new coefficient to zero and reproduce the old fit.
- $SS_{reg}$ can only stay the same or increase when adding new variables to the model. This means that the value of $R^{2}$ never decreases when adding new variables. This **rewards model overfitting**.
The issue here is that if we use $R^{2}$ to compare models, models with more variables will always be favoured even when they are worse-off than smaller models. This motivates the use of the **adjusted coefficient of determination** to compare models.
### **Adjusted Coefficient of Determination**
The adjusted coefficient of determination adjusts the $R^{2}$ value to consider the size of the model. In this way, it penalises the addition of new explanatory variables. It is defined by:
$$R^{2}_{\text{adj}}=1-\left(\frac{n-1}{n-p-1}\right)(1-R^{2})$$
We still have the bound
$$0\leq R^{2}_{\text{adj}} \leq 1$$
The adjusted $R^{2}$ describes the proportion of variation in the dependent variable that is explained by the proportion of variation in the independent variables, taking the sample size and number of independent variables into account.
We can obtain it from the model summary.
```{r}
summary(fresh_fit)
```
From this, we can see that the coefficient of determination is $0.8943$.
## **Model Checking**
For multiple linear regression, we have the same model assumptions as in simple linear regression. Those are
- Linearity
- Normal distribution of errors with a mean of $0$
- Constant error variance
- Independence of errors
As such, we check the model in the same way.
```{r}
##############################
# CHECKING MODEL ASSUMPTIONS
##############################
#assessing linearity
plot(fresh_data$fresh_price, fresh_data$demand)
plot(fresh_data$ads_expenditure, fresh_data$demand)
plot(fresh_data$competitor_price, fresh_data$demand)
# assessing normality of errors
# histogram
hist(residuals(fresh_fit))
# qqplot
qqnorm(residuals(fresh_fit))
qqline(residuals(fresh_fit))
# constant error variance
plot(fresh_fit, which=1, add.smooth=F)
# independence of errors
plot(residuals(fresh_fit))
```
From these plots, it looks like our data meets all of the assumptions.
## **Prediction**
Prediction for multiple linear regression works in the same manner as with simple linear regression, only with the adjustment that we need to provide values for all the independent variables to make a prediction. We already have that:
$$\hat{y}=7.0705-2.2826(\text{fresh price})+0.5162(\text{ads expenditure})+1.6530(\text{competitor price})$$
Suppose we want to know the demand predicted from Fresh when the price of Fresh is $\text{R}50$, the average competitor price is $\text{R}55$, and the advertising expenditure is $\text{R}8000$. Then, the predicted demand would be given by
$$\begin{align*}
\hat{y} &= 7.0705-2.2826(5)+0.5126(8)+1.6530(5.5)\\
&= 8.8786
\end{align*}$$
So, the predicted demand would be $8.8786\times100000=887860$ bottles.
### **Confidence and Prediction Intervals**
Because of how complicated manual methods of finding prediction and confidence intervals are for multiple linear regression, we stick to using R to obtain these.
```{r}
#####################################
# PREDICTING FOR A SINGLE INDIVIDUAL
#####################################
newdat <- list(fresh_price=5, ads_expenditure=8, competitor_price=5.5)
# makes it easier on the eye
predict(fresh_fit, newdata=newdat, interval="prediction")
###############################
# PREDICTING FOR AN AVERAGE
###############################
predict(fresh_fit, newdata=newdat, interval="confidence")
```
- **Prediction interval** : $[7.297,10.46]$
- **Confidence interval** : $[7.365, 10.39]$
# **Multiple Linear Regression with Categorical Explanatory Variables**
Sometimes, our datasets include both numerical and categorical variables. Before, we bypassed this by removing the categorical variables, but these variables may be very important. So, we introduce a way to include them into the model.
Let us start by considering a variable with two **levels**.
::: callout-note
**Levels** refer to the number of subcategories that exist for a categorical variable.
:::
Take the *company* *size* variable from our data which has the levels "big" and "small". We can make the category of company sizes binary, letting small take a value of $0$ and "big" take a value of $1$. This allows us to treat the categorical variable as numeric while still preserving the fact that it has two levels. Our regression equation will, then, be given by
$$\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\hat{\beta}_{3}x_{3i}+\hat{\beta}_{4}x_{4i}$$
with $\hat{\beta}_{4}$ being the change in demand when the size of the company is big versus when it is small.
***recall** : we chose "big" to take on the value of* $1$ *(and "small" to take on a value of* $0$*). We interpret this as a change in the independent variable when the independent variable takes on a value of* $1$ *compared to* $0$. *The reason why* *we do this is because when the company is small, the last term falls away, and so does not affect the demand. However, when the company is big, the last term is present and affects the demand.*
This treats small as a **reference category**. What happens, geometrically, is that when a company is big the intercept is adjusted by $\hat{\beta}_{4}$. This captures the change in demand when the company is big. Since "small" is the reference category, the intercept parameter captures what happens when a company is small.
## **Including Binary Variables in R**
```{r}
##############################
# DEFAULT REFERENCE CATEGORY
##############################
# this assigns the category that comes first in the alphabet as the
# reference category
fresh_data$size <- as.factor(fresh_data$size) # makes "size" numeric
# viewing data with new changes
str(fresh_data)
contrasts(fresh_data$size) # shows us explicitly which variable is 0 and 1
```
Notice that, in the table, *size* is given as $2$'s and $1$'s. That is the nature of R's output. $1 \rightarrow$ reference category (or $0$). What is important to note here is that "big" is assigned as the reference category since it comes first in the alphabet. But this is not how we established our example; we want "small" to be the reference category. We can do this by manually assigning "small" to be a reference category.
```{r}
################################
# MANUAL REFERENCE CATEGORY
################################
fresh_data$size <- relevel(fresh_data$size, ref="Small")
contrasts(fresh_data$size)
```
As we can see, "small" is now the reference category.
### **What if There are Two or More Levels?**
When there are two or more variables, we do the same thing, but now we will have a **dummy variable** for each variable that is not a reference category.
Suppose we consider adding the *ads campaign* *variable* which has three levels. This would result in two more $\beta$ estimates added to the model. So, our model now becomes
$$\hat{y}_{i}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\hat{\beta}_{3}x_{3i}+\hat{\beta}_{4}x_{6i}+\hat{\beta}_{5}x_{5i}+\hat{\beta}_{6}x_{6i}$$
Assuming that we use "C" as the reference category, $\hat{\beta}_{5}$ would represent the change in demand when the company uses ads campaign "A" as opposed to "C", and $\hat{\beta}_{6}$ would represent the change in demand when the company uses ads campaign "B" as opposed to "C". Since we are using "C" as the reference category, we need to manually assign it as the reference category in R.
```{r}
###############################################
# MANUAL REFERENCE CATEGORY FOR ADS CAMPAIGN
###############################################
fresh_data$ads_campaign <- as.factor(fresh_data$ads_campaign)
fresh_data$ads_campaign <- relevel(fresh_data$ads_campaign, ref="C")
contrasts(fresh_data$ads_campaign)
```
## **Regression Analysis with Categorical Variables**
We can, then, perform a full regression analysis including all the variables of the dataset
```{r}
###########################
# FULL REGRESSION ANALYSIS
###########################
fresh_full <- lm(demand ~ fresh_price + ads_expenditure + competitor_price + size + ads_campaign, data = fresh_data)
summary(fresh_full)
```
We then obtain the regression equation as
$$\begin{align*}
\hat{y}= \hspace{0.2em} &7.75-2.23(\text{fresh price})+0.46(\text{ads expenditure})+1.55(\text{competitor price})+\\
& 0.16(\text{size: Big}) - 0.35(\text{ads campaign A}) - 0.2(\text{ads campaign B})
\end{align*}$$
## **Interpretation of the** $\beta$ **Estimates**
::: {.callout-important title="Interpretation of the Parameter Estimates"}
$\beta_{0}$ : it is the average value of $y$ when all the continuous variables are zero and the categorical variables are at the reference category.
$\beta_{continuous_{j}}$ : it is the average estimated change in $y$ for every additional unit increase in $x_{continuous_{j}}$ and all the categorical variables are at the reference category.
$\beta_{categorical_{j}}$ : it is the average estimated change in $y$ when $x_{categorical_{j}}$ takes on a dummy category compared to when it takes on a reference category.
:::
In our example:
$\beta_{0}$ : the average demand for Fresh is $7.75\times100000=775000$ bottles when the price and competitor price are $\text{R}0$, no ads expenditure, the size of the company is small, and the company uses ads campaign A and/or B.
$\beta_{1}$ : the average demand for Fresh decreases by $2.23\times100000=223000$ bottles for every additional $\text{R}10$ increase in the price of Fresh when the company is small, and when the company uses ads campaign C.
$\beta_{3}$ : the average demand of Fresh increases by $0.46\times100000=46000$ bottles for every additional $\text{R}1000$ spent on advertising when the company is small, and the company uses ads campaign C.
$\beta_{3}$ : the average demand for Fresh increases by $1.55\times100000=155000$ bottles for every additional $\text{R}10$ increase in the competitor price for similar detergents when the company is small, and when the company uses ads campaign C.
$\beta_{4}$ : the average demand for Fresh increases by $0.16\times100000=16000$ bottles when the company is big compared to when it is small
$\beta_{5}$ : the average demand for Fresh decreases by $0.35\times100000=35000$ bottles when the company uses ads campaign A compared to when the company uses ads campaign C
$\beta_{6}$ : the average demand for Fresh decreases by $0.2\times100000=20000$ bottles when the company uses ads campaign B compared to C
The effect of a altering levels of a category on a multiple linear regression model is that it shifts the model up and down.
{fig-align="center" width="416"}
## **Testing the Significance of the** $\beta_{j}$ **Estimates**
The test is the same as it is for continuous variables, and only the hypotheses and conclusion differ.
**If** $H_{0}$ **is rejected**: we conclude that there is significant evidence in a difference in $y$ between the dummy category and the reference category
**If** $H_{0}$ **is not rejected**: we conclude that there is no evidence of a significant difference in $y$ between the dummy category and the reference category.
::: callout-note
Of course, this comes from a null hypotheses of no difference in $y$ between between the dummy category and the reference category. The alternative hypothesis stands to reason that there is a significant difference.
:::
# **Interactions**
The regression models we have looked at so far are called **additive regression models**. Their regression equation is characterised by a linear combination of the independent variables.
Sometimes, however, we may have reason to belive that the relationship between $y$ and some independent variable $x_{i}$ is affected by another independent variable $x_{k}$. In this case, we have to modify the effect of $x_{i}$ on $y$ for different values of $x_{k}$. To do this, we introduce an interaction term.
Suppose we have a regression model where the effect of $x_{1}$ is influence by $x_{2}$. Then, our model my look like:
$$\hat{y}=\hat{\beta}_{0}+\hat{\beta}_{1}x_{1i}+\hat{\beta}_{2}x_{2i}+\hat{\beta}_{3}x_{1i}x_{2i}$$
with $\hat{\beta}_{3}x_{1i}x_{2i}$ being the **interaction term**. The effect of $x_{1}$ on $y$ is now given by $\hat{\beta}_{1}$ and $\hat{\beta}_{3}x_{3i}$. We can do this in R.
```{r}
###########################
# MULTIPLICATIVE MODELS
###########################
fresh_inter <- lm(demand ~ fresh_price * ads_campaign,
data=fresh_data)
# we use * to show interaction between two variables
# for now, we exclude the other variables if we added the others
# we would have
# lm(demand ~ fresh_price*ads_campaign + ... + ... , data=fresh_data)
summary(fresh_inter)
```
From this, we obtain our **multiplicative regression model** as
$$\begin{align*}
\hat{y}= \hspace{0.2em} &25.76-4.63(\text{fresh price})-14.49(\text{ads campaign A})+21.65(\text{ads campaign B})\\
& +3.82(\text{fresh price * ads campaign A})-5.76(\text{fresh price * ads campaign B})
\end{align*}$$
## **Interpreting Interaction Terms**
$\hat{\beta}_{4}$ : this is the adjustment of the slope of fresh price when the advertising campaign is A compared to C
$\hat{\beta}_{5}$ : this is the adjustments of the slope of fresh price when the advertising campaign is B compared to C.
This is the visual effect of interaction terms when we have different category levels:
{fig-align="center" width="657"}