Quiz 2A: Multiple Linear Regression

#####################
# READING IN DATA
#####################

sales <- read.csv("retail_sales.csv")

Multiple Linear Regression

#############################
# MULTIPLE LINEAR REGRESSION
#############################

model <- lm(sales ~ markup + advertising, data=sales)
summary(model)

Call:
lm(formula = sales ~ markup + advertising, data = sales)

Residuals:
     Min       1Q   Median       3Q      Max 
-1721.57  -610.36    25.09   674.06  2211.27 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1784.04     424.83   4.199 0.000161 ***
markup       -511.73     265.67  -1.926 0.061788 .  
advertising    37.36      13.41   2.787 0.008347 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 851 on 37 degrees of freedom
Multiple R-squared:  0.2017,    Adjusted R-squared:  0.1586 
F-statistic: 4.675 on 2 and 37 DF,  p-value: 0.01548

Question 1

The residual standard error for this model is:

\(851\)

Question 2

The upper bound of the confidence interval for the advertising coefficient is:

######################
# CONDFIDENCE INTERVAL
######################

confint(model)
                  2.5 %    97.5 %
(Intercept)   923.25430 2644.8306
markup      -1050.02007   26.5635
advertising    10.19832   64.5222

\(64.52\)

Question 3

The degrees of freedom for the t-test to test the significance of the markup coefficient is: (enter your answer as a whole number)

\(\text{df}=n-p-1=40-2-1=37\)

Question 4

The numerator degrees of freedom for the F-test of overall model significance is: (enter your answer as a whole number)

This the numerator of the F statistic is given by the \(MS_{reg}\) which has \(p\) degrees of freedom. In this case, we have that \(p=2\).

Question 5

The standard error of the coefficient for advertising is:

\(13.41\)

Question 6

The value of the test statistic for the test of overall model significance is:

\(4.675\)

Question 7

The value of the test statistic for the test of overall model significance is:

  • True

  • False

False

Question 8

The model coefficient estimates that appear in the R output are the values of the population parameters (β0 and β1).

  • True

  • False

False

Question 9

The correlation between sales and markup is high.

  • True

  • False

#####################
# CORRELATION MATRIX
#####################
cor(sales$markup, sales$sales)
[1] -0.1848131

False

Question 10

The lower bound of the confidence interval for the markup coefficient is:

\(-1050.02\)

Question 11

The test of overall model significance shows that the model is significant at the 1% level.

  • True

  • False

False

Multiple-Choice

Question 12

In a multiple linear regression model, multicollinearity occurs when:

  1. The independent variables provide complementary information about the dependent variable.
  2. There exists a high degree of correlation between the independent variables and the dependent variable.
  3. There exists a high degree of correlation between the independent variables included in the model.
  4. The dependent variable provides redundant information about the independent variables.
  5. The fitted model yields estimates that are non-linear in form.

(c)

Question 13

Which of the following interpretations of the advertising coefficient (from the multiple linear regression analysis in the previous part) is correct?

  1. On average, average monthly sales increases by 37.36 sales for every R1000 increase in advertising spend, holding all else constant.
  2. On average, average monthly sales increases by 37.36 sales for every R1 increase in advertising spend, holding all else constant.
  3. On average, advertising spend increases by 37.36 units for a unit increase in average monthly sales, holding all else constant.
  4. We cannot interpret the advertising coefficient because it is not statistically significant.
  5. On average, average monthly sales increases by 1784.04 sales for every unit increase in advertising spend, holding all else constant.

(a)

Question 14

What is the difference between a simple and multiple linear regression model?

A simple linear regression model only uses one explanatory variable to describe the change in the dependent variable whereas a multiple linear regression model uses more than one explanatory variable.

Question 15

What are we testing when performing an F-test of overall model significance?

Whether the model is different from a null model.