Quiz 1B: Simple Linear Regression

##############################################
# READING IN DATA, CORRELATION ANALYSIS & SLR
##############################################

ecom <- read.csv("ecom_adv_simulated.csv")

# corr
cor.test(ecom$Advertising_expenditure, ecom$Monthly_sales)

    Pearson's product-moment correlation

data:  ecom$Advertising_expenditure and ecom$Monthly_sales
t = 7.5839, df = 48, p-value = 9.421e-10
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.5789249 0.8433389
sample estimates:
     cor 
0.738301 
# SLR 
model <- lm(Monthly_sales ~ Advertising_expenditure, data=ecom)
summary(model)

Call:
lm(formula = Monthly_sales ~ Advertising_expenditure, data = ecom)

Residuals:
    Min      1Q  Median      3Q     Max 
-362.07  -80.54   17.19   71.88  258.47 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)               34.747     63.020   0.551    0.584    
Advertising_expenditure   20.734      2.734   7.584 9.42e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 135.6 on 48 degrees of freedom
Multiple R-squared:  0.5451,    Adjusted R-squared:  0.5356 
F-statistic: 57.51 on 1 and 48 DF,  p-value: 9.421e-10

Question 1

The lower bound of the 95% confidence interval for the slope parameter, rounded to two decimal places, is equal to:

confint(model)
                            2.5 %    97.5 %
(Intercept)             -91.96370 161.45687
Advertising_expenditure  15.23666  26.23041

\(15.24\)

Question 2

The upper bound of the 95% confidence interval of the intercept parameter, rounded to two decimal places, is:

\(161.46\)

Question 3

The value of SSx for this dataset, rounded to two decimal places, is:

###################
# SSx CALCULATION
###################

x <- ecom$Advertising_expenditure
xbar <- mean(ecom$Advertising_expenditure)

# then,

SSx <- sum((x-xbar)^2)
SSx
[1] 2461.371

Question 4

The test statistics for testing whether the slope parameter is significant or not, rounded to two decimal places, is equal to:

\(7.58\)

Question 5

The value of the correlation coefficient, rounded to two decimal places, is:

\(0.74\)

Question 6

When fitting a simple linear regression model to this dataset, the standard error for the intercept parameter, rounded to two decimal places, is equal to:

\(63.02\)

Question 7

The degrees of freedom for testing whether the slope parameter is significant or not, rounded to two decimal places, is:

\(48.00\) (we have \(50\) observations)

Question 8

The value of MSresid for this dataset, rounded to two decimal places, is equal to:

#######################
# MS_RESID CALCULATION
#######################

y <- ecom$Monthly_sales
yfitted <- fitted(model)
df <- 48

# then, 

SS_resid <- sum((y-yfitted)^2)

# and, 

MS_resid <- SS_resid/df
MS_resid
[1] 18396.82

Question 9

The test statistic for testing the overall model significance, rounded to two decimal places, is:

\(57.51\)

Question 10

The p-value of testing whether the correlation coefficient of this dataset is significant or not, is equal to (round your answers to two decimal places):

\(0.00\)

Question 11

The test statistics for testing whether the correlation coefficient for this dataset is significant or not, is equal to:

\(7.58\)

Question 12

Perform a hypothesis test to see whether the slope parameter is significant or not. Choose the correct answer from the options below:

  1. We will reject the null hypothesis, since the p-value is 0.58 and thus much greater than the significance level. We can conclude that there is some significant linear relationship between advertising expenditure and monthly sales.
  2. We will reject the null hypothesis, since the p-value is extremely small. We can conclude that there is some significant linear relationship between advertising expenditure and monthly sales.
  3. We will not reject the null hypothesis, since the p-value is extremely small. We can conclude that there is no significant linear relationship between advertising expenditure and monthly sales.
  4. We will not reject the null hypothesis, since the p-value is extremely large. We can conclude that there is no significant linear relationship between advertising expenditure and monthly sales.

We definitely reject the null hypothesis since the \(p\)-value is significantly small. Option (c) and (d) are already out. We then conclude that there is evidence of a significant evidence of a linear relationship between advertising expenditure and monthly sales. Option (a) makes a false statement that \(p=0.58\). So, we choose (b).

Question 13

Check the dataset to make sure that it meets the necessary requirements for performing a simple linear regression. Choose the incorrect answer from the list below:

  1. The histogram of the data shows that the residuals do not follow a normal distribution perfectly.
  2. The QQ-plot of the residuals shows deviations from the theoretical quantiles, not just around the endpoints as one would expect to see.
  3. The assumption of constant variance is very clearly not met.
  4. The assumption that the errors are independent is met.
################################
# ASSESSING MODEL ASSUMPTIONS 
################################

# assessing normality of errors
hist(residuals(model))

qqnorm(residuals(model))
qqline(residuals(model))

# constant variance 
plot(model, which=1, add.smooth=F)

# independence of errors
plot(residuals(model))
abline(h=0)

(c) is false. It is not clear from the plot that the assumption of constant variance is not met.

Question 14

What us the interpretation of the slope coefficient from the regression model?

On average, monthly sales increase by \(20.73\) for every additional unit of advertising expenditure.