Quiz 2B: Multiple Linear Regression

##################
# READING IN DATA
##################

salary_data <- read.csv("salary_simulated.csv")
options(digits=12)

Question 1

In the additive multiple linear regression model, the predicted yearly salary of someone with 10 years of experience and some tertiary education is R_______: 

#################
# ADDITIVE MLR
#################

salary_data$tertiary_edu <- as.factor(salary_data$tertiary_edu)

model <- lm(salary ~ experience + tertiary_edu, data=salary_data)
summary(model)

Call:
lm(formula = salary ~ experience + tertiary_edu, data = salary_data)

Residuals:
         Min           1Q       Median           3Q          Max 
-44.77222568  -9.68118502   1.50047851   9.06660315  40.81434771 

Coefficients:
                    Estimate   Std. Error  t value   Pr(>|t|)    
(Intercept)     27.673514285  2.921162434  9.47346 1.8392e-15 ***
experience       2.888323022  0.199924016 14.44710 < 2.22e-16 ***
tertiary_eduYes  7.488917712  3.146338110  2.38020   0.019258 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.4540898 on 97 degrees of freedom
Multiple R-squared:  0.685469626,   Adjusted R-squared:  0.678984464 
F-statistic: 105.698144 on 2 and 97 DF,  p-value: < 2.220446e-16
###############
# PREDICTIONS
################

predict(model, newdata=data.frame(experience=10, tertiary_edu="Yes"))
            1 
64.0456622121 

\(64.04566\times10000=\text{R}640456.62\)

Question 2

In the multiplicative multiple linear regression model, the final estimated intercept coefficient when a person has had a tertiary education is equal to:

######################
# MULTIPLICATIVE MLR
######################

modelX <- lm(salary ~ experience * tertiary_edu, data=salary_data)
summary(modelX)

Call:
lm(formula = salary ~ experience * tertiary_edu, data = salary_data)

Residuals:
         Min           1Q       Median           3Q          Max 
-43.66407769  -9.17612138   2.11603601   8.80404203  39.60443103 

Coefficients:
                               Estimate   Std. Error  t value   Pr(>|t|)    
(Intercept)                29.279883918  3.388467112  8.64104 1.2286e-13 ***
experience                  2.736681728  0.257292895 10.63645 < 2.22e-16 ***
tertiary_eduYes             3.615367758  5.195606481  0.69585    0.48820    
experience:tertiary_eduYes  0.383447760  0.409140203  0.93720    0.35101    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.4637895 on 96 degrees of freedom
Multiple R-squared:  0.688321327,   Adjusted R-squared:  0.678581368 
F-statistic: 70.6698415 on 3 and 96 DF,  p-value: < 2.220446e-16

\(29.2798... + 3.615367... \approx 32.90\)

Question 3

In the additive multiple linear regression model, the estimate of the slope coefficient for experience is equal to:

\(2.89\)

Question 4

The correlation analysis reveals that the sample correlation between salary and experience is equal to:

cor(salary_data$salary, salary_data$experience)
[1] 0.816761436203

Question 5

In the additive multiple linear regression model, the upper bound of the 95% confidence interval for the intercept parameter is equal to:

confint(model)
                         2.5 %        97.5 %
(Intercept)     21.87581547243 33.4712130975
experience       2.49152919230  3.2851168508
tertiary_eduYes  1.24430751343 13.7335279098

\(33.47\)

Question 6

In the multiplicative multiple linear regression model, the residual standard error is equal to:

\(15.46\)

Question 7

In the additive multiple linear regression model, the test statistic for the overall model significance is equal to:

\(105.70\)

Question 8

In the multiplicative multiple linear regression model, the lower bound of the 95% confidence interval on the yearly salary of a person with 4 years of experience and some tertiary education is equal to R______:

#################
# PREDICTION
#################

predict(modelX, newdata=data.frame(experience=4, tertiary_edu="Yes"), interval="confidence")
            fit           lwr           upr
1 45.3757696294 39.3505071305 51.4010321283

\(39.350507... \times 10000\approx\text{R}393505.07\)

Question 9

In the additive multiple linear regression model, the lower bound of the 95% prediction interval for the yearly salary of someone with 7 years of experience but no tertiary education is R_______:

#################
# PREDICTION
#################

predict(model, newdata=data.frame(experience=7, tertiary_edu="No"), interval="prediction")
            fit           lwr           upr
1 47.8917754358 16.9280012397 78.8555496319

\(16.92800... \times 10000\approx\text{R}169280.00\)

Question 10

In the multiplicative multiple linear regression model, the degrees of freedom for testing for overall model signficance is equal to:

\(96\)

Question 11

In the additive multiple linear regression model, the adjusted R-squared values is equal to:

\(0.68\)

Question 12

Choose the incorrect answer with regards to the additive multiple linear regression model:

  1. There is some significant linear relationship between a person’s years of experience and the their yearly salary.
  2. There is some significant linear relationship between whether a person has received tertiary education and the their yearly salary.
  3. The overall model fit is statistically significant.
  4. A person who received some tertiary education but has no experience, is expected to earn a lower yearly salary than someone with no tertiary education but with two years of experience.

(c) is the easiest check, and we find that it is true since the \(p\)-value for the F statistic is very small. (a) is also true since the test statistic for the experience slope estimate is significant with a very small \(p\)-value. (b) is can be true if we have defined the significance level to be \(0.05\). If we define it as \(0.01\), the it is untrue. (d) requires calculation:

##############
# PREDICTION 
##############

# case 1: tertiary education with no xp 
predict(model, newdata=data.frame(experience=0, tertiary_edu="Yes"))
            1 
35.1624319966 
# case 2: no tertiary education with 2 years xp
predict(model, newdata=data.frame(experience=2, tertiary_edu="No"))
            1 
33.4501603281 

Case 1 is clearly higher. So, (d) is false. We choose (d) as the final answer.

Question 13

Choose the correct answer: In the multiplicative multiple linear regression model…

  1. …the interaction is statistically significant.
  2. …the intercept term for the simplified model when someone has had no tertiary education is 29.28.
  3. …the interaction plot shows two parallel lines.
  4. …the adjusted R-squared value is larger than the adjusted R-squared value for the additive multiple linear regression model.

(a) and (d) are false. (b) is true, but we still need to check for (c).

####################
# INTERACTION PLOT
####################

interaction.plot(
  x.factor=salary_data$experience,        # variable on x-axis
  trace.factor=salary_data$tertiary_edu,  # variable that defines separate lines
  response=salary_data$salary,            # response variable
  )

Clearly, then, (c) is false.

Question 14

What is the interpretation of the experience coefficient from the additive model?

On average, the salary of an individual increases by \(\text{R}28883.23\) for each additional year of experience when the person has no tertiary education.