ISLR2 – Exercise 3.7 Solution

Question 3.7.1 (Difficulty Level - 1)

1. Describe the null hypotheses to which the p-values given in Table 3.4 correspond. Explain what conclusions you can draw based on these p-values. Your explanation should be phrased in terms of sales, TV, radio, and newspaper, rather than in terms of the coefficients of the linear model.

Ans: The null hypothesis for the predictors “TV”, “radio”, and “newspaper” claims that the advertising budgets for the three mediums are not associated with response “sales”. As per table 3.4 results, we can concluded basis the respective p-values of the predictors “TV”, and “radio”, assuming significance level of 0.05, to be significantly high is indicative that null hypothesis can be rejected. However, p-value in case of “newspaper” indicate that it is not significant and we fail to reject null in case of “newspaper”. It gives us the impression that newspaper is not associated with sales. However, we can observe that radio and newspaper have a high positive correlation of 0.35. Now from here it can follow that if radio spends are higher then sales will tend to be higher and due to high correlation with newspaper ads spends will also tend to be high. So newspaper ads act as surrogate to radio advertising and so newspaper ads act in favor of association between radio on sales.

Question 3.7.3 (Difficulty Level - 2 or 3)

Suppose we have a data set with five predictors, X1 = GPA, X2 = IQ, X3 = Level (1 for College and 0 for High School), X4 = Interaction between GPA and IQ, and X5 = Interaction between GPA and Level. The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model, and get ˆ β0 = 50, ˆ β1 = 20, ˆ β2 = 0.07, ˆ β3 = 35, ˆ β4 = 0.01, ˆ β5 = −10.

(a) Which answer is correct, and why?

i. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates.

ii. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates.

iii. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high enough.

iv. For a fixed value of IQ and GPA, college graduates earn more, on average, than high school graduates provided that the GPA is high enough.

Ans: True- iii. For a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is high
enough.

Let,

x1 = GPA
x2 = IQ
x3 = Level (College - 1, High School - 0)
x4 = Interaction b/w GPA and IQ (x1.x2)
x5 = Interaction b/w GPA and Level (x1.x3)
Salary = b0 + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 = 50 + 20x1 + 0.07x2 + 35x3 + 0.01x4 - 10x5

for fixed IQ and GPA at x1’ and x2’: - Salary (high school) = 50 + 20x1’ + 0.07x2’ + 35*(0) + 0.01(x1’.x2’) -10(x1’.0) = 50 + 20x1’ + 0.07x2’ + 0.01(x1’.x2’)

Salary (college) = 50 + 20x1’ + 0.07x2’ + 35*(1) + 0.01(x1’.x2’) -10(x1’.1) = 50 + 20x1’ + 0.07x2’ + 35 + 0.01(x1’.x2’) - 10(x1’) = Salary (high school) + 35 - 10(x1’)

From here:

Salary (college) - Salary (high school) = 35 - 10x1’

Assuming the salary difference to be more than equal to zero, we get:

35 - 10x1’ >= 0 x1’ <= 3.5

Assuming the salary difference to be less than equal to zero, we get:

35 - 10x1’ <= 0 x1’ >= 3.5

Hence, for a fixed value of IQ and GPA, high school graduates earn more, on average, than college graduates provided that the GPA is more than equal to 3.5

(b) Predict the salary of a college graduate with IQ of 110 and a GPA of 4.0.

Ans: Salary = 50+20(4)+0.07(110)+35+0.01(110x4)-10(4) = 137.1 (in thousands of dollars)

(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there is very little evidence of an interaction effect. Justify your answer.

Ans: False because the magnitude of coefficient is not an indicator of statistical significance

Question 3.7.10 (Difficulty Level - 1)

This question should be answered using the Carseats data set.

(a) Fit a multiple regression model to predict Sales using Price, Urban, and US.

library(ISLR2)
car = Carseats

summary(lm(car$Sales ~ car$Price + factor(car$Urban) + factor(car$US)))

## 
## Call:
## lm(formula = car$Sales ~ car$Price + factor(car$Urban) + factor(car$US))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9206 -1.6220 -0.0564  1.5786  7.0581 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          13.043469   0.651012  20.036  < 2e-16 ***
## car$Price            -0.054459   0.005242 -10.389  < 2e-16 ***
## factor(car$Urban)Yes -0.021916   0.271650  -0.081    0.936    
## factor(car$US)Yes     1.200573   0.259042   4.635 4.86e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.472 on 396 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2335 
## F-statistic: 41.52 on 3 and 396 DF,  p-value: < 2.2e-16

(b) Provide an interpretation of each coefficient in the model. Be careful—some of the variables in the model are qualitative!

Ans:

“car$Price - If one location sells the car seat $1 more than another location, that is otherwise equivalent on the other measured variables, the more expensive location can eb expected to sell 54 fewer car seats according to our model.

“car$Urban” - Sales of car seat is not expected to increase or decrease whether the store is located in Urban area or not as the p-value conveys that we fail to reject the null hypothesis

“car$US” - If the store in located in US, then it is expected to sell 1200 more car seats in US than abroad keeping other predictors fixed. .

(c) Write out the model in equation form, being careful to handle the qualitative variables properly.

Ans: Sales = 13.043469 - 0.054459x(Price) - 0.021916x(Urban) + 1.200573x(US)

(d) For which of the predictors can you reject the null hypothesis H0 : βj = 0?

Ans: For the predictors “Price” and “US” we can reject the null hypothesis.

(e) On the basis of your response to the previous question, fit a smaller model that only uses the predictors for which there is evidence of association with the outcome.

summary(lm(car$Sales ~ car$Price + car$US))

## 
## Call:
## lm(formula = car$Sales ~ car$Price + car$US)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9269 -1.6286 -0.0574  1.5766  7.0515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.03079    0.63098  20.652  < 2e-16 ***
## car$Price   -0.05448    0.00523 -10.416  < 2e-16 ***
## car$USYes    1.19964    0.25846   4.641 4.71e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.469 on 397 degrees of freedom
## Multiple R-squared:  0.2393, Adjusted R-squared:  0.2354 
## F-statistic: 62.43 on 2 and 397 DF,  p-value: < 2.2e-16

(f) How well do the models in (a) and (e) fit the data?

Ans: Both the models are able to explain 24% variation in Sales. This is evident from the coefficient of determination (R squared) and adjusted coefficient of determination are close and high. The residual standard error of the model is small which means that the regression model is fitting the data well.

*(g) Using the model from (e), obtain 95% confidence intervals for the coefficient(s).

mod1 = lm(car$Sales ~ car$Price + car$US)
confint(mod1)

##                   2.5 %      97.5 %
## (Intercept) 11.79032020 14.27126531
## car$Price   -0.06475984 -0.04419543
## car$USYes    0.69151957  1.70776632

(h) Is there evidence of outliers or high leverage observations in the model from (e)?

Ans: Based on the below plots there is no evidence of outliers or high leverage observations

par(mfrow = c(2,2))
plot(mod1)

## Question 3.7.14 (Difficulty Level = 1 or 2)

(a) This problem focuses on the collinearity problem. Perform the following commands in R.The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?

set.seed (1)
x1 <- runif (100)
x2 <- 0.5 * x1 + rnorm (100) / 10
y <- 2 + 2 * x1 + 0.3 * x2 + rnorm (100)

Ans: The linear model is as below:

 *Y = 2 + 2(x1) + 0.3(x2) + ε, where ε follows normal distribution with (mu = 0, sigma = 1)*
 
 *The regressions coefficients are:*
 *incerpet - 2*
 *beta1 - 2*
 *beta2 - 0.3*

(b) What is the correlation between “x1” and “x2” ? Create a scatter plot displaying the relationship between the variables.

cor(x1, x2)

## [1] 0.8351212

plot(x1, x2)

Ans: The correlation between x1 and x2 is 0.835.

(c) Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are ˆ β0, ˆ β1, and ˆ β2? How do these relate to the true β0, β1, and β2? Can you reject the null hypothesis H0 : β1 = 0? How about the null hypothesis H0 : β2 = 0?

summary(lm(y ~ x1 + x2))

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8311 -0.7273 -0.0537  0.6338  2.3359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
## x1            1.4396     0.7212   1.996   0.0487 *  
## x2            1.0097     1.1337   0.891   0.3754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.056 on 97 degrees of freedom
## Multiple R-squared:  0.2088, Adjusted R-squared:  0.1925 
## F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05

Ans: The estimated beta values are intercept = 2.1305, beta1 = 1.4396, and beta2 = 1.0097 . Only estimated value of the intercept is close to the true intercept value, rest all estimates are different. At significance level 0.05, we may reject null hypothesis for beta1 but not for beta2.

(d) Now fit a least squares regression to predict “y” using only “x1”. Comment on your results. Can you reject the null hypothesis H0:β1=0 ?

summary(lm(y ~ x1))

## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.89495 -0.66874 -0.07785  0.59221  2.45560 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
## x1            1.9759     0.3963   4.986 2.66e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.055 on 98 degrees of freedom
## Multiple R-squared:  0.2024, Adjusted R-squared:  0.1942 
## F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06

Ans: In this simple model the beta value for x1 is significant and can reject the null hypothesis here more confidently as compared to the full model in previous scenario.

(e) Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis H0 : β1 = 0?

summary(lm(y ~ x2))

## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.62687 -0.75156 -0.03598  0.72383  2.44890 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
## x2            2.8996     0.6330    4.58 1.37e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.072 on 98 degrees of freedom
## Multiple R-squared:  0.1763, Adjusted R-squared:  0.1679 
## F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05

Ans: As compared to the full model, here the estimate of b1 for x2 is significant and we can reject the null hypothesis

(f) Do the results obtained in (c)-(e) contradict each other ? Explain your answer.

Ans: Yes, the results in the above three scenarios contradict each other. When we model the response separately with x1 and x2, both predictors are significant, however as per scenario (c), x1 may be considered significant and x2 is not significant. This could be possible because of strong positive linear correlation of 0.835 between the predictors that is resulting in collinearity problem. Collinearity reduces the accuracy of the estimates of regression coefficients. Due to collinearity the standard error in scenario (c) for both the predictor coefficients is very high as compared to the standard errors in scenario (d) and (e). Hence, the importance of predictor x2 is masked due to collinearity in the full model.

(g) Now suppose we obtain one additional observation, which was unfortunately mismeasured. Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.

x1 <- c(x1, 0.1)
x2 <- c(x2, 0.8)
y <- c(y, 6)

summary(lm(y ~ x1 + x2)) ## scenario (c)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.73348 -0.69318 -0.05263  0.66385  2.30619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2267     0.2314   9.624 7.91e-16 ***
## x1            0.5394     0.5922   0.911  0.36458    
## x2            2.5146     0.8977   2.801  0.00614 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.075 on 98 degrees of freedom
## Multiple R-squared:  0.2188, Adjusted R-squared:  0.2029 
## F-statistic: 13.72 on 2 and 98 DF,  p-value: 5.564e-06

par(mfrow = c(2,2))
plot((lm(y ~ x1 + x2)))

summary(lm(y ~ x1)) ## scenario (d) with new data

## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8897 -0.6556 -0.0909  0.5682  3.5665 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.2569     0.2390   9.445 1.78e-15 ***
## x1            1.7657     0.4124   4.282 4.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.111 on 99 degrees of freedom
## Multiple R-squared:  0.1562, Adjusted R-squared:  0.1477 
## F-statistic: 18.33 on 1 and 99 DF,  p-value: 4.295e-05

par(mfrow = c(2,2))
plot((lm(y ~ x1)))

summary(lm(y ~ x2)) ## scenario (e) with new data

## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.64729 -0.71021 -0.06899  0.72699  2.38074 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.3451     0.1912  12.264  < 2e-16 ***
## x2            3.1190     0.6040   5.164 1.25e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.074 on 99 degrees of freedom
## Multiple R-squared:  0.2122, Adjusted R-squared:  0.2042 
## F-statistic: 26.66 on 1 and 99 DF,  p-value: 1.253e-06

par(mfrow = c(2,2))
plot((lm(y ~  x2)))

Ans: With the introduction of new observation, x1 is now insignificant in the full model, however, individually it is still significant. x2 is significant in all the scenarios. In the full model (x1 + x2) and simple model with x2 only, the new observation is a high leverage point. Howevre, in the model only with predictor x1, the observation is an outlier.

Question 3.7.15 (Difficulty Level = 2 or 3)

This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response,and the other variables are the predictors.

(a) For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

dat = Boston

x = dat[,2:13]
idv = colnames(dat[,2:13])
intercept = vector()
beta1 = vector()
std_err =vector()
t_val = vector()
p_val = vector()
r_squared = vector()
adj_r_squared = vector()



for (i in (1:12)){
  intercept<- c(intercept,round(summary(lm(dat$crim ~ x[,i]))$coefficients[[1]],3))
  beta1 <- c(beta1,round(summary(lm(dat$crim ~ x[,i]))$coefficients[[2]],3))
  std_err <- c(std_err, round(summary(lm(dat$crim ~ x[,i]))$coefficients[[4]],3))
  t_val <- c(t_val,round(summary(lm(dat$crim ~ x[,i]))$coefficients[[6]],3))
  p_val <- c(p_val,round(summary(lm(dat$crim ~ x[,i]))$coefficients[[8]],3))
  r_squared <- c(r_squared, round(summary(lm(dat$crim ~ x[,i]))$r.squared,3))
  adj_r_squared <- c(adj_r_squared, round(summary(lm(dat$crim ~ x[,i]))$adj.r.squared,3))
  
}

out <- data.frame("idv" = idv, "intercept" = intercept, "beta1" = beta1,
                  "std_err" = std_err, "t_val" = t_val, "p_val" = p_val,
                  "r_squared" = r_squared, "adj_r_squared" = adj_r_squared)
knitr::kable(out)

idv	intercept	beta1	std_err	t_val	p_val	r_squared	adj_r_squared
zn	4.454	-0.074	0.016	-4.594	0.000	0.040	0.038
indus	-2.064	0.510	0.051	9.991	0.000	0.165	0.164
chas	3.744	-1.893	1.506	-1.257	0.209	0.003	0.001
nox	-13.720	31.249	2.999	10.419	0.000	0.177	0.176
rm	20.482	-2.684	0.532	-5.045	0.000	0.048	0.046
age	-3.778	0.108	0.013	8.463	0.000	0.124	0.123
dis	9.499	-1.551	0.168	-9.213	0.000	0.144	0.142
rad	-2.287	0.618	0.034	17.998	0.000	0.391	0.390
tax	-8.528	0.030	0.002	16.099	0.000	0.340	0.338
ptratio	-17.647	1.152	0.169	6.801	0.000	0.084	0.082
lstat	-3.331	0.549	0.048	11.491	0.000	0.208	0.206
medv	11.797	-0.363	0.038	-9.460	0.000	0.151	0.149

Ans: We are assuming 5% (0.05) significance level here. All the “beta1” values respective to the independent
variables are significantly different than zero and be used as predictors for the dependent variable “crim” except for the independent variable chas as there seems to be no association of the tracts surrounding Charles river over crime rate in the area. All the variables appear to be significant as predictors individually, the respective coefficient of determination (r_squared) and adjusted r_squared values are high and close to each for variables indus, nox, age, dis, rad, tax, lstat, and medv and are able to explain the variance in the dependent variable by more than 10%. Only one model each has been made for each predictor, so analyzing adjusted coefficient of determination is not recommended. However, if we compare the r_squared values and adj_r_squared values, predictors “tax” and ”rad” seem to perform better out of all the 12 models.*

 *Below we can observe from the Normal QQ plot of residuals for the variables "rad", "tax", "nox", and "chas"*
 *that the residuals are not normally distributed for any variable, however, we can observe that the residuals*
 *for "rad" and "tax" seem to have better distribution as compared to the rest of the two.*

par(mfrow = c(2,2))
qqnorm((lm(dat$crim ~ dat$rad))$residuals, main = "Residual Distribution plot for 'rad'")
qqline((lm(dat$crim ~ dat$rad))$residuals)
qqnorm((lm(dat$crim ~ dat$tax))$residuals, main = "Residual Distribution plot for 'tax'")
qqline((lm(dat$crim ~ dat$tax))$residuals)
qqnorm((lm(dat$crim ~ dat$nox))$residuals, main = "Residual Distribution plot for 'nox'")
qqline((lm(dat$crim ~ dat$nox))$residuals)
qqnorm((lm(dat$crim ~ dat$chas))$residuals, main = "Residual Distribution plot for 'chas'")
qqline((lm(dat$crim ~ dat$chas))$residuals)

(b) Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0?

summary(lm(dat$crim ~. , data = dat))

## 
## Call:
## lm(formula = dat$crim ~ ., data = dat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.534 -2.248 -0.348  1.087 73.923 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.7783938  7.0818258   1.946 0.052271 .  
## zn           0.0457100  0.0187903   2.433 0.015344 *  
## indus       -0.0583501  0.0836351  -0.698 0.485709    
## chas        -0.8253776  1.1833963  -0.697 0.485841    
## nox         -9.9575865  5.2898242  -1.882 0.060370 .  
## rm           0.6289107  0.6070924   1.036 0.300738    
## age         -0.0008483  0.0179482  -0.047 0.962323    
## dis         -1.0122467  0.2824676  -3.584 0.000373 ***
## rad          0.6124653  0.0875358   6.997 8.59e-12 ***
## tax         -0.0037756  0.0051723  -0.730 0.465757    
## ptratio     -0.3040728  0.1863598  -1.632 0.103393    
## lstat        0.1388006  0.0757213   1.833 0.067398 .  
## medv        -0.2200564  0.0598240  -3.678 0.000261 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.46 on 493 degrees of freedom
## Multiple R-squared:  0.4493, Adjusted R-squared:  0.4359 
## F-statistic: 33.52 on 12 and 493 DF,  p-value: < 2.2e-16

Ans: Observing the residuals of the full model with all the predictors, it can be claimed that the residuals are not normally distributed and hence, the model does not satisfy the assumptions of linear regression. At significance of 0.05, the null hypothesis for predictors “zn”, “dis”, “rad”, and “medv” can be rejected. From the summary we infer that the overall crime rate across all tracts on an average would be 13.78 (baseline model). Predictors “indus”, “chas”, “nox”, “age”, “dis”, “tax”, “ptratio”, and “medv” negatively associated with the expectation of the mean of “crim”. Predictors are able to explain 44.93% variation in crime rate. So taking all variables together we are able to explain higher variation in the crime rate. Comparing to simple regression where the highest variance explained was 39%, using the full model we have been able to explain additional 5.9% variation in the crime rate.

(c) How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis.That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

z = summary(lm(dat$crim ~. , data = dat))$coefficients
h = data.frame(z[,1])
comp = data.frame("idv" = idv,
                  "simple_model" = out[,3],
                  "multiple_model" = h[-1,1])
comp

##        idv simple_model multiple_model
## 1       zn       -0.074   0.0457100386
## 2    indus        0.510  -0.0583501107
## 3     chas       -1.893  -0.8253775522
## 4      nox       31.249  -9.9575865471
## 5       rm       -2.684   0.6289106622
## 6      age        0.108  -0.0008482791
## 7      dis       -1.551  -1.0122467382
## 8      rad        0.618   0.6124653115
## 9      tax        0.030  -0.0037756465
## 10 ptratio        1.152  -0.3040727572
## 11   lstat        0.549   0.1388005968
## 12    medv       -0.363  -0.2200563590

plot(comp$simple_model, comp$multiple_model)

library(car)

## Loading required package: carData

data.frame(vif(lm(dat$crim ~. , data = dat)))

##         vif.lm.dat.crim......data...dat..
## zn                               2.323944
## indus                            3.983627
## chas                             1.093242
## nox                              4.546642
## rm                               2.201688
## age                              3.088678
## dis                              4.280979
## rad                              7.029796
## tax                              9.195493
## ptratio                          1.969732
## lstat                            3.538098
## medv                             3.663205

knitr::kable(round(cor(dat), 2))

	crim	zn	indus	chas	nox	rm	age	dis	rad	tax	ptratio	lstat	medv
crim	1.00	-0.20	0.41	-0.06	0.42	-0.22	0.35	-0.38	0.63	0.58	0.29	0.46	-0.39
zn	-0.20	1.00	-0.53	-0.04	-0.52	0.31	-0.57	0.66	-0.31	-0.31	-0.39	-0.41	0.36
indus	0.41	-0.53	1.00	0.06	0.76	-0.39	0.64	-0.71	0.60	0.72	0.38	0.60	-0.48
chas	-0.06	-0.04	0.06	1.00	0.09	0.09	0.09	-0.10	-0.01	-0.04	-0.12	-0.05	0.18
nox	0.42	-0.52	0.76	0.09	1.00	-0.30	0.73	-0.77	0.61	0.67	0.19	0.59	-0.43
rm	-0.22	0.31	-0.39	0.09	-0.30	1.00	-0.24	0.21	-0.21	-0.29	-0.36	-0.61	0.70
age	0.35	-0.57	0.64	0.09	0.73	-0.24	1.00	-0.75	0.46	0.51	0.26	0.60	-0.38
dis	-0.38	0.66	-0.71	-0.10	-0.77	0.21	-0.75	1.00	-0.49	-0.53	-0.23	-0.50	0.25
rad	0.63	-0.31	0.60	-0.01	0.61	-0.21	0.46	-0.49	1.00	0.91	0.46	0.49	-0.38
tax	0.58	-0.31	0.72	-0.04	0.67	-0.29	0.51	-0.53	0.91	1.00	0.46	0.54	-0.47
ptratio	0.29	-0.39	0.38	-0.12	0.19	-0.36	0.26	-0.23	0.46	0.46	1.00	0.37	-0.51
lstat	0.46	-0.41	0.60	-0.05	0.59	-0.61	0.60	-0.50	0.49	0.54	0.37	1.00	-0.74
medv	-0.39	0.36	-0.48	0.18	-0.43	0.70	-0.38	0.25	-0.38	-0.47	-0.51	-0.74	1.00

Ans: The most significant difference we can observe between coefficient values of simple and multiple model for “nox”. For simple model the coefficient for “nox” is 31.25, whereas in multiple model it is -9.96. The coefficients differ for other predictors as well. Observing the variance inflation factor , we can infer that any variable having a VIF value of more than 2.5 is resulting into collinearity problem. The difference stems from the fact that in a simple model the average change in the response is observed basis only one predictor, as there would be no association or influence over crime rate from other independent variables in their absence. However, in the multiple model the average change in the response is observed basis a predictor, holding other predictors fixed. It can be observed that few predictors are impacting the response in opposite directions in simple versus multiple model is due to independent variables are running into multi-collinearity problem. Basis the VIF value we can observe that variables indus, nox, age, dis, rad, tax, lstat, and medv are contributing to multi-collinearity issue. “nox” has very high negative correlation with “dis” and very high positive correlation with “age”. Also, form the above coefficient scatter plot, we can observe a strong negative correlation among coefficients of both the models.

(d) Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form Y = β0 + β1X + β2X2 + β3X3 + ϵ.

ZN = summary(lm(dat$crim ~ poly(dat$zn, 3)))
ZN

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$zn, 3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.821 -4.614 -1.294  0.473 84.130 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        3.6135     0.3722   9.709  < 2e-16 ***
## poly(dat$zn, 3)1 -38.7498     8.3722  -4.628  4.7e-06 ***
## poly(dat$zn, 3)2  23.9398     8.3722   2.859  0.00442 ** 
## poly(dat$zn, 3)3 -10.0719     8.3722  -1.203  0.22954    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.372 on 502 degrees of freedom
## Multiple R-squared:  0.05824,    Adjusted R-squared:  0.05261 
## F-statistic: 10.35 on 3 and 502 DF,  p-value: 1.281e-06

Ans: For predictor “zn” cubic polynomial coefficient is not statistically significant but quadratic coefficient is.Hence, there could be existence of non linear relationship between response and predictor at the quadratic level

Indus = summary(lm(dat$crim ~ poly(dat$indus, 3)))
Indus

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$indus, 3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.278 -2.514  0.054  0.764 79.713 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            3.614      0.330  10.950  < 2e-16 ***
## poly(dat$indus, 3)1   78.591      7.423  10.587  < 2e-16 ***
## poly(dat$indus, 3)2  -24.395      7.423  -3.286  0.00109 ** 
## poly(dat$indus, 3)3  -54.130      7.423  -7.292  1.2e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.423 on 502 degrees of freedom
## Multiple R-squared:  0.2597, Adjusted R-squared:  0.2552 
## F-statistic: 58.69 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

Charles = summary(lm(dat$crim ~ poly(dat$nox, 3)))
Charles

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$nox, 3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.110 -2.068 -0.255  0.739 78.302 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.6135     0.3216  11.237  < 2e-16 ***
## poly(dat$nox, 3)1  81.3720     7.2336  11.249  < 2e-16 ***
## poly(dat$nox, 3)2 -28.8286     7.2336  -3.985 7.74e-05 ***
## poly(dat$nox, 3)3 -60.3619     7.2336  -8.345 6.96e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.234 on 502 degrees of freedom
## Multiple R-squared:  0.297,  Adjusted R-squared:  0.2928 
## F-statistic: 70.69 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

Age = summary(lm(dat$crim ~ poly(dat$age, 3)))
Age

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$age, 3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.762 -2.673 -0.516  0.019 82.842 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.6135     0.3485  10.368  < 2e-16 ***
## poly(dat$age, 3)1  68.1820     7.8397   8.697  < 2e-16 ***
## poly(dat$age, 3)2  37.4845     7.8397   4.781 2.29e-06 ***
## poly(dat$age, 3)3  21.3532     7.8397   2.724  0.00668 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared:  0.1742, Adjusted R-squared:  0.1693 
## F-statistic: 35.31 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

Dis = summary(lm(dat$crim ~ poly(dat$dis, 3)))
Dis

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$dis, 3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.757  -2.588   0.031   1.267  76.378 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.6135     0.3259  11.087  < 2e-16 ***
## poly(dat$dis, 3)1 -73.3886     7.3315 -10.010  < 2e-16 ***
## poly(dat$dis, 3)2  56.3730     7.3315   7.689 7.87e-14 ***
## poly(dat$dis, 3)3 -42.6219     7.3315  -5.814 1.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.331 on 502 degrees of freedom
## Multiple R-squared:  0.2778, Adjusted R-squared:  0.2735 
## F-statistic: 64.37 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

Rad = summary(lm(dat$crim ~ poly(dat$rad, 3)))
Rad

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$rad, 3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.381  -0.412  -0.269   0.179  76.217 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.6135     0.2971  12.164  < 2e-16 ***
## poly(dat$rad, 3)1 120.9074     6.6824  18.093  < 2e-16 ***
## poly(dat$rad, 3)2  17.4923     6.6824   2.618  0.00912 ** 
## poly(dat$rad, 3)3   4.6985     6.6824   0.703  0.48231    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.682 on 502 degrees of freedom
## Multiple R-squared:    0.4,  Adjusted R-squared:  0.3965 
## F-statistic: 111.6 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: For predictor “rad” cubic polynomial coefficient is not statistically significant but quadratic coefficient is.Hence, there could be existence of non linear relationship between response and predictor at the quadratic level

Tax = summary(lm(dat$crim ~ poly(dat$tax, 3)))
Tax

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$tax, 3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.273  -1.389   0.046   0.536  76.950 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         3.6135     0.3047  11.860  < 2e-16 ***
## poly(dat$tax, 3)1 112.6458     6.8537  16.436  < 2e-16 ***
## poly(dat$tax, 3)2  32.0873     6.8537   4.682 3.67e-06 ***
## poly(dat$tax, 3)3  -7.9968     6.8537  -1.167    0.244    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.854 on 502 degrees of freedom
## Multiple R-squared:  0.3689, Adjusted R-squared:  0.3651 
## F-statistic:  97.8 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: For predictor “tax” cubic polynomial coefficient is not statistically significant but quadratic coefficient is.Hence, there could be existence of non linear relationship between response and predictor at the quadratic level

Ptratio = summary(lm(dat$crim ~ poly(dat$ptratio, 3)))
Ptratio

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$ptratio, 3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.833 -4.146 -1.655  1.408 82.697 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              3.614      0.361  10.008  < 2e-16 ***
## poly(dat$ptratio, 3)1   56.045      8.122   6.901 1.57e-11 ***
## poly(dat$ptratio, 3)2   24.775      8.122   3.050  0.00241 ** 
## poly(dat$ptratio, 3)3  -22.280      8.122  -2.743  0.00630 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.122 on 502 degrees of freedom
## Multiple R-squared:  0.1138, Adjusted R-squared:  0.1085 
## F-statistic: 21.48 on 3 and 502 DF,  p-value: 4.171e-13

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

Lstat = summary(lm(dat$crim ~ poly(dat$lstat, 3)))
Lstat

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$lstat, 3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.234  -2.151  -0.486   0.066  83.353 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.6135     0.3392  10.654   <2e-16 ***
## poly(dat$lstat, 3)1  88.0697     7.6294  11.543   <2e-16 ***
## poly(dat$lstat, 3)2  15.8882     7.6294   2.082   0.0378 *  
## poly(dat$lstat, 3)3 -11.5740     7.6294  -1.517   0.1299    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.629 on 502 degrees of freedom
## Multiple R-squared:  0.2179, Adjusted R-squared:  0.2133 
## F-statistic: 46.63 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so so this at least presents that such a model is preferable here.

Medv = summary(lm(dat$crim ~ poly(dat$medv, 3)))
Medv

## 
## Call:
## lm(formula = dat$crim ~ poly(dat$medv, 3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.427  -1.976  -0.437   0.439  73.655 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           3.614      0.292  12.374  < 2e-16 ***
## poly(dat$medv, 3)1  -75.058      6.569 -11.426  < 2e-16 ***
## poly(dat$medv, 3)2   88.086      6.569  13.409  < 2e-16 ***
## poly(dat$medv, 3)3  -48.033      6.569  -7.312 1.05e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.569 on 502 degrees of freedom
## Multiple R-squared:  0.4202, Adjusted R-squared:  0.4167 
## F-statistic: 121.3 on 3 and 502 DF,  p-value: < 2.2e-16

Ans: Here the polynomial coefficients are statistically significant and so this at least presents that such a model is preferable here.

ISLR2 – Exercise 3.7 Solution

Abhinav Kumar

1/6/2022

Question 3.7.1 (Difficulty Level - 1)

Question 3.7.3 (Difficulty Level - 2 or 3)

Question 3.7.10 (Difficulty Level - 1)

Question 3.7.15 (Difficulty Level = 2 or 3)