ISLR Chapter 3 Question 15 Solution

This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response ? Create some plots to back up your assertions.

#Loading the MASS package
library(MASS)
#reading the Boston Dataset
data("Boston")
attach(Boston)

The level of statistical significance is often expressed as a p-value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject the null hypothesis.

If p-value is less than 0.05, then it’s statistically significant.. It indicates strong evidence against the null hypothesis, as there is less than a 5% probability the null is correct. Therefore, we reject the null hypothesis, and accept the alternative hypothesis.

The simple linear regression can be defined by a model

Y = $β_0$ + $β_1$ X+ $$ where;

Y= Response variable

X = Predictor variable

$β_0$ = Intercept

$β_1$ = Slope

$$ = Irreducible error

Crim (per capita crime rate) and zn (proportion of residential land zoned for lots over 25,000 sq.ft)

Since the p-value of the crim vs zn model is 0.05, meaning the chance of having a null hypothesis ($β_0$) is very low. Therefore we conclude that there is a statistically significant association between crim and zn.

The model’s R squared value of 0.04019 and Adjusted R squared value of 0.03828 are relatively smaller which makes it less significant.

From the summary statistics of the model,

Y(crim) = $β_0$ + $β_1$ (zn)X

$crim = 4.45369-0.07393x$

#linear model crim ~ zn
linear_fit.zn <- lm(crim ~zn)
summary(linear_fit.zn)

## 
## Call:
## lm(formula = crim ~ zn)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.429 -4.222 -2.620  1.250 84.523 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.45369    0.41722  10.675  < 2e-16 ***
## zn          -0.07393    0.01609  -4.594 5.51e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.435 on 504 degrees of freedom
## Multiple R-squared:  0.04019,    Adjusted R-squared:  0.03828 
## F-statistic:  21.1 on 1 and 504 DF,  p-value: 5.506e-06

#Ploting the model
plot(zn,crim,pch = 20, main = "Relationship between crim and zn")
abline(linear_fit.zn,col = "blue",lwd = 3)
legend("topleft", c( "Regression"), col = c("blue"), lty = c(1, 1))

The above graph confirms that there is a low negative relationship between the per capita crime rate and zn.

Per capita crime rate(crim) and Indus (proportion of non-retail business acres per town).

There is a statistically significant relationship between the per capita and Indus. This is because the p-value of the model is 2e-16 which is far less than 0.05

The model can be described as;

Y(crim)=$β_0$+ $β_1$ (indus)X

$crim=-2.06374+0.50978x$

#linear model crim ~ indus
linear_fit.indus <- lm(crim ~ indus)
summary(linear_fit.indus)

## 
## Call:
## lm(formula = crim ~ indus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.972  -2.698  -0.736   0.712  81.813 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.06374    0.66723  -3.093  0.00209 ** 
## indus        0.50978    0.05102   9.991  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.866 on 504 degrees of freedom
## Multiple R-squared:  0.1653, Adjusted R-squared:  0.1637 
## F-statistic: 99.82 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(indus,crim,pch = 20, main = "Relationship between crim and indus")
abline(linear_fit.indus,col = "red",lwd = 3)
legend("topleft", c( "Regression"), col = c("red"), lty = c(1, 1))

It’s evident from the graph above that there is a slightly positive relationship between the per capita crime rate and Indus.

Per capita crime rate(crim) and chas (Charles River dummy variable)

The p-value of the model is 0.209 which is must great than 0.05 and this means that the chances of having a null hypothesis are high and therefore chas is not statistically significant. The R-squared value of 0.003124 and Adjusted R squared value of 0.001146 are extremely low which continues to confirm that there is no statistically significant association between per capita crime rate and chas.

#linear model crim ~ chas
linear_fit.chas <- lm(crim ~ chas)
summary(linear_fit.chas)

## 
## Call:
## lm(formula = crim ~ chas)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.738 -3.661 -3.435  0.018 85.232 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.7444     0.3961   9.453   <2e-16 ***
## chas         -1.8928     1.5061  -1.257    0.209    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.597 on 504 degrees of freedom
## Multiple R-squared:  0.003124,   Adjusted R-squared:  0.001146 
## F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094

#Ploting the model
plot(chas,crim,pch = 20, main = "Relationship between crim and chas")
abline(linear_fit.chas,col = "green",lwd = 3)
legend("topleft", c( "Regression"), col = c("green"), lty = c(1, 1))

The above graph illustrates that a change in chas is not accompanied by an increase in the per capita crime rate. And therefore we can conclude that there is no relationship between chas and the per capita crime rate.

Per capita crime rate(crim) and nox (nitrogen oxides concentration)

The 2e-16 p-value of the model is far less than 0.05 which makes the relationship between per capita crime rate and nitrogen oxide concentration statistically significant.

Mathematically we can conclude that;

Y(crim)=$β_0$+ $β_1$ (nox)X

$crim=-13.720+31.249x$

#linear model crim ~ nox
linear_fit.nox <- lm(crim ~ nox)
summary(linear_fit.nox)

## 
## Call:
## lm(formula = crim ~ nox)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.371  -2.738  -0.974   0.559  81.728 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -13.720      1.699  -8.073 5.08e-15 ***
## nox           31.249      2.999  10.419  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.81 on 504 degrees of freedom
## Multiple R-squared:  0.1772, Adjusted R-squared:  0.1756 
## F-statistic: 108.6 on 1 and 504 DF,  p-value: < 2.2e-16

##Ploting the model
plot(nox,crim,pch = 20, main = "Relationship between crim and nox")
abline(linear_fit.nox,col = "red",lwd = 3)
legend("topleft", c( "Regression"), col = c("red"), lty = c(1, 1))

The graph above confirms the assertion that there exists a slightly lower positive correlation between the per capita crime rate and nitrogen oxides concentration-nox. This low significant association is again confirmed by the low R squared value of 0.1772 and Adjusted R squared value of 0.1756.

Per capita crime rate(crim) and rm (average number of rooms per dwelling)

There is a statistically significant association between the per capita crime rate and the average number of rooms per dwelling(rm) because of the p- of 6.35e-7 is smaller than 0.05. value. But this significance is low because of the low R squared value of 0.04807 and Adjusted R squared value of 0.04618.

#linear model crim ~ rm
linear_fit.rm <- lm(crim ~ rm)
summary(linear_fit.rm)

## 
## Call:
## lm(formula = crim ~ rm)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.604 -3.952 -2.654  0.989 87.197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.482      3.365   6.088 2.27e-09 ***
## rm            -2.684      0.532  -5.045 6.35e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.401 on 504 degrees of freedom
## Multiple R-squared:  0.04807,    Adjusted R-squared:  0.04618 
## F-statistic: 25.45 on 1 and 504 DF,  p-value: 6.347e-07

#Ploting the model
plot(rm,crim,pch = 20, main = "Relationship between crim and rm",
     xlab = "average number of rooms per dwelling 'rm'", ylab = "per capita crime rate 'crim'")
abline(linear_fit.rm,col = "blue",lwd = 3)
legend("topleft", c( "Regression"), col = c("blue"), lty = c(1, 1))

The graph above shows that the relationship between the per capita crime rate and rm is defined by a low sloping value.

Per capita crime rate(crim) and age (proportion of owner-occupied units built prior to 1940)

There exists a statistically significant relationship between per capita crime and age because of the low p-value of 2.85e-16. This association is quite small due to the low values of the R squared value of 0.1244 and Adjusted R squared value of 0.1227.

#linear model crim ~ age
linear_fit.age <- lm(crim ~ age)
summary(linear_fit.age)

## 
## Call:
## lm(formula = crim ~ age)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.789 -4.257 -1.230  1.527 82.849 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.77791    0.94398  -4.002 7.22e-05 ***
## age          0.10779    0.01274   8.463 2.85e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.057 on 504 degrees of freedom
## Multiple R-squared:  0.1244, Adjusted R-squared:  0.1227 
## F-statistic: 71.62 on 1 and 504 DF,  p-value: 2.855e-16

#Ploting the model
plot(age,crim,pch = 20, main = "Relationship between crim and age",
      ylab = "per capita crime rate 'crim'")
abline(linear_fit.age,col = "green",lwd = 3)
legend("topleft", c( "Regression"), col = c("green"), lty = c(1, 1))

Its clearly demonstrated in the figure above that their exits a low positive relationship between per capita crime rate and age.

Per capita crime rate(crim) and dis (weighted mean of distances to five Boston employment centers).

Because of the small p-value of 2e-16, there exists a statistically significant association between per capita crime rate and dis variable. But we can also confirm that this association is quite small due to the low R squared value of 0.1441 and Adjusted R squared value of 0.1425.

Mathematically we can conclude that;

Y(crim)=$β_0$+ $β_1$ (dis)X

$crim=9.4993-1.5509x$

#linear model crim ~ dis
linear_fit.dis <- lm(crim ~ dis)
summary(linear_fit.dis)

## 
## Call:
## lm(formula = crim ~ dis)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.708 -4.134 -1.527  1.516 81.674 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.4993     0.7304  13.006   <2e-16 ***
## dis          -1.5509     0.1683  -9.213   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.965 on 504 degrees of freedom
## Multiple R-squared:  0.1441, Adjusted R-squared:  0.1425 
## F-statistic: 84.89 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(dis,crim,pch = 20, main = "Relationship between crim and dis",
     ylab = "per capita crime rate 'crim'")
abline(linear_fit.dis,col = "red",lwd = 3)
legend("topleft", c( "Regression"), col = c("red"), lty = c(1, 1))

There is a low sloping relationship between the per capita crime rate and dis as illustrated by the figure above.

Per capita crime rate(crim) and rad (index of accessibility to radial highways).

There is a statistically significant association between the per capita crime rate and rad because of the low p-value of 2e-16. This significance is small due to low R squared value of 0.3913 and Adjusted R squared value of 0.39

#linear model crim ~ rad
linear_fit.rad <- lm(crim ~ rad)
summary(linear_fit.rad)

## 
## Call:
## lm(formula = crim ~ rad)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.164  -1.381  -0.141   0.660  76.433 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.28716    0.44348  -5.157 3.61e-07 ***
## rad          0.61791    0.03433  17.998  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.718 on 504 degrees of freedom
## Multiple R-squared:  0.3913, Adjusted R-squared:   0.39 
## F-statistic: 323.9 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(rad,crim,pch = 20, main = "Relationship between crim and rad",
     ylab = "per capita crime rate 'crim'")
abline(linear_fit.rad,col = "blue",lwd = 3)
legend("topleft", c( "Regression"), col = c("blue"), lty = c(1, 1))

The graph above confirms the existence of a statistically significant relationship between the per capita crime rate and rad.

Per capita crime rate(crim) and tax (full-value property-tax rate per $10,000).

Between the per capita crime rate and tax, there is a statistically significant association, and this due to the small p-value of 2e-16. Though this relationship is small due to low values of R squared and Adjusted R squared.

#linear model crim ~ tax
linear_fit.tax <- lm(crim ~ tax)
summary(linear_fit.tax)

## 
## Call:
## lm(formula = crim ~ tax)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.513  -2.738  -0.194   1.065  77.696 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.528369   0.815809  -10.45   <2e-16 ***
## tax          0.029742   0.001847   16.10   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.997 on 504 degrees of freedom
## Multiple R-squared:  0.3396, Adjusted R-squared:  0.3383 
## F-statistic: 259.2 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(tax,crim,pch = 20, main = "Relationship between crim and tax",
     ylab = "per capita crime rate 'crim'")
abline(linear_fit.tax,col = "green",lwd = 3)
legend("topleft", c( "Regression"), col = c("green"), lty = c(1, 1))

The graph above shows and confirms that there is a slightly positive relationship between the per capita rate and tax.

Per capita crime rate(crim) and ptratio (pupil-teacher ratio by town)

There a statistically significant association between per capita crime and ptratio because of the model’s small p-value of 2.94e-11. But this relation is not much significant due to the low values of R squared and Adjusted R squared.

#linear model crim ~ ptratio
linear_fit.ptratio <- lm(crim ~ ptratio)
summary(linear_fit.ptratio)

## 
## Call:
## lm(formula = crim ~ ptratio)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.654 -3.985 -1.912  1.825 83.353 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17.6469     3.1473  -5.607 3.40e-08 ***
## ptratio       1.1520     0.1694   6.801 2.94e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.24 on 504 degrees of freedom
## Multiple R-squared:  0.08407,    Adjusted R-squared:  0.08225 
## F-statistic: 46.26 on 1 and 504 DF,  p-value: 2.943e-11

#Ploting the model
plot(ptratio,crim,pch = 20, main = "Relationship between crim and ptratio",
     ylab = "per capita crime rate 'crim'", xlab = "pupil-teacher ratio by town (ptratio)")
abline(linear_fit.ptratio,col = "red",lwd = 3)
legend("topleft", c( "Regression"), col = c("red"), lty = c(1, 1))

A conclusion can be made based on the above graph that between the per capita crime rate and ptratio there is a low positive correlation.

Per capita crime rate(crim) and black (1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town).

Basing on the low p-value of 2e-16 which is much less than 0.05, we can state that there is a statistically significant association between the per capita crime rate and the black variable. The association is quite small due to a low R squared value of 0.1483 and Adjusted R squared value of 0.1466.

Mathematically we can conclude that;

Y(crim)=$β_0$+ $β_1 $(black)X

$crim=16.553529-0.036280x$

#linear model crim ~ black
linear_fit.black <- lm(crim ~ black)
summary(linear_fit.black)

## 
## Call:
## lm(formula = crim ~ black)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.756  -2.299  -2.095  -1.296  86.822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.553529   1.425903  11.609   <2e-16 ***
## black       -0.036280   0.003873  -9.367   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.946 on 504 degrees of freedom
## Multiple R-squared:  0.1483, Adjusted R-squared:  0.1466 
## F-statistic: 87.74 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(black,crim,pch = 20, main = "Relationship between crim and black",
     ylab = "per capita crime rate 'crim'")
abline(linear_fit.black,col = "blue",lwd = 3)
legend("topleft", c( "Regression"), col = c("blue"), lty = c(1, 1))

The graph above illustrates and confirms the statistically significant relationship between the per capita crime rate and black.

Per capita crime rate(crim) and lstat (lower status of the population (percent)).

The p-value of 2e-16 is way below the 0.05 and therefore we can conclude that there is a statistically significant association between the per capita crime rate and lstat. This significance is low due to small R squared value 0.2076 and Adjusted R squared value of 0.206.

#linear model crim ~ lstat
linear_fit.lstat <- lm(crim ~ lstat)
summary(linear_fit.lstat)

## 
## Call:
## lm(formula = crim ~ lstat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.925  -2.822  -0.664   1.079  82.862 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.33054    0.69376  -4.801 2.09e-06 ***
## lstat        0.54880    0.04776  11.491  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.664 on 504 degrees of freedom
## Multiple R-squared:  0.2076, Adjusted R-squared:  0.206 
## F-statistic:   132 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(lstat,crim,pch = 20, main = "Relationship between crim and lstat",
     ylab = "per capita crime rate 'crim'", xlab = "lower status of the population (percent)")
abline(linear_fit.lstat,col = "green",lwd = 3)
legend("topleft", c( "Regression"), col = c("green"), lty = c(1, 1))

The graph above evidently confirms that there a statistically significant positive relationship between the per capita crime rate and lstat.

Per capita crime rate(crim) and medv (median value of owner-occupied homes in $1000s).

There is a statistically significant association between the per capita crime rate and medv because of the small p-value of 2e-16. The R squared value of 0.1508 and Adjusted R squared value of 0.1491 are low which means the significance between the two variables is low.

#linear model crim ~ medv
linear_fit.medv <- lm(crim ~ medv)
summary(linear_fit.medv)

## 
## Call:
## lm(formula = crim ~ medv)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.071 -4.022 -2.343  1.298 80.957 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 11.79654    0.93419   12.63   <2e-16 ***
## medv        -0.36316    0.03839   -9.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.934 on 504 degrees of freedom
## Multiple R-squared:  0.1508, Adjusted R-squared:  0.1491 
## F-statistic: 89.49 on 1 and 504 DF,  p-value: < 2.2e-16

#Ploting the model
plot(medv,crim,pch = 20, main = "Relationship between crim and medv",
     ylab = "per capita crime rate 'crim'", xlab = "median value of owner-occupied homes")
abline(linear_fit.medv,col = "red",lwd = 3)
legend("topleft", c( "Regression"), col = c("red"), lty = c(1, 1))

The relationship between the per capita crime rate and medv is slightly sloping as demonstrated by the above graph.

In conclusion, there is a statistically significant relationship between the predictors and the response for every variable except chas (Charles River Dummy). When we observe the variables and per capita crime rate in the plotted scatter plots, one can state that there is a general linear regression with the variables which would allow a better prediction of crime.

And also, all predictors seem to have a slightly sloping or a slight increase with the response variable as shown by the graphs above. The R squared and Adjusted R squared values are very low for all the above-fitted models, and therefore these predictors describe a small amount of the variation in the crime response.

Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis $H_0$:$β_j$ = 0 ?

#fitting a multiple linear regression between crime response and all other variable (predictors) of the Boston dataset
multiplemodel_fit <- lm(crim ~.,data = Boston)
summary(multiplemodel_fit)

## 
## Call:
## lm(formula = crim ~ ., data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.924 -2.120 -0.353  1.019 75.051 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  17.033228   7.234903   2.354 0.018949 *  
## zn            0.044855   0.018734   2.394 0.017025 *  
## indus        -0.063855   0.083407  -0.766 0.444294    
## chas         -0.749134   1.180147  -0.635 0.525867    
## nox         -10.313535   5.275536  -1.955 0.051152 .  
## rm            0.430131   0.612830   0.702 0.483089    
## age           0.001452   0.017925   0.081 0.935488    
## dis          -0.987176   0.281817  -3.503 0.000502 ***
## rad           0.588209   0.088049   6.680 6.46e-11 ***
## tax          -0.003780   0.005156  -0.733 0.463793    
## ptratio      -0.271081   0.186450  -1.454 0.146611    
## black        -0.007538   0.003673  -2.052 0.040702 *  
## lstat         0.126211   0.075725   1.667 0.096208 .  
## medv         -0.198887   0.060516  -3.287 0.001087 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.439 on 492 degrees of freedom
## Multiple R-squared:  0.454,  Adjusted R-squared:  0.4396 
## F-statistic: 31.47 on 13 and 492 DF,  p-value: < 2.2e-16

A few predictors of the fitted multiple regression model are found to be statistically significant and these include ” zn”, “dis”, “rad”, “black”, and “medv”. dis and rad at the 0.001 level, medv at 0.01 level and zn and black at 0.05 level.

Other remaining variables because of their high p-values, we cannot reject the null hypothesis ($H_0$: $β_j$ = 0).

In conclusion, we can only reject the null hypothesis for “zn”, ”dis”, ”rad”, ”black” and “medv”

The multiple regression model generally does not fit the Boston dataset very well because of the low R squared value of 0.454 and the Adjusted R squared value of 0.4396

How do your results from (a) compare to your results from (b) ? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point on the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

#Plotting a scatter plot of Multiple regression Vs Univariate regression coefficients 
univeriate_reg <- vector("numeric",0)
univeriate_reg <- c(univeriate_reg, linear_fit.zn$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.indus$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.chas$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.nox$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.rm$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.age$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.dis$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.rad$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.tax$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.ptratio$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.black$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.lstat$coefficient[2])
univeriate_reg <- c(univeriate_reg, linear_fit.medv$coefficient[2])
multiple_reg <- vector("numeric", 0)
multiple_reg <- c(multiple_reg, multiplemodel_fit$coefficients)
multiple_reg <- multiple_reg[-1]

plot(univeriate_reg, multiple_reg, col = "blue",pch =19, ylab = "multiple regression coefficients",
     xlab = "Univariate Regression coefficients",
     main = "Relationship between Multiple regression \n and univariate regression coefficients")

Multiple regression Vs Univariate regression coefficients

Univariate and multiple regression coefficients have a distinct difference. This is because the slope of the simple regression model represents the average effect of an increase in the predictor ignoring the other predictors in the dataset. But multiple regression holds other predictors fixed, and its slope represents the average effect of an increase in the predictor.

We have seen in the above-fitted models that multiple regression suggests no relationship between per capita crime rate and most of the predictors while for the simple regression it’s vice versa and this because there is some strong correlation between some predictors as shown in the correlation table below.

#correlation between difference variances of the Boston dataset
library(corrplot)

## corrplot 0.84 loaded

corr <-round(cor(Boston[-c(1,4)]),3)
corrplot(corr, method = "number")

Correlation between different variables of the Boston Dataset

Is there evidence of non-linear association between any of the predictors and the response ? To answer this question, for each predictor X, fit a model of the form.

Y= $β_0$ + $β_1$X + $β_2X_2$ + $β_3X_3$ + $ε$ .

#Non-linear model crim ~ zn
poly_zn <- lm(crim~ zn + I(zn^2) +I(zn^3))
summary(poly_zn)

## 
## Call:
## lm(formula = crim ~ zn + I(zn^2) + I(zn^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.821 -4.614 -1.294  0.473 84.130 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.846e+00  4.330e-01  11.192  < 2e-16 ***
## zn          -3.322e-01  1.098e-01  -3.025  0.00261 ** 
## I(zn^2)      6.483e-03  3.861e-03   1.679  0.09375 .  
## I(zn^3)     -3.776e-05  3.139e-05  -1.203  0.22954    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.372 on 502 degrees of freedom
## Multiple R-squared:  0.05824,    Adjusted R-squared:  0.05261 
## F-statistic: 10.35 on 3 and 502 DF,  p-value: 1.281e-06

#Non-linear model crim ~ indus
poly_indus <- lm(crim~ indus + I(indus^2) +I(indus^3))
summary(poly_indus)

## 
## Call:
## lm(formula = crim ~ indus + I(indus^2) + I(indus^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.278 -2.514  0.054  0.764 79.713 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.6625683  1.5739833   2.327   0.0204 *  
## indus       -1.9652129  0.4819901  -4.077 5.30e-05 ***
## I(indus^2)   0.2519373  0.0393221   6.407 3.42e-10 ***
## I(indus^3)  -0.0069760  0.0009567  -7.292 1.20e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.423 on 502 degrees of freedom
## Multiple R-squared:  0.2597, Adjusted R-squared:  0.2552 
## F-statistic: 58.69 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ nox
poly_nox <- lm(crim~ nox + I(nox^2) +I(nox^3))
summary(poly_nox)

## 
## Call:
## lm(formula = crim ~ nox + I(nox^2) + I(nox^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.110 -2.068 -0.255  0.739 78.302 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   233.09      33.64   6.928 1.31e-11 ***
## nox         -1279.37     170.40  -7.508 2.76e-13 ***
## I(nox^2)     2248.54     279.90   8.033 6.81e-15 ***
## I(nox^3)    -1245.70     149.28  -8.345 6.96e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.234 on 502 degrees of freedom
## Multiple R-squared:  0.297,  Adjusted R-squared:  0.2928 
## F-statistic: 70.69 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ age
poly_age <- lm(crim~ age + I(age^2) +I(age^3))
summary(poly_age)

## 
## Call:
## lm(formula = crim ~ age + I(age^2) + I(age^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.762 -2.673 -0.516  0.019 82.842 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -2.549e+00  2.769e+00  -0.920  0.35780   
## age          2.737e-01  1.864e-01   1.468  0.14266   
## I(age^2)    -7.230e-03  3.637e-03  -1.988  0.04738 * 
## I(age^3)     5.745e-05  2.109e-05   2.724  0.00668 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared:  0.1742, Adjusted R-squared:  0.1693 
## F-statistic: 35.31 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ dis
poly_dis <- lm(crim~ dis + I(dis^2) +I(dis^3))
summary(poly_dis)

## 
## Call:
## lm(formula = crim ~ dis + I(dis^2) + I(dis^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.757  -2.588   0.031   1.267  76.378 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.0476     2.4459  12.285  < 2e-16 ***
## dis         -15.5543     1.7360  -8.960  < 2e-16 ***
## I(dis^2)      2.4521     0.3464   7.078 4.94e-12 ***
## I(dis^3)     -0.1186     0.0204  -5.814 1.09e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.331 on 502 degrees of freedom
## Multiple R-squared:  0.2778, Adjusted R-squared:  0.2735 
## F-statistic: 64.37 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ rad
poly_rad <- lm(crim~ rad + I(rad^2) +I(rad^3))
summary(poly_rad)

## 
## Call:
## lm(formula = crim ~ rad + I(rad^2) + I(rad^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.381  -0.412  -0.269   0.179  76.217 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.605545   2.050108  -0.295    0.768
## rad          0.512736   1.043597   0.491    0.623
## I(rad^2)    -0.075177   0.148543  -0.506    0.613
## I(rad^3)     0.003209   0.004564   0.703    0.482
## 
## Residual standard error: 6.682 on 502 degrees of freedom
## Multiple R-squared:    0.4,  Adjusted R-squared:  0.3965 
## F-statistic: 111.6 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ tax
poly_tax <- lm(crim~ tax + I(tax^2) +I(tax^3))
summary(poly_tax)

## 
## Call:
## lm(formula = crim ~ tax + I(tax^2) + I(tax^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.273  -1.389   0.046   0.536  76.950 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.918e+01  1.180e+01   1.626    0.105
## tax         -1.533e-01  9.568e-02  -1.602    0.110
## I(tax^2)     3.608e-04  2.425e-04   1.488    0.137
## I(tax^3)    -2.204e-07  1.889e-07  -1.167    0.244
## 
## Residual standard error: 6.854 on 502 degrees of freedom
## Multiple R-squared:  0.3689, Adjusted R-squared:  0.3651 
## F-statistic:  97.8 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ ptratio
poly_ptratio <- lm(crim~ ptratio + I(ptratio^2) +I(ptratio^3))
summary(poly_ptratio)

## 
## Call:
## lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.833 -4.146 -1.655  1.408 82.697 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  477.18405  156.79498   3.043  0.00246 **
## ptratio      -82.36054   27.64394  -2.979  0.00303 **
## I(ptratio^2)   4.63535    1.60832   2.882  0.00412 **
## I(ptratio^3)  -0.08476    0.03090  -2.743  0.00630 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.122 on 502 degrees of freedom
## Multiple R-squared:  0.1138, Adjusted R-squared:  0.1085 
## F-statistic: 21.48 on 3 and 502 DF,  p-value: 4.171e-13

#Non-linear model crim ~ black
poly_black <- lm(crim~ black + I(black^2) +I(black^3))
summary(poly_black)

## 
## Call:
## lm(formula = crim ~ black + I(black^2) + I(black^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.096  -2.343  -2.128  -1.439  86.790 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.826e+01  2.305e+00   7.924  1.5e-14 ***
## black       -8.356e-02  5.633e-02  -1.483    0.139    
## I(black^2)   2.137e-04  2.984e-04   0.716    0.474    
## I(black^3)  -2.652e-07  4.364e-07  -0.608    0.544    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.955 on 502 degrees of freedom
## Multiple R-squared:  0.1498, Adjusted R-squared:  0.1448 
## F-statistic: 29.49 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ lstat
poly_lstat <- lm(crim~ lstat + I(lstat^2) +I(lstat^3))
summary(poly_lstat)

## 
## Call:
## lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.234  -2.151  -0.486   0.066  83.353 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.2009656  2.0286452   0.592   0.5541  
## lstat       -0.4490656  0.4648911  -0.966   0.3345  
## I(lstat^2)   0.0557794  0.0301156   1.852   0.0646 .
## I(lstat^3)  -0.0008574  0.0005652  -1.517   0.1299  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.629 on 502 degrees of freedom
## Multiple R-squared:  0.2179, Adjusted R-squared:  0.2133 
## F-statistic: 46.63 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ medv
poly_medv <- lm(crim~ medv + I(medv^2) +I(medv^3))
summary(poly_medv)

## 
## Call:
## lm(formula = crim ~ medv + I(medv^2) + I(medv^3))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.427  -1.976  -0.437   0.439  73.655 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 53.1655381  3.3563105  15.840  < 2e-16 ***
## medv        -5.0948305  0.4338321 -11.744  < 2e-16 ***
## I(medv^2)    0.1554965  0.0171904   9.046  < 2e-16 ***
## I(medv^3)   -0.0014901  0.0002038  -7.312 1.05e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.569 on 502 degrees of freedom
## Multiple R-squared:  0.4202, Adjusted R-squared:  0.4167 
## F-statistic: 121.3 on 3 and 502 DF,  p-value: < 2.2e-16

#Non-linear model crim ~ chas
poly_chas <- lm(crim~ chas + I(chas^2) +I(chas^3))
summary(poly_chas)

## 
## Call:
## lm(formula = crim ~ chas + I(chas^2) + I(chas^3))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.738 -3.661 -3.435  0.018 85.232 
## 
## Coefficients: (2 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.7444     0.3961   9.453   <2e-16 ***
## chas         -1.8928     1.5061  -1.257    0.209    
## I(chas^2)         NA         NA      NA       NA    
## I(chas^3)         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.597 on 504 degrees of freedom
## Multiple R-squared:  0.003124,   Adjusted R-squared:  0.001146 
## F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094

The table below shows whether there is/isn’t evidence of a non-linear relationship between the per capita crime rate and the predictors.

knitr::include_graphics("G:/UCT-MSc. Data Science/Semester 1/Supervised Learning/Assignment/SL_Quiz_2_MRGALE005/1.JPG")

Non-linearity between predictors

ISLR Chapter 3 Question 15 Solution

Published by Alex Mirugwe