Homework Two

Excercise 4 (Page 414)

Suppose that for a particular data set, we perform hierarchical clustering using single linkage and using complete linkage. We obtain two dendrograms.

Part (a)

At a certain point on the single linkage dendrogram, the clusters {1,2,3} and {4,5} fuse. On the complete linkage dendrogram, the clusters {1, 2, 3} and {4, 5} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell?

The complete fusion will tend to occur higher on the tree. Single fusion records the smallest possible distance between clusters and other points, so fusions will tend to occur sooner rather than later.

Part (b)

At a certain point on the single linkage dendrogram, the clusters {5} and {6} fuse. On the complete linkage dendrogram, the clusters {5} and {6} also fuse at a certain point. Which fusion will occur higher on the tree, or will they fuse at the same height, or is there not enough information to tell?

With only two points, the fusion will occur at the same time. This is because both the smallest distance between 5 and 6 and the largest distance between 5 and 6 is the same value. As such, they should fuse at the same time with either method.

Excercise 9 (Page 416)

Part(a):

Using hierarchical clustering with complete linkage and Euclidean distance, cluster the states.

hc.complete = hclust(dist(USArrests), method = "complete")
plot(hc.complete, main = "Complete Linkage", xlab = "", sub = "", cex = 0.9)

plot of chunk 9.a

Part(b):

Cut the denogram at a height that results in three distinct cluseters. Which states belong to which clusters?

tree <- cutree(hc.complete, k = 3)
tree

##        Alabama         Alaska        Arizona       Arkansas     California 
##              1              1              1              2              1 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##              2              3              1              1              2 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##              3              3              1              3              3 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##              3              3              1              3              1 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##              2              1              3              1              2 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##              3              3              1              3              2 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##              1              1              1              3              3 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##              2              2              3              2              1 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##              3              2              2              3              3 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##              2              2              3              3              2

Part©:

Hierarchically cluster the states using complete linkage and Euclidean distance, after scaling the variables to have standard deviation one.

USArrests_scaled <- scale(USArrests)
hc.complete = hclust(dist(USArrests_scaled), method = "complete")
plot(hc.complete, main = "Complete Linkage Scaled", xlab = "", sub = "", cex = 0.9)

plot of chunk 9.c

Part(d):

Answer

Question Four (Page 120)

I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then fit a linear regression model to the data, as well as a separate cubic regression, i.e. Y = β0 +β1X +β2X2 +β3X3 +ε.

Part (a)

Suppose that the true relationship between X and Y is linear, i.e. Y = β0 + β1X + ε. Consider the training residual sum of squares (RSS) for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

For the training data, it is a bit difficult to tell. Cubic regression is a more flexible model, so one might expect the RSS in the training data to be smaller compared with a linear regression. Even though linear regression assumes the true data generating process, a cubic regression will do a better job fitting (albeit over-fitting) the 100 training points in the specific training set. However, as the true relationship between X and Y is linear, this increased flexibility might not mitigate the fact that linear regression correctly assumes the data generating process

Part(b)

Answer (a) using test rather than training RSS.

For test data, I would definitely expect the RSS to be smaller for linear regression as oppose to the cubic regression. Linear regression correctly assumes the true data generating process, and will tend to better fit additional Xs and Ys. Cubic regression, on the other hand, is imposing an incorrect data generating process. Although RSS was low for training data, it lowered this RSS at the expense of future predictive power (it overfit the training data).

Part ©

SupposethatthetruerelationshipbetweenXandYisnotlinear, but we don’t know how far it is from linear. Consider the training RSS for the linear regression, and also the training RSS for the cubic regression. Would we expect one to be lower than the other, would we expect them to be the same, or is there not enough information to tell? Justify your answer.

If the true relationship was not linear, I would expect the cubic regression to better fit the training data due to its increased flexibility. My reasoning is the same as in part (a), but in this case it is even more likely that the increased flexibility of cubic regression will lead to a lower RSS.

Part (d)

Answer © using test rather than training RSS.

It is difficult to tell whether or not linear or cubic regression will lead to a lower RSS in the test data, as it depends on the true relationship of X and Y. If the true relationship is far from linear (and especially if it happens to be cubic), cubic regression will lead to lower RSS in test data as it is a better representation of the data generating process. On the other hand, if the reationship between X and Y is very close to linear, linear regression will have lower RSS in the test data for the same reasons as part (b) above.

Exercise 9 (p.122)

This question involves the use of multiple linear regression on the Auto data set.

Part(a)

Produce a scatterplot matrix which includes all of the variables in the data set.

auto_data = read.csv("Auto.csv", header = T, na.strings = "?")
plot(auto_data)

plot of chunk 9_2.a

auto_data <- na.omit(auto_data)

Part(b)

Compute the matrix of correlations between the variables using the function cor(). You will need to exclude the name variable, which is qualitative.

library(xtable)
print(xtable((cor(auto_data[1:8]))), type = "html")

	mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin
mpg	1.00	-0.78	-0.81	-0.78	-0.83	0.42	0.58	0.57
cylinders	-0.78	1.00	0.95	0.84	0.90	-0.50	-0.35	-0.57
displacement	-0.81	0.95	1.00	0.90	0.93	-0.54	-0.37	-0.61
horsepower	-0.78	0.84	0.90	1.00	0.86	-0.69	-0.42	-0.46
weight	-0.83	0.90	0.93	0.86	1.00	-0.42	-0.31	-0.59
acceleration	0.42	-0.50	-0.54	-0.69	-0.42	1.00	0.29	0.21
year	0.58	-0.35	-0.37	-0.42	-0.31	0.29	1.00	0.18
origin	0.57	-0.57	-0.61	-0.46	-0.59	0.21	0.18	1.00

Part ©

Use the lm() function to perform a multiple linear regression with mpg as the response and all other variables except name as the predictors. Use the summary() function to print the results. Comment on the output.

i. There is a relationship between the predictors and a response. This is demonstrated by the high R-squared value and the fact that, using the F-statistic, we can reject the null hypothesis that none of these variables helps predict the outcome. However, not every predictor is found to be significant in predicting the outcome.

ii. The predictors that have the highest statstical sginficance are: weight, year and origin. This makes intuitive sense, as lighter cars would logically have better mpg and more modern cars employ better gas-saving technology. displacement and horsepower are also sgificant at the .05 level.

iii. The positive coefficient on the year variable suggests that more modern cars have higher mpg. As mentioned above, this makes intuitive sense.

foo <- lm(mpg ~ cylinders + displacement + horsepower + weight + year + origin, 
    data = auto_data)
summary(foo)

## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight + 
##     year + origin, data = auto_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.760 -2.179 -0.154  1.852 13.121 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.56e+01   4.18e+00   -3.73  0.00022 ***
## cylinders    -5.07e-01   3.23e-01   -1.57  0.11724    
## displacement  1.93e-02   7.47e-03    2.58  0.01029 *  
## horsepower   -2.39e-02   1.08e-02   -2.21  0.02803 *  
## weight       -6.22e-03   5.71e-04  -10.88  < 2e-16 ***
## year          7.48e-01   5.08e-02   14.72  < 2e-16 ***
## origin        1.43e+00   2.78e-01    5.14  4.4e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 3.33 on 385 degrees of freedom
## Multiple R-squared: 0.821,   Adjusted R-squared: 0.818 
## F-statistic:  295 on 6 and 385 DF,  p-value: <2e-16

Part (d)

Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage?

plot(foo)

plot of chunk 9_2.d

The diagnostic plots reveal some problems with the fit. Looking at residuals vs fitted values, one notices a clear 'U' shape– that is, the residuals of small and high values tend to be positive while the residuals of those fitted values in the middle are generally negative. This suggests non-linearity in the data.

The diagnostic plots further reveal a few outliers, the most substantial outlier being point 323 with a standaridized residual of four. Point 14 is a point that does not seem to be an outlier based on its standardized residual, but does have very high leverage. This indicates that while the point isn't far from the predicted value, it is probably substantially out-of-sample and is has a large impact on forming the predictions for the fitted values.

Part(e)

Use the * and : symbols to fit linear regression models with interaction effects. Do any interactions appear to be statistically significant?

foo <- lm(mpg ~ cylinders + displacement + horsepower + weight * year + origin, 
    data = auto_data)
summary(foo)

## 
## Call:
## lm(formula = mpg ~ cylinders + displacement + horsepower + weight * 
##     year + origin, data = auto_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.330 -1.927 -0.172  1.544 11.835 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.14e+02   1.31e+01   -8.72  < 2e-16 ***
## cylinders    -1.52e-01   3.03e-01   -0.50    0.617    
## displacement  1.19e-02   7.00e-03    1.70    0.089 .  
## horsepower   -4.09e-02   1.03e-02   -3.98  8.4e-05 ***
## weight        3.03e-02   4.66e-03    6.49  2.6e-10 ***
## year          2.06e+00   1.73e-01   11.91  < 2e-16 ***
## origin        1.18e+00   2.60e-01    4.54  7.5e-06 ***
## weight:year  -4.80e-04   6.09e-05   -7.88  3.4e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 3.09 on 384 degrees of freedom
## Multiple R-squared: 0.846,   Adjusted R-squared: 0.843 
## F-statistic:  301 on 7 and 384 DF,  p-value: <2e-16

Yes, the interaction between weight and year is statsitically significant.

Part (f)

Try a few different transformations of the variables, such as log(X), √X, X2. Comment on your findings.

The transformations on significantly signficant variables tended to stay statisticant. Adding the transformations also tended to push the R-squared higher, which makes sense as adding variables will always increase R-squared.

Exercise 14 (p.125)

This question involves the use of multiple linear regression on the Auto data set.

set.seed(1)
x1 = runif(100)
x2 = 0.5 * x1 + rnorm(100)/10
y = 2 + 2 * x1 + 0.3 * x2 + rnorm(100)

Part (a)

Write out the form of the linear model. What are the regression coefficients?

\[ y=2+2x_1+0.3x_2+rnorm(100)\\ y=2+2x_1+0.3*(0.5x_1+\frac{rnorm(100)}{10}+rnorm(100)\\ y=2+2x_1+0.15x_1+.3*\frac{rnorm(100)}{10}+rnorm(100)\\ y=2+2.15x_1+.3*\frac{rnorm(100)}{10}+rnorm(100)\\ \] This shows that the regression coeffecient on x1 should be 2.15

Part (b)

What is the correlation between x1 and x2? Create a scatterplot displaying the relationship between the variables.

cor(x1, x2)

## [1] 0.8351

plot(x1, x2, main = "Scatterplot between x1 & x2")

plot of chunk 14.b

Part©

Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are βˆ0, βˆ1, and βˆ2? How do these relate to the true β0, β1, and β2? Can you reject the null hypothesis H0 : β1 = 0? How about the null hypothesis H0 : β2 = 0?

foocol <- lm(y ~ x1 + x2)
summary(foocol)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8311 -0.7273 -0.0537  0.6338  2.3359 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.130      0.232    9.19  7.6e-15 ***
## x1             1.440      0.721    2.00    0.049 *  
## x2             1.010      1.134    0.89    0.375    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.06 on 97 degrees of freedom
## Multiple R-squared: 0.209,   Adjusted R-squared: 0.193 
## F-statistic: 12.8 on 2 and 97 DF,  p-value: 1.16e-05

The regression coefficients are 1.440 for x1 and 1.010 for x2. We are able to reject the null hypothesis that \( \beta_1 \) = 0, but we are unable to reject the null hypothesis that \( \beta_2 \) = 0.

Part(D)

Now fit a least squares regression to predict y using only x1. Comment on your results. Can you reject the null hypothesis H0 :β1 =0?

foox1 <- lm(y ~ x1)
summary(foox1)

## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8950 -0.6687 -0.0779  0.5922  2.4556 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.112      0.231    9.15  8.3e-15 ***
## x1             1.976      0.396    4.99  2.7e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.06 on 98 degrees of freedom
## Multiple R-squared: 0.202,   Adjusted R-squared: 0.194 
## F-statistic: 24.9 on 1 and 98 DF,  p-value: 2.66e-06

We are unable to reject the null hypothesis that \( \beta_1 \) = 0 for x1.

Part(E)

Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis H0 :β1 =0?

foox2 <- lm(y ~ x2)
summary(foox2)

## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.627 -0.752 -0.036  0.724  2.449 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.390      0.195   12.26  < 2e-16 ***
## x2             2.900      0.633    4.58  1.4e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.07 on 98 degrees of freedom
## Multiple R-squared: 0.176,   Adjusted R-squared: 0.168 
## F-statistic:   21 on 1 and 98 DF,  p-value: 1.37e-05

We are unable to reject the null hypothesis that \( \beta_1 \) = 0 for x2.

Part (f)

Do the results obtained in ©–(e) contradict each other? Explain your answer.

No, these results do not contradict each other. Both \( x_1 \) and \( x_2 \) should, on their own, have the ability to give good predictions of y because these terms were included in the data generating process of y. However, as the two variables are highly correlated, when both included in the model one variable explains much of the variation explained by the second, lowering the predictive power of that second variable. This is why its not contradictory to say that each variable has predictive power independently but we can fail to reject the null hypothesis when the variables are included together.

Part(g)

Now suppose we obtain one additional observation, which was unfortunately mismeasured. Re-fit the linear models from © to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.

x1 = c(x1, 0.1)
x2 = c(x2, 0.8)
y = c(y, 6)

foocol <- lm(y ~ x1 + x2)
summary(foocol)

## 
## Call:
## lm(formula = y ~ x1 + x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.7335 -0.6932 -0.0526  0.6638  2.3062 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.227      0.231    9.62  7.9e-16 ***
## x1             0.539      0.592    0.91   0.3646    
## x2             2.515      0.898    2.80   0.0061 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.07 on 98 degrees of freedom
## Multiple R-squared: 0.219,   Adjusted R-squared: 0.203 
## F-statistic: 13.7 on 2 and 98 DF,  p-value: 5.56e-06


foox1 <- lm(y ~ x1)
summary(foox1)

## 
## Call:
## lm(formula = y ~ x1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.890 -0.656 -0.091  0.568  3.567 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.257      0.239    9.44  1.8e-15 ***
## x1             1.766      0.412    4.28  4.3e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.11 on 99 degrees of freedom
## Multiple R-squared: 0.156,   Adjusted R-squared: 0.148 
## F-statistic: 18.3 on 1 and 99 DF,  p-value: 4.29e-05


foox2 <- lm(y ~ x2)
summary(foox2)

## 
## Call:
## lm(formula = y ~ x2)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.647 -0.710 -0.069  0.727  2.381 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.345      0.191   12.26  < 2e-16 ***
## x2             3.119      0.604    5.16  1.3e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 1.07 on 99 degrees of freedom
## Multiple R-squared: 0.212,   Adjusted R-squared: 0.204 
## F-statistic: 26.7 on 1 and 99 DF,  p-value: 1.25e-06

Exercise 15 (p.125)

This question involves the use of multiple linear regression on the Auto data set.

Part (A)

For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

library(MASS)
data(Boston)
summary(lm(crim ~ zn, data = Boston))

## 
## Call:
## lm(formula = crim ~ zn, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -4.43  -4.22  -2.62   1.25  84.52 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.4537     0.4172   10.67  < 2e-16 ***
## zn           -0.0739     0.0161   -4.59  5.5e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.44 on 504 degrees of freedom
## Multiple R-squared: 0.0402,  Adjusted R-squared: 0.0383 
## F-statistic: 21.1 on 1 and 504 DF,  p-value: 5.51e-06

summary(lm(crim ~ indus, data = Boston))

## 
## Call:
## lm(formula = crim ~ indus, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -11.97  -2.70  -0.74   0.71  81.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -2.064      0.667   -3.09   0.0021 ** 
## indus          0.510      0.051    9.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.87 on 504 degrees of freedom
## Multiple R-squared: 0.165,   Adjusted R-squared: 0.164 
## F-statistic: 99.8 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ chas, data = Boston))

## 
## Call:
## lm(formula = crim ~ chas, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3.74  -3.66  -3.44   0.02  85.23 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.744      0.396    9.45   <2e-16 ***
## chas          -1.893      1.506   -1.26     0.21    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.6 on 504 degrees of freedom
## Multiple R-squared: 0.00312, Adjusted R-squared: 0.00115 
## F-statistic: 1.58 on 1 and 504 DF,  p-value: 0.209

summary(lm(crim ~ nox, data = Boston))

## 
## Call:
## lm(formula = crim ~ nox, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12.37  -2.74  -0.97   0.56  81.73 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -13.7        1.7   -8.07  5.1e-15 ***
## nox             31.2        3.0   10.42  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.81 on 504 degrees of freedom
## Multiple R-squared: 0.177,   Adjusted R-squared: 0.176 
## F-statistic:  109 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ rm, data = Boston))

## 
## Call:
## lm(formula = crim ~ rm, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.60  -3.95  -2.65   0.99  87.20 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.482      3.364    6.09  2.3e-09 ***
## rm            -2.684      0.532   -5.04  6.3e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.4 on 504 degrees of freedom
## Multiple R-squared: 0.0481,  Adjusted R-squared: 0.0462 
## F-statistic: 25.5 on 1 and 504 DF,  p-value: 6.35e-07

summary(lm(crim ~ age, data = Boston))

## 
## Call:
## lm(formula = crim ~ age, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.79  -4.26  -1.23   1.53  82.85 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -3.7779     0.9440   -4.00  7.2e-05 ***
## age           0.1078     0.0127    8.46  2.9e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.06 on 504 degrees of freedom
## Multiple R-squared: 0.124,   Adjusted R-squared: 0.123 
## F-statistic: 71.6 on 1 and 504 DF,  p-value: 2.85e-16

summary(lm(crim ~ dis, data = Boston))

## 
## Call:
## lm(formula = crim ~ dis, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.71  -4.13  -1.53   1.52  81.67 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.499      0.730   13.01   <2e-16 ***
## dis           -1.551      0.168   -9.21   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.97 on 504 degrees of freedom
## Multiple R-squared: 0.144,   Adjusted R-squared: 0.142 
## F-statistic: 84.9 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ rad, data = Boston))

## 
## Call:
## lm(formula = crim ~ rad, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10.16  -1.38  -0.14   0.66  76.43 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.2872     0.4435   -5.16  3.6e-07 ***
## rad           0.6179     0.0343   18.00  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 6.72 on 504 degrees of freedom
## Multiple R-squared: 0.391,   Adjusted R-squared: 0.39 
## F-statistic:  324 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ tax, data = Boston))

## 
## Call:
## lm(formula = crim ~ tax, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -12.51  -2.74  -0.19   1.07  77.70 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.52837    0.81581   -10.4   <2e-16 ***
## tax          0.02974    0.00185    16.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7 on 504 degrees of freedom
## Multiple R-squared: 0.34,    Adjusted R-squared: 0.338 
## F-statistic:  259 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ ptratio, data = Boston))

## 
## Call:
## lm(formula = crim ~ ptratio, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -7.65  -3.99  -1.91   1.82  83.35 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -17.647      3.147   -5.61  3.4e-08 ***
## ptratio        1.152      0.169    6.80  2.9e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.24 on 504 degrees of freedom
## Multiple R-squared: 0.0841,  Adjusted R-squared: 0.0823 
## F-statistic: 46.3 on 1 and 504 DF,  p-value: 2.94e-11

summary(lm(crim ~ black, data = Boston))

## 
## Call:
## lm(formula = crim ~ black, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13.76  -2.30  -2.09  -1.30  86.82 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 16.55353    1.42590   11.61   <2e-16 ***
## black       -0.03628    0.00387   -9.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.95 on 504 degrees of freedom
## Multiple R-squared: 0.148,   Adjusted R-squared: 0.147 
## F-statistic: 87.7 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ lstat, data = Boston))

## 
## Call:
## lm(formula = crim ~ lstat, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13.93  -2.82  -0.66   1.08  82.86 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -3.3305     0.6938    -4.8  2.1e-06 ***
## lstat         0.5488     0.0478    11.5  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.66 on 504 degrees of freedom
## Multiple R-squared: 0.208,   Adjusted R-squared: 0.206 
## F-statistic:  132 on 1 and 504 DF,  p-value: <2e-16

summary(lm(crim ~ medv, data = Boston))

## 
## Call:
## lm(formula = crim ~ medv, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.07  -4.02  -2.34   1.30  80.96 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  11.7965     0.9342   12.63   <2e-16 ***
## medv         -0.3632     0.0384   -9.46   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.93 on 504 degrees of freedom
## Multiple R-squared: 0.151,   Adjusted R-squared: 0.149 
## F-statistic: 89.5 on 1 and 504 DF,  p-value: <2e-16

There is a statistically signficant result between the predictor and the response for every variable except the Charles River Dummy. When looking at the response variables and crime in simple scatter plots, one can see how a general linear regression with these variables would allow for a better prediction of crime than simply using the mean of crime. That is, the data seems to have some slight shape sloping up or down, and isn't a random cloud of data. That being said, while almost every variable is statistically significant, R-squared is very low, and so these predictors only describe a small amount of the variation in the response.

plot(crim ~ . - crim, data = Boston)

plot of chunk 15.a.2

Part (b)

Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis H0 : βj = 0?

summary(lm(crim ~ . - crim, data = Boston))

## 
## Call:
## lm(formula = crim ~ . - crim, data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.92  -2.12  -0.35   1.02  75.05 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  17.03323    7.23490    2.35   0.0189 *  
## zn            0.04486    0.01873    2.39   0.0170 *  
## indus        -0.06385    0.08341   -0.77   0.4443    
## chas         -0.74913    1.18015   -0.63   0.5259    
## nox         -10.31353    5.27554   -1.95   0.0512 .  
## rm            0.43013    0.61283    0.70   0.4831    
## age           0.00145    0.01793    0.08   0.9355    
## dis          -0.98718    0.28182   -3.50   0.0005 ***
## rad           0.58821    0.08805    6.68  6.5e-11 ***
## tax          -0.00378    0.00516   -0.73   0.4638    
## ptratio      -0.27108    0.18645   -1.45   0.1466    
## black        -0.00754    0.00367   -2.05   0.0407 *  
## lstat         0.12621    0.07572    1.67   0.0962 .  
## medv         -0.19889    0.06052   -3.29   0.0011 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 6.44 on 492 degrees of freedom
## Multiple R-squared: 0.454,   Adjusted R-squared: 0.44 
## F-statistic: 31.5 on 13 and 492 DF,  p-value: <2e-16

When fitting a multiple regression model, only a small number of variables are found to be statistically signficant: dis and rad at the .001 level, medv at the .01 level, and zn and black at the .05 level. For every other variable, we now fail to reject the null hypothesis. R-squared is also much higher using a multiple regression model than any of the predictors on their own, meaning we better explain more of the variance in the outcome.

Part ©

How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regres- sion model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

univcof <- lm(crim ~ zn, data = Boston)$coefficients[2]
univcof <- append(univcof, lm(crim ~ indus, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ chas, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ nox, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ rm, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ age, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ dis, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ rad, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ tax, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ ptratio, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ black, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ lstat, data = Boston)$coefficients[2])
univcof <- append(univcof, lm(crim ~ medv, data = Boston)$coefficients[2])
fooBoston <- (lm(crim ~ . - crim, data = Boston))
fooBoston$coefficients[2:14]

##         zn      indus       chas        nox         rm        age 
##   0.044855  -0.063855  -0.749134 -10.313535   0.430131   0.001452 
##        dis        rad        tax    ptratio      black      lstat 
##  -0.987176   0.588209  -0.003780  -0.271081  -0.007538   0.126211 
##       medv 
##  -0.198887

plot(univcof, fooBoston$coefficients[2:14], main = "Univariate vs. Multiple Regression Coefficients", 
    xlab = "Univariate", ylab = "Multiple")

plot of chunk 15.c

Part (D)

Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor X, fit a model of the form Y = β0 +β1X +β2X2 +β3X3 +ε.

library(MASS)
data(Boston)
summary(lm(crim ~ zn + I(zn^2) + I(zn^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ zn + I(zn^2) + I(zn^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -4.82  -4.61  -1.29   0.47  84.13 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.85e+00   4.33e-01   11.19   <2e-16 ***
## zn          -3.32e-01   1.10e-01   -3.03   0.0026 ** 
## I(zn^2)      6.48e-03   3.86e-03    1.68   0.0938 .  
## I(zn^3)     -3.78e-05   3.14e-05   -1.20   0.2295    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.37 on 502 degrees of freedom
## Multiple R-squared: 0.0582,  Adjusted R-squared: 0.0526 
## F-statistic: 10.3 on 3 and 502 DF,  p-value: 1.28e-06

summary(lm(crim ~ indus + I(indus^2) + I(indus^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ indus + I(indus^2) + I(indus^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8.28  -2.51   0.05   0.76  79.71 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.662568   1.573983    2.33     0.02 *  
## indus       -1.965213   0.481990   -4.08  5.3e-05 ***
## I(indus^2)   0.251937   0.039322    6.41  3.4e-10 ***
## I(indus^3)  -0.006976   0.000957   -7.29  1.2e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.42 on 502 degrees of freedom
## Multiple R-squared: 0.26,    Adjusted R-squared: 0.255 
## F-statistic: 58.7 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ chas + I(chas^2) + I(chas^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ chas + I(chas^2) + I(chas^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3.74  -3.66  -3.44   0.02  85.23 
## 
## Coefficients: (2 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    3.744      0.396    9.45   <2e-16 ***
## chas          -1.893      1.506   -1.26     0.21    
## I(chas^2)         NA         NA      NA       NA    
## I(chas^3)         NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.6 on 504 degrees of freedom
## Multiple R-squared: 0.00312, Adjusted R-squared: 0.00115 
## F-statistic: 1.58 on 1 and 504 DF,  p-value: 0.209

summary(lm(crim ~ nox + I(nox^2) + I(nox^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ nox + I(nox^2) + I(nox^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.11  -2.07  -0.25   0.74  78.30 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    233.1       33.6    6.93  1.3e-11 ***
## nox          -1279.4      170.4   -7.51  2.8e-13 ***
## I(nox^2)      2248.5      279.9    8.03  6.8e-15 ***
## I(nox^3)     -1245.7      149.3   -8.34  7.0e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.23 on 502 degrees of freedom
## Multiple R-squared: 0.297,   Adjusted R-squared: 0.293 
## F-statistic: 70.7 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ rm + I(rm^2) + I(rm^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ rm + I(rm^2) + I(rm^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -18.49  -3.47  -2.22  -0.01  87.22 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  112.625     64.517    1.75    0.081 .
## rm           -39.150     31.311   -1.25    0.212  
## I(rm^2)        4.551      5.010    0.91    0.364  
## I(rm^3)       -0.174      0.264   -0.66    0.509  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.33 on 502 degrees of freedom
## Multiple R-squared: 0.0678,  Adjusted R-squared: 0.0622 
## F-statistic: 12.2 on 3 and 502 DF,  p-value: 1.07e-07

summary(lm(crim ~ age + I(age^2) + I(age^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ age + I(age^2) + I(age^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.76  -2.67  -0.52   0.02  82.84 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -2.55e+00   2.77e+00   -0.92   0.3578   
## age          2.74e-01   1.86e-01    1.47   0.1427   
## I(age^2)    -7.23e-03   3.64e-03   -1.99   0.0474 * 
## I(age^3)     5.75e-05   2.11e-05    2.72   0.0067 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.84 on 502 degrees of freedom
## Multiple R-squared: 0.174,   Adjusted R-squared: 0.169 
## F-statistic: 35.3 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ dis + I(dis^2) + I(dis^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ dis + I(dis^2) + I(dis^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10.76  -2.59   0.03   1.27  76.38 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.0476     2.4459   12.29  < 2e-16 ***
## dis         -15.5544     1.7360   -8.96  < 2e-16 ***
## I(dis^2)      2.4521     0.3464    7.08  4.9e-12 ***
## I(dis^3)     -0.1186     0.0204   -5.81  1.1e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.33 on 502 degrees of freedom
## Multiple R-squared: 0.278,   Adjusted R-squared: 0.274 
## F-statistic: 64.4 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ rad + I(rad^2) + I(rad^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ rad + I(rad^2) + I(rad^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -10.38  -0.41  -0.27   0.18  76.22 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.60554    2.05011   -0.30     0.77
## rad          0.51274    1.04360    0.49     0.62
## I(rad^2)    -0.07518    0.14854   -0.51     0.61
## I(rad^3)     0.00321    0.00456    0.70     0.48
## 
## Residual standard error: 6.68 on 502 degrees of freedom
## Multiple R-squared:  0.4,    Adjusted R-squared: 0.396 
## F-statistic:  112 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ tax + I(tax^2) + I(tax^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ tax + I(tax^2) + I(tax^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13.27  -1.39   0.05   0.54  76.95 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.92e+01   1.18e+01    1.63     0.10
## tax         -1.53e-01   9.57e-02   -1.60     0.11
## I(tax^2)     3.61e-04   2.43e-04    1.49     0.14
## I(tax^3)    -2.20e-07   1.89e-07   -1.17     0.24
## 
## Residual standard error: 6.85 on 502 degrees of freedom
## Multiple R-squared: 0.369,   Adjusted R-squared: 0.365 
## F-statistic: 97.8 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ ptratio + I(ptratio^2) + I(ptratio^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.83  -4.15  -1.65   1.41  82.70 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  477.1840   156.7950    3.04   0.0025 **
## ptratio      -82.3605    27.6439   -2.98   0.0030 **
## I(ptratio^2)   4.6353     1.6083    2.88   0.0041 **
## I(ptratio^3)  -0.0848     0.0309   -2.74   0.0063 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 8.12 on 502 degrees of freedom
## Multiple R-squared: 0.114,   Adjusted R-squared: 0.108 
## F-statistic: 21.5 on 3 and 502 DF,  p-value: 4.17e-13

summary(lm(crim ~ black + I(black^2) + I(black^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ black + I(black^2) + I(black^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13.10  -2.34  -2.13  -1.44  86.79 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.83e+01   2.30e+00    7.92  1.5e-14 ***
## black       -8.36e-02   5.63e-02   -1.48     0.14    
## I(black^2)   2.14e-04   2.98e-04    0.72     0.47    
## I(black^3)  -2.65e-07   4.36e-07   -0.61     0.54    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.95 on 502 degrees of freedom
## Multiple R-squared: 0.15,    Adjusted R-squared: 0.145 
## F-statistic: 29.5 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ lstat + I(lstat^2) + I(lstat^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -15.23  -2.15  -0.49   0.07  83.35 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  1.200966   2.028645    0.59    0.554  
## lstat       -0.449066   0.464891   -0.97    0.335  
## I(lstat^2)   0.055779   0.030116    1.85    0.065 .
## I(lstat^3)  -0.000857   0.000565   -1.52    0.130  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 7.63 on 502 degrees of freedom
## Multiple R-squared: 0.218,   Adjusted R-squared: 0.213 
## F-statistic: 46.6 on 3 and 502 DF,  p-value: <2e-16

summary(lm(crim ~ medv + I(medv^2) + I(medv^3), data = Boston))

## 
## Call:
## lm(formula = crim ~ medv + I(medv^2) + I(medv^3), data = Boston)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -24.43  -1.98  -0.44   0.44  73.65 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 53.165538   3.356311   15.84   <2e-16 ***
## medv        -5.094831   0.433832  -11.74   <2e-16 ***
## I(medv^2)    0.155496   0.017190    9.05   <2e-16 ***
## I(medv^3)   -0.001490   0.000204   -7.31    1e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 6.57 on 502 degrees of freedom
## Multiple R-squared: 0.42,    Adjusted R-squared: 0.417 
## F-statistic:  121 on 3 and 502 DF,  p-value: <2e-16

The first thing to note is that with the chas variable, we get NA values for the squared and cubed term. This makes sense as chas is a dummy variable, composed of only 0s and 1s, and these values will not change if they are squared or cubed.

With the variables indus, nox, dis, ptracio, and medv, there is evidence of a non-linear relationship, as each of these variables squared and cubed terms is found to be statistically signficant (we reject the null hypothesis that the coeffecients on these exponentiated variables are zero). Age also appears to have a non-linear relationship, and once squared-age and cubed-age are brought into the model, linear age becomes statistically insignficant.

For every other variable, we do not find evidence of a non-linear relationship between the predictor and outcome variables.