1 Introduction

The goal of this project is to find the best linear regression model to estimate the median value of owner-occupied homes in the suburbs of Boston. The data set is from UCI Machine Learning Repository, it has 506 rows and 14 columns. MEDV is the response variable while the other 13 variables are possible predictors.

Variable Name Description
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq. ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centers
RAD index of accessibility to radial highways
TAX full-value property-tax rate per $10,000
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT % lower status of the population
MEDV median value of owner-occupied homes in $1000s

2 Data Structure

First we need to install and load packages in R environment. We will need the package usdm to test for multicollinearity, car to diagnose for outliers, MASS to diagnose studentized residuals, DAAG to cross validate our models, lmtest to check for heteroscedasticity, and ggplot2 for graphic plot.

# LOAD PACKAGES
library(usdm) # for testing collinearity 
## Loading required package: sp
## Loading required package: raster
library(car) # for testing outliers
## 
## Attaching package: 'car'
## The following object is masked from 'package:usdm':
## 
##     vif
library(MASS) # for testing studentized residuals
## 
## Attaching package: 'MASS'
## The following objects are masked from 'package:raster':
## 
##     area, select
library(DAAG) # for cross validation of model
## Loading required package: lattice
## 
## Attaching package: 'DAAG'
## The following object is masked from 'package:MASS':
## 
##     hills
## The following object is masked from 'package:car':
## 
##     vif
## The following object is masked from 'package:usdm':
## 
##     vif
library(lmtest) # for checking homoskedasticity/heteroskedasticity
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(ggplot2) # Use for visuals

Then load the Boston Housing data.

# LOAD DATA
Boston <- read.csv("/Users/zhangyueming/Documents/NEU/2017Spring/IE 7280/Project/Mine/housing.csv",
                   header = FALSE,
                   col.names = c("crim", "zn", "indus", "chas", "nox", "rm", "age",
                                 "dis", "rad", "tax", "ptratio", "b", "lstat", "medv")
                   )
attach(Boston)

Take a look at the structure of this data by using str(Boston). The output shows that we have a total of 14 variables and 506 observations, and also shows the datatype of each variable.

str(Boston)
## 'data.frame':    506 obs. of  14 variables:
##  $ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
##  $ chas   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
##  $ rm     : num  6.58 6.42 7.18 7 7.15 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
##  $ dis    : num  4.09 4.97 4.97 6.06 6.06 ...
##  $ rad    : int  1 2 2 3 3 3 5 5 5 5 ...
##  $ tax    : int  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
##  $ b      : num  397 397 393 395 397 ...
##  $ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
##  $ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

Use cor(Boston) to understand the correlation between each individual variable. And we found ‘lstat’ and ‘rm’ have the highest positive correlation with ‘medv’, and ‘ptratio’ has the highest negative correlation with ‘medv’ from the last column.

cor(Boston)
##                crim          zn       indus         chas         nox
## crim     1.00000000 -0.20046922  0.40658341 -0.055891582  0.42097171
## zn      -0.20046922  1.00000000 -0.53382819 -0.042696719 -0.51660371
## indus    0.40658341 -0.53382819  1.00000000  0.062938027  0.76365145
## chas    -0.05589158 -0.04269672  0.06293803  1.000000000  0.09120281
## nox      0.42097171 -0.51660371  0.76365145  0.091202807  1.00000000
## rm      -0.21924670  0.31199059 -0.39167585  0.091251225 -0.30218819
## age      0.35273425 -0.56953734  0.64477851  0.086517774  0.73147010
## dis     -0.37967009  0.66440822 -0.70802699 -0.099175780 -0.76923011
## rad      0.62550515 -0.31194783  0.59512927 -0.007368241  0.61144056
## tax      0.58276431 -0.31456332  0.72076018 -0.035586518  0.66802320
## ptratio  0.28994558 -0.39167855  0.38324756 -0.121515174  0.18893268
## b       -0.38506394  0.17552032 -0.35697654  0.048788485 -0.38005064
## lstat    0.45562148 -0.41299457  0.60379972 -0.053929298  0.59087892
## medv    -0.38830461  0.36044534 -0.48372516  0.175260177 -0.42732077
##                  rm         age         dis          rad         tax
## crim    -0.21924670  0.35273425 -0.37967009  0.625505145  0.58276431
## zn       0.31199059 -0.56953734  0.66440822 -0.311947826 -0.31456332
## indus   -0.39167585  0.64477851 -0.70802699  0.595129275  0.72076018
## chas     0.09125123  0.08651777 -0.09917578 -0.007368241 -0.03558652
## nox     -0.30218819  0.73147010 -0.76923011  0.611440563  0.66802320
## rm       1.00000000 -0.24026493  0.20524621 -0.209846668 -0.29204783
## age     -0.24026493  1.00000000 -0.74788054  0.456022452  0.50645559
## dis      0.20524621 -0.74788054  1.00000000 -0.494587930 -0.53443158
## rad     -0.20984667  0.45602245 -0.49458793  1.000000000  0.91022819
## tax     -0.29204783  0.50645559 -0.53443158  0.910228189  1.00000000
## ptratio -0.35550149  0.26151501 -0.23247054  0.464741179  0.46085304
## b        0.12806864 -0.27353398  0.29151167 -0.444412816 -0.44180801
## lstat   -0.61380827  0.60233853 -0.49699583  0.488676335  0.54399341
## medv     0.69535995 -0.37695457  0.24992873 -0.381626231 -0.46853593
##            ptratio           b      lstat       medv
## crim     0.2899456 -0.38506394  0.4556215 -0.3883046
## zn      -0.3916785  0.17552032 -0.4129946  0.3604453
## indus    0.3832476 -0.35697654  0.6037997 -0.4837252
## chas    -0.1215152  0.04878848 -0.0539293  0.1752602
## nox      0.1889327 -0.38005064  0.5908789 -0.4273208
## rm      -0.3555015  0.12806864 -0.6138083  0.6953599
## age      0.2615150 -0.27353398  0.6023385 -0.3769546
## dis     -0.2324705  0.29151167 -0.4969958  0.2499287
## rad      0.4647412 -0.44441282  0.4886763 -0.3816262
## tax      0.4608530 -0.44180801  0.5439934 -0.4685359
## ptratio  1.0000000 -0.17738330  0.3740443 -0.5077867
## b       -0.1773833  1.00000000 -0.3660869  0.3334608
## lstat    0.3740443 -0.36608690  1.0000000 -0.7376627
## medv    -0.5077867  0.33346082 -0.7376627  1.0000000

Use summary(Boston) to get a broad overview of the variables.

summary(Boston)
##       crim                zn             indus            chas        
##  Min.   : 0.00632   Min.   :  0.00   Min.   : 0.46   Min.   :0.00000  
##  1st Qu.: 0.08204   1st Qu.:  0.00   1st Qu.: 5.19   1st Qu.:0.00000  
##  Median : 0.25651   Median :  0.00   Median : 9.69   Median :0.00000  
##  Mean   : 3.61352   Mean   : 11.36   Mean   :11.14   Mean   :0.06917  
##  3rd Qu.: 3.67708   3rd Qu.: 12.50   3rd Qu.:18.10   3rd Qu.:0.00000  
##  Max.   :88.97620   Max.   :100.00   Max.   :27.74   Max.   :1.00000  
##       nox               rm             age              dis        
##  Min.   :0.3850   Min.   :3.561   Min.   :  2.90   Min.   : 1.130  
##  1st Qu.:0.4490   1st Qu.:5.886   1st Qu.: 45.02   1st Qu.: 2.100  
##  Median :0.5380   Median :6.208   Median : 77.50   Median : 3.207  
##  Mean   :0.5547   Mean   :6.285   Mean   : 68.57   Mean   : 3.795  
##  3rd Qu.:0.6240   3rd Qu.:6.623   3rd Qu.: 94.08   3rd Qu.: 5.188  
##  Max.   :0.8710   Max.   :8.780   Max.   :100.00   Max.   :12.127  
##       rad              tax           ptratio            b         
##  Min.   : 1.000   Min.   :187.0   Min.   :12.60   Min.   :  0.32  
##  1st Qu.: 4.000   1st Qu.:279.0   1st Qu.:17.40   1st Qu.:375.38  
##  Median : 5.000   Median :330.0   Median :19.05   Median :391.44  
##  Mean   : 9.549   Mean   :408.2   Mean   :18.46   Mean   :356.67  
##  3rd Qu.:24.000   3rd Qu.:666.0   3rd Qu.:20.20   3rd Qu.:396.23  
##  Max.   :24.000   Max.   :711.0   Max.   :22.00   Max.   :396.90  
##      lstat            medv      
##  Min.   : 1.73   Min.   : 5.00  
##  1st Qu.: 6.95   1st Qu.:17.02  
##  Median :11.36   Median :21.20  
##  Mean   :12.65   Mean   :22.53  
##  3rd Qu.:16.95   3rd Qu.:25.00  
##  Max.   :37.97   Max.   :50.00

The variable ‘chas’ is a factor of 0 and 1, so we turn it into a factor variable using the as.factor() method.

Boston$chas <- as.factor(Boston$chas)

3 Data Analysis

Use scatterplotMatrix() method to get a better picture of the correlation between variables. We need to look for linear relationships between response variable ‘medv’ and the rest of the variables in the data frame.

scatterplotMatrix(~crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+b+lstat+medv, data = Boston)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth

We can roughly infer that ‘rm’, ‘ptratio’ and ‘lstat’ might be the most relevant in estimating ‘medv’. We can also be aware of multicollinearity between dependent variables from the images. When two variables are highly correlated, it might cause over-fitting, so we need to remove this kind of variable.

We use the histogram of ‘medv’ to make the appropriate transformations. The right skewed distribution suggests that a log transformation would be appropriate.

par(mfrow = c(1, 2))
hist(medv)
hist(log(medv))

Fit linear models with ‘medv’ and ‘log(medv)’ respectively using lm() method, we obtain a higher adjusted R-squared of 0.784 from the one with log transformation. Thus, the log transformation of ‘medv’ is indeed appropriate.

mod <- lm(medv ~ ., data = Boston)
summary(mod)
## 
## Call:
## lm(formula = medv ~ ., data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.595  -2.730  -0.518   1.777  26.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
## crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
## zn           4.642e-02  1.373e-02   3.382 0.000778 ***
## indus        2.056e-02  6.150e-02   0.334 0.738288    
## chas1        2.687e+00  8.616e-01   3.118 0.001925 ** 
## nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
## rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
## age          6.922e-04  1.321e-02   0.052 0.958229    
## dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
## rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
## tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
## ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
## b            9.312e-03  2.686e-03   3.467 0.000573 ***
## lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7338 
## F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16
logmod <- lm(log(medv) ~ ., data = Boston)
summary(logmod)
## 
## Call:
## lm(formula = log(medv) ~ ., data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73361 -0.09747 -0.01657  0.09629  0.86435 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.1020423  0.2042726  20.081  < 2e-16 ***
## crim        -0.0102715  0.0013155  -7.808 3.52e-14 ***
## zn           0.0011725  0.0005495   2.134 0.033349 *  
## indus        0.0024668  0.0024614   1.002 0.316755    
## chas1        0.1008876  0.0344859   2.925 0.003598 ** 
## nox         -0.7783993  0.1528902  -5.091 5.07e-07 ***
## rm           0.0908331  0.0167280   5.430 8.87e-08 ***
## age          0.0002106  0.0005287   0.398 0.690567    
## dis         -0.0490873  0.0079834  -6.149 1.62e-09 ***
## rad          0.0142673  0.0026556   5.373 1.20e-07 ***
## tax         -0.0006258  0.0001505  -4.157 3.80e-05 ***
## ptratio     -0.0382715  0.0052365  -7.309 1.10e-12 ***
## b            0.0004136  0.0001075   3.847 0.000135 ***
## lstat       -0.0290355  0.0020299 -14.304  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1899 on 492 degrees of freedom
## Multiple R-squared:  0.7896, Adjusted R-squared:  0.7841 
## F-statistic: 142.1 on 13 and 492 DF,  p-value: < 2.2e-16

We use stepwise regression method to find potential non-significant variables. From the output, ‘age’ and ‘indus’ might get removed.

mod <- lm(log(medv) ~ ., data = Boston)
step <- stepAIC(mod, direction = "both")
## Start:  AIC=-1667.19
## log(medv) ~ crim + zn + indus + chas + nox + rm + age + dis + 
##     rad + tax + ptratio + b + lstat
## 
##           Df Sum of Sq    RSS     AIC
## - age      1    0.0057 17.755 -1669.0
## - indus    1    0.0362 17.786 -1668.2
## <none>                 17.749 -1667.2
## - zn       1    0.1643 17.914 -1664.5
## - chas     1    0.3088 18.058 -1660.5
## - b        1    0.5339 18.283 -1654.2
## - tax      1    0.6235 18.373 -1651.7
## - nox      1    0.9351 18.684 -1643.2
## - rad      1    1.0413 18.791 -1640.3
## - rm       1    1.0637 18.813 -1639.7
## - dis      1    1.3639 19.113 -1631.7
## - ptratio  1    1.9270 19.676 -1617.0
## - crim     1    2.1995 19.949 -1610.1
## - lstat    1    7.3809 25.130 -1493.2
## 
## Step:  AIC=-1669.03
## log(medv) ~ crim + zn + indus + chas + nox + rm + dis + rad + 
##     tax + ptratio + b + lstat
## 
##           Df Sum of Sq    RSS     AIC
## - indus    1    0.0363 17.791 -1670.0
## <none>                 17.755 -1669.0
## + age      1    0.0057 17.749 -1667.2
## - zn       1    0.1593 17.914 -1666.5
## - chas     1    0.3138 18.069 -1662.2
## - b        1    0.5431 18.298 -1655.8
## - tax      1    0.6205 18.376 -1653.7
## - nox      1    0.9645 18.720 -1644.3
## - rad      1    1.0356 18.791 -1642.3
## - rm       1    1.1452 18.900 -1639.4
## - dis      1    1.5471 19.302 -1628.8
## - ptratio  1    1.9224 19.677 -1619.0
## - crim     1    2.1988 19.954 -1612.0
## - lstat    1    8.1949 25.950 -1479.0
## 
## Step:  AIC=-1670
## log(medv) ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + 
##     b + lstat
## 
##           Df Sum of Sq    RSS     AIC
## <none>                 17.791 -1670.0
## + indus    1    0.0363 17.755 -1669.0
## + age      1    0.0058 17.786 -1668.2
## - zn       1    0.1451 17.936 -1667.9
## - chas     1    0.3399 18.131 -1662.4
## - b        1    0.5344 18.326 -1657.0
## - tax      1    0.6139 18.405 -1654.8
## - nox      1    0.9350 18.726 -1646.1
## - rad      1    1.0088 18.800 -1644.1
## - rm       1    1.1171 18.909 -1641.2
## - dis      1    1.7385 19.530 -1624.8
## - ptratio  1    1.8862 19.678 -1621.0
## - crim     1    2.2229 20.014 -1612.4
## - lstat    1    8.1604 25.952 -1481.0
step$anova
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## log(medv) ~ crim + zn + indus + chas + nox + rm + age + dis + 
##     rad + tax + ptratio + b + lstat
## 
## Final Model:
## log(medv) ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio + 
##     b + lstat
## 
## 
##      Step Df    Deviance Resid. Df Resid. Dev       AIC
## 1                              492   17.74938 -1667.194
## 2   - age  1 0.005723781       493   17.75510 -1669.031
## 3 - indus  1 0.036264380       494   17.79137 -1669.999

4 Regression Model

4.1 First Model: All in

For the first model, we use our entire data set to fit the model. From summary, we find three non-significant variables: ‘age’, ‘indus’ and ‘zn’. Compared to the outcome from stepwise regression method, ‘age’ and ‘indus’ can be affirmed to be removed. Whether to remove ‘zn’ or not can be further determined.

fit1 <- lm(log(medv) ~ ., data = Boston)
summary(fit1)
## 
## Call:
## lm(formula = log(medv) ~ ., data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73361 -0.09747 -0.01657  0.09629  0.86435 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.1020423  0.2042726  20.081  < 2e-16 ***
## crim        -0.0102715  0.0013155  -7.808 3.52e-14 ***
## zn           0.0011725  0.0005495   2.134 0.033349 *  
## indus        0.0024668  0.0024614   1.002 0.316755    
## chas1        0.1008876  0.0344859   2.925 0.003598 ** 
## nox         -0.7783993  0.1528902  -5.091 5.07e-07 ***
## rm           0.0908331  0.0167280   5.430 8.87e-08 ***
## age          0.0002106  0.0005287   0.398 0.690567    
## dis         -0.0490873  0.0079834  -6.149 1.62e-09 ***
## rad          0.0142673  0.0026556   5.373 1.20e-07 ***
## tax         -0.0006258  0.0001505  -4.157 3.80e-05 ***
## ptratio     -0.0382715  0.0052365  -7.309 1.10e-12 ***
## b            0.0004136  0.0001075   3.847 0.000135 ***
## lstat       -0.0290355  0.0020299 -14.304  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1899 on 492 degrees of freedom
## Multiple R-squared:  0.7896, Adjusted R-squared:  0.7841 
## F-statistic: 142.1 on 13 and 492 DF,  p-value: < 2.2e-16

Use vif(fit1) to test the multicollinearity of our model. VIF(variance inflation factor) values higher than 5 are considered to be problematic, so we drop them from the model. ‘Tax’has the highest VIF value at 9.0086, so we drop it first.

vif(fit1)
##    crim      zn   indus   chas1     nox      rm     age     dis     rad 
##  1.7922  2.2988  3.9916  1.0740  4.3937  1.9337  3.1008  3.9559  7.4845 
##     tax ptratio       b   lstat 
##  9.0086  1.7991  1.3485  2.9415

4.2 Second Model: Remove variables with large vif values (TAX)

Remove ‘tax’ from our first model fit1. Use vif() and summary() again to diagnose. Now the VIF values are all below 5. So we assume there is no more collinear variables.

fit2 <- update(fit1, ~ . - tax)
vif(fit2)
##    crim      zn   indus   chas1     nox      rm     age     dis     rad 
##  1.7919  2.1842  3.2260  1.0582  4.3693  1.9231  3.0980  3.9544  2.8375 
## ptratio       b   lstat 
##  1.7888  1.3476  2.9408
summary(fit2)
## 
## Call:
## lm(formula = log(medv) ~ crim + zn + indus + chas + nox + rm + 
##     age + dis + rad + ptratio + b + lstat, data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73499 -0.10227 -0.01119  0.09491  0.86954 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.0091581  0.2063732  19.427  < 2e-16 ***
## crim        -0.0102067  0.0013369  -7.635 1.18e-13 ***
## zn           0.0006626  0.0005444   1.217 0.224110    
## indus       -0.0020148  0.0022491  -0.896 0.370779    
## chas1        0.1182635  0.0347924   3.399 0.000731 ***
## nox         -0.8258144  0.1549617  -5.329 1.50e-07 ***
## rm           0.0959990  0.0169551   5.662 2.54e-08 ***
## age          0.0001448  0.0005372   0.270 0.787643    
## dis         -0.0497335  0.0081127  -6.130 1.80e-09 ***
## rad          0.0055679  0.0016619   3.350 0.000869 ***
## ptratio     -0.0399143  0.0053071  -7.521 2.59e-13 ***
## b            0.0004255  0.0001092   3.895 0.000112 ***
## lstat       -0.0289062  0.0020630 -14.012  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.193 on 493 degrees of freedom
## Multiple R-squared:  0.7823, Adjusted R-squared:  0.777 
## F-statistic: 147.6 on 12 and 493 DF,  p-value: < 2.2e-16

Notice that R-squared has been reduced a little, but not significantly, we take it as acceptable.

4.3 Third Model: Remove non-significant variables (AGE, INDUS, ZN)

First we remove ‘age’ and ‘indus’, because these two variables are affirmed to be non-significant variables from above. The summary is as follows:

fit3 <- update(fit2, ~ . - age - indus)
summary(fit3)
## 
## Call:
## lm(formula = log(medv) ~ crim + zn + chas + nox + rm + dis + 
##     rad + ptratio + b + lstat, data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73447 -0.10493 -0.01084  0.09297  0.87348 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.0065321  0.2054382  19.502  < 2e-16 ***
## crim        -0.0101496  0.0013339  -7.609 1.41e-13 ***
## zn           0.0006511  0.0005400   1.206 0.228470    
## chas1        0.1169498  0.0346573   3.374 0.000798 ***
## nox         -0.8609244  0.1397957  -6.158 1.52e-09 ***
## rm           0.0989867  0.0164154   6.030 3.21e-09 ***
## dis         -0.0487057  0.0075256  -6.472 2.33e-10 ***
## rad          0.0053533  0.0016421   3.260 0.001191 ** 
## ptratio     -0.0406652  0.0051938  -7.830 3.00e-14 ***
## b            0.0004321  0.0001088   3.973 8.15e-05 ***
## lstat       -0.0288688  0.0019297 -14.960  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1928 on 495 degrees of freedom
## Multiple R-squared:  0.7819, Adjusted R-squared:  0.7775 
## F-statistic: 177.4 on 10 and 495 DF,  p-value: < 2.2e-16

The p-value for ‘zn’ in the output is larger than 0.05, which means ‘zn’ is also a non-significant variable. So it is necessary to remove the variable ‘zn’. Then we get:

fit3 <- update(fit2, ~ . - age - indus - zn)
summary(fit3)
## 
## Call:
## lm(formula = log(medv) ~ crim + chas + nox + rm + dis + rad + 
##     ptratio + b + lstat, data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.73252 -0.10612 -0.01410  0.09214  0.87773 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.0213020  0.2051665  19.600  < 2e-16 ***
## crim        -0.0100087  0.0013294  -7.529 2.44e-13 ***
## chas1        0.1161515  0.0346669   3.351 0.000868 ***
## nox         -0.8712566  0.1395967  -6.241 9.32e-10 ***
## rm           0.1014707  0.0162931   6.228 1.01e-09 ***
## dis         -0.0442353  0.0065520  -6.751 4.10e-11 ***
## rad          0.0056058  0.0016295   3.440 0.000630 ***
## ptratio     -0.0426888  0.0049175  -8.681  < 2e-16 ***
## b            0.0004319  0.0001088   3.970 8.26e-05 ***
## lstat       -0.0288434  0.0019305 -14.941  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1929 on 496 degrees of freedom
## Multiple R-squared:  0.7812, Adjusted R-squared:  0.7773 
## F-statistic: 196.8 on 9 and 496 DF,  p-value: < 2.2e-16

4.4 Fourth Model: Diagnose for outliers

Using outlierTest() to check for outliers and high leverage points in our data. Observations with an absolute value greater than 3 for their studentized residuals are considered problematic, so we remove the former 11 observations from the model.

outlierTest(fit3, cutoff = Inf, n.max = 15)
##      rstudent unadjusted p-value Bonferonni p
## 413  4.777095         2.3470e-06    0.0011876
## 372  4.245108         2.6097e-05    0.0132050
## 373  3.910611         1.0491e-04    0.0530840
## 402 -3.877262         1.1989e-04    0.0606620
## 369  3.803402         1.6056e-04    0.0812420
## 375  3.498285         5.1048e-04    0.2583000
## 215  3.491031         5.2420e-04    0.2652400
## 401 -3.464786         5.7676e-04    0.2918400
## 490 -3.358377         8.4446e-04    0.4272900
## 506 -3.164928         1.6467e-03    0.8332400
## 398 -3.045484         2.4469e-03           NA
## 400 -2.950863         3.3191e-03           NA
## 368  2.891055         4.0083e-03           NA
## 410  2.890862         4.0107e-03           NA
## 365 -2.544832         1.1236e-02           NA

Remove the outliers from the data frame and update our model into the fourth model.

Boston <- Boston[-c(413, 372, 373, 402, 369, 375, 215, 401, 490, 506, 398),]
fit4 <- lm(log(medv) ~ . - tax - age - indus - zn, data = Boston)
summary(fit4)
## 
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn, data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.55441 -0.09252 -0.01404  0.08823  0.64232 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.580e+00  1.771e-01  20.212  < 2e-16 ***
## crim        -9.324e-03  1.129e-03  -8.261 1.39e-15 ***
## chas1        8.973e-02  2.968e-02   3.023  0.00264 ** 
## nox         -6.568e-01  1.198e-01  -5.483 6.75e-08 ***
## rm           1.301e-01  1.411e-02   9.223  < 2e-16 ***
## dis         -3.788e-02  5.585e-03  -6.783 3.44e-11 ***
## rad          3.280e-03  1.407e-03   2.331  0.02016 *  
## ptratio     -3.783e-02  4.190e-03  -9.029  < 2e-16 ***
## b            5.298e-04  9.379e-05   5.649 2.76e-08 ***
## lstat       -2.789e-02  1.717e-03 -16.243  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1629 on 485 degrees of freedom
## Multiple R-squared:  0.8322, Adjusted R-squared:  0.8291 
## F-statistic: 267.2 on 9 and 485 DF,  p-value: < 2.2e-16

It looks like ‘lstat’ is our most significant variable along with ‘rm’. Our R-squared is 0.8322, and since it is close to 1, we can say our model explains a large part of our response variable. Our residual standard error is fairly small at 0.1629, which is good.

Now we get residual diagnostics plots using the par() and plot() methods to test our model. From the Residuals vs Fitted plot below, we can infer that there is non-linearity in our model.

par(mfrow = c(2,2))
plot(fit4)

Another assumption for linear regression is normality of residual terms. We test normality of residuals using the studres() and hist() methods. We also fit a curve using the lines() method. Here we see that the residuals of our model display a normal distribution, which is good as this is a linear regression assumption.

studentizedResiduals <- studres(fit4)
par(mfrow = c(1,1))
hist(studentizedResiduals, freq = FALSE, main = "Distribution of Studentized Residuals")
xfit <- seq(min(studentizedResiduals), max(studentizedResiduals), length = 40)
yfit <- dnorm(xfit)
lines(xfit, yfit)

4.5 Fifth Model: Add non-linear terms

Recall the scatterplots we got before, we can sense a non-linear relationship between ‘lstat’ and ‘medv’. So we add ‘lstat’-squared to the model. In our model summary we see that ‘I(lstat^2)’ is indeed statistically significant. Our R-squared has increased to 0.8379 and our RSE has decreased to 0.1603.

fit5 <- lm(log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2), data = Boston)
summary(fit5)
## 
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2), 
##     data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.60240 -0.09440 -0.01450  0.09271  0.65533 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.571e+00  1.683e-01  21.218  < 2e-16 ***
## crim        -8.890e-03  1.025e-03  -8.678  < 2e-16 ***
## chas1        8.875e-02  2.936e-02   3.022  0.00264 ** 
## nox         -5.004e-01  1.102e-01  -4.543 7.02e-06 ***
## rm           1.221e-01  1.416e-02   8.627  < 2e-16 ***
## dis         -3.819e-02  5.516e-03  -6.923 1.41e-11 ***
## ptratio     -3.036e-02  3.850e-03  -7.886 2.08e-14 ***
## b            4.709e-04  9.155e-05   5.144 3.92e-07 ***
## lstat       -4.543e-02  4.706e-03  -9.653  < 2e-16 ***
## I(lstat^2)   5.172e-04  1.283e-04   4.030 6.48e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1611 on 485 degrees of freedom
## Multiple R-squared:  0.8358, Adjusted R-squared:  0.8328 
## F-statistic: 274.3 on 9 and 485 DF,  p-value: < 2.2e-16

Now take a look at the residual diagnostics plots for fit5, the curve in Residuals vs Fitted plot seems to be flatter. Next we can include a third-level polynomial for ‘lstat’ to see if we can get a better model.

par(mfrow = c(2,2))
plot(fit5)

4.6 Sixth Model: More non-linearity

Here we add ‘I(lstat^3)’ to our model. Given its p-value, it looks like ‘I(lstat^3)’ does not add much to the model. The values of R-squared and RSE are almost the same.

fit6 <- lm(log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2) + I(lstat^3), data = Boston)
summary(fit6)
## 
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2) + 
##     I(lstat^3), data = Boston)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.59951 -0.09217 -0.01623  0.09464  0.64919 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.649e+00  1.866e-01  19.557  < 2e-16 ***
## crim        -8.914e-03  1.025e-03  -8.697  < 2e-16 ***
## chas1        8.670e-02  2.944e-02   2.945  0.00339 ** 
## nox         -5.070e-01  1.104e-01  -4.593 5.57e-06 ***
## rm           1.173e-01  1.502e-02   7.806 3.69e-14 ***
## dis         -3.838e-02  5.520e-03  -6.953 1.16e-11 ***
## ptratio     -3.043e-02  3.851e-03  -7.901 1.88e-14 ***
## b            4.741e-04  9.162e-05   5.174 3.36e-07 ***
## lstat       -5.617e-02  1.207e-02  -4.652 4.24e-06 ***
## I(lstat^2)   1.245e-03  7.642e-04   1.629  0.10390    
## I(lstat^3)  -1.407e-05  1.456e-05  -0.966  0.33438    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1612 on 484 degrees of freedom
## Multiple R-squared:  0.8361, Adjusted R-squared:  0.8327 
## F-statistic: 246.9 on 10 and 484 DF,  p-value: < 2.2e-16

There is no significant improvement from the residual diagnostics plot.

par(mfrow = c(2,2))
plot(fit6)

5 Cross Validation

During this step, we determine which of our models is best at estimating our response variable while not overfitting out data. Since the goal of cross validation is to define a dataset to “test” the model in the training phase, in order to limit problems like overfitting, give an insight on how the model will generalize to an independent dataset. So for this step, we use the CVlm() method. We want each of the fold symbols to be as close as possible to the lines on the plot. We call the attr() to get more details on the mean squared error of the fit. We will be comparing models 4, 5 and 6.

## Analysis of Variance Table
## 
## Response: log(medv)
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## crim        1  22.42   22.42   844.6 < 2e-16 ***
## chas        1   1.03    1.03    38.9 9.8e-10 ***
## nox         1   9.25    9.25   348.4 < 2e-16 ***
## rm          1  18.40   18.40   693.5 < 2e-16 ***
## dis         1   0.35    0.35    13.4 0.00028 ***
## rad         1   0.62    0.62    23.5 1.7e-06 ***
## ptratio     1   3.06    3.06   115.2 < 2e-16 ***
## b           1   1.70    1.70    63.9 9.7e-15 ***
## lstat       1   7.00    7.00   263.8 < 2e-16 ***
## Residuals 485  12.87    0.03                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## fold 1 
## Observations in test set: 49 
##                 5      24      38       44      45     59     72     80
## Predicted   3.343  2.6921  3.1196  3.21010  3.1304 3.0814  3.094  3.151
## cvpred      3.342  2.6912  3.1262  3.21368  3.1337 3.0779  3.100  3.159
## log(medv)   3.589  2.6741  3.0445  3.20680  3.0540 3.1485  3.077  3.011
## CV residual 0.247 -0.0171 -0.0817 -0.00688 -0.0797 0.0705 -0.023 -0.148
##                 97    100    102     104    106    118     131   148
## Predicted    3.173 3.4831 3.2332  2.9923 2.8941  3.167  2.9962 2.416
## cvpred       3.182 3.4880 3.2379  2.9975 2.9008  3.172  2.9976 2.405
## log(medv)    3.063 3.5025 3.2771  2.9601 2.9704  2.955  2.9549 2.681
## CV residual -0.118 0.0146 0.0392 -0.0374 0.0696 -0.217 -0.0427 0.276
##                150     157    168      190    202    213    228     275
## Predicted   2.7015  2.6626 3.0812  3.55566  3.323 3.0335 3.4433  3.5599
## cvpred      2.6927  2.6643 3.0889  3.55791  3.327 3.0398 3.4456  3.5688
## log(medv)   2.7344  2.5726 3.1697  3.55249  3.182 3.1091 3.4532  3.4782
## CV residual 0.0416 -0.0917 0.0808 -0.00543 -0.145 0.0692 0.0076 -0.0907
##                285    287     294     297    298    300    317     321
## Predicted   3.3775 2.9286  3.2600  3.3289 2.9822  3.458 2.8508  3.2034
## cvpred      3.3746 2.9247  3.2638  3.3300 2.9813  3.454 2.8499  3.2066
## log(medv)   3.4720 3.0007  3.1739  3.2995 3.0106  3.367 2.8792  3.1697
## CV residual 0.0973 0.0761 -0.0899 -0.0305 0.0293 -0.087 0.0293 -0.0369
##               328    332     337     339   345    352   392    441    443
## Predicted   2.969  3.009  2.9963  3.0725 3.339 3.1240 2.768  2.480 2.8361
## cvpred      2.968  3.011  2.9980  3.0751 3.336 3.1148 2.757  2.470 2.8239
## log(medv)   3.100  2.839  2.9704  3.0253 3.440 3.1822 3.144  2.351 2.9124
## CV residual 0.132 -0.172 -0.0276 -0.0499 0.104 0.0674 0.387 -0.118 0.0884
##                446     453   468   469    472     482    491    504
## Predicted   2.4283  2.8310 2.736 2.719  3.051  3.2366  2.366  3.291
## cvpred      2.4203  2.8185 2.729 2.715  3.048  3.2327  2.368  3.297
## log(medv)   2.4681  2.7788 2.950 2.950  2.976  3.1655  2.092  3.174
## CV residual 0.0478 -0.0397 0.221 0.235 -0.072 -0.0672 -0.276 -0.123
## 
## Sum of squares = 0.78    Mean square = 0.02    n = 49 
## 
## fold 2 
## Observations in test set: 50 
##                6      10     11      15     20     22    39      47    65
## Predicted   3.25 2.93528  2.902  2.9790  2.933 2.8840 3.097 2.99215 3.172
## cvpred      3.25 2.93463  2.901  2.9710  2.926 2.8765 3.096 2.98927 3.162
## log(medv)   3.36 2.93916  2.708  2.9014  2.901 2.9755 3.207 2.99573 3.497
## CV residual 0.11 0.00453 -0.193 -0.0696 -0.025 0.0991 0.111 0.00646 0.335
##                 66      74      75     81     87   107     120    125
## Predicted    3.371  3.1957  3.2768 3.3279 3.0609 2.828  3.0349 2.8813
## cvpred       3.371  3.1925  3.2771 3.3244 3.0577 2.825  3.0375 2.8746
## log(medv)    3.157  3.1527  3.1822 3.3322 3.1135 2.970  2.9601 2.9339
## CV residual -0.214 -0.0397 -0.0949 0.0078 0.0558 0.146 -0.0774 0.0593
##               144    155   167   182     198    206    216   253    261
## Predicted   2.562  3.025 3.671 3.243  3.4398 3.0914 3.1660 3.271 3.4727
## cvpred      2.555  3.021 3.669 3.242  3.4373 3.0899 3.1638 3.267 3.4753
## log(medv)   2.747  2.833 3.912 3.589  3.4111 3.1179 3.2189 3.388 3.5205
## CV residual 0.192 -0.188 0.243 0.347 -0.0261 0.0281 0.0551 0.121 0.0452
##                274    278    289     296    306     324     347   355
## Predicted   3.5282  3.520  3.248  3.3784 3.2891  2.9723  2.8916 2.804
## cvpred      3.5237  3.521  3.250  3.3780 3.2921  2.9701  2.8834 2.792
## log(medv)   3.5610  3.500  3.105  3.3534 3.3464  2.9178  2.8449 2.901
## CV residual 0.0373 -0.021 -0.146 -0.0246 0.0542 -0.0523 -0.0385 0.109
##                376     391    394    403    405   418   419    424    427
## Predicted    3.008  2.7722  2.884  2.754 2.1220 2.159 1.904 2.4950  2.656
## cvpred       3.009  2.7895  2.899  2.766 2.0956 2.145 1.829 2.4991  2.657
## log(medv)    2.708  2.7147  2.625  2.493 2.1401 2.342 2.175 2.5953  2.322
## CV residual -0.301 -0.0748 -0.274 -0.273 0.0445 0.196 0.346 0.0961 -0.335
##                433    439   449    458     462     488    499
## Predicted    2.920 2.0721  2.76 2.5465  2.9232 3.00203 3.0289
## cvpred       2.929 2.0669  2.78 2.5476  2.9420 3.02315 3.0287
## log(medv)    2.779 2.1282  2.65 2.6027  2.8736 3.02529 3.0540
## CV residual -0.151 0.0613 -0.13 0.0551 -0.0684 0.00215 0.0253
## 
## Sum of squares = 1.17    Mean square = 0.02    n = 50 
## 
## fold 3 
## Observations in test set: 50 
##                  1      7     13    19      35    55      68      85    99
## Predicted    3.423 3.1111 3.0050 2.831  2.6455 2.809  3.1119  3.1871 3.607
## cvpred       3.415 3.1133 3.0128 2.814  2.6529 2.814  3.1042  3.1884 3.613
## log(medv)    3.178 3.1311 3.0773 3.006  2.6027 2.939  3.0910  3.1739 3.780
## CV residual -0.237 0.0179 0.0645 0.192 -0.0502 0.125 -0.0132 -0.0145 0.166
##                108     110    112     117   121     129    137    139
## Predicted   2.9924 2.94962  3.265  3.1388 2.966  2.9276 2.8062  2.691
## cvpred      2.9958 2.95750  3.269  3.1397 2.965  2.9330 2.8078  2.702
## log(medv)   3.0155 2.96527  3.127  3.0540 3.091  2.8904 2.8565  2.588
## CV residual 0.0198 0.00777 -0.142 -0.0857 0.126 -0.0427 0.0487 -0.115
##               147    152     160     165   203   230    236    239    244
## Predicted   2.748 2.9079  3.2222  3.1769 3.633 3.434 3.1618  3.322  3.309
## cvpred      2.728 2.8867  3.2029  3.1722 3.640 3.423 3.1602  3.319  3.301
## log(medv)   2.747 2.9755  3.1485  3.1224 3.745 3.450 3.1781  3.165  3.165
## CV residual 0.019 0.0888 -0.0545 -0.0498 0.105 0.027 0.0178 -0.154 -0.136
##                258     270    313     320     326     340   359   360
## Predicted   3.8022  3.1190  3.097 3.03421  3.2281  3.0288 3.014 2.903
## cvpred      3.8189  3.1209  3.095 3.03491  3.2165  3.0209 2.999 2.893
## log(medv)   3.9120  3.0301  2.965 3.04452  3.2027  2.9444 3.122 3.118
## CV residual 0.0931 -0.0908 -0.129 0.00961 -0.0137 -0.0765 0.123 0.225
##               361   367    397    400   411    420     431    448  455
## Predicted   3.065 2.745  2.825  2.395 2.441  2.524  2.7325  2.794 2.60
## cvpred      3.046 2.722  2.837  2.419 2.391  2.527  2.7271  2.793 2.59
## log(medv)   3.219 3.086  2.526  1.841 2.708  2.128  2.6741  2.534 2.70
## CV residual 0.173 0.364 -0.311 -0.579 0.317 -0.399 -0.0529 -0.259 0.11
##               460   465   467    479    481    487   497
## Predicted   2.852 2.929 2.628  2.817 3.0843 2.9102 2.703
## cvpred      2.847 2.925 2.612  2.824 3.0817 2.9130 2.711
## log(medv)   2.996 3.063 2.944  2.681 3.1355 2.9497 2.981
## CV residual 0.148 0.138 0.333 -0.143 0.0538 0.0367 0.269
## 
## Sum of squares = 1.56    Mean square = 0.03    n = 50 
## 
## fold 4 
## Observations in test set: 50 
##                  3     8     50      51     52     70     77      94
## Predicted   3.4476 2.930 2.8733  3.0208  3.152  3.093  3.129  3.3059
## cvpred      3.4505 2.929 2.8724  3.0200  3.153  3.094  3.124  3.3062
## log(medv)   3.5467 3.300 2.9653  2.9806  3.020  3.040  2.996  3.2189
## CV residual 0.0963 0.371 0.0929 -0.0393 -0.133 -0.054 -0.128 -0.0874
##                 95   101   133   134     141    153    172    175   181
## Predicted    3.191 3.180 3.011 2.829  2.6528  2.983  3.145  3.232 3.501
## cvpred       3.189 3.181 3.018 2.834  2.6520  3.000  3.147  3.231 3.499
## log(medv)    3.025 3.314 3.135 2.912  2.6391  2.728  2.950  3.118 3.684
## CV residual -0.164 0.134 0.118 0.078 -0.0129 -0.273 -0.197 -0.113 0.185
##                197   205    207    222     235     243     248    265
## Predicted    3.612 3.742 3.1278 2.9985  3.4039  3.1352 3.01686 3.5188
## cvpred       3.612 3.739 3.1284 2.9942  3.4039  3.1320 3.01698 3.5226
## log(medv)    3.506 3.912 3.1946 3.0773  3.3673  3.1001 3.02042 3.5973
## CV residual -0.107 0.173 0.0662 0.0831 -0.0366 -0.0319 0.00344 0.0747
##                269     277    283    309     315    318     319     327
## Predicted   3.7040  3.5244 3.7444  3.357  3.2054 2.9030  3.1532  3.1834
## cvpred      3.7072  3.5264 3.7451  3.362  3.2088 2.9045  3.1563  3.1883
## log(medv)   3.7728  3.5025 3.8286  3.127  3.1697 2.9857  3.1398  3.1355
## CV residual 0.0656 -0.0238 0.0835 -0.236 -0.0391 0.0812 -0.0165 -0.0528
##                341    344    346   362   363     377    378    399    406
## Predicted    3.039  3.277  2.978 2.870 2.890  2.6698  2.813  2.092  2.055
## cvpred       3.043  3.279  2.981 2.875 2.901  2.6808  2.819  2.137  2.144
## log(medv)    2.929  3.174  2.862 2.991 3.035  2.6319  2.588  1.609  1.609
## CV residual -0.115 -0.105 -0.119 0.115 0.134 -0.0489 -0.232 -0.528 -0.534
##               414   415    435    450    452    461   466    498
## Predicted   2.398 1.554  2.651  2.731  2.846  2.832 2.851  2.948
## cvpred      2.416 1.585  2.649  2.733  2.849  2.828 2.850  2.951
## log(medv)   2.791 1.946  2.460  2.565  2.721  2.797 2.991  2.907
## CV residual 0.375 0.361 -0.189 -0.168 -0.128 -0.031 0.141 -0.044
## 
## Sum of squares = 1.63    Mean square = 0.03    n = 50 
## 
## fold 5 
## Observations in test set: 50 
##                  2     17     21     41      42     48    56    64     78
## Predicted    3.208 3.0590  2.628 3.4801  3.3509  2.869 3.344 3.085  3.164
## cvpred       3.211 3.0701  2.628 3.4849  3.3540  2.863 3.335 3.076  3.171
## log(medv)    3.073 3.1398  2.610 3.5525  3.2809  2.809 3.567 3.219  3.035
## CV residual -0.138 0.0697 -0.018 0.0676 -0.0731 -0.054 0.232 0.143 -0.136
##                  86     92    151    159   185     188     195   204  209
## Predicted    3.3151  3.288 2.9925  3.362 3.029  3.4952  3.4310 3.693 3.08
## cvpred       3.3208  3.297 3.0014  3.382 3.039  3.5026  3.4359 3.691 3.06
## log(medv)    3.2809  3.091 3.0681  3.190 3.273  3.4657  3.3707 3.882 3.19
## CV residual -0.0399 -0.206 0.0667 -0.191 0.234 -0.0369 -0.0652 0.191 0.13
##               212   249    250     255    259   262      264     272
## Predicted   2.760 3.070 3.1991  3.1756 3.5419 3.576  3.44146  3.2508
## cvpred      2.743 3.065 3.1949  3.1738 3.5476 3.579  3.44268  3.2600
## log(medv)   2.960 3.199 3.2658  3.0865 3.5835 3.764  3.43399  3.2268
## CV residual 0.217 0.134 0.0708 -0.0874 0.0359 0.184 -0.00869 -0.0332
##                 280     291    299     308    310    311    312    314
## Predicted   3.54613  3.4131  3.359  3.3675  3.128  2.900  3.273  3.217
## cvpred      3.55462  3.4185  3.362  3.3738  3.139  2.921  3.288  3.228
## log(medv)   3.55820  3.3499  3.114  3.3393  3.011  2.779  3.096  3.073
## CV residual 0.00358 -0.0686 -0.249 -0.0345 -0.129 -0.142 -0.193 -0.155
##                329    333     338   350    379    382    393    404
## Predicted    3.142  3.163  2.9629 3.168 2.5607  2.722  2.409  2.499
## cvpred       3.149  3.168  2.9667 3.159 2.5672  2.727  2.420  2.519
## log(medv)    2.960  2.965  2.9178 3.281 2.5726  2.389  2.272  2.116
## CV residual -0.189 -0.202 -0.0489 0.122 0.0054 -0.338 -0.148 -0.403
##                 428    437     440     447     451    484     503   505
## Predicted    2.4391  2.550  2.5566  2.7690  2.6677 3.0311  3.0790  3.23
## cvpred       2.4683  2.562  2.5635  2.7743  2.6744 3.0438  3.0904  3.24
## log(medv)    2.3888  2.262  2.5494  2.7014  2.5953 3.0819  3.0253  3.09
## CV residual -0.0795 -0.301 -0.0141 -0.0729 -0.0791 0.0381 -0.0651 -0.15
## 
## Sum of squares = 1.14    Mean square = 0.02    n = 50 
## 
## fold 6 
## Observations in test set: 50 
##                18      31      37      43     53      71     76      79
## Predicted   2.852  2.5785  3.0555  3.2444  3.321  3.2455  3.202  3.0831
## cvpred      2.845  2.5642  3.0626  3.2527  3.326  3.2501  3.208  3.0836
## log(medv)   2.862  2.5416  2.9957  3.2308  3.219  3.1864  3.063  3.0540
## CV residual 0.017 -0.0226 -0.0669 -0.0219 -0.107 -0.0637 -0.144 -0.0296
##                  82      84     89     114    116  123   124   142     143
## Predicted    3.2593  3.1898  3.431  2.9949  2.987 2.88 2.657 2.215  2.6424
## cvpred       3.2595  3.1968  3.428  2.9923  2.988 2.87 2.640 2.197  2.6294
## log(medv)    3.1739  3.1311  3.161  2.9285  2.907 3.02 2.851 2.667  2.5953
## CV residual -0.0857 -0.0657 -0.267 -0.0638 -0.081 0.15 0.211 0.471 -0.0341
##               154    164    171   187    199    211   220     232    233
## Predicted   2.828 3.8398  3.057 3.577 3.5372 3.0119  3.32  3.4911 3.6848
## cvpred      2.820 3.8245  3.061 3.564 3.5329 3.0111  3.33  3.4864 3.6690
## log(medv)   2.965 3.9120  2.856 3.912 3.5439 3.0773  3.14  3.4563 3.7305
## CV residual 0.145 0.0875 -0.204 0.348 0.0109 0.0662 -0.19 -0.0301 0.0615
##               246   254   263   268     273    302    304     322    323
## Predicted   2.715 3.443 3.721 3.706  3.2766  3.304 3.4806  3.2060  3.131
## cvpred      2.713 3.418 3.699 3.687  3.2780  3.307 3.4847  3.2089  3.138
## log(medv)   2.918 3.757 3.888 3.912  3.1946  3.091 3.4995  3.1398  3.016
## CV residual 0.205 0.339 0.189 0.225 -0.0834 -0.216 0.0148 -0.0691 -0.122
##               330     336    357    380    388   412     421     432  436
## Predicted    3.27 3.03950 2.8321  2.646  2.135 2.607  2.8335  2.7213 2.50
## cvpred       3.28 3.04141 2.8321  2.645  2.139 2.596  2.8338  2.7097 2.48
## log(medv)    3.12 3.04927 2.8792  2.322  2.001 2.845  2.8154  2.6462 2.60
## CV residual -0.16 0.00786 0.0471 -0.323 -0.137 0.249 -0.0184 -0.0636 0.12
##                 445     454    464     476     489    494
## Predicted    2.4431  2.9508  3.035  2.6580  2.7778 3.0154
## cvpred       2.4373  2.9327  3.040  2.6543  2.7771 3.0225
## log(medv)    2.3795  2.8792  3.006  2.5878  2.7213 3.0819
## CV residual -0.0578 -0.0535 -0.034 -0.0666 -0.0558 0.0594
## 
## Sum of squares = 1.23    Mean square = 0.02    n = 50 
## 
## fold 7 
## Observations in test set: 49 
##                 12      14      25     46     67      69     83   111
## Predicted    3.062  3.0153  2.7970  3.089  3.150  2.9354  3.233 3.010
## cvpred       3.061  3.0217  2.7900  3.090  3.151  2.9340  3.237 3.003
## log(medv)    2.939  3.0155  2.7473  2.960  2.965  2.8565  3.211 3.077
## CV residual -0.122 -0.0062 -0.0427 -0.129 -0.185 -0.0776 -0.026 0.074
##                135   136    140    145   146   158    161    178  180
## Predicted   2.6890 2.857 2.8089 2.4228 2.506 3.529  3.492  3.363 3.47
## cvpred      2.6869 2.845 2.7946 2.4014 2.494 3.532  3.500  3.366 3.47
## log(medv)   2.7473 2.896 2.8792 2.4681 2.625 3.721  3.296  3.203 3.62
## CV residual 0.0604 0.051 0.0846 0.0667 0.131 0.189 -0.204 -0.163 0.15
##               186    193     194   196   200   210     219     223    240
## Predicted   3.110 3.5491  3.4471 3.708 3.383 2.769 3.07898  3.4001  3.313
## cvpred      3.102 3.5570  3.4511 3.710 3.390 2.751 3.06570  3.3960  3.316
## log(medv)   3.388 3.5946  3.4372 3.912 3.552 2.996 3.06805  3.3142  3.148
## CV residual 0.286 0.0376 -0.0139 0.202 0.162 0.245 0.00235 -0.0818 -0.168
##                241     251    282    284     293    301     334     335
## Predicted    3.237  3.2055 3.5295 3.8246  3.3448  3.409  3.1190  3.0883
## cvpred       3.232  3.2148 3.5329 3.8263  3.3497  3.414  3.1309  3.0985
## log(medv)    3.091  3.1946 3.5667 3.9120  3.3286  3.211  3.1001  3.0301
## CV residual -0.141 -0.0202 0.0338 0.0858 -0.0211 -0.203 -0.0308 -0.0684
##                342     351   374   387   389    396   409    425     438
## Predicted   3.4383 3.11081 2.187 2.179 2.220  2.865 2.540  2.598  2.2624
## cvpred      3.4426 3.12258 2.154 2.160 2.197  2.856 2.519  2.611  2.2620
## log(medv)   3.4874 3.13114 2.625 2.351 2.322  2.573 2.845  2.460  2.1633
## CV residual 0.0448 0.00856 0.471 0.192 0.126 -0.283 0.326 -0.151 -0.0987
##                 456      457   473     475      486    493
## Predicted    2.6602  2.53811 3.016  2.7310  3.05851 2.9684
## cvpred       2.6690  2.55043 3.010  2.7244  3.06124 2.9630
## log(medv)    2.6462  2.54160 3.144  2.6247  3.05400 3.0007
## CV residual -0.0229 -0.00883 0.134 -0.0997 -0.00724 0.0378
## 
## Sum of squares = 1.09    Mean square = 0.02    n = 49 
## 
## fold 8 
## Observations in test set: 49 
##                4     9      26      27      29     34     73     103
## Predicted   3.39 2.548  2.6977 2.80489  2.9632  2.711  3.237  2.9339
## cvpred      3.40 2.536  2.7002 2.80756  2.9716  2.711  3.235  2.9471
## log(medv)   3.51 2.803  2.6319 2.80940  2.9124  2.573  3.127  2.9232
## CV residual 0.11 0.267 -0.0683 0.00184 -0.0592 -0.138 -0.108 -0.0239
##                 113    115     128  149     169    173    174     176
## Predicted    3.0034  3.209 2.78037 2.46  3.2122 3.0596  3.325  3.4246
## cvpred       2.9933  3.202 2.77814 2.45  3.2039 3.0429  3.317  3.4195
## log(medv)    2.9339  2.918 2.78501 2.88  3.1697 3.1398  3.161  3.3810
## CV residual -0.0595 -0.285 0.00688 0.43 -0.0342 0.0969 -0.156 -0.0385
##                 189    221     242      245     252     256   257     276
## Predicted    3.4891  3.434  3.1061  2.87073  3.2527  3.0720 3.577  3.4904
## cvpred       3.4818  3.446  3.1002  2.87039  3.2597  3.0704 3.583  3.4910
## log(medv)    3.3945  3.285  3.0007  2.86790  3.2108  3.0397 3.784  3.4657
## CV residual -0.0872 -0.162 -0.0995 -0.00249 -0.0488 -0.0306 0.201 -0.0252
##               281   292  305    307      325    331    343     353     364
## Predicted   3.682 3.444 3.40 3.4743  3.22363  3.170  3.220  2.9809  2.9010
## cvpred      3.686 3.451 3.41 3.4793  3.22703  3.167  3.222  2.9877  2.9132
## log(medv)   3.816 3.619 3.59 3.5086  3.21888  2.986  2.803  2.9232  2.8214
## CV residual 0.129 0.168 0.18 0.0293 -0.00816 -0.181 -0.419 -0.0645 -0.0918
##                365   371    383    395   407   408    422    429     459
## Predicted    3.591 3.502  2.562  2.768 2.308 2.869  2.800  2.564  2.7602
## cvpred       3.630 3.517  2.552  2.761 2.281 2.860  2.799  2.570  2.7656
## log(medv)    3.086 3.912  2.425  2.542 2.477 3.329  2.653  2.398  2.7014
## CV residual -0.544 0.395 -0.127 -0.219 0.196 0.469 -0.146 -0.172 -0.0643
##               470   471   480     483   495  496     500
## Predicted   2.827 2.917 2.932  3.2954 2.999 2.84  2.9099
## cvpred      2.817 2.913 2.926  3.2996 2.994 2.83  2.9014
## log(medv)   3.001 2.991 3.063  3.2189 3.199 3.14  2.8622
## CV residual 0.184 0.078 0.137 -0.0807 0.205 0.31 -0.0392
## 
## Sum of squares = 1.85    Mean square = 0.04    n = 49 
## 
## fold 9 
## Observations in test set: 49 
##                 36     40    49    58      60     61     62     88     91
## Predicted    3.127 3.3591 2.447 3.425  3.0245 2.8781  2.878  3.223  3.272
## cvpred       3.105 3.3459 2.495 3.419  3.0087 2.8720  2.881  3.205  3.263
## log(medv)    2.939 3.4275 2.667 3.453  2.9755 2.9285  2.773  3.100  3.118
## CV residual -0.166 0.0816 0.172 0.034 -0.0332 0.0565 -0.109 -0.105 -0.146
##                  98     105     109    122    126    166    170  183   184
## Predicted   3.62282  3.0380  3.0806 2.9820 2.9753 3.1769  3.227 3.50 3.393
## cvpred      3.64980  3.0332  3.0853 2.9826 2.9766 3.1520  3.218 3.50 3.375
## log(medv)   3.65584  3.0007  2.9857 3.0106 3.0634 3.2189  3.105 3.63 3.481
## CV residual 0.00604 -0.0325 -0.0996 0.0281 0.0868 0.0669 -0.113 0.14 0.106
##               191   208   214    224   226     227    231     238   267
## Predicted   3.451 2.856 3.190 3.3450 3.711  3.6559 3.1268  3.4855 3.295
## cvpred      3.447 2.865 3.184 3.3290 3.748  3.6675 3.1070  3.4854 3.316
## log(medv)   3.611 3.114 3.336 3.4045 3.912  3.6270 3.1905  3.4500 3.424
## CV residual 0.164 0.249 0.152 0.0756 0.164 -0.0405 0.0835 -0.0354 0.108
##                279     288      290    295     349   354    358   366
## Predicted   3.3295  3.2474  3.21425  3.194  3.2517 3.178 3.0173 2.772
## cvpred      3.3162  3.2311  3.21740  3.180  3.2535 3.189 2.9838 2.637
## log(medv)   3.3707  3.1442  3.21084  3.077  3.1987 3.405 3.0773 3.314
## CV residual 0.0545 -0.0869 -0.00656 -0.103 -0.0548 0.215 0.0935 0.677
##               368  370   385    386    390   410    416  423    426    430
## Predicted   2.498 3.43 2.059  2.262  2.624 2.747  2.252 2.80  2.320  2.478
## cvpred      2.390 3.36 2.035  2.268  2.597 2.756  2.285 2.76  2.319  2.490
## log(medv)   3.140 3.91 2.175  1.974  2.442 3.314  1.974 3.03  2.116  2.251
## CV residual 0.749 0.55 0.139 -0.294 -0.154 0.558 -0.311 0.28 -0.202 -0.239
##                 434   442    463    502
## Predicted    2.7248 2.733 2.9029  3.114
## cvpred       2.7126 2.736 2.8838  3.122
## log(medv)    2.6603 2.839 2.9704  3.109
## CV residual -0.0524 0.103 0.0866 -0.013
## 
## Sum of squares = 2.48    Mean square = 0.05    n = 49 
## 
## fold 10 
## Observations in test set: 49 
##                  16     23      28     30     32    33      54     57
## Predicted    3.0018  2.771  2.7351 3.0113  2.901 2.406  3.1665 3.1947
## cvpred       3.0046  2.773  2.7433 3.0089  2.903 2.422  3.1689 3.1922
## log(medv)    2.9907  2.721  2.6946 3.0445  2.674 2.580  3.1527 3.2068
## CV residual -0.0139 -0.052 -0.0487 0.0356 -0.229 0.159 -0.0161 0.0146
##                  63      90     93       96    119   127     130       132
## Predicted    3.1516  3.4344  3.281  3.34624 2.9869 2.573  2.7305  2.975800
## cvpred       3.1477  3.4290  3.281  3.34683 2.9970 2.587  2.7384  2.976168
## log(medv)    3.1001  3.3569  3.131  3.34639 3.0155 2.754  2.6603  2.975530
## CV residual -0.0476 -0.0721 -0.149 -0.00044 0.0185 0.167 -0.0781 -0.000639
##                138    156  162  163     177    179     192   201     217
## Predicted    2.950  2.887 3.68 3.81  3.2080  3.430  3.4410 3.400  3.1838
## cvpred       2.951  2.916 3.67 3.80  3.2126  3.427  3.4376 3.393  3.1978
## log(medv)    2.839  2.747 3.91 3.91  3.1442  3.398  3.4177 3.493  3.1485
## CV residual -0.112 -0.169 0.24 0.11 -0.0684 -0.029 -0.0199 0.101 -0.0493
##                218   225   229   234    237   247     260    266    271
## Predicted   3.2868 3.669 3.577 3.637  3.345 3.035  3.5031  3.237 3.0164
## cvpred      3.2864 3.653 3.567 3.622  3.351 3.035  3.5022  3.248 3.0220
## log(medv)   3.3569 3.802 3.844 3.877  3.223 3.190  3.4045  3.127 3.0493
## CV residual 0.0705 0.149 0.277 0.255 -0.128 0.156 -0.0977 -0.121 0.0273
##               286     303    316     348   356   381     384   417    444
## Predicted    3.29  3.3074  3.020  3.1893 2.904 2.208  2.5459  2.45  2.764
## cvpred       3.29  3.3071  3.026  3.1859 2.903 2.128  2.5480  2.46  2.755
## log(medv)    3.09  3.2734  2.785  3.1398 3.025 2.342  2.5096  2.01  2.734
## CV residual -0.20 -0.0337 -0.241 -0.0461 0.122 0.214 -0.0384 -0.45 -0.021
##              474     477    478   485    492    501
## Predicted   3.13  2.8913 2.4523 2.937  2.842  2.987
## cvpred      3.12  2.8863 2.4527 2.938  2.849  2.992
## log(medv)   3.39  2.8154 2.4849 3.025  2.610  2.821
## CV residual 0.27 -0.0709 0.0322 0.087 -0.239 -0.171
## 
## Sum of squares = 1.05    Mean square = 0.02    n = 49 
## 
## Overall (Sum over all 49 folds) 
##     ms 
## 0.0283
## Analysis of Variance Table
## 
## Response: log(medv)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## crim         1  22.42   22.42   863.2 < 2e-16 ***
## chas         1   1.03    1.03    39.8 6.5e-10 ***
## nox          1   9.25    9.25   356.1 < 2e-16 ***
## rm           1  18.40   18.40   708.8 < 2e-16 ***
## dis          1   0.35    0.35    13.7 0.00024 ***
## ptratio      1   3.68    3.68   141.7 < 2e-16 ***
## b            1   1.62    1.62    62.4 1.9e-14 ***
## lstat        1   6.93    6.93   267.0 < 2e-16 ***
## I(lstat^2)   1   0.42    0.42    16.2 6.5e-05 ***
## Residuals  485  12.59    0.03                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## fold 1 
## Observations in test set: 49 
##                 5      24     38      44      45     59      72     80
## Predicted   3.375  2.6965  3.131  3.2188  3.1221 3.0940  3.0857  3.146
## cvpred      3.374  2.6901  3.138  3.2221  3.1227 3.0925  3.0897  3.153
## log(medv)   3.589  2.6741  3.045  3.2068  3.0540 3.1485  3.0773  3.011
## CV residual 0.215 -0.0159 -0.093 -0.0153 -0.0687 0.0559 -0.0124 -0.142
##                  97       100     102     104    106    118    131   148
## Predicted    3.1563  3.499393 3.26302  2.9893 2.8873  3.159  3.016 2.508
## cvpred       3.1602  3.502685 3.26738  2.9900 2.8893  3.164  3.013 2.506
## log(medv)    3.0634  3.502550 3.27714  2.9601 2.9704  2.955  2.955 2.681
## CV residual -0.0968 -0.000135 0.00977 -0.0299 0.0811 -0.209 -0.058 0.175
##                150    157   168      190    202   213     228    275   285
## Predicted   2.7195  2.688 3.063  3.55069  3.307 3.007 3.44650  3.604 3.354
## cvpred      2.7091  2.689 3.069  3.55442  3.310 3.008 3.45040  3.619 3.347
## log(medv)   2.7344  2.573 3.170  3.55249  3.182 3.109 3.45316  3.478 3.472
## CV residual 0.0253 -0.116 0.101 -0.00193 -0.128 0.101 0.00276 -0.141 0.125
##               287     294    297    298     300    317     321   328   332
## Predicted   2.900  3.2396  3.316 2.9286  3.4537 2.8324  3.2262 2.954  2.98
## cvpred      2.888  3.2422  3.317 2.9216  3.4522 2.8258  3.2301 2.949  2.98
## log(medv)   3.001  3.1739  3.300 3.0106  3.3673 2.8792  3.1697 3.100  2.84
## CV residual 0.113 -0.0683 -0.017 0.0891 -0.0849 0.0534 -0.0604 0.151 -0.14
##                 337    339    345    352   392   441   443    446     453
## Predicted    3.0085  3.094 3.3715 3.1447 2.725  2.46 2.795 2.4342 2.78578
## cvpred       3.0090  3.096 3.3722 3.1361 2.716  2.45 2.785 2.4284 2.77425
## log(medv)    2.9704  3.025 3.4404 3.1822 3.144  2.35 2.912 2.4681 2.77882
## CV residual -0.0386 -0.071 0.0682 0.0461 0.428 -0.10 0.128 0.0397 0.00457
##               468   469      472     482    491    504
## Predicted   2.685 2.662  2.98559  3.2028  2.464  3.363
## cvpred      2.681 2.655  2.98384  3.2043  2.474  3.370
## log(medv)   2.950 2.950  2.97553  3.1655  2.092  3.174
## CV residual 0.269 0.294 -0.00831 -0.0388 -0.382 -0.196
## 
## Sum of squares = 0.92    Mean square = 0.02    n = 49 
## 
## fold 2 
## Observations in test set: 50 
##                  6     10     11     15      20    22   39    47    65
## Predicted   3.2910 2.8843  2.854  3.000  2.9501 2.888 3.10 2.962 3.167
## cvpred      3.2891 2.8814  2.852  3.002  2.9518 2.890 3.10 2.959 3.164
## log(medv)   3.3569 2.9392  2.708  2.901  2.9014 2.976 3.21 2.996 3.497
## CV residual 0.0678 0.0577 -0.144 -0.101 -0.0504 0.086 0.11 0.037 0.333
##                 66      74     75      81     87   107     120    125
## Predicted    3.385  3.2054  3.290  3.3574 3.0380 2.822  3.0124 2.8812
## cvpred       3.379  3.2023  3.287  3.3551 3.0357 2.824  3.0126 2.8832
## log(medv)    3.157  3.1527  3.182  3.3322 3.1135 2.970  2.9601 2.9339
## CV residual -0.222 -0.0496 -0.105 -0.0229 0.0778 0.147 -0.0525 0.0506
##               144   155   167   182    198    206    216    253   261
## Predicted   2.614  3.03 3.708 3.241 3.3892 3.0838 3.1661 3.3125 3.444
## cvpred      2.620  3.04 3.707 3.240 3.3813 3.0828 3.1652 3.3099 3.444
## log(medv)   2.747  2.83 3.912 3.589 3.4111 3.1179 3.2189 3.3878 3.520
## CV residual 0.128 -0.21 0.205 0.349 0.0298 0.0351 0.0537 0.0778 0.076
##                274     278    289     296    306     324     347    355
## Predicted   3.5413  3.5548  3.228  3.3762 3.2762  2.9661  2.8724 2.8319
## cvpred      3.5473  3.5591  3.223  3.3719 3.2750  2.9656  2.8694 2.8301
## log(medv)   3.5610  3.4995  3.105  3.3534 3.3464  2.9178  2.8449 2.9014
## CV residual 0.0138 -0.0596 -0.118 -0.0185 0.0714 -0.0479 -0.0245 0.0714
##                376     391    394    403      405   418   419   424    427
## Predicted    2.961  2.7300  2.839  2.713 2.155358 2.189 1.914 2.480  2.620
## cvpred       2.957  2.7330  2.841  2.715 2.139813 2.179 1.880 2.477  2.614
## log(medv)    2.708  2.7147  2.625  2.493 2.140066 2.342 2.175 2.595  2.322
## CV residual -0.249 -0.0183 -0.217 -0.222 0.000253 0.162 0.294 0.118 -0.291
##                433     439     449    458     462  488    499
## Predicted    2.884  2.2072  2.7207 2.5278  2.8790 2.95 3.0231
## cvpred       2.881  2.2041  2.7231 2.5271  2.8843 2.96 3.0255
## log(medv)    2.779  2.1282  2.6462 2.6027  2.8736 3.03 3.0540
## CV residual -0.103 -0.0759 -0.0769 0.0756 -0.0107 0.07 0.0285
## 
## Sum of squares = 0.95    Mean square = 0.02    n = 50 
## 
## fold 3 
## Observations in test set: 50 
##                  1      7    13    19     35    55      68       85    99
## Predicted    3.458 3.0704 2.956 2.855  2.658 2.793  3.1147  3.18014 3.653
## cvpred       3.455 3.0669 2.956 2.836  2.662 2.792  3.1069  3.17911 3.666
## log(medv)    3.178 3.1311 3.077 3.006  2.603 2.939  3.0910  3.17388 3.780
## CV residual -0.277 0.0642 0.121 0.169 -0.059 0.147 -0.0159 -0.00523 0.114
##                108    110    112     117   121     129    137    139
## Predicted   2.9879 2.9408  3.252  3.1194 2.968  2.9391 2.8214  2.716
## cvpred      2.9874 2.9438  3.254  3.1168 2.964  2.9409 2.8198  2.726
## log(medv)   3.0155 2.9653  3.127  3.0540 3.091  2.8904 2.8565  2.588
## CV residual 0.0282 0.0215 -0.127 -0.0628 0.127 -0.0505 0.0366 -0.138
##                 147    152    160     165    203     230    236    239
## Predicted   2.76627 2.9240  3.268  3.1523 3.6571  3.4739 3.1347  3.318
## cvpred      2.74446 2.9023  3.253  3.1445 3.6688  3.4705 3.1298  3.316
## log(medv)   2.74727 2.9755  3.148  3.1224 3.7448  3.4500 3.1781  3.165
## CV residual 0.00281 0.0732 -0.105 -0.0221 0.0759 -0.0205 0.0482 -0.151
##                244    258     270    313    320     326     340   359
## Predicted    3.319 3.8062  3.0964  3.091 3.0222  3.2745  3.0406 2.993
## cvpred       3.314 3.8248  3.0918  3.086 3.0188  3.2686  3.0321 2.974
## log(medv)    3.165 3.9120  3.0301  2.965 3.0445  3.2027  2.9444 3.122
## CV residual -0.149 0.0872 -0.0616 -0.121 0.0257 -0.0658 -0.0876 0.148
##               360   361   367    397    400   411    420       431    448
## Predicted   2.877 3.072 2.719  2.778  2.443 2.449  2.516  2.686191  2.755
## cvpred      2.864 3.055 2.694  2.784  2.475 2.396  2.515  2.674507  2.748
## log(medv)   3.118 3.219 3.086  2.526  1.841 2.708  2.128  2.674149  2.534
## CV residual 0.254 0.164 0.392 -0.259 -0.634 0.312 -0.387 -0.000359 -0.215
##               455   460  465   467     479   481   487   497
## Predicted   2.577 2.811 2.88 2.597  2.7603 3.030 2.848 2.702
## cvpred      2.563 2.801 2.87 2.576  2.7604 3.023 2.843 2.708
## log(medv)   2.701 2.996 3.06 2.944  2.6810 3.135 2.950 2.981
## CV residual 0.138 0.194 0.19 0.368 -0.0794 0.112 0.106 0.272
## 
## Sum of squares = 1.69    Mean square = 0.03    n = 50 
## 
## fold 4 
## Observations in test set: 50 
##                  3     8    50      51     52      70   77     94     95
## Predicted   3.4942 2.880 2.840 2.97776  3.131  3.0899  3.1  3.329  3.175
## cvpred      3.4955 2.875 2.837 2.97531  3.131  3.0905  3.1  3.331  3.172
## log(medv)   3.5467 3.300 2.965 2.98062  3.020  3.0397  3.0  3.219  3.025
## CV residual 0.0512 0.424 0.128 0.00531 -0.111 -0.0508 -0.1 -0.112 -0.147
##               101    133    134     141    153    172    175   181    197
## Predicted   3.195 3.0401 2.8464  2.6925  3.006  3.121  3.219 3.502  3.607
## cvpred      3.191 3.0394 2.8448  2.6851  3.020  3.123  3.220 3.497  3.609
## log(medv)   3.314 3.1355 2.9124  2.6391  2.728  2.950  3.118 3.684  3.506
## CV residual 0.123 0.0961 0.0675 -0.0461 -0.292 -0.173 -0.102 0.187 -0.104
##               205    207  222     235    243    248   265    269     277
## Predicted   3.760 3.1161 2.96  3.3936 3.0911 2.9962 3.503 3.7322  3.5333
## cvpred      3.758 3.1139 2.96  3.3948 3.0872 2.9943 3.504 3.7370  3.5351
## log(medv)   3.912 3.1946 3.08  3.3673 3.1001 3.0204 3.597 3.7728  3.5025
## CV residual 0.154 0.0807 0.12 -0.0276 0.0129 0.0261 0.093 0.0358 -0.0325
##               283    309     315   318     319     327    341    344
## Predicted   3.766  3.411  3.2110 2.885  3.1525  3.2181  3.054  3.281
## cvpred      3.770  3.416  3.2108 2.883  3.1520  3.2217  3.056  3.282
## log(medv)   3.829  3.127  3.1697 2.986  3.1398  3.1355  2.929  3.174
## CV residual 0.059 -0.289 -0.0412 0.103 -0.0122 -0.0862 -0.127 -0.108
##                346   362   363     377    378    399    406   414   415
## Predicted    2.968 2.839 2.885  2.6417  2.768  2.161  2.058 2.369 1.762
## cvpred       2.968 2.844 2.898  2.6563  2.777  2.220  2.163 2.399 1.818
## log(medv)    2.862 2.991 3.035  2.6319  2.588  1.609  1.609 2.791 1.946
## CV residual -0.105 0.147 0.137 -0.0244 -0.189 -0.611 -0.554 0.392 0.128
##                435   450    452    461   466     498
## Predicted    2.628  2.69  2.800 2.7915 2.807  2.9397
## cvpred       2.629  2.70  2.802 2.7860 2.809  2.9390
## log(medv)    2.460  2.56  2.721 2.7973 2.991  2.9069
## CV residual -0.169 -0.13 -0.081 0.0113 0.182 -0.0321
## 
## Sum of squares = 1.68    Mean square = 0.03    n = 50 
## 
## fold 5 
## Observations in test set: 50 
##                  2      17      21      41     42      48    56   64
## Predicted    3.205 3.11395  2.6397  3.5513  3.385  2.8346 3.359 3.07
## cvpred       3.209 3.13224  2.6394  3.5642  3.392  2.8243 3.352 3.06
## log(medv)    3.073 3.13983  2.6101  3.5525  3.281  2.8094 3.567 3.22
## CV residual -0.137 0.00759 -0.0293 -0.0117 -0.111 -0.0149 0.214 0.16
##                 78      86     92    151    159   185     188     195
## Predicted    3.148  3.3350  3.298 2.9983  3.380 3.007  3.4793  3.4544
## cvpred       3.156  3.3442  3.309 3.0049  3.403 3.017  3.4869  3.4626
## log(medv)    3.035  3.2809  3.091 3.0681  3.190 3.273  3.4657  3.3707
## CV residual -0.121 -0.0633 -0.218 0.0631 -0.213 0.257 -0.0211 -0.0919
##               204   209   212   249    250     255    259   262    264
## Predicted   3.700 3.053 2.762 3.052 3.2054  3.1797 3.5283 3.565 3.4008
## cvpred      3.698 3.036 2.742 3.048 3.2037  3.1794 3.5318 3.567 3.3965
## log(medv)   3.882 3.195 2.960 3.199 3.2658  3.0865 3.5835 3.764 3.4340
## CV residual 0.184 0.158 0.219 0.151 0.0621 -0.0929 0.0517 0.197 0.0375
##                 272      280    291    299     308    310    311   312
## Predicted    3.2760  3.54963  3.466  3.358  3.3652  3.133  2.902  3.31
## cvpred       3.2896  3.56006  3.479  3.362  3.3734  3.147  2.925  3.34
## log(medv)    3.2268  3.55820  3.350  3.114  3.3393  3.011  2.779  3.10
## CV residual -0.0627 -0.00186 -0.129 -0.249 -0.0341 -0.136 -0.146 -0.24
##                314    329   333     338    350    379    382    393    404
## Predicted    3.237  3.124  3.17  2.9696 3.2040 2.5394  2.682  2.414  2.470
## cvpred       3.251  3.131  3.17  2.9756 3.1987 2.5411  2.681  2.423  2.487
## log(medv)    3.073  2.960  2.97  2.9178 3.2809 2.5726  2.389  2.272  2.116
## CV residual -0.178 -0.171 -0.21 -0.0579 0.0822 0.0315 -0.293 -0.151 -0.371
##                 428    437     440     447     451    484    503    505
## Predicted    2.4281  2.534 2.54047  2.7325  2.6424 2.9819  3.124  3.297
## cvpred       2.4552  2.543 2.54506  2.7351  2.6464 2.9953  3.141  3.312
## log(medv)    2.3888  2.262 2.54945  2.7014  2.5953 3.0819  3.025  3.091
## CV residual -0.0664 -0.282 0.00439 -0.0337 -0.0511 0.0866 -0.115 -0.221
## 
## Sum of squares = 1.18    Mean square = 0.02    n = 50 
## 
## fold 6 
## Observations in test set: 50 
##                  18      31      37     43     53      71     76       79
## Predicted   2.85363  2.5978  3.0491  3.272  3.338  3.2616  3.196  3.05428
## cvpred      2.85452  2.5927  3.0585  3.280  3.340  3.2665  3.201  3.05594
## log(medv)   2.86220  2.5416  2.9957  3.231  3.219  3.1864  3.063  3.05400
## CV residual 0.00768 -0.0511 -0.0627 -0.049 -0.121 -0.0801 -0.138 -0.00194
##                  82      84     89     114    116   123   124   142    143
## Predicted    3.2684  3.2000  3.464  2.9636  2.961 2.878 2.694 2.395  2.698
## cvpred       3.2696  3.2074  3.463  2.9641  2.964 2.879 2.687 2.388  2.697
## log(medv)    3.1739  3.1311  3.161  2.9285  2.907 3.020 2.851 2.667  2.595
## CV residual -0.0957 -0.0762 -0.302 -0.0356 -0.057 0.142 0.164 0.279 -0.101
##               154    164    171  187    199    211    220     232    233
## Predicted   2.840 3.8757  3.026 3.61 3.5023 2.9840  3.301  3.5047 3.7286
## cvpred      2.840 3.8612  3.031 3.60 3.4937 2.9882  3.307  3.4975 3.7104
## log(medv)   2.965 3.9120  2.856 3.91 3.5439 3.0773  3.135  3.4563 3.7305
## CV residual 0.125 0.0508 -0.174 0.31 0.0501 0.0891 -0.172 -0.0412 0.0201
##               246   254   263   268     273    302    304     322    323
## Predicted   2.676 3.474 3.718 3.676  3.2881  3.262 3.4833  3.2326  3.152
## cvpred      2.675 3.448 3.697 3.656  3.2918  3.262 3.4821  3.2374  3.160
## log(medv)   2.918 3.757 3.888 3.912  3.1946  3.091 3.4995  3.1398  3.016
## CV residual 0.243 0.308 0.191 0.256 -0.0972 -0.171 0.0174 -0.0976 -0.145
##               330    336    357    380    388   412   421     432   436
## Predicted    3.27  3.065 2.7979  2.612  2.223 2.575 2.797  2.6756 2.492
## cvpred       3.28  3.070 2.7967  2.610  2.226 2.558 2.793  2.6569 2.469
## log(medv)    3.12  3.049 2.8792  2.322  2.001 2.845 2.815  2.6462 2.595
## CV residual -0.16 -0.021 0.0825 -0.288 -0.225 0.287 0.022 -0.0108 0.126
##                 445      454      464     476     489    494
## Predicted    2.4419 2.898749  3.01081  2.6256  2.7860 3.0162
## cvpred       2.4342 2.878330  3.00950  2.6153  2.7936 3.0270
## log(medv)    2.3795 2.879198  3.00568  2.5878  2.7213 3.0819
## CV residual -0.0546 0.000869 -0.00382 -0.0276 -0.0723 0.0549
## 
## Sum of squares = 1.08    Mean square = 0.02    n = 50 
## 
## fold 7 
## Observations in test set: 49 
##                  12      14      25     46     67     69      83    111
## Predicted    3.0179  3.0528  2.7961  3.078  3.113  2.908  3.2495 3.0084
## cvpred       3.0263  3.0561  2.7950  3.081  3.119  2.911  3.2495 3.0047
## log(medv)    2.9392  3.0155  2.7473  2.960  2.965  2.856  3.2108 3.0773
## CV residual -0.0871 -0.0406 -0.0477 -0.121 -0.153 -0.055 -0.0387 0.0726
##                135    136    140     145   146   158    161    178   180
## Predicted   2.7127 2.8678 2.8227  2.5120 2.581 3.561  3.520  3.377 3.502
## cvpred      2.7136 2.8622 2.8137  2.4859 2.568 3.559  3.520  3.376 3.495
## log(medv)   2.7473 2.8959 2.8792  2.4681 2.625 3.721  3.296  3.203 3.616
## CV residual 0.0337 0.0338 0.0655 -0.0178 0.057 0.162 -0.224 -0.173 0.121
##               186    193     194   196   200   210    219     223    240
## Predicted   3.087 3.5774  3.4597 3.724 3.403 2.765 3.0414  3.3716  3.297
## cvpred      3.085 3.5772  3.4603 3.720 3.407 2.748 3.0353  3.3697  3.301
## log(medv)   3.388 3.5946  3.4372 3.912 3.552 2.996 3.0681  3.3142  3.148
## CV residual 0.303 0.0174 -0.0231 0.192 0.146 0.247 0.0327 -0.0555 -0.152
##                 241     251    282    284     293    301     334    335
## Predicted    3.1869  3.2210 3.5349 3.8372  3.3816  3.389  3.1669  3.124
## cvpred       3.1908  3.2259 3.5340 3.8329  3.3779  3.395  3.1725  3.131
## log(medv)    3.0910  3.1946 3.5667 3.9120  3.3286  3.211  3.1001  3.030
## CV residual -0.0997 -0.0313 0.0327 0.0791 -0.0493 -0.185 -0.0724 -0.101
##               342    351   374   387    389    396   409    425    438
## Predicted   3.447  3.149 2.312 2.219 2.2853  2.817 2.533  2.563  2.296
## cvpred      3.451  3.157 2.252 2.193 2.2485  2.814 2.508  2.574  2.290
## log(medv)   3.487  3.131 2.625 2.351 2.3224  2.573 2.845  2.460  2.163
## CV residual 0.036 -0.026 0.373 0.159 0.0739 -0.242 0.337 -0.115 -0.127
##                  456     457   473     475    486    493
## Predicted   2.633363 2.51941 2.951  2.6765 3.0125 2.9782
## cvpred      2.645220 2.53263 2.950  2.6733 3.0166 2.9766
## log(medv)   2.646175 2.54160 3.144  2.6247 3.0540 3.0007
## CV residual 0.000955 0.00898 0.194 -0.0486 0.0374 0.0241
## 
## Sum of squares = 0.92    Mean square = 0.02    n = 49 
## 
## fold 8 
## Observations in test set: 49 
##                  4     9      26       27      29     34     73    103
## Predicted   3.4541 2.587  2.7047  2.80794  2.9668  2.716  3.269  2.962
## cvpred      3.4579 2.571  2.7068  2.81056  2.9757  2.715  3.263  2.974
## log(medv)   3.5086 2.803  2.6319  2.80940  2.9124  2.573  3.127  2.923
## CV residual 0.0507 0.232 -0.0749 -0.00116 -0.0633 -0.142 -0.136 -0.051
##                 113    115      128   149     169  173    174     176
## Predicted    2.9740  3.198  2.79680 2.540  3.1928 3.02  3.312  3.4481
## cvpred       2.9662  3.193  2.79236 2.518  3.1870 3.01  3.305  3.4402
## log(medv)    2.9339  2.918  2.78501 2.879  3.1697 3.14  3.161  3.3810
## CV residual -0.0323 -0.275 -0.00735 0.361 -0.0173 0.13 -0.143 -0.0592
##                189    221     242    245     252     256   257     276
## Predicted    3.500  3.407  3.0568 2.8417  3.2989  3.0527 3.605  3.5428
## cvpred       3.492  3.421  3.0561 2.8445  3.3013  3.0542 3.610  3.5379
## log(medv)    3.395  3.285  3.0007 2.8679  3.2108  3.0397 3.784  3.4657
## CV residual -0.097 -0.136 -0.0554 0.0234 -0.0905 -0.0145 0.175 -0.0721
##               281   292   305    307    325   331    343     353     364
## Predicted   3.692 3.491 3.402 3.4784  3.258  3.16  3.218  2.9814  2.8721
## cvpred      3.696 3.495 3.408 3.4833  3.258  3.16  3.221  2.9894  2.8848
## log(medv)   3.816 3.619 3.586 3.5086  3.219  2.99  2.803  2.9232  2.8214
## CV residual 0.119 0.124 0.179 0.0252 -0.039 -0.17 -0.418 -0.0662 -0.0634
##               365   371    383    395   407   408    422    429     459
## Predicted    3.60 3.539  2.545  2.726 2.300 2.839  2.763  2.543  2.7228
## cvpred       3.64 3.548  2.533  2.721 2.269 2.830  2.765  2.551  2.7313
## log(medv)    3.09 3.912  2.425  2.542 2.477 3.329  2.653  2.398  2.7014
## CV residual -0.55 0.364 -0.108 -0.179 0.208 0.498 -0.111 -0.153 -0.0299
##               470   471   480     483   495  496     500
## Predicted   2.771 2.851 2.884  3.2662 2.991 2.83  2.9010
## cvpred      2.763 2.851 2.880  3.2716 2.986 2.82  2.8925
## log(medv)   3.001 2.991 3.063  3.2189 3.199 3.14  2.8622
## CV residual 0.238 0.139 0.183 -0.0527 0.212 0.32 -0.0303
## 
## Sum of squares = 1.78    Mean square = 0.04    n = 49 
## 
## fold 9 
## Observations in test set: 49 
##                 36    40    49     58     60     61      62     88     91
## Predicted    3.130 3.400 2.515 3.4354  3.017 2.8487  2.8433  3.228  3.276
## cvpred       3.108 3.385 2.562 3.4355  3.005 2.8467  2.8505  3.209  3.264
## log(medv)    2.939 3.428 2.667 3.4532  2.976 2.9285  2.7726  3.100  3.118
## CV residual -0.168 0.043 0.106 0.0177 -0.029 0.0819 -0.0779 -0.109 -0.146
##                  98     105     109    122    126    166     170   183
## Predicted    3.6580  3.0396  3.0799 2.9844 2.9762 3.1715  3.2035 3.535
## cvpred       3.6765  3.0289  3.0776 2.9743 2.9672 3.1444  3.1933 3.527
## log(medv)    3.6558  3.0007  2.9857 3.0106 3.0634 3.2189  3.1046 3.635
## CV residual -0.0206 -0.0281 -0.0919 0.0363 0.0962 0.0745 -0.0887 0.108
##                184   191   208   214    224   226     227   231     238
## Predicted   3.4243 3.452 2.831 3.190 3.3388 3.722  3.6920 3.096  3.5055
## cvpred      3.4035 3.453 2.839 3.180 3.3277 3.756  3.7027 3.083  3.5064
## log(medv)   3.4812 3.611 3.114 3.336 3.4045 3.912  3.6270 3.190  3.4500
## CV residual 0.0777 0.158 0.275 0.155 0.0768 0.156 -0.0757 0.108 -0.0564
##              267    279     288    290     295     349   354   358   366
## Predicted   3.24 3.3332  3.2322 3.1765  3.1604  3.2597 3.196 2.985 2.801
## cvpred      3.26 3.3214  3.2242 3.1866  3.1530  3.2609 3.206 2.962 2.688
## log(medv)   3.42 3.3707  3.1442 3.2108  3.0773  3.1987 3.405 3.077 3.314
## CV residual 0.16 0.0494 -0.0801 0.0242 -0.0756 -0.0623 0.199 0.115 0.626
##               368   370    385    386    390   410    416   423    426
## Predicted   2.485 3.460 2.1362  2.327  2.592 2.700  2.303 2.759  2.328
## cvpred      2.395 3.408 2.1212  2.342  2.581 2.719  2.335 2.729  2.329
## log(medv)   3.140 3.912 2.1748  1.974  2.442 3.314  1.974 3.035  2.116
## CV residual 0.744 0.504 0.0536 -0.368 -0.139 0.595 -0.361 0.306 -0.213
##                430     434   442   463     502
## Predicted    2.474  2.6957 2.697 2.862  3.1508
## cvpred       2.492  2.6901 2.708 2.855  3.1404
## log(medv)    2.251  2.6603 2.839 2.970  3.1091
## CV residual -0.241 -0.0298 0.131 0.116 -0.0314
## 
## Sum of squares = 2.48    Mean square = 0.05    n = 49 
## 
## fold 10 
## Observations in test set: 49 
##                  16     23      28     30     32     33       54       57
## Predicted    3.0384  2.770  2.7383 3.0180  2.908 2.4745  3.15424  3.21154
## cvpred       3.0396  2.773  2.7482 3.0170  2.911 2.4885  3.15713  3.20868
## log(medv)    2.9907  2.721  2.6946 3.0445  2.674 2.5802  3.15274  3.20680
## CV residual -0.0489 -0.052 -0.0536 0.0275 -0.237 0.0917 -0.00439 -0.00188
##                  63     90     93      96    119   127     130     132
## Predicted    3.1628  3.464  3.283  3.3661 2.9621 2.631  2.7483  2.9985
## cvpred       3.1593  3.458  3.283  3.3663 2.9738 2.642  2.7555  2.9987
## log(medv)    3.1001  3.357  3.131  3.3464 3.0155 2.754  2.6603  2.9755
## CV residual -0.0592 -0.101 -0.152 -0.0199 0.0417 0.112 -0.0952 -0.0232
##                138    156   162    163     177     179     192    201
## Predicted    2.963  2.907 3.748 3.8679  3.1898  3.4328  3.4481 3.4216
## cvpred       2.965  2.938 3.737 3.8613  3.1949  3.4304  3.4444 3.4143
## log(medv)    2.839  2.747 3.912 3.9120  3.1442  3.3979  3.4177 3.4935
## CV residual -0.126 -0.191 0.175 0.0507 -0.0508 -0.0325 -0.0266 0.0792
##                 217    218   225   229  234    237   247     260     266
## Predicted    3.1525 3.2715 3.689 3.606 3.66  3.322 3.022  3.5017  3.2159
## cvpred       3.1683 3.2724 3.675 3.595 3.65  3.329 3.023  3.5006  3.2257
## log(medv)    3.1485 3.3569 3.802 3.844 3.88  3.223 3.190  3.4045  3.1268
## CV residual -0.0198 0.0845 0.127 0.249 0.23 -0.106 0.167 -0.0961 -0.0989
##                271    286      303    316     348    356   381     384
## Predicted   2.9978  3.268  3.27430  3.017  3.1998 2.9567 2.187  2.5356
## cvpred      3.0044  3.269  3.27595  3.023  3.1967 2.9544 2.098  2.5367
## log(medv)   3.0493  3.091  3.27336  2.785  3.1398 3.0253 2.342  2.5096
## CV residual 0.0448 -0.178 -0.00258 -0.238 -0.0569 0.0709 0.244 -0.0271
##                417    444   474    477    478  485    492    501
## Predicted    2.463 2.7253 3.082  2.830 2.4388 2.88  2.847  2.977
## cvpred       2.478 2.7189 3.077  2.828 2.4375 2.89  2.854  2.982
## log(medv)    2.015 2.7344 3.395  2.815 2.4849 3.03  2.610  2.821
## CV residual -0.463 0.0155 0.318 -0.013 0.0475 0.14 -0.243 -0.161
## 
## Sum of squares = 1.01    Mean square = 0.02    n = 49 
## 
## Overall (Sum over all 49 folds) 
##     ms 
## 0.0276
## Analysis of Variance Table
## 
## Response: log(medv)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## crim         1  22.42   22.42  863.08 < 2e-16 ***
## chas         1   1.03    1.03   39.74 6.5e-10 ***
## nox          1   9.25    9.25  356.02 < 2e-16 ***
## rm           1  18.40   18.40  708.67 < 2e-16 ***
## dis          1   0.35    0.35   13.66 0.00024 ***
## ptratio      1   3.68    3.68  141.63 < 2e-16 ***
## b            1   1.62    1.62   62.41 1.9e-14 ***
## lstat        1   6.93    6.93  267.00 < 2e-16 ***
## I(lstat^2)   1   0.42    0.42   16.24 6.5e-05 ***
## I(lstat^3)   1   0.02    0.02    0.93 0.33438    
## Residuals  484  12.57    0.03                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## fold 1 
## Observations in test set: 49 
##                 5      24      38      44      45     59       72    80
## Predicted   3.375  2.7051  3.1281  3.2170  3.1179 3.0933  3.08201  3.14
## cvpred      3.374  2.6969  3.1354  3.2204  3.1192 3.0919  3.08649  3.15
## log(medv)   3.589  2.6741  3.0445  3.2068  3.0540 3.1485  3.07731  3.01
## CV residual 0.215 -0.0227 -0.0909 -0.0136 -0.0652 0.0565 -0.00918 -0.14
##                  97     100    102     104   106    118    131   148
## Predicted    3.1518 3.49564 3.2578  2.9861 2.890  3.155  3.010 2.511
## cvpred       3.1563 3.49941 3.2630  2.9873 2.891  3.160  3.008 2.510
## log(medv)    3.0634 3.50255 3.2771  2.9601 2.970  2.955  2.955 2.681
## CV residual -0.0929 0.00314 0.0142 -0.0272 0.079 -0.205 -0.053 0.171
##                150    157   168      190    202   213    228    275   285
## Predicted   2.7298  2.691 3.060  3.55107  3.305 3.008 3.4430  3.614 3.348
## cvpred      2.7179  2.691 3.066  3.55449  3.309 3.009 3.4475  3.626 3.341
## log(medv)   2.7344  2.573 3.170  3.55249  3.182 3.109 3.4532  3.478 3.472
## CV residual 0.0165 -0.119 0.104 -0.00201 -0.126 0.101 0.0057 -0.148 0.131
##               287     294     297    298     300    317     321   328
## Predicted   2.895  3.2364  3.3130 2.9309  3.4572 2.8381  3.2237 2.950
## cvpred      2.884  3.2395  3.3138 2.9233  3.4549 2.8303  3.2279 2.946
## log(medv)   3.001  3.1739  3.2995 3.0106  3.3673 2.8792  3.1697 3.100
## CV residual 0.117 -0.0656 -0.0143 0.0873 -0.0876 0.0489 -0.0582 0.155
##                332     337     339    345    352   392    441   443    446
## Predicted    2.981  3.0047  3.0903 3.3760 3.1463 2.730  2.474 2.795 2.4409
## cvpred       2.978  3.0058  3.0934 3.3757 3.1374 2.720  2.460 2.785 2.4340
## log(medv)    2.839  2.9704  3.0253 3.4404 3.1822 3.144  2.351 2.912 2.4681
## CV residual -0.138 -0.0354 -0.0681 0.0647 0.0448 0.424 -0.109 0.127 0.0341
##                 453   468   469      472     482    491    504
## Predicted   2.78667 2.695 2.667  2.98112  3.1973  2.467  3.363
## cvpred      2.77519 2.688 2.659  2.98014  3.1997  2.477  3.369
## log(medv)   2.77882 2.950 2.950  2.97553  3.1655  2.092  3.174
## CV residual 0.00362 0.261 0.291 -0.00461 -0.0343 -0.385 -0.196
## 
## Sum of squares = 0.91    Mean square = 0.02    n = 49 
## 
## fold 2 
## Observations in test set: 50 
##                  6     10     11      15      20    22    39     47   65
## Predicted   3.2951 2.8871  2.861  2.9951  2.9469 2.886 3.093 2.9613 3.16
## cvpred      3.2925 2.8841  2.857  2.9975  2.9490 2.888 3.093 2.9585 3.16
## log(medv)   3.3569 2.9392  2.708  2.9014  2.9014 2.976 3.207 2.9957 3.50
## CV residual 0.0644 0.0551 -0.149 -0.0961 -0.0476 0.088 0.113 0.0372 0.34
##                 66      74     75      81     87   107     120    125
## Predicted    3.393  3.2033  3.290  3.3601 3.0350 2.829  3.0114 2.8858
## cvpred       3.386  3.2005  3.287  3.3571 3.0333 2.830  3.0118 2.8871
## log(medv)    3.157  3.1527  3.182  3.3322 3.1135 2.970  2.9601 2.9339
## CV residual -0.229 -0.0478 -0.105 -0.0249 0.0802 0.141 -0.0517 0.0468
##               144    155   167   182    198    206    216    253    261
## Predicted   2.625  3.025 3.712 3.237 3.3811 3.0802 3.1615 3.3224 3.4341
## cvpred      2.628  3.039 3.711 3.236 3.3745 3.0798 3.1612 3.3180 3.4359
## log(medv)   2.747  2.833 3.912 3.589 3.4111 3.1179 3.2189 3.3878 3.5205
## CV residual 0.119 -0.206 0.201 0.353 0.0366 0.0382 0.0576 0.0697 0.0845
##                274     278    289     296    306     324     347    355
## Predicted   3.5326  3.5604  3.225  3.3755 3.2702  2.9633  2.8690 2.8305
## cvpred      3.5393  3.5636  3.220  3.3713 3.2698  2.9633  2.8666 2.8288
## log(medv)   3.5610  3.4995  3.105  3.3534 3.3464  2.9178  2.8449 2.9014
## CV residual 0.0217 -0.0641 -0.116 -0.0179 0.0766 -0.0456 -0.0217 0.0727
##                376     391    394    403      405   418   419   424    427
## Predicted    2.951  2.7336  2.837  2.719  2.16288 2.200 1.920 2.489  2.620
## cvpred       2.949  2.7362  2.840  2.720  2.14691 2.188 1.887 2.484  2.613
## log(medv)    2.708  2.7147  2.625  2.493  2.14007 2.342 2.175 2.595  2.322
## CV residual -0.241 -0.0215 -0.215 -0.226 -0.00684 0.154 0.288 0.111 -0.291
##                433    439     449    458      462    488    499
## Predicted    2.876  2.172  2.7237 2.5285  2.87502 2.9504 3.0196
## cvpred       2.875  2.173  2.7257 2.5273  2.88080 2.9517 3.0225
## log(medv)    2.779  2.128  2.6462 2.6027  2.87356 3.0253 3.0540
## CV residual -0.096 -0.045 -0.0795 0.0754 -0.00723 0.0736 0.0315
## 
## Sum of squares = 0.94    Mean square = 0.02    n = 50 
## 
## fold 3 
## Observations in test set: 50 
##                  1      7   13    19      35    55     68      85    99
## Predicted    3.462 3.0666 2.96 2.852  2.6656 2.793  3.113 3.17450 3.659
## cvpred       3.461 3.0617 2.96 2.833  2.6719 2.792  3.105 3.17192 3.675
## log(medv)    3.178 3.1311 3.08 3.006  2.6027 2.939  3.091 3.17388 3.780
## CV residual -0.283 0.0694 0.12 0.172 -0.0692 0.147 -0.014 0.00196 0.105
##                108    110    112     117   121     129    137    139
## Predicted   2.9857 2.9403  3.245  3.1147 2.967  2.9368 2.8240  2.726
## cvpred      2.9847 2.9433  3.244  3.1106 2.963  2.9378 2.8231  2.739
## log(medv)   3.0155 2.9653  3.127  3.0540 3.091  2.8904 2.8565  2.588
## CV residual 0.0309 0.0219 -0.117 -0.0566 0.128 -0.0474 0.0333 -0.151
##                 147    152     160     165    203     230    236   239
## Predicted   2.76816 2.9223  3.2632  3.1487 3.6674  3.4851 3.1301  3.32
## cvpred      2.74609 2.8989  3.2456  3.1395 3.6825  3.4849 3.1237  3.32
## log(medv)   2.74727 2.9755  3.1485  3.1224 3.7448  3.4500 3.1781  3.17
## CV residual 0.00118 0.0767 -0.0972 -0.0171 0.0623 -0.0349 0.0543 -0.15
##                244    258     270    313    320     326     340   359
## Predicted    3.323 3.7997  3.0928  3.087 3.0181  3.2790  3.0363 2.984
## cvpred       3.320 3.8162  3.0868  3.080 3.0134  3.2743  3.0263 2.962
## log(medv)    3.165 3.9120  3.0301  2.965 3.0445  3.2027  2.9444 3.122
## CV residual -0.155 0.0958 -0.0567 -0.115 0.0312 -0.0716 -0.0819 0.161
##               360   361  367    397    400  411    420      431    448
## Predicted   2.871 3.066 2.72  2.783  2.440 2.44  2.521  2.68738  2.754
## cvpred      2.855 3.047 2.70  2.790  2.472 2.39  2.522  2.67648  2.747
## log(medv)   3.118 3.219 3.09  2.526  1.841 2.71  2.128  2.67415  2.534
## CV residual 0.262 0.171 0.39 -0.264 -0.631 0.32 -0.394 -0.00233 -0.213
##               455   460   465   467     479  481   487   497
## Predicted   2.577 2.808 2.878 2.599  2.7636 3.02 2.846 2.715
## cvpred      2.563 2.797 2.867 2.579  2.7645 3.02 2.841 2.724
## log(medv)   2.701 2.996 3.063 2.944  2.6810 3.14 2.950 2.981
## CV residual 0.138 0.199 0.197 0.366 -0.0835 0.12 0.109 0.256
## 
## Sum of squares = 1.71    Mean square = 0.03    n = 50 
## 
## fold 4 
## Observations in test set: 50 
##                  3     8    50      51     52      70      77     94    95
## Predicted   3.5007 2.885 2.844 2.97542  3.127  3.0874  3.0976  3.331  3.17
## cvpred      3.5051 2.887 2.844 2.97265  3.124  3.0858  3.0892  3.333  3.16
## log(medv)   3.5467 3.300 2.965 2.98062  3.020  3.0397  2.9957  3.219  3.03
## CV residual 0.0417 0.413 0.121 0.00796 -0.104 -0.0461 -0.0935 -0.114 -0.14
##               101   133    134     141    153    172     175  181    197
## Predicted   3.187 3.033 2.8463  2.7026  3.003  3.117  3.2162 3.49  3.613
## cvpred      3.180 3.029 2.8455  2.7014  3.014  3.117  3.2148 3.48  3.619
## log(medv)   3.314 3.135 2.9124  2.6391  2.728  2.950  3.1179 3.68  3.506
## CV residual 0.134 0.106 0.0668 -0.0624 -0.286 -0.168 -0.0969 0.20 -0.113
##               205    207   222     235    243   248   265    269     277
## Predicted   3.770 3.1103 2.968  3.3859 3.0851 2.990 3.495 3.7426  3.5284
## cvpred      3.774 3.1054 2.970  3.3826 3.0785 2.985 3.493 3.7532  3.5274
## log(medv)   3.912 3.1946 3.077  3.3673 3.1001 3.020 3.597 3.7728  3.5025
## CV residual 0.138 0.0892 0.107 -0.0153 0.0216 0.035 0.104 0.0196 -0.0249
##                283    309     315    318      319     327    341    344
## Predicted   3.7745  3.417  3.2044 2.8869  3.14621  3.2190  3.051  3.277
## cvpred      3.7828  3.425  3.2010 2.8870  3.14275  3.2220  3.049  3.276
## log(medv)   3.8286  3.127  3.1697 2.9857  3.13983  3.1355  2.929  3.174
## CV residual 0.0459 -0.298 -0.0313 0.0987 -0.00292 -0.0865 -0.121 -0.102
##                 346   362   363    377    378    399    406   414  415
## Predicted    2.9635 2.835 2.882  2.649  2.773  2.156  2.068 2.380 1.70
## cvpred       2.9601 2.838 2.892  2.668  2.786  2.205  2.175 2.414 1.70
## log(medv)    2.8622 2.991 3.035  2.632  2.588  1.609  1.609 2.791 1.95
## CV residual -0.0979 0.153 0.142 -0.036 -0.199 -0.596 -0.566 0.378 0.25
##                435    450     452    461   466     498
## Predicted    2.625  2.697  2.8002 2.7884 2.806  2.9387
## cvpred       2.622  2.702  2.8040 2.7824 2.806  2.9382
## log(medv)    2.460  2.565  2.7213 2.7973 2.991  2.9069
## CV residual -0.162 -0.137 -0.0827 0.0149 0.184 -0.0313
## 
## Sum of squares = 1.7    Mean square = 0.03    n = 50 
## 
## fold 5 
## Observations in test set: 50 
##                  2      17      21      41     42      48    56    64
## Predicted    3.200 3.11509  2.6510  3.5726  3.389  2.8409 3.361 3.060
## cvpred       3.201 3.13459  2.6573  3.5971  3.399  2.8340 3.355 3.047
## log(medv)    3.073 3.13983  2.6101  3.5525  3.281  2.8094 3.567 3.219
## CV residual -0.128 0.00524 -0.0472 -0.0447 -0.118 -0.0246 0.211 0.172
##                 78      86     92    151    159   185     188    195   204
## Predicted    3.144  3.3337  3.294 2.9942  3.381 3.007  3.4772  3.462 3.705
## cvpred       3.150  3.3426  3.303 2.9974  3.405 3.017  3.4837  3.475 3.705
## log(medv)    3.035  3.2809  3.091 3.0681  3.190 3.273  3.4657  3.371 3.882
## CV residual -0.115 -0.0617 -0.212 0.0706 -0.214 0.256 -0.0179 -0.104 0.177
##               209   212   249    250     255    259  262    264     272
## Predicted   3.050 2.774 3.046 3.2030  3.1804 3.5200 3.56 3.3899  3.2764
## cvpred      3.032 2.762 3.038 3.2004  3.1807 3.5181 3.55 3.3787  3.2905
## log(medv)   3.195 2.960 3.199 3.2658  3.0865 3.5835 3.76 3.4340  3.2268
## CV residual 0.163 0.198 0.161 0.0654 -0.0942 0.0655 0.21 0.0553 -0.0637
##                 280    291    299     308   310    311    312    314
## Predicted    3.5543  3.479  3.364  3.3604  3.13  2.903  3.317  3.234
## cvpred       3.5671  3.499  3.371  3.3662  3.14  2.927  3.340  3.246
## log(medv)    3.5582  3.350  3.114  3.3393  3.01  2.779  3.096  3.073
## CV residual -0.0089 -0.149 -0.258 -0.0269 -0.13 -0.148 -0.245 -0.174
##                329    333     338    350    379    382    393    404
## Predicted    3.120  3.167  2.9653 3.2026 2.5478  2.688  2.428  2.480
## cvpred       3.126  3.172  2.9694 3.1968 2.5547  2.691  2.446  2.503
## log(medv)    2.960  2.965  2.9178 3.2809 2.5726  2.389  2.272  2.116
## CV residual -0.166 -0.207 -0.0516 0.0841 0.0179 -0.302 -0.174 -0.386
##                 428   437     440     447     451    484    503    505
## Predicted    2.4232  2.53  2.5521  2.7338  2.6401 2.9784  3.119  3.295
## cvpred       2.4470  2.54  2.5632  2.7365  2.6416 2.9904  3.134  3.308
## log(medv)    2.3888  2.26  2.5494  2.7014  2.5953 3.0819  3.025  3.091
## CV residual -0.0582 -0.28 -0.0138 -0.0352 -0.0464 0.0915 -0.109 -0.217
## 
## Sum of squares = 1.21    Mean square = 0.02    n = 50 
## 
## fold 6 
## Observations in test set: 50 
##                18      31      37      43     53      71     76      79
## Predicted   2.852  2.6097  3.0458  3.2747  3.341  3.2608  3.191 3.04973
## cvpred      2.853  2.6084  3.0541  3.2835  3.345  3.2652  3.195 3.05034
## log(medv)   2.862  2.5416  2.9957  3.2308  3.219  3.1864  3.063 3.05400
## CV residual 0.009 -0.0668 -0.0584 -0.0527 -0.126 -0.0788 -0.132 0.00366
##                  82      84     89     114     116   123   124   142
## Predicted    3.2652  3.1983  3.464  2.9666  2.9618 2.883 2.706 2.363
## cvpred       3.2653  3.2050  3.463  2.9681  2.9657 2.885 2.701 2.343
## log(medv)    3.1739  3.1311  3.161  2.9285  2.9069 3.020 2.851 2.667
## CV residual -0.0915 -0.0739 -0.302 -0.0396 -0.0588 0.136 0.149 0.324
##                143   154    164    171   187    199   211    220   232
## Predicted    2.706 2.841 3.8784  3.025 3.615 3.4977 2.986  3.293  3.50
## cvpred       2.706 2.841 3.8644  3.030 3.604 3.4880 2.991  3.297  3.50
## log(medv)    2.595 2.965 3.9120  2.856 3.912 3.5439 3.077  3.135  3.46
## CV residual -0.111 0.125 0.0476 -0.173 0.309 0.0559 0.086 -0.161 -0.04
##                 233   246   254   263   268     273    302    304     322
## Predicted   3.73953 2.683 3.477 3.710 3.665  3.2843  3.256 3.4868  3.2312
## cvpred      3.72405 2.685 3.453 3.686 3.641  3.2865  3.254 3.4862  3.2352
## log(medv)   3.73050 2.918 3.757 3.888 3.912  3.1946  3.091 3.4995  3.1398
## CV residual 0.00645 0.233 0.304 0.202 0.271 -0.0919 -0.163 0.0133 -0.0954
##                323    330     336    357    380    388   412    421    432
## Predicted    3.150  3.271  3.0622 2.7973  2.620  2.211 2.580 2.7929  2.678
## cvpred       3.158  3.275  3.0665 2.7965  2.621  2.209 2.565 2.7888  2.661
## log(medv)    3.016  3.118  3.0493 2.8792  2.322  2.001 2.845 2.8154  2.646
## CV residual -0.142 -0.157 -0.0173 0.0827 -0.298 -0.208 0.279 0.0266 -0.015
##               436    445     454     464     476     489    494
## Predicted   2.498  2.452 2.89323 3.00266  2.6355  2.7933 3.0135
## cvpred      2.477  2.448 2.87177 2.99886  2.6283  2.8031 3.0233
## log(medv)   2.595  2.380 2.87920 3.00568  2.5878  2.7213 3.0819
## CV residual 0.118 -0.068 0.00743 0.00682 -0.0406 -0.0818 0.0586
## 
## Sum of squares = 1.1    Mean square = 0.02    n = 50 
## 
## fold 7 
## Observations in test set: 49 
##                  12      14      25     46    67      69      83    111
## Predicted    3.0148  3.0500  2.7979  3.076  3.11  2.9068  3.2492 3.0044
## cvpred       3.0229  3.0527  2.7969  3.078  3.12  2.9105  3.2492 2.9998
## log(medv)    2.9392  3.0155  2.7473  2.960  2.97  2.8565  3.2108 3.0773
## CV residual -0.0838 -0.0371 -0.0497 -0.118 -0.15 -0.0541 -0.0384 0.0775
##               135   136    140     145    146   158    161    178   180
## Predicted   2.717 2.869 2.8273  2.5165 2.5840 3.566  3.521  3.378 3.505
## cvpred      2.718 2.863 2.8187  2.4903 2.5715 3.565  3.523  3.377 3.499
## log(medv)   2.747 2.896 2.8792  2.4681 2.6247 3.721  3.296  3.203 3.616
## CV residual 0.029 0.033 0.0605 -0.0222 0.0531 0.156 -0.227 -0.174 0.118
##               186      193     194   196  200   210    219     223    240
## Predicted   3.084 3.591248  3.4634 3.734 3.41 2.778 3.0445  3.3613  3.293
## cvpred      3.081 3.594034  3.4650 3.733 3.41 2.764 3.0396  3.3581  3.297
## log(medv)   3.388 3.594569  3.4372 3.912 3.55 2.996 3.0681  3.3142  3.148
## CV residual 0.307 0.000535 -0.0278 0.179 0.14 0.232 0.0285 -0.0439 -0.148
##                 241     251    282    284     293    301     334    335
## Predicted    3.1785  3.2220 3.5398 3.8439  3.3876  3.388  3.1691  3.123
## cvpred       3.1811  3.2271 3.5403 3.8419  3.3850  3.395  3.1750  3.129
## log(medv)    3.0910  3.1946 3.5667 3.9120  3.3286  3.211  3.1001  3.030
## CV residual -0.0901 -0.0325 0.0264 0.0701 -0.0564 -0.184 -0.0749 -0.099
##                342     351   374   387    389    396   409    425    438
## Predicted   3.4463  3.1491 2.276 2.229 2.2836  2.817 2.544  2.567  2.302
## cvpred      3.4508  3.1576 2.206 2.203 2.2450  2.814 2.520  2.580  2.298
## log(medv)   3.4874  3.1311 2.625 2.351 2.3224  2.573 2.845  2.460  2.163
## CV residual 0.0366 -0.0265 0.418 0.148 0.0774 -0.242 0.325 -0.121 -0.134
##                   456     457   473     475    486    493
## Predicted    2.633632 2.52372 2.947  2.6838 3.0059 2.9752
## cvpred       2.646503 2.53881 2.945  2.6820 3.0087 2.9729
## log(medv)    2.646175 2.54160 3.144  2.6247 3.0540 3.0007
## CV residual -0.000328 0.00279 0.199 -0.0573 0.0453 0.0278
## 
## Sum of squares = 0.93    Mean square = 0.02    n = 49 
## 
## fold 8 
## Observations in test set: 49 
##                  4     9      26      27     29     34     73     103
## Predicted   3.4680 2.586  2.7082  2.8077  2.961  2.722  3.274  2.9548
## cvpred      3.4609 2.571  2.7075  2.8105  2.974  2.716  3.264  2.9727
## log(medv)   3.5086 2.803  2.6319  2.8094  2.912  2.573  3.127  2.9232
## CV residual 0.0477 0.232 -0.0756 -0.0011 -0.062 -0.144 -0.137 -0.0495
##                 113    115      128  149    169  173    174   176     189
## Predicted    2.9762  3.193  2.80125 2.55  3.187 3.03  3.307  3.45  3.5075
## cvpred       2.9666  3.192  2.79330 2.52  3.186 3.01  3.304  3.44  3.4931
## log(medv)    2.9339  2.918  2.78501 2.88  3.170 3.14  3.161  3.38  3.3945
## CV residual -0.0328 -0.274 -0.00829 0.36 -0.016 0.13 -0.142 -0.06 -0.0986
##                221     242    245     252     256   257     276   281
## Predicted    3.396  3.0529 2.8398  3.3112  3.0495 3.617  3.5576 3.697
## cvpred       3.419  3.0553 2.8441  3.3040  3.0536 3.612  3.5411 3.697
## log(medv)    3.285  3.0007 2.8679  3.2108  3.0397 3.784  3.4657 3.816
## CV residual -0.134 -0.0546 0.0238 -0.0932 -0.0138 0.172 -0.0754 0.118
##               292  305    307     325    331    343     353     364    365
## Predicted   3.501 3.40 3.4736  3.2587  3.153  3.212  2.9796  2.8684  3.587
## cvpred      3.497 3.41 3.4824  3.2582  3.155  3.220  2.9891  2.8840  3.634
## log(medv)   3.619 3.59 3.5086  3.2189  2.986  2.803  2.9232  2.8214  3.086
## CV residual 0.122 0.18 0.0262 -0.0393 -0.169 -0.417 -0.0659 -0.0626 -0.547
##               371    383    395   407   408    422    429     459   470
## Predicted   3.550  2.557  2.727 2.320 2.836  2.763  2.550  2.7213 2.771
## cvpred      3.550  2.536  2.721 2.273 2.830  2.764  2.553  2.7309 2.763
## log(medv)   3.912  2.425  2.542 2.477 3.329  2.653  2.398  2.7014 3.001
## CV residual 0.362 -0.111 -0.179 0.203 0.499 -0.111 -0.155 -0.0295 0.237
##               471   480     483   495   496     500
## Predicted   2.851 2.879  3.2609 2.989 2.831  2.9027
## cvpred      2.851 2.879  3.2705 2.986 2.821  2.8929
## log(medv)   2.991 3.063  3.2189 3.199 3.140  2.8622
## CV residual 0.139 0.184 -0.0517 0.213 0.319 -0.0307
## 
## Sum of squares = 1.77    Mean square = 0.04    n = 49 
## 
## fold 9 
## Observations in test set: 49 
##                 36     40    49     58      60     61      62     88
## Predicted    3.127 3.4082 2.511 3.4440  3.0133 2.8467  2.8419  3.225
## cvpred       3.106 3.3878 2.560 3.4390  3.0032 2.8458  2.8498  3.208
## log(medv)    2.939 3.4275 2.667 3.4532  2.9755 2.9285  2.7726  3.100
## CV residual -0.167 0.0397 0.107 0.0142 -0.0277 0.0827 -0.0772 -0.108
##                 91      98     105     109   122    126    166    170
## Predicted    3.271  3.6596  3.0352  3.0740 2.983 2.9755 3.1663  3.197
## cvpred       3.262  3.6770  3.0270  3.0751 2.974 2.9668 3.1425  3.191
## log(medv)    3.118  3.6558  3.0007  2.9857 3.011 3.0634 3.2189  3.105
## CV residual -0.144 -0.0211 -0.0263 -0.0894 0.037 0.0966 0.0764 -0.086
##               183    184   191   208   214    224   226     227   231
## Predicted   3.538 3.4264 3.454 2.838 3.184 3.3347 3.718  3.6999 3.092
## cvpred      3.528 3.4044 3.454 2.841 3.178 3.3261 3.755  3.7058 3.081
## log(medv)   3.635 3.4812 3.611 3.114 3.336 3.4045 3.912  3.6270 3.190
## CV residual 0.107 0.0768 0.157 0.272 0.158 0.0784 0.157 -0.0788 0.109
##                 238   267    279     288    290    295     349   354   358
## Predicted    3.5074 3.237 3.3310  3.2312 3.1700  3.156  3.2596 3.201 2.977
## cvpred       3.5072 3.262 3.3205  3.2238 3.1839  3.151  3.2609 3.208 2.959
## log(medv)    3.4500 3.424 3.3707  3.1442 3.2108  3.077  3.1987 3.405 3.077
## CV residual -0.0572 0.163 0.0502 -0.0797 0.0269 -0.074 -0.0623 0.196 0.119
##               366   368   370    385    386    390   410   416   423
## Predicted   2.812 2.491 3.467 2.1365  2.322  2.604 2.703  2.30 2.758
## cvpred      2.693 2.398 3.411 2.1217  2.340  2.585 2.720  2.33 2.729
## log(medv)   3.314 3.140 3.912 2.1748  1.974  2.442 3.314  1.97 3.035
## CV residual 0.621 0.741 0.501 0.0531 -0.366 -0.143 0.594 -0.36 0.306
##                426    430     434  442   463     502
## Predicted    2.337  2.481  2.6932 2.70 2.857  3.1436
## cvpred       2.333  2.495  2.6892 2.71 2.853  3.1376
## log(medv)    2.116  2.251  2.6603 2.84 2.970  3.1091
## CV residual -0.217 -0.244 -0.0289 0.13 0.117 -0.0285
## 
## Sum of squares = 2.47    Mean square = 0.05    n = 49 
## 
## fold 10 
## Observations in test set: 49 
##                  16      23      28    30     32    33       54       57
## Predicted    3.0359  2.7748  2.7410 3.010  2.904 2.480  3.15152  3.21344
## cvpred       3.0375  2.7776  2.7504 3.011  2.908 2.494  3.15457  3.21013
## log(medv)    2.9907  2.7213  2.6946 3.045  2.674 2.580  3.15274  3.20680
## CV residual -0.0468 -0.0563 -0.0557 0.034 -0.234 0.086 -0.00184 -0.00333
##                  63     90     93      96    119   127    130     132
## Predicted    3.1611  3.464  3.279  3.3647 2.9629 2.640  2.755  2.9926
## cvpred       3.1579  3.458  3.279  3.3653 2.9744 2.650  2.761  2.9938
## log(medv)    3.1001  3.357  3.131  3.3464 3.0155 2.754  2.660  2.9755
## CV residual -0.0578 -0.101 -0.148 -0.0189 0.0411 0.103 -0.101 -0.0183
##                138    156   162    163     177     179     192    201
## Predicted    2.959  2.901 3.769 3.8835  3.1857  3.4293  3.4533 3.4261
## cvpred       2.962  2.933 3.755 3.8749  3.1913  3.4276  3.4489 3.4181
## log(medv)    2.839  2.747 3.912 3.9120  3.1442  3.3979  3.4177 3.4935
## CV residual -0.122 -0.186 0.157 0.0371 -0.0472 -0.0297 -0.0311 0.0754
##                 217    218   225   229   234     237   247    260     266
## Predicted    3.1486 3.2643 3.690 3.610 3.663  3.3126 3.018  3.498  3.2136
## cvpred       3.1645 3.2663 3.676 3.600 3.649  3.3214 3.019  3.497  3.2234
## log(medv)    3.1485 3.3569 3.802 3.844 3.877  3.2229 3.190  3.405  3.1268
## CV residual -0.0161 0.0906 0.126 0.244 0.228 -0.0985 0.171 -0.093 -0.0967
##                271    286     303    316    348    356   381     384
## Predicted   2.9956  3.264 3.26913  3.014  3.199 2.9609 2.183  2.5482
## cvpred      3.0024  3.265 3.27143  3.021  3.196 2.9578 2.093  2.5477
## log(medv)   3.0493  3.091 3.27336  2.785  3.140 3.0253 2.342  2.5096
## CV residual 0.0469 -0.174 0.00193 -0.236 -0.056 0.0675 0.249 -0.0381
##                417    444   474     477    478   485    492    501
## Predicted    2.468 2.7278 3.073  2.8331 2.4524 2.878  2.851  2.975
## cvpred       2.482 2.7210 3.069  2.8311 2.4491 2.883  2.858  2.980
## log(medv)    2.015 2.7344 3.395  2.8154 2.4849 3.025  2.610  2.821
## CV residual -0.467 0.0134 0.326 -0.0157 0.0358 0.143 -0.248 -0.159
## 
## Sum of squares = 1    Mean square = 0.02    n = 49 
## 
## Overall (Sum over all 49 folds) 
##     ms 
## 0.0277

The relevant statistic we gather from the CVlm() and attributes() methods is the Mean Squared Error (MSE). This value represents the mean squared error our model has over a random set of test data sets, simulated by our training data set. Therefore, the smaller our MSE is, the better.

attr(fit4_CV, 'ms')
## [1] 0.0283
attr(fit5_CV, 'ms')
## [1] 0.0276
attr(fit6_CV, 'ms')
## [1] 0.0277

Our MSE for model 4 is 0.0283, model 5 is 0.0276, model 6 is 0.0277, thus model 5 is the best model.

6 Conclusion

With an R-squared value of 0.8379 and a Mean Squared Error of 0.1603, model 5 is the best.