The goal of this project is to find the best linear regression model to estimate the median value of owner-occupied homes in the suburbs of Boston. The data set is from UCI Machine Learning Repository, it has 506 rows and 14 columns. MEDV is the response variable while the other 13 variables are possible predictors.
| Variable Name | Description |
|---|---|
| CRIM | per capita crime rate by town |
| ZN | proportion of residential land zoned for lots over 25,000 sq. ft. |
| INDUS | proportion of non-retail business acres per town |
| CHAS | Charles River dummy variable (= 1 if tract bounds river; 0 otherwise) |
| NOX | nitric oxides concentration (parts per 10 million) |
| RM | average number of rooms per dwelling |
| AGE | proportion of owner-occupied units built prior to 1940 |
| DIS | weighted distances to five Boston employment centers |
| RAD | index of accessibility to radial highways |
| TAX | full-value property-tax rate per $10,000 |
| PTRATIO | pupil-teacher ratio by town |
| B | 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town |
| LSTAT | % lower status of the population |
| MEDV | median value of owner-occupied homes in $1000s |
First we need to install and load packages in R environment. We will need the package usdm to test for multicollinearity, car to diagnose for outliers, MASS to diagnose studentized residuals, DAAG to cross validate our models, lmtest to check for heteroscedasticity, and ggplot2 for graphic plot.
# LOAD PACKAGES
library(usdm) # for testing collinearity
## Loading required package: sp
## Loading required package: raster
library(car) # for testing outliers
##
## Attaching package: 'car'
## The following object is masked from 'package:usdm':
##
## vif
library(MASS) # for testing studentized residuals
##
## Attaching package: 'MASS'
## The following objects are masked from 'package:raster':
##
## area, select
library(DAAG) # for cross validation of model
## Loading required package: lattice
##
## Attaching package: 'DAAG'
## The following object is masked from 'package:MASS':
##
## hills
## The following object is masked from 'package:car':
##
## vif
## The following object is masked from 'package:usdm':
##
## vif
library(lmtest) # for checking homoskedasticity/heteroskedasticity
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(ggplot2) # Use for visuals
Then load the Boston Housing data.
# LOAD DATA
Boston <- read.csv("/Users/zhangyueming/Documents/NEU/2017Spring/IE 7280/Project/Mine/housing.csv",
header = FALSE,
col.names = c("crim", "zn", "indus", "chas", "nox", "rm", "age",
"dis", "rad", "tax", "ptratio", "b", "lstat", "medv")
)
attach(Boston)
Take a look at the structure of this data by using str(Boston). The output shows that we have a total of 14 variables and 506 observations, and also shows the datatype of each variable.
str(Boston)
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : int 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ b : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
Use cor(Boston) to understand the correlation between each individual variable. And we found ‘lstat’ and ‘rm’ have the highest positive correlation with ‘medv’, and ‘ptratio’ has the highest negative correlation with ‘medv’ from the last column.
cor(Boston)
## crim zn indus chas nox
## crim 1.00000000 -0.20046922 0.40658341 -0.055891582 0.42097171
## zn -0.20046922 1.00000000 -0.53382819 -0.042696719 -0.51660371
## indus 0.40658341 -0.53382819 1.00000000 0.062938027 0.76365145
## chas -0.05589158 -0.04269672 0.06293803 1.000000000 0.09120281
## nox 0.42097171 -0.51660371 0.76365145 0.091202807 1.00000000
## rm -0.21924670 0.31199059 -0.39167585 0.091251225 -0.30218819
## age 0.35273425 -0.56953734 0.64477851 0.086517774 0.73147010
## dis -0.37967009 0.66440822 -0.70802699 -0.099175780 -0.76923011
## rad 0.62550515 -0.31194783 0.59512927 -0.007368241 0.61144056
## tax 0.58276431 -0.31456332 0.72076018 -0.035586518 0.66802320
## ptratio 0.28994558 -0.39167855 0.38324756 -0.121515174 0.18893268
## b -0.38506394 0.17552032 -0.35697654 0.048788485 -0.38005064
## lstat 0.45562148 -0.41299457 0.60379972 -0.053929298 0.59087892
## medv -0.38830461 0.36044534 -0.48372516 0.175260177 -0.42732077
## rm age dis rad tax
## crim -0.21924670 0.35273425 -0.37967009 0.625505145 0.58276431
## zn 0.31199059 -0.56953734 0.66440822 -0.311947826 -0.31456332
## indus -0.39167585 0.64477851 -0.70802699 0.595129275 0.72076018
## chas 0.09125123 0.08651777 -0.09917578 -0.007368241 -0.03558652
## nox -0.30218819 0.73147010 -0.76923011 0.611440563 0.66802320
## rm 1.00000000 -0.24026493 0.20524621 -0.209846668 -0.29204783
## age -0.24026493 1.00000000 -0.74788054 0.456022452 0.50645559
## dis 0.20524621 -0.74788054 1.00000000 -0.494587930 -0.53443158
## rad -0.20984667 0.45602245 -0.49458793 1.000000000 0.91022819
## tax -0.29204783 0.50645559 -0.53443158 0.910228189 1.00000000
## ptratio -0.35550149 0.26151501 -0.23247054 0.464741179 0.46085304
## b 0.12806864 -0.27353398 0.29151167 -0.444412816 -0.44180801
## lstat -0.61380827 0.60233853 -0.49699583 0.488676335 0.54399341
## medv 0.69535995 -0.37695457 0.24992873 -0.381626231 -0.46853593
## ptratio b lstat medv
## crim 0.2899456 -0.38506394 0.4556215 -0.3883046
## zn -0.3916785 0.17552032 -0.4129946 0.3604453
## indus 0.3832476 -0.35697654 0.6037997 -0.4837252
## chas -0.1215152 0.04878848 -0.0539293 0.1752602
## nox 0.1889327 -0.38005064 0.5908789 -0.4273208
## rm -0.3555015 0.12806864 -0.6138083 0.6953599
## age 0.2615150 -0.27353398 0.6023385 -0.3769546
## dis -0.2324705 0.29151167 -0.4969958 0.2499287
## rad 0.4647412 -0.44441282 0.4886763 -0.3816262
## tax 0.4608530 -0.44180801 0.5439934 -0.4685359
## ptratio 1.0000000 -0.17738330 0.3740443 -0.5077867
## b -0.1773833 1.00000000 -0.3660869 0.3334608
## lstat 0.3740443 -0.36608690 1.0000000 -0.7376627
## medv -0.5077867 0.33346082 -0.7376627 1.0000000
Use summary(Boston) to get a broad overview of the variables.
summary(Boston)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08204 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio b
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
The variable ‘chas’ is a factor of 0 and 1, so we turn it into a factor variable using the as.factor() method.
Boston$chas <- as.factor(Boston$chas)
Use scatterplotMatrix() method to get a better picture of the correlation between variables. We need to look for linear relationships between response variable ‘medv’ and the rest of the variables in the data frame.
scatterplotMatrix(~crim+zn+indus+chas+nox+rm+age+dis+rad+tax+ptratio+b+lstat+medv, data = Boston)
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
## Warning in smoother(x, y, col = col[2], log.x = FALSE, log.y = FALSE,
## spread = spread, : could not fit smooth
We can roughly infer that ‘rm’, ‘ptratio’ and ‘lstat’ might be the most relevant in estimating ‘medv’. We can also be aware of multicollinearity between dependent variables from the images. When two variables are highly correlated, it might cause over-fitting, so we need to remove this kind of variable.
We use the histogram of ‘medv’ to make the appropriate transformations. The right skewed distribution suggests that a log transformation would be appropriate.
par(mfrow = c(1, 2))
hist(medv)
hist(log(medv))
Fit linear models with ‘medv’ and ‘log(medv)’ respectively using lm() method, we obtain a higher adjusted R-squared of 0.784 from the one with log transformation. Thus, the log transformation of ‘medv’ is indeed appropriate.
mod <- lm(medv ~ ., data = Boston)
summary(mod)
##
## Call:
## lm(formula = medv ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.595 -2.730 -0.518 1.777 26.199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.646e+01 5.103e+00 7.144 3.28e-12 ***
## crim -1.080e-01 3.286e-02 -3.287 0.001087 **
## zn 4.642e-02 1.373e-02 3.382 0.000778 ***
## indus 2.056e-02 6.150e-02 0.334 0.738288
## chas1 2.687e+00 8.616e-01 3.118 0.001925 **
## nox -1.777e+01 3.820e+00 -4.651 4.25e-06 ***
## rm 3.810e+00 4.179e-01 9.116 < 2e-16 ***
## age 6.922e-04 1.321e-02 0.052 0.958229
## dis -1.476e+00 1.995e-01 -7.398 6.01e-13 ***
## rad 3.060e-01 6.635e-02 4.613 5.07e-06 ***
## tax -1.233e-02 3.760e-03 -3.280 0.001112 **
## ptratio -9.527e-01 1.308e-01 -7.283 1.31e-12 ***
## b 9.312e-03 2.686e-03 3.467 0.000573 ***
## lstat -5.248e-01 5.072e-02 -10.347 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared: 0.7406, Adjusted R-squared: 0.7338
## F-statistic: 108.1 on 13 and 492 DF, p-value: < 2.2e-16
logmod <- lm(log(medv) ~ ., data = Boston)
summary(logmod)
##
## Call:
## lm(formula = log(medv) ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73361 -0.09747 -0.01657 0.09629 0.86435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1020423 0.2042726 20.081 < 2e-16 ***
## crim -0.0102715 0.0013155 -7.808 3.52e-14 ***
## zn 0.0011725 0.0005495 2.134 0.033349 *
## indus 0.0024668 0.0024614 1.002 0.316755
## chas1 0.1008876 0.0344859 2.925 0.003598 **
## nox -0.7783993 0.1528902 -5.091 5.07e-07 ***
## rm 0.0908331 0.0167280 5.430 8.87e-08 ***
## age 0.0002106 0.0005287 0.398 0.690567
## dis -0.0490873 0.0079834 -6.149 1.62e-09 ***
## rad 0.0142673 0.0026556 5.373 1.20e-07 ***
## tax -0.0006258 0.0001505 -4.157 3.80e-05 ***
## ptratio -0.0382715 0.0052365 -7.309 1.10e-12 ***
## b 0.0004136 0.0001075 3.847 0.000135 ***
## lstat -0.0290355 0.0020299 -14.304 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1899 on 492 degrees of freedom
## Multiple R-squared: 0.7896, Adjusted R-squared: 0.7841
## F-statistic: 142.1 on 13 and 492 DF, p-value: < 2.2e-16
We use stepwise regression method to find potential non-significant variables. From the output, ‘age’ and ‘indus’ might get removed.
mod <- lm(log(medv) ~ ., data = Boston)
step <- stepAIC(mod, direction = "both")
## Start: AIC=-1667.19
## log(medv) ~ crim + zn + indus + chas + nox + rm + age + dis +
## rad + tax + ptratio + b + lstat
##
## Df Sum of Sq RSS AIC
## - age 1 0.0057 17.755 -1669.0
## - indus 1 0.0362 17.786 -1668.2
## <none> 17.749 -1667.2
## - zn 1 0.1643 17.914 -1664.5
## - chas 1 0.3088 18.058 -1660.5
## - b 1 0.5339 18.283 -1654.2
## - tax 1 0.6235 18.373 -1651.7
## - nox 1 0.9351 18.684 -1643.2
## - rad 1 1.0413 18.791 -1640.3
## - rm 1 1.0637 18.813 -1639.7
## - dis 1 1.3639 19.113 -1631.7
## - ptratio 1 1.9270 19.676 -1617.0
## - crim 1 2.1995 19.949 -1610.1
## - lstat 1 7.3809 25.130 -1493.2
##
## Step: AIC=-1669.03
## log(medv) ~ crim + zn + indus + chas + nox + rm + dis + rad +
## tax + ptratio + b + lstat
##
## Df Sum of Sq RSS AIC
## - indus 1 0.0363 17.791 -1670.0
## <none> 17.755 -1669.0
## + age 1 0.0057 17.749 -1667.2
## - zn 1 0.1593 17.914 -1666.5
## - chas 1 0.3138 18.069 -1662.2
## - b 1 0.5431 18.298 -1655.8
## - tax 1 0.6205 18.376 -1653.7
## - nox 1 0.9645 18.720 -1644.3
## - rad 1 1.0356 18.791 -1642.3
## - rm 1 1.1452 18.900 -1639.4
## - dis 1 1.5471 19.302 -1628.8
## - ptratio 1 1.9224 19.677 -1619.0
## - crim 1 2.1988 19.954 -1612.0
## - lstat 1 8.1949 25.950 -1479.0
##
## Step: AIC=-1670
## log(medv) ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio +
## b + lstat
##
## Df Sum of Sq RSS AIC
## <none> 17.791 -1670.0
## + indus 1 0.0363 17.755 -1669.0
## + age 1 0.0058 17.786 -1668.2
## - zn 1 0.1451 17.936 -1667.9
## - chas 1 0.3399 18.131 -1662.4
## - b 1 0.5344 18.326 -1657.0
## - tax 1 0.6139 18.405 -1654.8
## - nox 1 0.9350 18.726 -1646.1
## - rad 1 1.0088 18.800 -1644.1
## - rm 1 1.1171 18.909 -1641.2
## - dis 1 1.7385 19.530 -1624.8
## - ptratio 1 1.8862 19.678 -1621.0
## - crim 1 2.2229 20.014 -1612.4
## - lstat 1 8.1604 25.952 -1481.0
step$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## log(medv) ~ crim + zn + indus + chas + nox + rm + age + dis +
## rad + tax + ptratio + b + lstat
##
## Final Model:
## log(medv) ~ crim + zn + chas + nox + rm + dis + rad + tax + ptratio +
## b + lstat
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 492 17.74938 -1667.194
## 2 - age 1 0.005723781 493 17.75510 -1669.031
## 3 - indus 1 0.036264380 494 17.79137 -1669.999
For the first model, we use our entire data set to fit the model. From summary, we find three non-significant variables: ‘age’, ‘indus’ and ‘zn’. Compared to the outcome from stepwise regression method, ‘age’ and ‘indus’ can be affirmed to be removed. Whether to remove ‘zn’ or not can be further determined.
fit1 <- lm(log(medv) ~ ., data = Boston)
summary(fit1)
##
## Call:
## lm(formula = log(medv) ~ ., data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73361 -0.09747 -0.01657 0.09629 0.86435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1020423 0.2042726 20.081 < 2e-16 ***
## crim -0.0102715 0.0013155 -7.808 3.52e-14 ***
## zn 0.0011725 0.0005495 2.134 0.033349 *
## indus 0.0024668 0.0024614 1.002 0.316755
## chas1 0.1008876 0.0344859 2.925 0.003598 **
## nox -0.7783993 0.1528902 -5.091 5.07e-07 ***
## rm 0.0908331 0.0167280 5.430 8.87e-08 ***
## age 0.0002106 0.0005287 0.398 0.690567
## dis -0.0490873 0.0079834 -6.149 1.62e-09 ***
## rad 0.0142673 0.0026556 5.373 1.20e-07 ***
## tax -0.0006258 0.0001505 -4.157 3.80e-05 ***
## ptratio -0.0382715 0.0052365 -7.309 1.10e-12 ***
## b 0.0004136 0.0001075 3.847 0.000135 ***
## lstat -0.0290355 0.0020299 -14.304 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1899 on 492 degrees of freedom
## Multiple R-squared: 0.7896, Adjusted R-squared: 0.7841
## F-statistic: 142.1 on 13 and 492 DF, p-value: < 2.2e-16
Use vif(fit1) to test the multicollinearity of our model. VIF(variance inflation factor) values higher than 5 are considered to be problematic, so we drop them from the model. ‘Tax’has the highest VIF value at 9.0086, so we drop it first.
vif(fit1)
## crim zn indus chas1 nox rm age dis rad
## 1.7922 2.2988 3.9916 1.0740 4.3937 1.9337 3.1008 3.9559 7.4845
## tax ptratio b lstat
## 9.0086 1.7991 1.3485 2.9415
Remove ‘tax’ from our first model fit1. Use vif() and summary() again to diagnose. Now the VIF values are all below 5. So we assume there is no more collinear variables.
fit2 <- update(fit1, ~ . - tax)
vif(fit2)
## crim zn indus chas1 nox rm age dis rad
## 1.7919 2.1842 3.2260 1.0582 4.3693 1.9231 3.0980 3.9544 2.8375
## ptratio b lstat
## 1.7888 1.3476 2.9408
summary(fit2)
##
## Call:
## lm(formula = log(medv) ~ crim + zn + indus + chas + nox + rm +
## age + dis + rad + ptratio + b + lstat, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73499 -0.10227 -0.01119 0.09491 0.86954
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0091581 0.2063732 19.427 < 2e-16 ***
## crim -0.0102067 0.0013369 -7.635 1.18e-13 ***
## zn 0.0006626 0.0005444 1.217 0.224110
## indus -0.0020148 0.0022491 -0.896 0.370779
## chas1 0.1182635 0.0347924 3.399 0.000731 ***
## nox -0.8258144 0.1549617 -5.329 1.50e-07 ***
## rm 0.0959990 0.0169551 5.662 2.54e-08 ***
## age 0.0001448 0.0005372 0.270 0.787643
## dis -0.0497335 0.0081127 -6.130 1.80e-09 ***
## rad 0.0055679 0.0016619 3.350 0.000869 ***
## ptratio -0.0399143 0.0053071 -7.521 2.59e-13 ***
## b 0.0004255 0.0001092 3.895 0.000112 ***
## lstat -0.0289062 0.0020630 -14.012 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.193 on 493 degrees of freedom
## Multiple R-squared: 0.7823, Adjusted R-squared: 0.777
## F-statistic: 147.6 on 12 and 493 DF, p-value: < 2.2e-16
Notice that R-squared has been reduced a little, but not significantly, we take it as acceptable.
First we remove ‘age’ and ‘indus’, because these two variables are affirmed to be non-significant variables from above. The summary is as follows:
fit3 <- update(fit2, ~ . - age - indus)
summary(fit3)
##
## Call:
## lm(formula = log(medv) ~ crim + zn + chas + nox + rm + dis +
## rad + ptratio + b + lstat, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73447 -0.10493 -0.01084 0.09297 0.87348
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0065321 0.2054382 19.502 < 2e-16 ***
## crim -0.0101496 0.0013339 -7.609 1.41e-13 ***
## zn 0.0006511 0.0005400 1.206 0.228470
## chas1 0.1169498 0.0346573 3.374 0.000798 ***
## nox -0.8609244 0.1397957 -6.158 1.52e-09 ***
## rm 0.0989867 0.0164154 6.030 3.21e-09 ***
## dis -0.0487057 0.0075256 -6.472 2.33e-10 ***
## rad 0.0053533 0.0016421 3.260 0.001191 **
## ptratio -0.0406652 0.0051938 -7.830 3.00e-14 ***
## b 0.0004321 0.0001088 3.973 8.15e-05 ***
## lstat -0.0288688 0.0019297 -14.960 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1928 on 495 degrees of freedom
## Multiple R-squared: 0.7819, Adjusted R-squared: 0.7775
## F-statistic: 177.4 on 10 and 495 DF, p-value: < 2.2e-16
The p-value for ‘zn’ in the output is larger than 0.05, which means ‘zn’ is also a non-significant variable. So it is necessary to remove the variable ‘zn’. Then we get:
fit3 <- update(fit2, ~ . - age - indus - zn)
summary(fit3)
##
## Call:
## lm(formula = log(medv) ~ crim + chas + nox + rm + dis + rad +
## ptratio + b + lstat, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.73252 -0.10612 -0.01410 0.09214 0.87773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.0213020 0.2051665 19.600 < 2e-16 ***
## crim -0.0100087 0.0013294 -7.529 2.44e-13 ***
## chas1 0.1161515 0.0346669 3.351 0.000868 ***
## nox -0.8712566 0.1395967 -6.241 9.32e-10 ***
## rm 0.1014707 0.0162931 6.228 1.01e-09 ***
## dis -0.0442353 0.0065520 -6.751 4.10e-11 ***
## rad 0.0056058 0.0016295 3.440 0.000630 ***
## ptratio -0.0426888 0.0049175 -8.681 < 2e-16 ***
## b 0.0004319 0.0001088 3.970 8.26e-05 ***
## lstat -0.0288434 0.0019305 -14.941 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1929 on 496 degrees of freedom
## Multiple R-squared: 0.7812, Adjusted R-squared: 0.7773
## F-statistic: 196.8 on 9 and 496 DF, p-value: < 2.2e-16
Using outlierTest() to check for outliers and high leverage points in our data. Observations with an absolute value greater than 3 for their studentized residuals are considered problematic, so we remove the former 11 observations from the model.
outlierTest(fit3, cutoff = Inf, n.max = 15)
## rstudent unadjusted p-value Bonferonni p
## 413 4.777095 2.3470e-06 0.0011876
## 372 4.245108 2.6097e-05 0.0132050
## 373 3.910611 1.0491e-04 0.0530840
## 402 -3.877262 1.1989e-04 0.0606620
## 369 3.803402 1.6056e-04 0.0812420
## 375 3.498285 5.1048e-04 0.2583000
## 215 3.491031 5.2420e-04 0.2652400
## 401 -3.464786 5.7676e-04 0.2918400
## 490 -3.358377 8.4446e-04 0.4272900
## 506 -3.164928 1.6467e-03 0.8332400
## 398 -3.045484 2.4469e-03 NA
## 400 -2.950863 3.3191e-03 NA
## 368 2.891055 4.0083e-03 NA
## 410 2.890862 4.0107e-03 NA
## 365 -2.544832 1.1236e-02 NA
Remove the outliers from the data frame and update our model into the fourth model.
Boston <- Boston[-c(413, 372, 373, 402, 369, 375, 215, 401, 490, 506, 398),]
fit4 <- lm(log(medv) ~ . - tax - age - indus - zn, data = Boston)
summary(fit4)
##
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.55441 -0.09252 -0.01404 0.08823 0.64232
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.580e+00 1.771e-01 20.212 < 2e-16 ***
## crim -9.324e-03 1.129e-03 -8.261 1.39e-15 ***
## chas1 8.973e-02 2.968e-02 3.023 0.00264 **
## nox -6.568e-01 1.198e-01 -5.483 6.75e-08 ***
## rm 1.301e-01 1.411e-02 9.223 < 2e-16 ***
## dis -3.788e-02 5.585e-03 -6.783 3.44e-11 ***
## rad 3.280e-03 1.407e-03 2.331 0.02016 *
## ptratio -3.783e-02 4.190e-03 -9.029 < 2e-16 ***
## b 5.298e-04 9.379e-05 5.649 2.76e-08 ***
## lstat -2.789e-02 1.717e-03 -16.243 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1629 on 485 degrees of freedom
## Multiple R-squared: 0.8322, Adjusted R-squared: 0.8291
## F-statistic: 267.2 on 9 and 485 DF, p-value: < 2.2e-16
It looks like ‘lstat’ is our most significant variable along with ‘rm’. Our R-squared is 0.8322, and since it is close to 1, we can say our model explains a large part of our response variable. Our residual standard error is fairly small at 0.1629, which is good.
Now we get residual diagnostics plots using the par() and plot() methods to test our model. From the Residuals vs Fitted plot below, we can infer that there is non-linearity in our model.
par(mfrow = c(2,2))
plot(fit4)
Another assumption for linear regression is normality of residual terms. We test normality of residuals using the studres() and hist() methods. We also fit a curve using the lines() method. Here we see that the residuals of our model display a normal distribution, which is good as this is a linear regression assumption.
studentizedResiduals <- studres(fit4)
par(mfrow = c(1,1))
hist(studentizedResiduals, freq = FALSE, main = "Distribution of Studentized Residuals")
xfit <- seq(min(studentizedResiduals), max(studentizedResiduals), length = 40)
yfit <- dnorm(xfit)
lines(xfit, yfit)
Recall the scatterplots we got before, we can sense a non-linear relationship between ‘lstat’ and ‘medv’. So we add ‘lstat’-squared to the model. In our model summary we see that ‘I(lstat^2)’ is indeed statistically significant. Our R-squared has increased to 0.8379 and our RSE has decreased to 0.1603.
fit5 <- lm(log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2), data = Boston)
summary(fit5)
##
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2),
## data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60240 -0.09440 -0.01450 0.09271 0.65533
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.571e+00 1.683e-01 21.218 < 2e-16 ***
## crim -8.890e-03 1.025e-03 -8.678 < 2e-16 ***
## chas1 8.875e-02 2.936e-02 3.022 0.00264 **
## nox -5.004e-01 1.102e-01 -4.543 7.02e-06 ***
## rm 1.221e-01 1.416e-02 8.627 < 2e-16 ***
## dis -3.819e-02 5.516e-03 -6.923 1.41e-11 ***
## ptratio -3.036e-02 3.850e-03 -7.886 2.08e-14 ***
## b 4.709e-04 9.155e-05 5.144 3.92e-07 ***
## lstat -4.543e-02 4.706e-03 -9.653 < 2e-16 ***
## I(lstat^2) 5.172e-04 1.283e-04 4.030 6.48e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1611 on 485 degrees of freedom
## Multiple R-squared: 0.8358, Adjusted R-squared: 0.8328
## F-statistic: 274.3 on 9 and 485 DF, p-value: < 2.2e-16
Now take a look at the residual diagnostics plots for fit5, the curve in Residuals vs Fitted plot seems to be flatter. Next we can include a third-level polynomial for ‘lstat’ to see if we can get a better model.
par(mfrow = c(2,2))
plot(fit5)
Here we add ‘I(lstat^3)’ to our model. Given its p-value, it looks like ‘I(lstat^3)’ does not add much to the model. The values of R-squared and RSE are almost the same.
fit6 <- lm(log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2) + I(lstat^3), data = Boston)
summary(fit6)
##
## Call:
## lm(formula = log(medv) ~ . - tax - age - indus - zn - rad + I(lstat^2) +
## I(lstat^3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.59951 -0.09217 -0.01623 0.09464 0.64919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.649e+00 1.866e-01 19.557 < 2e-16 ***
## crim -8.914e-03 1.025e-03 -8.697 < 2e-16 ***
## chas1 8.670e-02 2.944e-02 2.945 0.00339 **
## nox -5.070e-01 1.104e-01 -4.593 5.57e-06 ***
## rm 1.173e-01 1.502e-02 7.806 3.69e-14 ***
## dis -3.838e-02 5.520e-03 -6.953 1.16e-11 ***
## ptratio -3.043e-02 3.851e-03 -7.901 1.88e-14 ***
## b 4.741e-04 9.162e-05 5.174 3.36e-07 ***
## lstat -5.617e-02 1.207e-02 -4.652 4.24e-06 ***
## I(lstat^2) 1.245e-03 7.642e-04 1.629 0.10390
## I(lstat^3) -1.407e-05 1.456e-05 -0.966 0.33438
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1612 on 484 degrees of freedom
## Multiple R-squared: 0.8361, Adjusted R-squared: 0.8327
## F-statistic: 246.9 on 10 and 484 DF, p-value: < 2.2e-16
There is no significant improvement from the residual diagnostics plot.
par(mfrow = c(2,2))
plot(fit6)
During this step, we determine which of our models is best at estimating our response variable while not overfitting out data. Since the goal of cross validation is to define a dataset to “test” the model in the training phase, in order to limit problems like overfitting, give an insight on how the model will generalize to an independent dataset. So for this step, we use the CVlm() method. We want each of the fold symbols to be as close as possible to the lines on the plot. We call the attr() to get more details on the mean squared error of the fit. We will be comparing models 4, 5 and 6.
## Analysis of Variance Table
##
## Response: log(medv)
## Df Sum Sq Mean Sq F value Pr(>F)
## crim 1 22.42 22.42 844.6 < 2e-16 ***
## chas 1 1.03 1.03 38.9 9.8e-10 ***
## nox 1 9.25 9.25 348.4 < 2e-16 ***
## rm 1 18.40 18.40 693.5 < 2e-16 ***
## dis 1 0.35 0.35 13.4 0.00028 ***
## rad 1 0.62 0.62 23.5 1.7e-06 ***
## ptratio 1 3.06 3.06 115.2 < 2e-16 ***
## b 1 1.70 1.70 63.9 9.7e-15 ***
## lstat 1 7.00 7.00 263.8 < 2e-16 ***
## Residuals 485 12.87 0.03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## fold 1
## Observations in test set: 49
## 5 24 38 44 45 59 72 80
## Predicted 3.343 2.6921 3.1196 3.21010 3.1304 3.0814 3.094 3.151
## cvpred 3.342 2.6912 3.1262 3.21368 3.1337 3.0779 3.100 3.159
## log(medv) 3.589 2.6741 3.0445 3.20680 3.0540 3.1485 3.077 3.011
## CV residual 0.247 -0.0171 -0.0817 -0.00688 -0.0797 0.0705 -0.023 -0.148
## 97 100 102 104 106 118 131 148
## Predicted 3.173 3.4831 3.2332 2.9923 2.8941 3.167 2.9962 2.416
## cvpred 3.182 3.4880 3.2379 2.9975 2.9008 3.172 2.9976 2.405
## log(medv) 3.063 3.5025 3.2771 2.9601 2.9704 2.955 2.9549 2.681
## CV residual -0.118 0.0146 0.0392 -0.0374 0.0696 -0.217 -0.0427 0.276
## 150 157 168 190 202 213 228 275
## Predicted 2.7015 2.6626 3.0812 3.55566 3.323 3.0335 3.4433 3.5599
## cvpred 2.6927 2.6643 3.0889 3.55791 3.327 3.0398 3.4456 3.5688
## log(medv) 2.7344 2.5726 3.1697 3.55249 3.182 3.1091 3.4532 3.4782
## CV residual 0.0416 -0.0917 0.0808 -0.00543 -0.145 0.0692 0.0076 -0.0907
## 285 287 294 297 298 300 317 321
## Predicted 3.3775 2.9286 3.2600 3.3289 2.9822 3.458 2.8508 3.2034
## cvpred 3.3746 2.9247 3.2638 3.3300 2.9813 3.454 2.8499 3.2066
## log(medv) 3.4720 3.0007 3.1739 3.2995 3.0106 3.367 2.8792 3.1697
## CV residual 0.0973 0.0761 -0.0899 -0.0305 0.0293 -0.087 0.0293 -0.0369
## 328 332 337 339 345 352 392 441 443
## Predicted 2.969 3.009 2.9963 3.0725 3.339 3.1240 2.768 2.480 2.8361
## cvpred 2.968 3.011 2.9980 3.0751 3.336 3.1148 2.757 2.470 2.8239
## log(medv) 3.100 2.839 2.9704 3.0253 3.440 3.1822 3.144 2.351 2.9124
## CV residual 0.132 -0.172 -0.0276 -0.0499 0.104 0.0674 0.387 -0.118 0.0884
## 446 453 468 469 472 482 491 504
## Predicted 2.4283 2.8310 2.736 2.719 3.051 3.2366 2.366 3.291
## cvpred 2.4203 2.8185 2.729 2.715 3.048 3.2327 2.368 3.297
## log(medv) 2.4681 2.7788 2.950 2.950 2.976 3.1655 2.092 3.174
## CV residual 0.0478 -0.0397 0.221 0.235 -0.072 -0.0672 -0.276 -0.123
##
## Sum of squares = 0.78 Mean square = 0.02 n = 49
##
## fold 2
## Observations in test set: 50
## 6 10 11 15 20 22 39 47 65
## Predicted 3.25 2.93528 2.902 2.9790 2.933 2.8840 3.097 2.99215 3.172
## cvpred 3.25 2.93463 2.901 2.9710 2.926 2.8765 3.096 2.98927 3.162
## log(medv) 3.36 2.93916 2.708 2.9014 2.901 2.9755 3.207 2.99573 3.497
## CV residual 0.11 0.00453 -0.193 -0.0696 -0.025 0.0991 0.111 0.00646 0.335
## 66 74 75 81 87 107 120 125
## Predicted 3.371 3.1957 3.2768 3.3279 3.0609 2.828 3.0349 2.8813
## cvpred 3.371 3.1925 3.2771 3.3244 3.0577 2.825 3.0375 2.8746
## log(medv) 3.157 3.1527 3.1822 3.3322 3.1135 2.970 2.9601 2.9339
## CV residual -0.214 -0.0397 -0.0949 0.0078 0.0558 0.146 -0.0774 0.0593
## 144 155 167 182 198 206 216 253 261
## Predicted 2.562 3.025 3.671 3.243 3.4398 3.0914 3.1660 3.271 3.4727
## cvpred 2.555 3.021 3.669 3.242 3.4373 3.0899 3.1638 3.267 3.4753
## log(medv) 2.747 2.833 3.912 3.589 3.4111 3.1179 3.2189 3.388 3.5205
## CV residual 0.192 -0.188 0.243 0.347 -0.0261 0.0281 0.0551 0.121 0.0452
## 274 278 289 296 306 324 347 355
## Predicted 3.5282 3.520 3.248 3.3784 3.2891 2.9723 2.8916 2.804
## cvpred 3.5237 3.521 3.250 3.3780 3.2921 2.9701 2.8834 2.792
## log(medv) 3.5610 3.500 3.105 3.3534 3.3464 2.9178 2.8449 2.901
## CV residual 0.0373 -0.021 -0.146 -0.0246 0.0542 -0.0523 -0.0385 0.109
## 376 391 394 403 405 418 419 424 427
## Predicted 3.008 2.7722 2.884 2.754 2.1220 2.159 1.904 2.4950 2.656
## cvpred 3.009 2.7895 2.899 2.766 2.0956 2.145 1.829 2.4991 2.657
## log(medv) 2.708 2.7147 2.625 2.493 2.1401 2.342 2.175 2.5953 2.322
## CV residual -0.301 -0.0748 -0.274 -0.273 0.0445 0.196 0.346 0.0961 -0.335
## 433 439 449 458 462 488 499
## Predicted 2.920 2.0721 2.76 2.5465 2.9232 3.00203 3.0289
## cvpred 2.929 2.0669 2.78 2.5476 2.9420 3.02315 3.0287
## log(medv) 2.779 2.1282 2.65 2.6027 2.8736 3.02529 3.0540
## CV residual -0.151 0.0613 -0.13 0.0551 -0.0684 0.00215 0.0253
##
## Sum of squares = 1.17 Mean square = 0.02 n = 50
##
## fold 3
## Observations in test set: 50
## 1 7 13 19 35 55 68 85 99
## Predicted 3.423 3.1111 3.0050 2.831 2.6455 2.809 3.1119 3.1871 3.607
## cvpred 3.415 3.1133 3.0128 2.814 2.6529 2.814 3.1042 3.1884 3.613
## log(medv) 3.178 3.1311 3.0773 3.006 2.6027 2.939 3.0910 3.1739 3.780
## CV residual -0.237 0.0179 0.0645 0.192 -0.0502 0.125 -0.0132 -0.0145 0.166
## 108 110 112 117 121 129 137 139
## Predicted 2.9924 2.94962 3.265 3.1388 2.966 2.9276 2.8062 2.691
## cvpred 2.9958 2.95750 3.269 3.1397 2.965 2.9330 2.8078 2.702
## log(medv) 3.0155 2.96527 3.127 3.0540 3.091 2.8904 2.8565 2.588
## CV residual 0.0198 0.00777 -0.142 -0.0857 0.126 -0.0427 0.0487 -0.115
## 147 152 160 165 203 230 236 239 244
## Predicted 2.748 2.9079 3.2222 3.1769 3.633 3.434 3.1618 3.322 3.309
## cvpred 2.728 2.8867 3.2029 3.1722 3.640 3.423 3.1602 3.319 3.301
## log(medv) 2.747 2.9755 3.1485 3.1224 3.745 3.450 3.1781 3.165 3.165
## CV residual 0.019 0.0888 -0.0545 -0.0498 0.105 0.027 0.0178 -0.154 -0.136
## 258 270 313 320 326 340 359 360
## Predicted 3.8022 3.1190 3.097 3.03421 3.2281 3.0288 3.014 2.903
## cvpred 3.8189 3.1209 3.095 3.03491 3.2165 3.0209 2.999 2.893
## log(medv) 3.9120 3.0301 2.965 3.04452 3.2027 2.9444 3.122 3.118
## CV residual 0.0931 -0.0908 -0.129 0.00961 -0.0137 -0.0765 0.123 0.225
## 361 367 397 400 411 420 431 448 455
## Predicted 3.065 2.745 2.825 2.395 2.441 2.524 2.7325 2.794 2.60
## cvpred 3.046 2.722 2.837 2.419 2.391 2.527 2.7271 2.793 2.59
## log(medv) 3.219 3.086 2.526 1.841 2.708 2.128 2.6741 2.534 2.70
## CV residual 0.173 0.364 -0.311 -0.579 0.317 -0.399 -0.0529 -0.259 0.11
## 460 465 467 479 481 487 497
## Predicted 2.852 2.929 2.628 2.817 3.0843 2.9102 2.703
## cvpred 2.847 2.925 2.612 2.824 3.0817 2.9130 2.711
## log(medv) 2.996 3.063 2.944 2.681 3.1355 2.9497 2.981
## CV residual 0.148 0.138 0.333 -0.143 0.0538 0.0367 0.269
##
## Sum of squares = 1.56 Mean square = 0.03 n = 50
##
## fold 4
## Observations in test set: 50
## 3 8 50 51 52 70 77 94
## Predicted 3.4476 2.930 2.8733 3.0208 3.152 3.093 3.129 3.3059
## cvpred 3.4505 2.929 2.8724 3.0200 3.153 3.094 3.124 3.3062
## log(medv) 3.5467 3.300 2.9653 2.9806 3.020 3.040 2.996 3.2189
## CV residual 0.0963 0.371 0.0929 -0.0393 -0.133 -0.054 -0.128 -0.0874
## 95 101 133 134 141 153 172 175 181
## Predicted 3.191 3.180 3.011 2.829 2.6528 2.983 3.145 3.232 3.501
## cvpred 3.189 3.181 3.018 2.834 2.6520 3.000 3.147 3.231 3.499
## log(medv) 3.025 3.314 3.135 2.912 2.6391 2.728 2.950 3.118 3.684
## CV residual -0.164 0.134 0.118 0.078 -0.0129 -0.273 -0.197 -0.113 0.185
## 197 205 207 222 235 243 248 265
## Predicted 3.612 3.742 3.1278 2.9985 3.4039 3.1352 3.01686 3.5188
## cvpred 3.612 3.739 3.1284 2.9942 3.4039 3.1320 3.01698 3.5226
## log(medv) 3.506 3.912 3.1946 3.0773 3.3673 3.1001 3.02042 3.5973
## CV residual -0.107 0.173 0.0662 0.0831 -0.0366 -0.0319 0.00344 0.0747
## 269 277 283 309 315 318 319 327
## Predicted 3.7040 3.5244 3.7444 3.357 3.2054 2.9030 3.1532 3.1834
## cvpred 3.7072 3.5264 3.7451 3.362 3.2088 2.9045 3.1563 3.1883
## log(medv) 3.7728 3.5025 3.8286 3.127 3.1697 2.9857 3.1398 3.1355
## CV residual 0.0656 -0.0238 0.0835 -0.236 -0.0391 0.0812 -0.0165 -0.0528
## 341 344 346 362 363 377 378 399 406
## Predicted 3.039 3.277 2.978 2.870 2.890 2.6698 2.813 2.092 2.055
## cvpred 3.043 3.279 2.981 2.875 2.901 2.6808 2.819 2.137 2.144
## log(medv) 2.929 3.174 2.862 2.991 3.035 2.6319 2.588 1.609 1.609
## CV residual -0.115 -0.105 -0.119 0.115 0.134 -0.0489 -0.232 -0.528 -0.534
## 414 415 435 450 452 461 466 498
## Predicted 2.398 1.554 2.651 2.731 2.846 2.832 2.851 2.948
## cvpred 2.416 1.585 2.649 2.733 2.849 2.828 2.850 2.951
## log(medv) 2.791 1.946 2.460 2.565 2.721 2.797 2.991 2.907
## CV residual 0.375 0.361 -0.189 -0.168 -0.128 -0.031 0.141 -0.044
##
## Sum of squares = 1.63 Mean square = 0.03 n = 50
##
## fold 5
## Observations in test set: 50
## 2 17 21 41 42 48 56 64 78
## Predicted 3.208 3.0590 2.628 3.4801 3.3509 2.869 3.344 3.085 3.164
## cvpred 3.211 3.0701 2.628 3.4849 3.3540 2.863 3.335 3.076 3.171
## log(medv) 3.073 3.1398 2.610 3.5525 3.2809 2.809 3.567 3.219 3.035
## CV residual -0.138 0.0697 -0.018 0.0676 -0.0731 -0.054 0.232 0.143 -0.136
## 86 92 151 159 185 188 195 204 209
## Predicted 3.3151 3.288 2.9925 3.362 3.029 3.4952 3.4310 3.693 3.08
## cvpred 3.3208 3.297 3.0014 3.382 3.039 3.5026 3.4359 3.691 3.06
## log(medv) 3.2809 3.091 3.0681 3.190 3.273 3.4657 3.3707 3.882 3.19
## CV residual -0.0399 -0.206 0.0667 -0.191 0.234 -0.0369 -0.0652 0.191 0.13
## 212 249 250 255 259 262 264 272
## Predicted 2.760 3.070 3.1991 3.1756 3.5419 3.576 3.44146 3.2508
## cvpred 2.743 3.065 3.1949 3.1738 3.5476 3.579 3.44268 3.2600
## log(medv) 2.960 3.199 3.2658 3.0865 3.5835 3.764 3.43399 3.2268
## CV residual 0.217 0.134 0.0708 -0.0874 0.0359 0.184 -0.00869 -0.0332
## 280 291 299 308 310 311 312 314
## Predicted 3.54613 3.4131 3.359 3.3675 3.128 2.900 3.273 3.217
## cvpred 3.55462 3.4185 3.362 3.3738 3.139 2.921 3.288 3.228
## log(medv) 3.55820 3.3499 3.114 3.3393 3.011 2.779 3.096 3.073
## CV residual 0.00358 -0.0686 -0.249 -0.0345 -0.129 -0.142 -0.193 -0.155
## 329 333 338 350 379 382 393 404
## Predicted 3.142 3.163 2.9629 3.168 2.5607 2.722 2.409 2.499
## cvpred 3.149 3.168 2.9667 3.159 2.5672 2.727 2.420 2.519
## log(medv) 2.960 2.965 2.9178 3.281 2.5726 2.389 2.272 2.116
## CV residual -0.189 -0.202 -0.0489 0.122 0.0054 -0.338 -0.148 -0.403
## 428 437 440 447 451 484 503 505
## Predicted 2.4391 2.550 2.5566 2.7690 2.6677 3.0311 3.0790 3.23
## cvpred 2.4683 2.562 2.5635 2.7743 2.6744 3.0438 3.0904 3.24
## log(medv) 2.3888 2.262 2.5494 2.7014 2.5953 3.0819 3.0253 3.09
## CV residual -0.0795 -0.301 -0.0141 -0.0729 -0.0791 0.0381 -0.0651 -0.15
##
## Sum of squares = 1.14 Mean square = 0.02 n = 50
##
## fold 6
## Observations in test set: 50
## 18 31 37 43 53 71 76 79
## Predicted 2.852 2.5785 3.0555 3.2444 3.321 3.2455 3.202 3.0831
## cvpred 2.845 2.5642 3.0626 3.2527 3.326 3.2501 3.208 3.0836
## log(medv) 2.862 2.5416 2.9957 3.2308 3.219 3.1864 3.063 3.0540
## CV residual 0.017 -0.0226 -0.0669 -0.0219 -0.107 -0.0637 -0.144 -0.0296
## 82 84 89 114 116 123 124 142 143
## Predicted 3.2593 3.1898 3.431 2.9949 2.987 2.88 2.657 2.215 2.6424
## cvpred 3.2595 3.1968 3.428 2.9923 2.988 2.87 2.640 2.197 2.6294
## log(medv) 3.1739 3.1311 3.161 2.9285 2.907 3.02 2.851 2.667 2.5953
## CV residual -0.0857 -0.0657 -0.267 -0.0638 -0.081 0.15 0.211 0.471 -0.0341
## 154 164 171 187 199 211 220 232 233
## Predicted 2.828 3.8398 3.057 3.577 3.5372 3.0119 3.32 3.4911 3.6848
## cvpred 2.820 3.8245 3.061 3.564 3.5329 3.0111 3.33 3.4864 3.6690
## log(medv) 2.965 3.9120 2.856 3.912 3.5439 3.0773 3.14 3.4563 3.7305
## CV residual 0.145 0.0875 -0.204 0.348 0.0109 0.0662 -0.19 -0.0301 0.0615
## 246 254 263 268 273 302 304 322 323
## Predicted 2.715 3.443 3.721 3.706 3.2766 3.304 3.4806 3.2060 3.131
## cvpred 2.713 3.418 3.699 3.687 3.2780 3.307 3.4847 3.2089 3.138
## log(medv) 2.918 3.757 3.888 3.912 3.1946 3.091 3.4995 3.1398 3.016
## CV residual 0.205 0.339 0.189 0.225 -0.0834 -0.216 0.0148 -0.0691 -0.122
## 330 336 357 380 388 412 421 432 436
## Predicted 3.27 3.03950 2.8321 2.646 2.135 2.607 2.8335 2.7213 2.50
## cvpred 3.28 3.04141 2.8321 2.645 2.139 2.596 2.8338 2.7097 2.48
## log(medv) 3.12 3.04927 2.8792 2.322 2.001 2.845 2.8154 2.6462 2.60
## CV residual -0.16 0.00786 0.0471 -0.323 -0.137 0.249 -0.0184 -0.0636 0.12
## 445 454 464 476 489 494
## Predicted 2.4431 2.9508 3.035 2.6580 2.7778 3.0154
## cvpred 2.4373 2.9327 3.040 2.6543 2.7771 3.0225
## log(medv) 2.3795 2.8792 3.006 2.5878 2.7213 3.0819
## CV residual -0.0578 -0.0535 -0.034 -0.0666 -0.0558 0.0594
##
## Sum of squares = 1.23 Mean square = 0.02 n = 50
##
## fold 7
## Observations in test set: 49
## 12 14 25 46 67 69 83 111
## Predicted 3.062 3.0153 2.7970 3.089 3.150 2.9354 3.233 3.010
## cvpred 3.061 3.0217 2.7900 3.090 3.151 2.9340 3.237 3.003
## log(medv) 2.939 3.0155 2.7473 2.960 2.965 2.8565 3.211 3.077
## CV residual -0.122 -0.0062 -0.0427 -0.129 -0.185 -0.0776 -0.026 0.074
## 135 136 140 145 146 158 161 178 180
## Predicted 2.6890 2.857 2.8089 2.4228 2.506 3.529 3.492 3.363 3.47
## cvpred 2.6869 2.845 2.7946 2.4014 2.494 3.532 3.500 3.366 3.47
## log(medv) 2.7473 2.896 2.8792 2.4681 2.625 3.721 3.296 3.203 3.62
## CV residual 0.0604 0.051 0.0846 0.0667 0.131 0.189 -0.204 -0.163 0.15
## 186 193 194 196 200 210 219 223 240
## Predicted 3.110 3.5491 3.4471 3.708 3.383 2.769 3.07898 3.4001 3.313
## cvpred 3.102 3.5570 3.4511 3.710 3.390 2.751 3.06570 3.3960 3.316
## log(medv) 3.388 3.5946 3.4372 3.912 3.552 2.996 3.06805 3.3142 3.148
## CV residual 0.286 0.0376 -0.0139 0.202 0.162 0.245 0.00235 -0.0818 -0.168
## 241 251 282 284 293 301 334 335
## Predicted 3.237 3.2055 3.5295 3.8246 3.3448 3.409 3.1190 3.0883
## cvpred 3.232 3.2148 3.5329 3.8263 3.3497 3.414 3.1309 3.0985
## log(medv) 3.091 3.1946 3.5667 3.9120 3.3286 3.211 3.1001 3.0301
## CV residual -0.141 -0.0202 0.0338 0.0858 -0.0211 -0.203 -0.0308 -0.0684
## 342 351 374 387 389 396 409 425 438
## Predicted 3.4383 3.11081 2.187 2.179 2.220 2.865 2.540 2.598 2.2624
## cvpred 3.4426 3.12258 2.154 2.160 2.197 2.856 2.519 2.611 2.2620
## log(medv) 3.4874 3.13114 2.625 2.351 2.322 2.573 2.845 2.460 2.1633
## CV residual 0.0448 0.00856 0.471 0.192 0.126 -0.283 0.326 -0.151 -0.0987
## 456 457 473 475 486 493
## Predicted 2.6602 2.53811 3.016 2.7310 3.05851 2.9684
## cvpred 2.6690 2.55043 3.010 2.7244 3.06124 2.9630
## log(medv) 2.6462 2.54160 3.144 2.6247 3.05400 3.0007
## CV residual -0.0229 -0.00883 0.134 -0.0997 -0.00724 0.0378
##
## Sum of squares = 1.09 Mean square = 0.02 n = 49
##
## fold 8
## Observations in test set: 49
## 4 9 26 27 29 34 73 103
## Predicted 3.39 2.548 2.6977 2.80489 2.9632 2.711 3.237 2.9339
## cvpred 3.40 2.536 2.7002 2.80756 2.9716 2.711 3.235 2.9471
## log(medv) 3.51 2.803 2.6319 2.80940 2.9124 2.573 3.127 2.9232
## CV residual 0.11 0.267 -0.0683 0.00184 -0.0592 -0.138 -0.108 -0.0239
## 113 115 128 149 169 173 174 176
## Predicted 3.0034 3.209 2.78037 2.46 3.2122 3.0596 3.325 3.4246
## cvpred 2.9933 3.202 2.77814 2.45 3.2039 3.0429 3.317 3.4195
## log(medv) 2.9339 2.918 2.78501 2.88 3.1697 3.1398 3.161 3.3810
## CV residual -0.0595 -0.285 0.00688 0.43 -0.0342 0.0969 -0.156 -0.0385
## 189 221 242 245 252 256 257 276
## Predicted 3.4891 3.434 3.1061 2.87073 3.2527 3.0720 3.577 3.4904
## cvpred 3.4818 3.446 3.1002 2.87039 3.2597 3.0704 3.583 3.4910
## log(medv) 3.3945 3.285 3.0007 2.86790 3.2108 3.0397 3.784 3.4657
## CV residual -0.0872 -0.162 -0.0995 -0.00249 -0.0488 -0.0306 0.201 -0.0252
## 281 292 305 307 325 331 343 353 364
## Predicted 3.682 3.444 3.40 3.4743 3.22363 3.170 3.220 2.9809 2.9010
## cvpred 3.686 3.451 3.41 3.4793 3.22703 3.167 3.222 2.9877 2.9132
## log(medv) 3.816 3.619 3.59 3.5086 3.21888 2.986 2.803 2.9232 2.8214
## CV residual 0.129 0.168 0.18 0.0293 -0.00816 -0.181 -0.419 -0.0645 -0.0918
## 365 371 383 395 407 408 422 429 459
## Predicted 3.591 3.502 2.562 2.768 2.308 2.869 2.800 2.564 2.7602
## cvpred 3.630 3.517 2.552 2.761 2.281 2.860 2.799 2.570 2.7656
## log(medv) 3.086 3.912 2.425 2.542 2.477 3.329 2.653 2.398 2.7014
## CV residual -0.544 0.395 -0.127 -0.219 0.196 0.469 -0.146 -0.172 -0.0643
## 470 471 480 483 495 496 500
## Predicted 2.827 2.917 2.932 3.2954 2.999 2.84 2.9099
## cvpred 2.817 2.913 2.926 3.2996 2.994 2.83 2.9014
## log(medv) 3.001 2.991 3.063 3.2189 3.199 3.14 2.8622
## CV residual 0.184 0.078 0.137 -0.0807 0.205 0.31 -0.0392
##
## Sum of squares = 1.85 Mean square = 0.04 n = 49
##
## fold 9
## Observations in test set: 49
## 36 40 49 58 60 61 62 88 91
## Predicted 3.127 3.3591 2.447 3.425 3.0245 2.8781 2.878 3.223 3.272
## cvpred 3.105 3.3459 2.495 3.419 3.0087 2.8720 2.881 3.205 3.263
## log(medv) 2.939 3.4275 2.667 3.453 2.9755 2.9285 2.773 3.100 3.118
## CV residual -0.166 0.0816 0.172 0.034 -0.0332 0.0565 -0.109 -0.105 -0.146
## 98 105 109 122 126 166 170 183 184
## Predicted 3.62282 3.0380 3.0806 2.9820 2.9753 3.1769 3.227 3.50 3.393
## cvpred 3.64980 3.0332 3.0853 2.9826 2.9766 3.1520 3.218 3.50 3.375
## log(medv) 3.65584 3.0007 2.9857 3.0106 3.0634 3.2189 3.105 3.63 3.481
## CV residual 0.00604 -0.0325 -0.0996 0.0281 0.0868 0.0669 -0.113 0.14 0.106
## 191 208 214 224 226 227 231 238 267
## Predicted 3.451 2.856 3.190 3.3450 3.711 3.6559 3.1268 3.4855 3.295
## cvpred 3.447 2.865 3.184 3.3290 3.748 3.6675 3.1070 3.4854 3.316
## log(medv) 3.611 3.114 3.336 3.4045 3.912 3.6270 3.1905 3.4500 3.424
## CV residual 0.164 0.249 0.152 0.0756 0.164 -0.0405 0.0835 -0.0354 0.108
## 279 288 290 295 349 354 358 366
## Predicted 3.3295 3.2474 3.21425 3.194 3.2517 3.178 3.0173 2.772
## cvpred 3.3162 3.2311 3.21740 3.180 3.2535 3.189 2.9838 2.637
## log(medv) 3.3707 3.1442 3.21084 3.077 3.1987 3.405 3.0773 3.314
## CV residual 0.0545 -0.0869 -0.00656 -0.103 -0.0548 0.215 0.0935 0.677
## 368 370 385 386 390 410 416 423 426 430
## Predicted 2.498 3.43 2.059 2.262 2.624 2.747 2.252 2.80 2.320 2.478
## cvpred 2.390 3.36 2.035 2.268 2.597 2.756 2.285 2.76 2.319 2.490
## log(medv) 3.140 3.91 2.175 1.974 2.442 3.314 1.974 3.03 2.116 2.251
## CV residual 0.749 0.55 0.139 -0.294 -0.154 0.558 -0.311 0.28 -0.202 -0.239
## 434 442 463 502
## Predicted 2.7248 2.733 2.9029 3.114
## cvpred 2.7126 2.736 2.8838 3.122
## log(medv) 2.6603 2.839 2.9704 3.109
## CV residual -0.0524 0.103 0.0866 -0.013
##
## Sum of squares = 2.48 Mean square = 0.05 n = 49
##
## fold 10
## Observations in test set: 49
## 16 23 28 30 32 33 54 57
## Predicted 3.0018 2.771 2.7351 3.0113 2.901 2.406 3.1665 3.1947
## cvpred 3.0046 2.773 2.7433 3.0089 2.903 2.422 3.1689 3.1922
## log(medv) 2.9907 2.721 2.6946 3.0445 2.674 2.580 3.1527 3.2068
## CV residual -0.0139 -0.052 -0.0487 0.0356 -0.229 0.159 -0.0161 0.0146
## 63 90 93 96 119 127 130 132
## Predicted 3.1516 3.4344 3.281 3.34624 2.9869 2.573 2.7305 2.975800
## cvpred 3.1477 3.4290 3.281 3.34683 2.9970 2.587 2.7384 2.976168
## log(medv) 3.1001 3.3569 3.131 3.34639 3.0155 2.754 2.6603 2.975530
## CV residual -0.0476 -0.0721 -0.149 -0.00044 0.0185 0.167 -0.0781 -0.000639
## 138 156 162 163 177 179 192 201 217
## Predicted 2.950 2.887 3.68 3.81 3.2080 3.430 3.4410 3.400 3.1838
## cvpred 2.951 2.916 3.67 3.80 3.2126 3.427 3.4376 3.393 3.1978
## log(medv) 2.839 2.747 3.91 3.91 3.1442 3.398 3.4177 3.493 3.1485
## CV residual -0.112 -0.169 0.24 0.11 -0.0684 -0.029 -0.0199 0.101 -0.0493
## 218 225 229 234 237 247 260 266 271
## Predicted 3.2868 3.669 3.577 3.637 3.345 3.035 3.5031 3.237 3.0164
## cvpred 3.2864 3.653 3.567 3.622 3.351 3.035 3.5022 3.248 3.0220
## log(medv) 3.3569 3.802 3.844 3.877 3.223 3.190 3.4045 3.127 3.0493
## CV residual 0.0705 0.149 0.277 0.255 -0.128 0.156 -0.0977 -0.121 0.0273
## 286 303 316 348 356 381 384 417 444
## Predicted 3.29 3.3074 3.020 3.1893 2.904 2.208 2.5459 2.45 2.764
## cvpred 3.29 3.3071 3.026 3.1859 2.903 2.128 2.5480 2.46 2.755
## log(medv) 3.09 3.2734 2.785 3.1398 3.025 2.342 2.5096 2.01 2.734
## CV residual -0.20 -0.0337 -0.241 -0.0461 0.122 0.214 -0.0384 -0.45 -0.021
## 474 477 478 485 492 501
## Predicted 3.13 2.8913 2.4523 2.937 2.842 2.987
## cvpred 3.12 2.8863 2.4527 2.938 2.849 2.992
## log(medv) 3.39 2.8154 2.4849 3.025 2.610 2.821
## CV residual 0.27 -0.0709 0.0322 0.087 -0.239 -0.171
##
## Sum of squares = 1.05 Mean square = 0.02 n = 49
##
## Overall (Sum over all 49 folds)
## ms
## 0.0283
## Analysis of Variance Table
##
## Response: log(medv)
## Df Sum Sq Mean Sq F value Pr(>F)
## crim 1 22.42 22.42 863.2 < 2e-16 ***
## chas 1 1.03 1.03 39.8 6.5e-10 ***
## nox 1 9.25 9.25 356.1 < 2e-16 ***
## rm 1 18.40 18.40 708.8 < 2e-16 ***
## dis 1 0.35 0.35 13.7 0.00024 ***
## ptratio 1 3.68 3.68 141.7 < 2e-16 ***
## b 1 1.62 1.62 62.4 1.9e-14 ***
## lstat 1 6.93 6.93 267.0 < 2e-16 ***
## I(lstat^2) 1 0.42 0.42 16.2 6.5e-05 ***
## Residuals 485 12.59 0.03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## fold 1
## Observations in test set: 49
## 5 24 38 44 45 59 72 80
## Predicted 3.375 2.6965 3.131 3.2188 3.1221 3.0940 3.0857 3.146
## cvpred 3.374 2.6901 3.138 3.2221 3.1227 3.0925 3.0897 3.153
## log(medv) 3.589 2.6741 3.045 3.2068 3.0540 3.1485 3.0773 3.011
## CV residual 0.215 -0.0159 -0.093 -0.0153 -0.0687 0.0559 -0.0124 -0.142
## 97 100 102 104 106 118 131 148
## Predicted 3.1563 3.499393 3.26302 2.9893 2.8873 3.159 3.016 2.508
## cvpred 3.1602 3.502685 3.26738 2.9900 2.8893 3.164 3.013 2.506
## log(medv) 3.0634 3.502550 3.27714 2.9601 2.9704 2.955 2.955 2.681
## CV residual -0.0968 -0.000135 0.00977 -0.0299 0.0811 -0.209 -0.058 0.175
## 150 157 168 190 202 213 228 275 285
## Predicted 2.7195 2.688 3.063 3.55069 3.307 3.007 3.44650 3.604 3.354
## cvpred 2.7091 2.689 3.069 3.55442 3.310 3.008 3.45040 3.619 3.347
## log(medv) 2.7344 2.573 3.170 3.55249 3.182 3.109 3.45316 3.478 3.472
## CV residual 0.0253 -0.116 0.101 -0.00193 -0.128 0.101 0.00276 -0.141 0.125
## 287 294 297 298 300 317 321 328 332
## Predicted 2.900 3.2396 3.316 2.9286 3.4537 2.8324 3.2262 2.954 2.98
## cvpred 2.888 3.2422 3.317 2.9216 3.4522 2.8258 3.2301 2.949 2.98
## log(medv) 3.001 3.1739 3.300 3.0106 3.3673 2.8792 3.1697 3.100 2.84
## CV residual 0.113 -0.0683 -0.017 0.0891 -0.0849 0.0534 -0.0604 0.151 -0.14
## 337 339 345 352 392 441 443 446 453
## Predicted 3.0085 3.094 3.3715 3.1447 2.725 2.46 2.795 2.4342 2.78578
## cvpred 3.0090 3.096 3.3722 3.1361 2.716 2.45 2.785 2.4284 2.77425
## log(medv) 2.9704 3.025 3.4404 3.1822 3.144 2.35 2.912 2.4681 2.77882
## CV residual -0.0386 -0.071 0.0682 0.0461 0.428 -0.10 0.128 0.0397 0.00457
## 468 469 472 482 491 504
## Predicted 2.685 2.662 2.98559 3.2028 2.464 3.363
## cvpred 2.681 2.655 2.98384 3.2043 2.474 3.370
## log(medv) 2.950 2.950 2.97553 3.1655 2.092 3.174
## CV residual 0.269 0.294 -0.00831 -0.0388 -0.382 -0.196
##
## Sum of squares = 0.92 Mean square = 0.02 n = 49
##
## fold 2
## Observations in test set: 50
## 6 10 11 15 20 22 39 47 65
## Predicted 3.2910 2.8843 2.854 3.000 2.9501 2.888 3.10 2.962 3.167
## cvpred 3.2891 2.8814 2.852 3.002 2.9518 2.890 3.10 2.959 3.164
## log(medv) 3.3569 2.9392 2.708 2.901 2.9014 2.976 3.21 2.996 3.497
## CV residual 0.0678 0.0577 -0.144 -0.101 -0.0504 0.086 0.11 0.037 0.333
## 66 74 75 81 87 107 120 125
## Predicted 3.385 3.2054 3.290 3.3574 3.0380 2.822 3.0124 2.8812
## cvpred 3.379 3.2023 3.287 3.3551 3.0357 2.824 3.0126 2.8832
## log(medv) 3.157 3.1527 3.182 3.3322 3.1135 2.970 2.9601 2.9339
## CV residual -0.222 -0.0496 -0.105 -0.0229 0.0778 0.147 -0.0525 0.0506
## 144 155 167 182 198 206 216 253 261
## Predicted 2.614 3.03 3.708 3.241 3.3892 3.0838 3.1661 3.3125 3.444
## cvpred 2.620 3.04 3.707 3.240 3.3813 3.0828 3.1652 3.3099 3.444
## log(medv) 2.747 2.83 3.912 3.589 3.4111 3.1179 3.2189 3.3878 3.520
## CV residual 0.128 -0.21 0.205 0.349 0.0298 0.0351 0.0537 0.0778 0.076
## 274 278 289 296 306 324 347 355
## Predicted 3.5413 3.5548 3.228 3.3762 3.2762 2.9661 2.8724 2.8319
## cvpred 3.5473 3.5591 3.223 3.3719 3.2750 2.9656 2.8694 2.8301
## log(medv) 3.5610 3.4995 3.105 3.3534 3.3464 2.9178 2.8449 2.9014
## CV residual 0.0138 -0.0596 -0.118 -0.0185 0.0714 -0.0479 -0.0245 0.0714
## 376 391 394 403 405 418 419 424 427
## Predicted 2.961 2.7300 2.839 2.713 2.155358 2.189 1.914 2.480 2.620
## cvpred 2.957 2.7330 2.841 2.715 2.139813 2.179 1.880 2.477 2.614
## log(medv) 2.708 2.7147 2.625 2.493 2.140066 2.342 2.175 2.595 2.322
## CV residual -0.249 -0.0183 -0.217 -0.222 0.000253 0.162 0.294 0.118 -0.291
## 433 439 449 458 462 488 499
## Predicted 2.884 2.2072 2.7207 2.5278 2.8790 2.95 3.0231
## cvpred 2.881 2.2041 2.7231 2.5271 2.8843 2.96 3.0255
## log(medv) 2.779 2.1282 2.6462 2.6027 2.8736 3.03 3.0540
## CV residual -0.103 -0.0759 -0.0769 0.0756 -0.0107 0.07 0.0285
##
## Sum of squares = 0.95 Mean square = 0.02 n = 50
##
## fold 3
## Observations in test set: 50
## 1 7 13 19 35 55 68 85 99
## Predicted 3.458 3.0704 2.956 2.855 2.658 2.793 3.1147 3.18014 3.653
## cvpred 3.455 3.0669 2.956 2.836 2.662 2.792 3.1069 3.17911 3.666
## log(medv) 3.178 3.1311 3.077 3.006 2.603 2.939 3.0910 3.17388 3.780
## CV residual -0.277 0.0642 0.121 0.169 -0.059 0.147 -0.0159 -0.00523 0.114
## 108 110 112 117 121 129 137 139
## Predicted 2.9879 2.9408 3.252 3.1194 2.968 2.9391 2.8214 2.716
## cvpred 2.9874 2.9438 3.254 3.1168 2.964 2.9409 2.8198 2.726
## log(medv) 3.0155 2.9653 3.127 3.0540 3.091 2.8904 2.8565 2.588
## CV residual 0.0282 0.0215 -0.127 -0.0628 0.127 -0.0505 0.0366 -0.138
## 147 152 160 165 203 230 236 239
## Predicted 2.76627 2.9240 3.268 3.1523 3.6571 3.4739 3.1347 3.318
## cvpred 2.74446 2.9023 3.253 3.1445 3.6688 3.4705 3.1298 3.316
## log(medv) 2.74727 2.9755 3.148 3.1224 3.7448 3.4500 3.1781 3.165
## CV residual 0.00281 0.0732 -0.105 -0.0221 0.0759 -0.0205 0.0482 -0.151
## 244 258 270 313 320 326 340 359
## Predicted 3.319 3.8062 3.0964 3.091 3.0222 3.2745 3.0406 2.993
## cvpred 3.314 3.8248 3.0918 3.086 3.0188 3.2686 3.0321 2.974
## log(medv) 3.165 3.9120 3.0301 2.965 3.0445 3.2027 2.9444 3.122
## CV residual -0.149 0.0872 -0.0616 -0.121 0.0257 -0.0658 -0.0876 0.148
## 360 361 367 397 400 411 420 431 448
## Predicted 2.877 3.072 2.719 2.778 2.443 2.449 2.516 2.686191 2.755
## cvpred 2.864 3.055 2.694 2.784 2.475 2.396 2.515 2.674507 2.748
## log(medv) 3.118 3.219 3.086 2.526 1.841 2.708 2.128 2.674149 2.534
## CV residual 0.254 0.164 0.392 -0.259 -0.634 0.312 -0.387 -0.000359 -0.215
## 455 460 465 467 479 481 487 497
## Predicted 2.577 2.811 2.88 2.597 2.7603 3.030 2.848 2.702
## cvpred 2.563 2.801 2.87 2.576 2.7604 3.023 2.843 2.708
## log(medv) 2.701 2.996 3.06 2.944 2.6810 3.135 2.950 2.981
## CV residual 0.138 0.194 0.19 0.368 -0.0794 0.112 0.106 0.272
##
## Sum of squares = 1.69 Mean square = 0.03 n = 50
##
## fold 4
## Observations in test set: 50
## 3 8 50 51 52 70 77 94 95
## Predicted 3.4942 2.880 2.840 2.97776 3.131 3.0899 3.1 3.329 3.175
## cvpred 3.4955 2.875 2.837 2.97531 3.131 3.0905 3.1 3.331 3.172
## log(medv) 3.5467 3.300 2.965 2.98062 3.020 3.0397 3.0 3.219 3.025
## CV residual 0.0512 0.424 0.128 0.00531 -0.111 -0.0508 -0.1 -0.112 -0.147
## 101 133 134 141 153 172 175 181 197
## Predicted 3.195 3.0401 2.8464 2.6925 3.006 3.121 3.219 3.502 3.607
## cvpred 3.191 3.0394 2.8448 2.6851 3.020 3.123 3.220 3.497 3.609
## log(medv) 3.314 3.1355 2.9124 2.6391 2.728 2.950 3.118 3.684 3.506
## CV residual 0.123 0.0961 0.0675 -0.0461 -0.292 -0.173 -0.102 0.187 -0.104
## 205 207 222 235 243 248 265 269 277
## Predicted 3.760 3.1161 2.96 3.3936 3.0911 2.9962 3.503 3.7322 3.5333
## cvpred 3.758 3.1139 2.96 3.3948 3.0872 2.9943 3.504 3.7370 3.5351
## log(medv) 3.912 3.1946 3.08 3.3673 3.1001 3.0204 3.597 3.7728 3.5025
## CV residual 0.154 0.0807 0.12 -0.0276 0.0129 0.0261 0.093 0.0358 -0.0325
## 283 309 315 318 319 327 341 344
## Predicted 3.766 3.411 3.2110 2.885 3.1525 3.2181 3.054 3.281
## cvpred 3.770 3.416 3.2108 2.883 3.1520 3.2217 3.056 3.282
## log(medv) 3.829 3.127 3.1697 2.986 3.1398 3.1355 2.929 3.174
## CV residual 0.059 -0.289 -0.0412 0.103 -0.0122 -0.0862 -0.127 -0.108
## 346 362 363 377 378 399 406 414 415
## Predicted 2.968 2.839 2.885 2.6417 2.768 2.161 2.058 2.369 1.762
## cvpred 2.968 2.844 2.898 2.6563 2.777 2.220 2.163 2.399 1.818
## log(medv) 2.862 2.991 3.035 2.6319 2.588 1.609 1.609 2.791 1.946
## CV residual -0.105 0.147 0.137 -0.0244 -0.189 -0.611 -0.554 0.392 0.128
## 435 450 452 461 466 498
## Predicted 2.628 2.69 2.800 2.7915 2.807 2.9397
## cvpred 2.629 2.70 2.802 2.7860 2.809 2.9390
## log(medv) 2.460 2.56 2.721 2.7973 2.991 2.9069
## CV residual -0.169 -0.13 -0.081 0.0113 0.182 -0.0321
##
## Sum of squares = 1.68 Mean square = 0.03 n = 50
##
## fold 5
## Observations in test set: 50
## 2 17 21 41 42 48 56 64
## Predicted 3.205 3.11395 2.6397 3.5513 3.385 2.8346 3.359 3.07
## cvpred 3.209 3.13224 2.6394 3.5642 3.392 2.8243 3.352 3.06
## log(medv) 3.073 3.13983 2.6101 3.5525 3.281 2.8094 3.567 3.22
## CV residual -0.137 0.00759 -0.0293 -0.0117 -0.111 -0.0149 0.214 0.16
## 78 86 92 151 159 185 188 195
## Predicted 3.148 3.3350 3.298 2.9983 3.380 3.007 3.4793 3.4544
## cvpred 3.156 3.3442 3.309 3.0049 3.403 3.017 3.4869 3.4626
## log(medv) 3.035 3.2809 3.091 3.0681 3.190 3.273 3.4657 3.3707
## CV residual -0.121 -0.0633 -0.218 0.0631 -0.213 0.257 -0.0211 -0.0919
## 204 209 212 249 250 255 259 262 264
## Predicted 3.700 3.053 2.762 3.052 3.2054 3.1797 3.5283 3.565 3.4008
## cvpred 3.698 3.036 2.742 3.048 3.2037 3.1794 3.5318 3.567 3.3965
## log(medv) 3.882 3.195 2.960 3.199 3.2658 3.0865 3.5835 3.764 3.4340
## CV residual 0.184 0.158 0.219 0.151 0.0621 -0.0929 0.0517 0.197 0.0375
## 272 280 291 299 308 310 311 312
## Predicted 3.2760 3.54963 3.466 3.358 3.3652 3.133 2.902 3.31
## cvpred 3.2896 3.56006 3.479 3.362 3.3734 3.147 2.925 3.34
## log(medv) 3.2268 3.55820 3.350 3.114 3.3393 3.011 2.779 3.10
## CV residual -0.0627 -0.00186 -0.129 -0.249 -0.0341 -0.136 -0.146 -0.24
## 314 329 333 338 350 379 382 393 404
## Predicted 3.237 3.124 3.17 2.9696 3.2040 2.5394 2.682 2.414 2.470
## cvpred 3.251 3.131 3.17 2.9756 3.1987 2.5411 2.681 2.423 2.487
## log(medv) 3.073 2.960 2.97 2.9178 3.2809 2.5726 2.389 2.272 2.116
## CV residual -0.178 -0.171 -0.21 -0.0579 0.0822 0.0315 -0.293 -0.151 -0.371
## 428 437 440 447 451 484 503 505
## Predicted 2.4281 2.534 2.54047 2.7325 2.6424 2.9819 3.124 3.297
## cvpred 2.4552 2.543 2.54506 2.7351 2.6464 2.9953 3.141 3.312
## log(medv) 2.3888 2.262 2.54945 2.7014 2.5953 3.0819 3.025 3.091
## CV residual -0.0664 -0.282 0.00439 -0.0337 -0.0511 0.0866 -0.115 -0.221
##
## Sum of squares = 1.18 Mean square = 0.02 n = 50
##
## fold 6
## Observations in test set: 50
## 18 31 37 43 53 71 76 79
## Predicted 2.85363 2.5978 3.0491 3.272 3.338 3.2616 3.196 3.05428
## cvpred 2.85452 2.5927 3.0585 3.280 3.340 3.2665 3.201 3.05594
## log(medv) 2.86220 2.5416 2.9957 3.231 3.219 3.1864 3.063 3.05400
## CV residual 0.00768 -0.0511 -0.0627 -0.049 -0.121 -0.0801 -0.138 -0.00194
## 82 84 89 114 116 123 124 142 143
## Predicted 3.2684 3.2000 3.464 2.9636 2.961 2.878 2.694 2.395 2.698
## cvpred 3.2696 3.2074 3.463 2.9641 2.964 2.879 2.687 2.388 2.697
## log(medv) 3.1739 3.1311 3.161 2.9285 2.907 3.020 2.851 2.667 2.595
## CV residual -0.0957 -0.0762 -0.302 -0.0356 -0.057 0.142 0.164 0.279 -0.101
## 154 164 171 187 199 211 220 232 233
## Predicted 2.840 3.8757 3.026 3.61 3.5023 2.9840 3.301 3.5047 3.7286
## cvpred 2.840 3.8612 3.031 3.60 3.4937 2.9882 3.307 3.4975 3.7104
## log(medv) 2.965 3.9120 2.856 3.91 3.5439 3.0773 3.135 3.4563 3.7305
## CV residual 0.125 0.0508 -0.174 0.31 0.0501 0.0891 -0.172 -0.0412 0.0201
## 246 254 263 268 273 302 304 322 323
## Predicted 2.676 3.474 3.718 3.676 3.2881 3.262 3.4833 3.2326 3.152
## cvpred 2.675 3.448 3.697 3.656 3.2918 3.262 3.4821 3.2374 3.160
## log(medv) 2.918 3.757 3.888 3.912 3.1946 3.091 3.4995 3.1398 3.016
## CV residual 0.243 0.308 0.191 0.256 -0.0972 -0.171 0.0174 -0.0976 -0.145
## 330 336 357 380 388 412 421 432 436
## Predicted 3.27 3.065 2.7979 2.612 2.223 2.575 2.797 2.6756 2.492
## cvpred 3.28 3.070 2.7967 2.610 2.226 2.558 2.793 2.6569 2.469
## log(medv) 3.12 3.049 2.8792 2.322 2.001 2.845 2.815 2.6462 2.595
## CV residual -0.16 -0.021 0.0825 -0.288 -0.225 0.287 0.022 -0.0108 0.126
## 445 454 464 476 489 494
## Predicted 2.4419 2.898749 3.01081 2.6256 2.7860 3.0162
## cvpred 2.4342 2.878330 3.00950 2.6153 2.7936 3.0270
## log(medv) 2.3795 2.879198 3.00568 2.5878 2.7213 3.0819
## CV residual -0.0546 0.000869 -0.00382 -0.0276 -0.0723 0.0549
##
## Sum of squares = 1.08 Mean square = 0.02 n = 50
##
## fold 7
## Observations in test set: 49
## 12 14 25 46 67 69 83 111
## Predicted 3.0179 3.0528 2.7961 3.078 3.113 2.908 3.2495 3.0084
## cvpred 3.0263 3.0561 2.7950 3.081 3.119 2.911 3.2495 3.0047
## log(medv) 2.9392 3.0155 2.7473 2.960 2.965 2.856 3.2108 3.0773
## CV residual -0.0871 -0.0406 -0.0477 -0.121 -0.153 -0.055 -0.0387 0.0726
## 135 136 140 145 146 158 161 178 180
## Predicted 2.7127 2.8678 2.8227 2.5120 2.581 3.561 3.520 3.377 3.502
## cvpred 2.7136 2.8622 2.8137 2.4859 2.568 3.559 3.520 3.376 3.495
## log(medv) 2.7473 2.8959 2.8792 2.4681 2.625 3.721 3.296 3.203 3.616
## CV residual 0.0337 0.0338 0.0655 -0.0178 0.057 0.162 -0.224 -0.173 0.121
## 186 193 194 196 200 210 219 223 240
## Predicted 3.087 3.5774 3.4597 3.724 3.403 2.765 3.0414 3.3716 3.297
## cvpred 3.085 3.5772 3.4603 3.720 3.407 2.748 3.0353 3.3697 3.301
## log(medv) 3.388 3.5946 3.4372 3.912 3.552 2.996 3.0681 3.3142 3.148
## CV residual 0.303 0.0174 -0.0231 0.192 0.146 0.247 0.0327 -0.0555 -0.152
## 241 251 282 284 293 301 334 335
## Predicted 3.1869 3.2210 3.5349 3.8372 3.3816 3.389 3.1669 3.124
## cvpred 3.1908 3.2259 3.5340 3.8329 3.3779 3.395 3.1725 3.131
## log(medv) 3.0910 3.1946 3.5667 3.9120 3.3286 3.211 3.1001 3.030
## CV residual -0.0997 -0.0313 0.0327 0.0791 -0.0493 -0.185 -0.0724 -0.101
## 342 351 374 387 389 396 409 425 438
## Predicted 3.447 3.149 2.312 2.219 2.2853 2.817 2.533 2.563 2.296
## cvpred 3.451 3.157 2.252 2.193 2.2485 2.814 2.508 2.574 2.290
## log(medv) 3.487 3.131 2.625 2.351 2.3224 2.573 2.845 2.460 2.163
## CV residual 0.036 -0.026 0.373 0.159 0.0739 -0.242 0.337 -0.115 -0.127
## 456 457 473 475 486 493
## Predicted 2.633363 2.51941 2.951 2.6765 3.0125 2.9782
## cvpred 2.645220 2.53263 2.950 2.6733 3.0166 2.9766
## log(medv) 2.646175 2.54160 3.144 2.6247 3.0540 3.0007
## CV residual 0.000955 0.00898 0.194 -0.0486 0.0374 0.0241
##
## Sum of squares = 0.92 Mean square = 0.02 n = 49
##
## fold 8
## Observations in test set: 49
## 4 9 26 27 29 34 73 103
## Predicted 3.4541 2.587 2.7047 2.80794 2.9668 2.716 3.269 2.962
## cvpred 3.4579 2.571 2.7068 2.81056 2.9757 2.715 3.263 2.974
## log(medv) 3.5086 2.803 2.6319 2.80940 2.9124 2.573 3.127 2.923
## CV residual 0.0507 0.232 -0.0749 -0.00116 -0.0633 -0.142 -0.136 -0.051
## 113 115 128 149 169 173 174 176
## Predicted 2.9740 3.198 2.79680 2.540 3.1928 3.02 3.312 3.4481
## cvpred 2.9662 3.193 2.79236 2.518 3.1870 3.01 3.305 3.4402
## log(medv) 2.9339 2.918 2.78501 2.879 3.1697 3.14 3.161 3.3810
## CV residual -0.0323 -0.275 -0.00735 0.361 -0.0173 0.13 -0.143 -0.0592
## 189 221 242 245 252 256 257 276
## Predicted 3.500 3.407 3.0568 2.8417 3.2989 3.0527 3.605 3.5428
## cvpred 3.492 3.421 3.0561 2.8445 3.3013 3.0542 3.610 3.5379
## log(medv) 3.395 3.285 3.0007 2.8679 3.2108 3.0397 3.784 3.4657
## CV residual -0.097 -0.136 -0.0554 0.0234 -0.0905 -0.0145 0.175 -0.0721
## 281 292 305 307 325 331 343 353 364
## Predicted 3.692 3.491 3.402 3.4784 3.258 3.16 3.218 2.9814 2.8721
## cvpred 3.696 3.495 3.408 3.4833 3.258 3.16 3.221 2.9894 2.8848
## log(medv) 3.816 3.619 3.586 3.5086 3.219 2.99 2.803 2.9232 2.8214
## CV residual 0.119 0.124 0.179 0.0252 -0.039 -0.17 -0.418 -0.0662 -0.0634
## 365 371 383 395 407 408 422 429 459
## Predicted 3.60 3.539 2.545 2.726 2.300 2.839 2.763 2.543 2.7228
## cvpred 3.64 3.548 2.533 2.721 2.269 2.830 2.765 2.551 2.7313
## log(medv) 3.09 3.912 2.425 2.542 2.477 3.329 2.653 2.398 2.7014
## CV residual -0.55 0.364 -0.108 -0.179 0.208 0.498 -0.111 -0.153 -0.0299
## 470 471 480 483 495 496 500
## Predicted 2.771 2.851 2.884 3.2662 2.991 2.83 2.9010
## cvpred 2.763 2.851 2.880 3.2716 2.986 2.82 2.8925
## log(medv) 3.001 2.991 3.063 3.2189 3.199 3.14 2.8622
## CV residual 0.238 0.139 0.183 -0.0527 0.212 0.32 -0.0303
##
## Sum of squares = 1.78 Mean square = 0.04 n = 49
##
## fold 9
## Observations in test set: 49
## 36 40 49 58 60 61 62 88 91
## Predicted 3.130 3.400 2.515 3.4354 3.017 2.8487 2.8433 3.228 3.276
## cvpred 3.108 3.385 2.562 3.4355 3.005 2.8467 2.8505 3.209 3.264
## log(medv) 2.939 3.428 2.667 3.4532 2.976 2.9285 2.7726 3.100 3.118
## CV residual -0.168 0.043 0.106 0.0177 -0.029 0.0819 -0.0779 -0.109 -0.146
## 98 105 109 122 126 166 170 183
## Predicted 3.6580 3.0396 3.0799 2.9844 2.9762 3.1715 3.2035 3.535
## cvpred 3.6765 3.0289 3.0776 2.9743 2.9672 3.1444 3.1933 3.527
## log(medv) 3.6558 3.0007 2.9857 3.0106 3.0634 3.2189 3.1046 3.635
## CV residual -0.0206 -0.0281 -0.0919 0.0363 0.0962 0.0745 -0.0887 0.108
## 184 191 208 214 224 226 227 231 238
## Predicted 3.4243 3.452 2.831 3.190 3.3388 3.722 3.6920 3.096 3.5055
## cvpred 3.4035 3.453 2.839 3.180 3.3277 3.756 3.7027 3.083 3.5064
## log(medv) 3.4812 3.611 3.114 3.336 3.4045 3.912 3.6270 3.190 3.4500
## CV residual 0.0777 0.158 0.275 0.155 0.0768 0.156 -0.0757 0.108 -0.0564
## 267 279 288 290 295 349 354 358 366
## Predicted 3.24 3.3332 3.2322 3.1765 3.1604 3.2597 3.196 2.985 2.801
## cvpred 3.26 3.3214 3.2242 3.1866 3.1530 3.2609 3.206 2.962 2.688
## log(medv) 3.42 3.3707 3.1442 3.2108 3.0773 3.1987 3.405 3.077 3.314
## CV residual 0.16 0.0494 -0.0801 0.0242 -0.0756 -0.0623 0.199 0.115 0.626
## 368 370 385 386 390 410 416 423 426
## Predicted 2.485 3.460 2.1362 2.327 2.592 2.700 2.303 2.759 2.328
## cvpred 2.395 3.408 2.1212 2.342 2.581 2.719 2.335 2.729 2.329
## log(medv) 3.140 3.912 2.1748 1.974 2.442 3.314 1.974 3.035 2.116
## CV residual 0.744 0.504 0.0536 -0.368 -0.139 0.595 -0.361 0.306 -0.213
## 430 434 442 463 502
## Predicted 2.474 2.6957 2.697 2.862 3.1508
## cvpred 2.492 2.6901 2.708 2.855 3.1404
## log(medv) 2.251 2.6603 2.839 2.970 3.1091
## CV residual -0.241 -0.0298 0.131 0.116 -0.0314
##
## Sum of squares = 2.48 Mean square = 0.05 n = 49
##
## fold 10
## Observations in test set: 49
## 16 23 28 30 32 33 54 57
## Predicted 3.0384 2.770 2.7383 3.0180 2.908 2.4745 3.15424 3.21154
## cvpred 3.0396 2.773 2.7482 3.0170 2.911 2.4885 3.15713 3.20868
## log(medv) 2.9907 2.721 2.6946 3.0445 2.674 2.5802 3.15274 3.20680
## CV residual -0.0489 -0.052 -0.0536 0.0275 -0.237 0.0917 -0.00439 -0.00188
## 63 90 93 96 119 127 130 132
## Predicted 3.1628 3.464 3.283 3.3661 2.9621 2.631 2.7483 2.9985
## cvpred 3.1593 3.458 3.283 3.3663 2.9738 2.642 2.7555 2.9987
## log(medv) 3.1001 3.357 3.131 3.3464 3.0155 2.754 2.6603 2.9755
## CV residual -0.0592 -0.101 -0.152 -0.0199 0.0417 0.112 -0.0952 -0.0232
## 138 156 162 163 177 179 192 201
## Predicted 2.963 2.907 3.748 3.8679 3.1898 3.4328 3.4481 3.4216
## cvpred 2.965 2.938 3.737 3.8613 3.1949 3.4304 3.4444 3.4143
## log(medv) 2.839 2.747 3.912 3.9120 3.1442 3.3979 3.4177 3.4935
## CV residual -0.126 -0.191 0.175 0.0507 -0.0508 -0.0325 -0.0266 0.0792
## 217 218 225 229 234 237 247 260 266
## Predicted 3.1525 3.2715 3.689 3.606 3.66 3.322 3.022 3.5017 3.2159
## cvpred 3.1683 3.2724 3.675 3.595 3.65 3.329 3.023 3.5006 3.2257
## log(medv) 3.1485 3.3569 3.802 3.844 3.88 3.223 3.190 3.4045 3.1268
## CV residual -0.0198 0.0845 0.127 0.249 0.23 -0.106 0.167 -0.0961 -0.0989
## 271 286 303 316 348 356 381 384
## Predicted 2.9978 3.268 3.27430 3.017 3.1998 2.9567 2.187 2.5356
## cvpred 3.0044 3.269 3.27595 3.023 3.1967 2.9544 2.098 2.5367
## log(medv) 3.0493 3.091 3.27336 2.785 3.1398 3.0253 2.342 2.5096
## CV residual 0.0448 -0.178 -0.00258 -0.238 -0.0569 0.0709 0.244 -0.0271
## 417 444 474 477 478 485 492 501
## Predicted 2.463 2.7253 3.082 2.830 2.4388 2.88 2.847 2.977
## cvpred 2.478 2.7189 3.077 2.828 2.4375 2.89 2.854 2.982
## log(medv) 2.015 2.7344 3.395 2.815 2.4849 3.03 2.610 2.821
## CV residual -0.463 0.0155 0.318 -0.013 0.0475 0.14 -0.243 -0.161
##
## Sum of squares = 1.01 Mean square = 0.02 n = 49
##
## Overall (Sum over all 49 folds)
## ms
## 0.0276
## Analysis of Variance Table
##
## Response: log(medv)
## Df Sum Sq Mean Sq F value Pr(>F)
## crim 1 22.42 22.42 863.08 < 2e-16 ***
## chas 1 1.03 1.03 39.74 6.5e-10 ***
## nox 1 9.25 9.25 356.02 < 2e-16 ***
## rm 1 18.40 18.40 708.67 < 2e-16 ***
## dis 1 0.35 0.35 13.66 0.00024 ***
## ptratio 1 3.68 3.68 141.63 < 2e-16 ***
## b 1 1.62 1.62 62.41 1.9e-14 ***
## lstat 1 6.93 6.93 267.00 < 2e-16 ***
## I(lstat^2) 1 0.42 0.42 16.24 6.5e-05 ***
## I(lstat^3) 1 0.02 0.02 0.93 0.33438
## Residuals 484 12.57 0.03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## fold 1
## Observations in test set: 49
## 5 24 38 44 45 59 72 80
## Predicted 3.375 2.7051 3.1281 3.2170 3.1179 3.0933 3.08201 3.14
## cvpred 3.374 2.6969 3.1354 3.2204 3.1192 3.0919 3.08649 3.15
## log(medv) 3.589 2.6741 3.0445 3.2068 3.0540 3.1485 3.07731 3.01
## CV residual 0.215 -0.0227 -0.0909 -0.0136 -0.0652 0.0565 -0.00918 -0.14
## 97 100 102 104 106 118 131 148
## Predicted 3.1518 3.49564 3.2578 2.9861 2.890 3.155 3.010 2.511
## cvpred 3.1563 3.49941 3.2630 2.9873 2.891 3.160 3.008 2.510
## log(medv) 3.0634 3.50255 3.2771 2.9601 2.970 2.955 2.955 2.681
## CV residual -0.0929 0.00314 0.0142 -0.0272 0.079 -0.205 -0.053 0.171
## 150 157 168 190 202 213 228 275 285
## Predicted 2.7298 2.691 3.060 3.55107 3.305 3.008 3.4430 3.614 3.348
## cvpred 2.7179 2.691 3.066 3.55449 3.309 3.009 3.4475 3.626 3.341
## log(medv) 2.7344 2.573 3.170 3.55249 3.182 3.109 3.4532 3.478 3.472
## CV residual 0.0165 -0.119 0.104 -0.00201 -0.126 0.101 0.0057 -0.148 0.131
## 287 294 297 298 300 317 321 328
## Predicted 2.895 3.2364 3.3130 2.9309 3.4572 2.8381 3.2237 2.950
## cvpred 2.884 3.2395 3.3138 2.9233 3.4549 2.8303 3.2279 2.946
## log(medv) 3.001 3.1739 3.2995 3.0106 3.3673 2.8792 3.1697 3.100
## CV residual 0.117 -0.0656 -0.0143 0.0873 -0.0876 0.0489 -0.0582 0.155
## 332 337 339 345 352 392 441 443 446
## Predicted 2.981 3.0047 3.0903 3.3760 3.1463 2.730 2.474 2.795 2.4409
## cvpred 2.978 3.0058 3.0934 3.3757 3.1374 2.720 2.460 2.785 2.4340
## log(medv) 2.839 2.9704 3.0253 3.4404 3.1822 3.144 2.351 2.912 2.4681
## CV residual -0.138 -0.0354 -0.0681 0.0647 0.0448 0.424 -0.109 0.127 0.0341
## 453 468 469 472 482 491 504
## Predicted 2.78667 2.695 2.667 2.98112 3.1973 2.467 3.363
## cvpred 2.77519 2.688 2.659 2.98014 3.1997 2.477 3.369
## log(medv) 2.77882 2.950 2.950 2.97553 3.1655 2.092 3.174
## CV residual 0.00362 0.261 0.291 -0.00461 -0.0343 -0.385 -0.196
##
## Sum of squares = 0.91 Mean square = 0.02 n = 49
##
## fold 2
## Observations in test set: 50
## 6 10 11 15 20 22 39 47 65
## Predicted 3.2951 2.8871 2.861 2.9951 2.9469 2.886 3.093 2.9613 3.16
## cvpred 3.2925 2.8841 2.857 2.9975 2.9490 2.888 3.093 2.9585 3.16
## log(medv) 3.3569 2.9392 2.708 2.9014 2.9014 2.976 3.207 2.9957 3.50
## CV residual 0.0644 0.0551 -0.149 -0.0961 -0.0476 0.088 0.113 0.0372 0.34
## 66 74 75 81 87 107 120 125
## Predicted 3.393 3.2033 3.290 3.3601 3.0350 2.829 3.0114 2.8858
## cvpred 3.386 3.2005 3.287 3.3571 3.0333 2.830 3.0118 2.8871
## log(medv) 3.157 3.1527 3.182 3.3322 3.1135 2.970 2.9601 2.9339
## CV residual -0.229 -0.0478 -0.105 -0.0249 0.0802 0.141 -0.0517 0.0468
## 144 155 167 182 198 206 216 253 261
## Predicted 2.625 3.025 3.712 3.237 3.3811 3.0802 3.1615 3.3224 3.4341
## cvpred 2.628 3.039 3.711 3.236 3.3745 3.0798 3.1612 3.3180 3.4359
## log(medv) 2.747 2.833 3.912 3.589 3.4111 3.1179 3.2189 3.3878 3.5205
## CV residual 0.119 -0.206 0.201 0.353 0.0366 0.0382 0.0576 0.0697 0.0845
## 274 278 289 296 306 324 347 355
## Predicted 3.5326 3.5604 3.225 3.3755 3.2702 2.9633 2.8690 2.8305
## cvpred 3.5393 3.5636 3.220 3.3713 3.2698 2.9633 2.8666 2.8288
## log(medv) 3.5610 3.4995 3.105 3.3534 3.3464 2.9178 2.8449 2.9014
## CV residual 0.0217 -0.0641 -0.116 -0.0179 0.0766 -0.0456 -0.0217 0.0727
## 376 391 394 403 405 418 419 424 427
## Predicted 2.951 2.7336 2.837 2.719 2.16288 2.200 1.920 2.489 2.620
## cvpred 2.949 2.7362 2.840 2.720 2.14691 2.188 1.887 2.484 2.613
## log(medv) 2.708 2.7147 2.625 2.493 2.14007 2.342 2.175 2.595 2.322
## CV residual -0.241 -0.0215 -0.215 -0.226 -0.00684 0.154 0.288 0.111 -0.291
## 433 439 449 458 462 488 499
## Predicted 2.876 2.172 2.7237 2.5285 2.87502 2.9504 3.0196
## cvpred 2.875 2.173 2.7257 2.5273 2.88080 2.9517 3.0225
## log(medv) 2.779 2.128 2.6462 2.6027 2.87356 3.0253 3.0540
## CV residual -0.096 -0.045 -0.0795 0.0754 -0.00723 0.0736 0.0315
##
## Sum of squares = 0.94 Mean square = 0.02 n = 50
##
## fold 3
## Observations in test set: 50
## 1 7 13 19 35 55 68 85 99
## Predicted 3.462 3.0666 2.96 2.852 2.6656 2.793 3.113 3.17450 3.659
## cvpred 3.461 3.0617 2.96 2.833 2.6719 2.792 3.105 3.17192 3.675
## log(medv) 3.178 3.1311 3.08 3.006 2.6027 2.939 3.091 3.17388 3.780
## CV residual -0.283 0.0694 0.12 0.172 -0.0692 0.147 -0.014 0.00196 0.105
## 108 110 112 117 121 129 137 139
## Predicted 2.9857 2.9403 3.245 3.1147 2.967 2.9368 2.8240 2.726
## cvpred 2.9847 2.9433 3.244 3.1106 2.963 2.9378 2.8231 2.739
## log(medv) 3.0155 2.9653 3.127 3.0540 3.091 2.8904 2.8565 2.588
## CV residual 0.0309 0.0219 -0.117 -0.0566 0.128 -0.0474 0.0333 -0.151
## 147 152 160 165 203 230 236 239
## Predicted 2.76816 2.9223 3.2632 3.1487 3.6674 3.4851 3.1301 3.32
## cvpred 2.74609 2.8989 3.2456 3.1395 3.6825 3.4849 3.1237 3.32
## log(medv) 2.74727 2.9755 3.1485 3.1224 3.7448 3.4500 3.1781 3.17
## CV residual 0.00118 0.0767 -0.0972 -0.0171 0.0623 -0.0349 0.0543 -0.15
## 244 258 270 313 320 326 340 359
## Predicted 3.323 3.7997 3.0928 3.087 3.0181 3.2790 3.0363 2.984
## cvpred 3.320 3.8162 3.0868 3.080 3.0134 3.2743 3.0263 2.962
## log(medv) 3.165 3.9120 3.0301 2.965 3.0445 3.2027 2.9444 3.122
## CV residual -0.155 0.0958 -0.0567 -0.115 0.0312 -0.0716 -0.0819 0.161
## 360 361 367 397 400 411 420 431 448
## Predicted 2.871 3.066 2.72 2.783 2.440 2.44 2.521 2.68738 2.754
## cvpred 2.855 3.047 2.70 2.790 2.472 2.39 2.522 2.67648 2.747
## log(medv) 3.118 3.219 3.09 2.526 1.841 2.71 2.128 2.67415 2.534
## CV residual 0.262 0.171 0.39 -0.264 -0.631 0.32 -0.394 -0.00233 -0.213
## 455 460 465 467 479 481 487 497
## Predicted 2.577 2.808 2.878 2.599 2.7636 3.02 2.846 2.715
## cvpred 2.563 2.797 2.867 2.579 2.7645 3.02 2.841 2.724
## log(medv) 2.701 2.996 3.063 2.944 2.6810 3.14 2.950 2.981
## CV residual 0.138 0.199 0.197 0.366 -0.0835 0.12 0.109 0.256
##
## Sum of squares = 1.71 Mean square = 0.03 n = 50
##
## fold 4
## Observations in test set: 50
## 3 8 50 51 52 70 77 94 95
## Predicted 3.5007 2.885 2.844 2.97542 3.127 3.0874 3.0976 3.331 3.17
## cvpred 3.5051 2.887 2.844 2.97265 3.124 3.0858 3.0892 3.333 3.16
## log(medv) 3.5467 3.300 2.965 2.98062 3.020 3.0397 2.9957 3.219 3.03
## CV residual 0.0417 0.413 0.121 0.00796 -0.104 -0.0461 -0.0935 -0.114 -0.14
## 101 133 134 141 153 172 175 181 197
## Predicted 3.187 3.033 2.8463 2.7026 3.003 3.117 3.2162 3.49 3.613
## cvpred 3.180 3.029 2.8455 2.7014 3.014 3.117 3.2148 3.48 3.619
## log(medv) 3.314 3.135 2.9124 2.6391 2.728 2.950 3.1179 3.68 3.506
## CV residual 0.134 0.106 0.0668 -0.0624 -0.286 -0.168 -0.0969 0.20 -0.113
## 205 207 222 235 243 248 265 269 277
## Predicted 3.770 3.1103 2.968 3.3859 3.0851 2.990 3.495 3.7426 3.5284
## cvpred 3.774 3.1054 2.970 3.3826 3.0785 2.985 3.493 3.7532 3.5274
## log(medv) 3.912 3.1946 3.077 3.3673 3.1001 3.020 3.597 3.7728 3.5025
## CV residual 0.138 0.0892 0.107 -0.0153 0.0216 0.035 0.104 0.0196 -0.0249
## 283 309 315 318 319 327 341 344
## Predicted 3.7745 3.417 3.2044 2.8869 3.14621 3.2190 3.051 3.277
## cvpred 3.7828 3.425 3.2010 2.8870 3.14275 3.2220 3.049 3.276
## log(medv) 3.8286 3.127 3.1697 2.9857 3.13983 3.1355 2.929 3.174
## CV residual 0.0459 -0.298 -0.0313 0.0987 -0.00292 -0.0865 -0.121 -0.102
## 346 362 363 377 378 399 406 414 415
## Predicted 2.9635 2.835 2.882 2.649 2.773 2.156 2.068 2.380 1.70
## cvpred 2.9601 2.838 2.892 2.668 2.786 2.205 2.175 2.414 1.70
## log(medv) 2.8622 2.991 3.035 2.632 2.588 1.609 1.609 2.791 1.95
## CV residual -0.0979 0.153 0.142 -0.036 -0.199 -0.596 -0.566 0.378 0.25
## 435 450 452 461 466 498
## Predicted 2.625 2.697 2.8002 2.7884 2.806 2.9387
## cvpred 2.622 2.702 2.8040 2.7824 2.806 2.9382
## log(medv) 2.460 2.565 2.7213 2.7973 2.991 2.9069
## CV residual -0.162 -0.137 -0.0827 0.0149 0.184 -0.0313
##
## Sum of squares = 1.7 Mean square = 0.03 n = 50
##
## fold 5
## Observations in test set: 50
## 2 17 21 41 42 48 56 64
## Predicted 3.200 3.11509 2.6510 3.5726 3.389 2.8409 3.361 3.060
## cvpred 3.201 3.13459 2.6573 3.5971 3.399 2.8340 3.355 3.047
## log(medv) 3.073 3.13983 2.6101 3.5525 3.281 2.8094 3.567 3.219
## CV residual -0.128 0.00524 -0.0472 -0.0447 -0.118 -0.0246 0.211 0.172
## 78 86 92 151 159 185 188 195 204
## Predicted 3.144 3.3337 3.294 2.9942 3.381 3.007 3.4772 3.462 3.705
## cvpred 3.150 3.3426 3.303 2.9974 3.405 3.017 3.4837 3.475 3.705
## log(medv) 3.035 3.2809 3.091 3.0681 3.190 3.273 3.4657 3.371 3.882
## CV residual -0.115 -0.0617 -0.212 0.0706 -0.214 0.256 -0.0179 -0.104 0.177
## 209 212 249 250 255 259 262 264 272
## Predicted 3.050 2.774 3.046 3.2030 3.1804 3.5200 3.56 3.3899 3.2764
## cvpred 3.032 2.762 3.038 3.2004 3.1807 3.5181 3.55 3.3787 3.2905
## log(medv) 3.195 2.960 3.199 3.2658 3.0865 3.5835 3.76 3.4340 3.2268
## CV residual 0.163 0.198 0.161 0.0654 -0.0942 0.0655 0.21 0.0553 -0.0637
## 280 291 299 308 310 311 312 314
## Predicted 3.5543 3.479 3.364 3.3604 3.13 2.903 3.317 3.234
## cvpred 3.5671 3.499 3.371 3.3662 3.14 2.927 3.340 3.246
## log(medv) 3.5582 3.350 3.114 3.3393 3.01 2.779 3.096 3.073
## CV residual -0.0089 -0.149 -0.258 -0.0269 -0.13 -0.148 -0.245 -0.174
## 329 333 338 350 379 382 393 404
## Predicted 3.120 3.167 2.9653 3.2026 2.5478 2.688 2.428 2.480
## cvpred 3.126 3.172 2.9694 3.1968 2.5547 2.691 2.446 2.503
## log(medv) 2.960 2.965 2.9178 3.2809 2.5726 2.389 2.272 2.116
## CV residual -0.166 -0.207 -0.0516 0.0841 0.0179 -0.302 -0.174 -0.386
## 428 437 440 447 451 484 503 505
## Predicted 2.4232 2.53 2.5521 2.7338 2.6401 2.9784 3.119 3.295
## cvpred 2.4470 2.54 2.5632 2.7365 2.6416 2.9904 3.134 3.308
## log(medv) 2.3888 2.26 2.5494 2.7014 2.5953 3.0819 3.025 3.091
## CV residual -0.0582 -0.28 -0.0138 -0.0352 -0.0464 0.0915 -0.109 -0.217
##
## Sum of squares = 1.21 Mean square = 0.02 n = 50
##
## fold 6
## Observations in test set: 50
## 18 31 37 43 53 71 76 79
## Predicted 2.852 2.6097 3.0458 3.2747 3.341 3.2608 3.191 3.04973
## cvpred 2.853 2.6084 3.0541 3.2835 3.345 3.2652 3.195 3.05034
## log(medv) 2.862 2.5416 2.9957 3.2308 3.219 3.1864 3.063 3.05400
## CV residual 0.009 -0.0668 -0.0584 -0.0527 -0.126 -0.0788 -0.132 0.00366
## 82 84 89 114 116 123 124 142
## Predicted 3.2652 3.1983 3.464 2.9666 2.9618 2.883 2.706 2.363
## cvpred 3.2653 3.2050 3.463 2.9681 2.9657 2.885 2.701 2.343
## log(medv) 3.1739 3.1311 3.161 2.9285 2.9069 3.020 2.851 2.667
## CV residual -0.0915 -0.0739 -0.302 -0.0396 -0.0588 0.136 0.149 0.324
## 143 154 164 171 187 199 211 220 232
## Predicted 2.706 2.841 3.8784 3.025 3.615 3.4977 2.986 3.293 3.50
## cvpred 2.706 2.841 3.8644 3.030 3.604 3.4880 2.991 3.297 3.50
## log(medv) 2.595 2.965 3.9120 2.856 3.912 3.5439 3.077 3.135 3.46
## CV residual -0.111 0.125 0.0476 -0.173 0.309 0.0559 0.086 -0.161 -0.04
## 233 246 254 263 268 273 302 304 322
## Predicted 3.73953 2.683 3.477 3.710 3.665 3.2843 3.256 3.4868 3.2312
## cvpred 3.72405 2.685 3.453 3.686 3.641 3.2865 3.254 3.4862 3.2352
## log(medv) 3.73050 2.918 3.757 3.888 3.912 3.1946 3.091 3.4995 3.1398
## CV residual 0.00645 0.233 0.304 0.202 0.271 -0.0919 -0.163 0.0133 -0.0954
## 323 330 336 357 380 388 412 421 432
## Predicted 3.150 3.271 3.0622 2.7973 2.620 2.211 2.580 2.7929 2.678
## cvpred 3.158 3.275 3.0665 2.7965 2.621 2.209 2.565 2.7888 2.661
## log(medv) 3.016 3.118 3.0493 2.8792 2.322 2.001 2.845 2.8154 2.646
## CV residual -0.142 -0.157 -0.0173 0.0827 -0.298 -0.208 0.279 0.0266 -0.015
## 436 445 454 464 476 489 494
## Predicted 2.498 2.452 2.89323 3.00266 2.6355 2.7933 3.0135
## cvpred 2.477 2.448 2.87177 2.99886 2.6283 2.8031 3.0233
## log(medv) 2.595 2.380 2.87920 3.00568 2.5878 2.7213 3.0819
## CV residual 0.118 -0.068 0.00743 0.00682 -0.0406 -0.0818 0.0586
##
## Sum of squares = 1.1 Mean square = 0.02 n = 50
##
## fold 7
## Observations in test set: 49
## 12 14 25 46 67 69 83 111
## Predicted 3.0148 3.0500 2.7979 3.076 3.11 2.9068 3.2492 3.0044
## cvpred 3.0229 3.0527 2.7969 3.078 3.12 2.9105 3.2492 2.9998
## log(medv) 2.9392 3.0155 2.7473 2.960 2.97 2.8565 3.2108 3.0773
## CV residual -0.0838 -0.0371 -0.0497 -0.118 -0.15 -0.0541 -0.0384 0.0775
## 135 136 140 145 146 158 161 178 180
## Predicted 2.717 2.869 2.8273 2.5165 2.5840 3.566 3.521 3.378 3.505
## cvpred 2.718 2.863 2.8187 2.4903 2.5715 3.565 3.523 3.377 3.499
## log(medv) 2.747 2.896 2.8792 2.4681 2.6247 3.721 3.296 3.203 3.616
## CV residual 0.029 0.033 0.0605 -0.0222 0.0531 0.156 -0.227 -0.174 0.118
## 186 193 194 196 200 210 219 223 240
## Predicted 3.084 3.591248 3.4634 3.734 3.41 2.778 3.0445 3.3613 3.293
## cvpred 3.081 3.594034 3.4650 3.733 3.41 2.764 3.0396 3.3581 3.297
## log(medv) 3.388 3.594569 3.4372 3.912 3.55 2.996 3.0681 3.3142 3.148
## CV residual 0.307 0.000535 -0.0278 0.179 0.14 0.232 0.0285 -0.0439 -0.148
## 241 251 282 284 293 301 334 335
## Predicted 3.1785 3.2220 3.5398 3.8439 3.3876 3.388 3.1691 3.123
## cvpred 3.1811 3.2271 3.5403 3.8419 3.3850 3.395 3.1750 3.129
## log(medv) 3.0910 3.1946 3.5667 3.9120 3.3286 3.211 3.1001 3.030
## CV residual -0.0901 -0.0325 0.0264 0.0701 -0.0564 -0.184 -0.0749 -0.099
## 342 351 374 387 389 396 409 425 438
## Predicted 3.4463 3.1491 2.276 2.229 2.2836 2.817 2.544 2.567 2.302
## cvpred 3.4508 3.1576 2.206 2.203 2.2450 2.814 2.520 2.580 2.298
## log(medv) 3.4874 3.1311 2.625 2.351 2.3224 2.573 2.845 2.460 2.163
## CV residual 0.0366 -0.0265 0.418 0.148 0.0774 -0.242 0.325 -0.121 -0.134
## 456 457 473 475 486 493
## Predicted 2.633632 2.52372 2.947 2.6838 3.0059 2.9752
## cvpred 2.646503 2.53881 2.945 2.6820 3.0087 2.9729
## log(medv) 2.646175 2.54160 3.144 2.6247 3.0540 3.0007
## CV residual -0.000328 0.00279 0.199 -0.0573 0.0453 0.0278
##
## Sum of squares = 0.93 Mean square = 0.02 n = 49
##
## fold 8
## Observations in test set: 49
## 4 9 26 27 29 34 73 103
## Predicted 3.4680 2.586 2.7082 2.8077 2.961 2.722 3.274 2.9548
## cvpred 3.4609 2.571 2.7075 2.8105 2.974 2.716 3.264 2.9727
## log(medv) 3.5086 2.803 2.6319 2.8094 2.912 2.573 3.127 2.9232
## CV residual 0.0477 0.232 -0.0756 -0.0011 -0.062 -0.144 -0.137 -0.0495
## 113 115 128 149 169 173 174 176 189
## Predicted 2.9762 3.193 2.80125 2.55 3.187 3.03 3.307 3.45 3.5075
## cvpred 2.9666 3.192 2.79330 2.52 3.186 3.01 3.304 3.44 3.4931
## log(medv) 2.9339 2.918 2.78501 2.88 3.170 3.14 3.161 3.38 3.3945
## CV residual -0.0328 -0.274 -0.00829 0.36 -0.016 0.13 -0.142 -0.06 -0.0986
## 221 242 245 252 256 257 276 281
## Predicted 3.396 3.0529 2.8398 3.3112 3.0495 3.617 3.5576 3.697
## cvpred 3.419 3.0553 2.8441 3.3040 3.0536 3.612 3.5411 3.697
## log(medv) 3.285 3.0007 2.8679 3.2108 3.0397 3.784 3.4657 3.816
## CV residual -0.134 -0.0546 0.0238 -0.0932 -0.0138 0.172 -0.0754 0.118
## 292 305 307 325 331 343 353 364 365
## Predicted 3.501 3.40 3.4736 3.2587 3.153 3.212 2.9796 2.8684 3.587
## cvpred 3.497 3.41 3.4824 3.2582 3.155 3.220 2.9891 2.8840 3.634
## log(medv) 3.619 3.59 3.5086 3.2189 2.986 2.803 2.9232 2.8214 3.086
## CV residual 0.122 0.18 0.0262 -0.0393 -0.169 -0.417 -0.0659 -0.0626 -0.547
## 371 383 395 407 408 422 429 459 470
## Predicted 3.550 2.557 2.727 2.320 2.836 2.763 2.550 2.7213 2.771
## cvpred 3.550 2.536 2.721 2.273 2.830 2.764 2.553 2.7309 2.763
## log(medv) 3.912 2.425 2.542 2.477 3.329 2.653 2.398 2.7014 3.001
## CV residual 0.362 -0.111 -0.179 0.203 0.499 -0.111 -0.155 -0.0295 0.237
## 471 480 483 495 496 500
## Predicted 2.851 2.879 3.2609 2.989 2.831 2.9027
## cvpred 2.851 2.879 3.2705 2.986 2.821 2.8929
## log(medv) 2.991 3.063 3.2189 3.199 3.140 2.8622
## CV residual 0.139 0.184 -0.0517 0.213 0.319 -0.0307
##
## Sum of squares = 1.77 Mean square = 0.04 n = 49
##
## fold 9
## Observations in test set: 49
## 36 40 49 58 60 61 62 88
## Predicted 3.127 3.4082 2.511 3.4440 3.0133 2.8467 2.8419 3.225
## cvpred 3.106 3.3878 2.560 3.4390 3.0032 2.8458 2.8498 3.208
## log(medv) 2.939 3.4275 2.667 3.4532 2.9755 2.9285 2.7726 3.100
## CV residual -0.167 0.0397 0.107 0.0142 -0.0277 0.0827 -0.0772 -0.108
## 91 98 105 109 122 126 166 170
## Predicted 3.271 3.6596 3.0352 3.0740 2.983 2.9755 3.1663 3.197
## cvpred 3.262 3.6770 3.0270 3.0751 2.974 2.9668 3.1425 3.191
## log(medv) 3.118 3.6558 3.0007 2.9857 3.011 3.0634 3.2189 3.105
## CV residual -0.144 -0.0211 -0.0263 -0.0894 0.037 0.0966 0.0764 -0.086
## 183 184 191 208 214 224 226 227 231
## Predicted 3.538 3.4264 3.454 2.838 3.184 3.3347 3.718 3.6999 3.092
## cvpred 3.528 3.4044 3.454 2.841 3.178 3.3261 3.755 3.7058 3.081
## log(medv) 3.635 3.4812 3.611 3.114 3.336 3.4045 3.912 3.6270 3.190
## CV residual 0.107 0.0768 0.157 0.272 0.158 0.0784 0.157 -0.0788 0.109
## 238 267 279 288 290 295 349 354 358
## Predicted 3.5074 3.237 3.3310 3.2312 3.1700 3.156 3.2596 3.201 2.977
## cvpred 3.5072 3.262 3.3205 3.2238 3.1839 3.151 3.2609 3.208 2.959
## log(medv) 3.4500 3.424 3.3707 3.1442 3.2108 3.077 3.1987 3.405 3.077
## CV residual -0.0572 0.163 0.0502 -0.0797 0.0269 -0.074 -0.0623 0.196 0.119
## 366 368 370 385 386 390 410 416 423
## Predicted 2.812 2.491 3.467 2.1365 2.322 2.604 2.703 2.30 2.758
## cvpred 2.693 2.398 3.411 2.1217 2.340 2.585 2.720 2.33 2.729
## log(medv) 3.314 3.140 3.912 2.1748 1.974 2.442 3.314 1.97 3.035
## CV residual 0.621 0.741 0.501 0.0531 -0.366 -0.143 0.594 -0.36 0.306
## 426 430 434 442 463 502
## Predicted 2.337 2.481 2.6932 2.70 2.857 3.1436
## cvpred 2.333 2.495 2.6892 2.71 2.853 3.1376
## log(medv) 2.116 2.251 2.6603 2.84 2.970 3.1091
## CV residual -0.217 -0.244 -0.0289 0.13 0.117 -0.0285
##
## Sum of squares = 2.47 Mean square = 0.05 n = 49
##
## fold 10
## Observations in test set: 49
## 16 23 28 30 32 33 54 57
## Predicted 3.0359 2.7748 2.7410 3.010 2.904 2.480 3.15152 3.21344
## cvpred 3.0375 2.7776 2.7504 3.011 2.908 2.494 3.15457 3.21013
## log(medv) 2.9907 2.7213 2.6946 3.045 2.674 2.580 3.15274 3.20680
## CV residual -0.0468 -0.0563 -0.0557 0.034 -0.234 0.086 -0.00184 -0.00333
## 63 90 93 96 119 127 130 132
## Predicted 3.1611 3.464 3.279 3.3647 2.9629 2.640 2.755 2.9926
## cvpred 3.1579 3.458 3.279 3.3653 2.9744 2.650 2.761 2.9938
## log(medv) 3.1001 3.357 3.131 3.3464 3.0155 2.754 2.660 2.9755
## CV residual -0.0578 -0.101 -0.148 -0.0189 0.0411 0.103 -0.101 -0.0183
## 138 156 162 163 177 179 192 201
## Predicted 2.959 2.901 3.769 3.8835 3.1857 3.4293 3.4533 3.4261
## cvpred 2.962 2.933 3.755 3.8749 3.1913 3.4276 3.4489 3.4181
## log(medv) 2.839 2.747 3.912 3.9120 3.1442 3.3979 3.4177 3.4935
## CV residual -0.122 -0.186 0.157 0.0371 -0.0472 -0.0297 -0.0311 0.0754
## 217 218 225 229 234 237 247 260 266
## Predicted 3.1486 3.2643 3.690 3.610 3.663 3.3126 3.018 3.498 3.2136
## cvpred 3.1645 3.2663 3.676 3.600 3.649 3.3214 3.019 3.497 3.2234
## log(medv) 3.1485 3.3569 3.802 3.844 3.877 3.2229 3.190 3.405 3.1268
## CV residual -0.0161 0.0906 0.126 0.244 0.228 -0.0985 0.171 -0.093 -0.0967
## 271 286 303 316 348 356 381 384
## Predicted 2.9956 3.264 3.26913 3.014 3.199 2.9609 2.183 2.5482
## cvpred 3.0024 3.265 3.27143 3.021 3.196 2.9578 2.093 2.5477
## log(medv) 3.0493 3.091 3.27336 2.785 3.140 3.0253 2.342 2.5096
## CV residual 0.0469 -0.174 0.00193 -0.236 -0.056 0.0675 0.249 -0.0381
## 417 444 474 477 478 485 492 501
## Predicted 2.468 2.7278 3.073 2.8331 2.4524 2.878 2.851 2.975
## cvpred 2.482 2.7210 3.069 2.8311 2.4491 2.883 2.858 2.980
## log(medv) 2.015 2.7344 3.395 2.8154 2.4849 3.025 2.610 2.821
## CV residual -0.467 0.0134 0.326 -0.0157 0.0358 0.143 -0.248 -0.159
##
## Sum of squares = 1 Mean square = 0.02 n = 49
##
## Overall (Sum over all 49 folds)
## ms
## 0.0277
The relevant statistic we gather from the CVlm() and attributes() methods is the Mean Squared Error (MSE). This value represents the mean squared error our model has over a random set of test data sets, simulated by our training data set. Therefore, the smaller our MSE is, the better.
attr(fit4_CV, 'ms')
## [1] 0.0283
attr(fit5_CV, 'ms')
## [1] 0.0276
attr(fit6_CV, 'ms')
## [1] 0.0277
Our MSE for model 4 is 0.0283, model 5 is 0.0276, model 6 is 0.0277, thus model 5 is the best model.
With an R-squared value of 0.8379 and a Mean Squared Error of 0.1603, model 5 is the best.