library(readr)
library(mosaic)
library(dplyr)
library(car)
library(leaps)
AmesTrain18 <- read_csv("AmesTrain18.csv")
AmesTest5 <- read_csv("AmesTest5.csv")
I cross-validated the final model (model3 - AmesTrain18) I used for Assignment #3. I predicted the price for the new data. The values were put to the power of 3 since I transformed the response variable in the final model - I did the inverse of log(x) with b^x.
model3 = lm(log(Price) ~ Quality + FirstSF + SecondSF + BasementFinSF +
I(2010-YearBuilt) + GarageSF + Condition + LotArea +
I(Bedroom^4) + TotalRooms + BasementSF + OpenPorchSF + Fireplaces +
I(2010-YearRemodel) + ScreenPorchSF + EnclosedPorchSF, data = AmesTrain18)
AmesTrain18$predictedPrice = (predict.lm(model3, AmesTrain18)^3)
head(AmesTrain18[c(2,43)])
I calculated the residuals for the values from above.
AmesTrain18$resid = AmesTrain18$Price - AmesTrain18$predictedPrice
head(AmesTrain18[c(2, 43, 44)])
I calculated the mean and the standard devidations for the residuals. Additionally, I created a histogram to see the distribution of the residuals.
mean(AmesTrain18$resid)
## [1] 44.10818
sd(AmesTrain18$resid)
## [1] 47.3998
hist(AmesTrain18$resid, breaks = 20)
AmesTrain18[which.max(AmesTrain18$resid),]
AmesTrain18[which.min(AmesTrain18$resid),]
cor(AmesTrain18$Price, AmesTrain18$predictedPrice)
## [1] 0.9473928
Lastly, I found the correlation of the actual price and the predicted price and calculated the shrinkage value. The shrinkage value turned out to be 0.04315, which is significantly low. This represents that the model will still do a pretty good job predicting values when new data is used.
crosscorr = cor(AmesTrain18$Price, AmesTrain18$predictedPrice)
shrinkage = 0.9407 - crosscorr^2
shrinkage
## [1] 0.0431468
Final model (fancy model) from parts 7 and 8.
transFinal = lm(sqrt(Price)~LotFrontage + sqrt(LotArea) + factor(Quality) + factor(Condition) + YearBuilt + BasementSF+ GroundSF + BasementFBath + Bedroom + Fireplaces + I(GarageCars^2) + WoodDeckSF + OpenPorchSF + EnclosedPorchSF + ScreenPorchSF + LotConfig + ExteriorQ + BasementHt + HeatingQC + KitchenQ + factor(Quality)*sqrt(LotArea), data = AmesTest5)
summary(transFinal)
##
## Call:
## lm(formula = sqrt(Price) ~ LotFrontage + sqrt(LotArea) + factor(Quality) +
## factor(Condition) + YearBuilt + BasementSF + GroundSF + BasementFBath +
## Bedroom + Fireplaces + I(GarageCars^2) + WoodDeckSF + OpenPorchSF +
## EnclosedPorchSF + ScreenPorchSF + LotConfig + ExteriorQ +
## BasementHt + HeatingQC + KitchenQ + factor(Quality) * sqrt(LotArea),
## data = AmesTest5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9468 -0.2988 0.0000 0.3881 1.4635
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.999e+01 7.833e+00 -3.828 0.000190 ***
## LotFrontage -2.219e-03 2.086e-03 -1.064 0.289182
## sqrt(LotArea) 3.984e-02 1.686e-02 2.363 0.019425 *
## factor(Quality)4 3.241e+00 1.763e+00 1.838 0.068018 .
## factor(Quality)5 2.988e+00 1.779e+00 1.679 0.095170 .
## factor(Quality)6 3.089e+00 1.680e+00 1.839 0.067948 .
## factor(Quality)7 3.117e+00 1.693e+00 1.841 0.067593 .
## factor(Quality)8 2.006e+00 1.845e+00 1.087 0.278702
## factor(Quality)9 1.208e+00 3.487e+00 0.346 0.729583
## factor(Quality)10 2.760e+02 1.339e+03 0.206 0.837031
## factor(Condition)3 -4.077e-01 1.145e+00 -0.356 0.722292
## factor(Condition)4 -4.616e-01 1.023e+00 -0.451 0.652522
## factor(Condition)5 5.121e-01 1.029e+00 0.498 0.619446
## factor(Condition)6 8.705e-01 1.004e+00 0.867 0.387370
## factor(Condition)7 1.487e+00 1.043e+00 1.426 0.155985
## factor(Condition)8 1.414e+00 1.050e+00 1.347 0.179890
## factor(Condition)9 2.253e+00 1.325e+00 1.700 0.091192 .
## YearBuilt 1.763e-02 3.889e-03 4.534 1.19e-05 ***
## BasementSF 1.312e-03 2.319e-04 5.659 7.63e-08 ***
## GroundSF 2.543e-03 2.432e-04 10.460 < 2e-16 ***
## BasementFBath 2.539e-01 1.107e-01 2.293 0.023278 *
## Bedroom -3.707e-01 1.044e-01 -3.552 0.000513 ***
## Fireplaces 3.319e-01 1.087e-01 3.052 0.002698 **
## I(GarageCars^2) 3.024e-02 2.839e-02 1.065 0.288492
## WoodDeckSF 5.464e-05 5.380e-04 0.102 0.919252
## OpenPorchSF 5.454e-04 9.928e-04 0.549 0.583552
## EnclosedPorchSF -7.741e-04 1.114e-03 -0.695 0.488267
## ScreenPorchSF 3.071e-03 1.155e-03 2.660 0.008687 **
## LotConfigCulDSac 1.678e-01 2.687e-01 0.625 0.533212
## LotConfigFR2 -5.506e-02 2.643e-01 -0.208 0.835287
## LotConfigFR3 -9.493e-01 7.633e-01 -1.244 0.215615
## LotConfigInside 2.042e-01 1.480e-01 1.380 0.169587
## ExteriorQFa -9.805e-01 6.673e-01 -1.469 0.143829
## ExteriorQGd -8.071e-01 4.120e-01 -1.959 0.051976 .
## ExteriorQTA -6.531e-01 4.677e-01 -1.396 0.164692
## BasementHtFa -1.499e+00 5.637e-01 -2.659 0.008704 **
## BasementHtGd -5.716e-01 2.362e-01 -2.420 0.016748 *
## BasementHtNone -1.084e+00 4.803e-01 -2.256 0.025544 *
## BasementHtTA -9.392e-01 3.177e-01 -2.957 0.003622 **
## HeatingQCFa 3.555e-01 3.343e-01 1.063 0.289380
## HeatingQCGd 6.864e-02 1.773e-01 0.387 0.699210
## HeatingQCTA 2.239e-01 1.724e-01 1.298 0.196213
## KitchenQFa -6.431e-01 6.697e-01 -0.960 0.338470
## KitchenQGd 1.467e-01 4.007e-01 0.366 0.714723
## KitchenQTA -5.296e-02 4.130e-01 -0.128 0.898125
## sqrt(LotArea):factor(Quality)4 -3.096e-02 1.784e-02 -1.736 0.084689 .
## sqrt(LotArea):factor(Quality)5 -2.887e-02 1.807e-02 -1.598 0.112203
## sqrt(LotArea):factor(Quality)6 -2.871e-02 1.718e-02 -1.671 0.096876 .
## sqrt(LotArea):factor(Quality)7 -2.575e-02 1.701e-02 -1.514 0.132208
## sqrt(LotArea):factor(Quality)8 -7.213e-03 1.876e-02 -0.384 0.701196
## sqrt(LotArea):factor(Quality)9 2.155e-03 3.349e-02 0.064 0.948774
## sqrt(LotArea):factor(Quality)10 -2.374e+00 1.154e+01 -0.206 0.837235
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6649 on 148 degrees of freedom
## Multiple R-squared: 0.9582, Adjusted R-squared: 0.9439
## F-statistic: 66.59 on 51 and 148 DF, p-value: < 2.2e-16
Plugging in criteria to find confidence interval.
NewHouseData = data.frame(HouseStyle = "2Story", TotalRooms = 9, YearBuilt = 1995, YearRemodel = 2003, LotArea = 11060, LotConfig = "Corner", LotFrontage = 90, Quality = 7, Condition = 5, ExteriorQ = "Gd", ExteriorC = "Gd", FirstSF = 1164, SecondSF = 1150, GroundSF = 2314, GarageSF = 502, GarageQ = "TA", GarageCars = 2, OpenPorch = 274, BasementFBath = 2, BasementSF = 1150, Bedroom = 3, Fireplaces = 1, OpenPorchSF = 274, EnclosedPorchSF = 0, ScreenPorchSF = 0, WoodDeckSF = 0, BasementHt = "Ex", HeatingQC = "Ex", KitchenQ = "Gd")
predict.lm(transFinal, NewHouseData, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 16.83001 16.12832 17.5317
predict.lm(transFinal, NewHouseData, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 16.83001 15.34042 18.31961
16.83001^2
## [1] 283.2492
16.12832^2
## [1] 260.1227
17.5317^2
## [1] 307.3605
15.34042^2
## [1] 235.3285
18.31961^2
## [1] 335.6081
The predicted value for the price of a house with these specific characteristics is 283,249.2 dollars. We are 95% confident that the mean price of a house with all of the desired characteristics falls between 260,122.7 dollars and 307,360.5 dollars. We are 95% confident that the price of an individual house with all of the desired characteristics falls between 235,328.5 dollars and 335,608.1 dollars.