Final Project 2b - Kaggle

Join the Kaggle competition House Prices Advanced Regression Techniques and build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score. Provide a screen snapshot of your score with your name identifiable.

This is a long assignment with detailed code outputs. Please use the floating Table of Contents to follow the structure of the response and note that commentary proceeds the code and output.


Data

Our data is publicly available as the Ames Housing Data.

The target value is SalePrice, there are 78 predictors and one ID column, with 1460 rows in our train data and 1459 rows in test. test does not contain the target value and is ultimately what we need to submit for the competition.


Read Data

We’ve uploaded the competition data to our Github to read from here.

The target value is SalePrice, there are 78 predictors and one ID column, with 1460 rows in our train data and 1459 rows in test. test does not contain the target value and is ultimately what we need to submit for the competition.

datalocation_train = 'https://raw.githubusercontent.com/pkofy/DATA605/main/Final%20Project/train.csv'
datalocation_test = 'https://raw.githubusercontent.com/pkofy/DATA605/main/Final%20Project/test.csv'
train <- read.csv(file=datalocation_train)
test <- read.csv(file=datalocation_test)


Scale Target Variable

We’re going to scale the target variable from 0-100 so that it’s easier to evaluate the model iterations.

max_sp <- max(train$SalePrice)
min_sp <- min(train$SalePrice)
range <- max_sp - min_sp
train$ssp <- 100 * (train$SalePrice - min_sp) / range


Commonsense Eliminations

We’re trying to figure out which predictors we can exclude before we start backwards elimination.

For the numeric predictors we’re going to eliminate any that don’t appear to have a linear relationship when when we compare them to scaled sale price, ssp.

For the non-numeric predictors we’re going to eliminate any that have a lot of N/A values, or have only two unique values as including them in the model would break the linear regression model function.


Variable Overview

str(train)
## 'data.frame':    1460 obs. of  82 variables:
##  $ Id           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ MSSubClass   : int  60 20 60 70 60 50 20 60 50 190 ...
##  $ MSZoning     : chr  "RL" "RL" "RL" "RL" ...
##  $ LotFrontage  : int  65 80 68 60 84 85 75 NA 51 50 ...
##  $ LotArea      : int  8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
##  $ Street       : chr  "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley        : chr  NA NA NA NA ...
##  $ LotShape     : chr  "Reg" "Reg" "IR1" "IR1" ...
##  $ LandContour  : chr  "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities    : chr  "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ LotConfig    : chr  "Inside" "FR2" "Inside" "Corner" ...
##  $ LandSlope    : chr  "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood : chr  "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
##  $ Condition1   : chr  "Norm" "Feedr" "Norm" "Norm" ...
##  $ Condition2   : chr  "Norm" "Norm" "Norm" "Norm" ...
##  $ BldgType     : chr  "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ HouseStyle   : chr  "2Story" "1Story" "2Story" "2Story" ...
##  $ OverallQual  : int  7 6 7 7 8 5 8 7 7 5 ...
##  $ OverallCond  : int  5 8 5 5 5 5 5 6 5 6 ...
##  $ YearBuilt    : int  2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
##  $ YearRemodAdd : int  2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
##  $ RoofStyle    : chr  "Gable" "Gable" "Gable" "Gable" ...
##  $ RoofMatl     : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior1st  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
##  $ Exterior2nd  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
##  $ MasVnrType   : chr  "BrkFace" "None" "BrkFace" "None" ...
##  $ MasVnrArea   : int  196 0 162 0 350 0 186 240 0 0 ...
##  $ ExterQual    : chr  "Gd" "TA" "Gd" "TA" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ Foundation   : chr  "PConc" "CBlock" "PConc" "BrkTil" ...
##  $ BsmtQual     : chr  "Gd" "Gd" "Gd" "TA" ...
##  $ BsmtCond     : chr  "TA" "TA" "TA" "Gd" ...
##  $ BsmtExposure : chr  "No" "Gd" "Mn" "No" ...
##  $ BsmtFinType1 : chr  "GLQ" "ALQ" "GLQ" "ALQ" ...
##  $ BsmtFinSF1   : int  706 978 486 216 655 732 1369 859 0 851 ...
##  $ BsmtFinType2 : chr  "Unf" "Unf" "Unf" "Unf" ...
##  $ BsmtFinSF2   : int  0 0 0 0 0 0 0 32 0 0 ...
##  $ BsmtUnfSF    : int  150 284 434 540 490 64 317 216 952 140 ...
##  $ TotalBsmtSF  : int  856 1262 920 756 1145 796 1686 1107 952 991 ...
##  $ Heating      : chr  "GasA" "GasA" "GasA" "GasA" ...
##  $ HeatingQC    : chr  "Ex" "Ex" "Ex" "Gd" ...
##  $ CentralAir   : chr  "Y" "Y" "Y" "Y" ...
##  $ Electrical   : chr  "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ X1stFlrSF    : int  856 1262 920 961 1145 796 1694 1107 1022 1077 ...
##  $ X2ndFlrSF    : int  854 0 866 756 1053 566 0 983 752 0 ...
##  $ LowQualFinSF : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ GrLivArea    : int  1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
##  $ BsmtFullBath : int  1 0 1 1 1 1 1 1 0 1 ...
##  $ BsmtHalfBath : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ FullBath     : int  2 2 2 1 2 1 2 2 2 1 ...
##  $ HalfBath     : int  1 0 1 0 1 1 0 1 0 0 ...
##  $ BedroomAbvGr : int  3 3 3 3 4 1 3 3 2 2 ...
##  $ KitchenAbvGr : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ KitchenQual  : chr  "Gd" "TA" "Gd" "Gd" ...
##  $ TotRmsAbvGrd : int  8 6 6 7 9 5 7 7 8 5 ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Typ" ...
##  $ Fireplaces   : int  0 1 1 1 1 0 1 2 2 2 ...
##  $ FireplaceQu  : chr  NA "TA" "TA" "Gd" ...
##  $ GarageType   : chr  "Attchd" "Attchd" "Attchd" "Detchd" ...
##  $ GarageYrBlt  : int  2003 1976 2001 1998 2000 1993 2004 1973 1931 1939 ...
##  $ GarageFinish : chr  "RFn" "RFn" "RFn" "Unf" ...
##  $ GarageCars   : int  2 2 2 3 3 2 2 2 2 1 ...
##  $ GarageArea   : int  548 460 608 642 836 480 636 484 468 205 ...
##  $ GarageQual   : chr  "TA" "TA" "TA" "TA" ...
##  $ GarageCond   : chr  "TA" "TA" "TA" "TA" ...
##  $ PavedDrive   : chr  "Y" "Y" "Y" "Y" ...
##  $ WoodDeckSF   : int  0 298 0 0 192 40 255 235 90 0 ...
##  $ OpenPorchSF  : int  61 0 42 35 84 30 57 204 0 4 ...
##  $ EnclosedPorch: int  0 0 0 272 0 0 0 228 205 0 ...
##  $ X3SsnPorch   : int  0 0 0 0 0 320 0 0 0 0 ...
##  $ ScreenPorch  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolArea     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolQC       : chr  NA NA NA NA ...
##  $ Fence        : chr  NA NA NA NA ...
##  $ MiscFeature  : chr  NA NA NA NA ...
##  $ MiscVal      : int  0 0 0 0 0 700 0 350 0 0 ...
##  $ MoSold       : int  2 5 9 2 12 10 8 11 4 1 ...
##  $ YrSold       : int  2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
##  $ SaleType     : chr  "WD" "WD" "WD" "WD" ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Abnorml" ...
##  $ SalePrice    : int  208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
##  $ ssp          : num  24.1 20.4 26.2 14.6 29.9 ...


Pairs charts

We use pairs charts to look for linear relationships between the values. We want to look at the top row which shows ssp on the y-axis and the predictor in that column as the x-axis. We tried setting verInd=1 to only show the first row however we couldn’t fix the distortion since the charts were still stretched on the y-axis for the whole length of the chart.


Pairs chart 1

From this we want to eliminate MSSubClass. The problem with eliminating one of these variables is they could in conjunction with another variable have predictive value but we’ll excuse that for now.

pairs1 <- train[c("ssp", "MSSubClass", "LotFrontage", "LotArea", "OverallQual", "OverallCond")]
pairs(pairs1, gap=.5)


Pairs chart 2

From this set of pairs we’re going to eliminate MasVnrArea and BsmtFinSF2, The first looks like a cloud with a lot of zeros and the second looks like it has a lot of zero values.

pairs2 <- train[c("ssp", "YearBuilt", "YearRemodAdd", "MasVnrArea", "BsmtFinSF2", "BsmtUnfSF")]
pairs(pairs2, gap=.5)


Pairs chart 3

We’re going to eliminate X2ndFlrSF and LowQualFinSF because of lots of zero values, and the second doesn’t look like it has a linear relationship.

pairs3 <- train[c("ssp", "TotalBsmtSF", "X1stFlrSF", "X2ndFlrSF", "LowQualFinSF", "GrLivArea")]
pairs(pairs3, gap=.5)


Pairs chart 4

We’re going to eliminate all of the variables from pairs4 because they seem to not have linear relationships with scaled sales price. It could be that we should treat them as qualitative variables and only eliminate them from the backwards elimination step if they have too many 0 or N/A values but we’ll excuse that for now. FullBath looks linear but maybe it’s a proxy for another variable like size of house.

pairs4 <- train[c("ssp", "BsmtFullBath", "BsmtHalfBath", "FullBath", "HalfBath", "BedroomAbvGr")]
pairs(pairs4, gap=.5)


Pairs chart 5

We’re going to keep TotRmsAbvGrd because it seems linear and not too many zeros, and exclude the rest. In retrospect Fireplaces and GarageCars seem linear (If you exclude 3 fire places and 4 car garages) but we’re okay with the judgement call we made then

pairs5 <- train[c("ssp", "KitchenAbvGr", "TotRmsAbvGrd", "Fireplaces", "GarageYrBlt", "GarageCars")]
pairs(pairs5, gap=.5)


Pairs chart 6

From the sixth group we’ll keep Garage Area, but the rest seem to have a lot of zero values or don’t seem linear.

pairs6 <- train[c("ssp", "GarageArea", "WoodDeckSF", "OpenPorchSF", "EnclosedPorch", "X3SsnPorch")]
pairs(pairs6, gap=.5)


Pairs chart 7

We’re not going to keep any from these values, either from too many zeros or not linear enough (a line from the bottom left corner to the top right corner, or if inversely linear, a line from the top left corner to the bottom right corner).

pairs7 <- train[c("ssp", "ScreenPorch", "PoolArea", "MiscVal", "MoSold", "YrSold")]
pairs(pairs7, gap=.5)


Non-numeric Eliminations

Now we have to evaluate the non-numeric predictors that have too many missing values or two few unique values. We could also look at number of occurrences for each unique value but we’ll leave that aside.


Too Many Missing Values

Here we check the number of missing values in the non-numeric predictors. From this we can eliminate Alley, FireplaceQu, PoolQC, Fence, MiscFeature.

not_numeric <- c("MSZoning", "Street", "Alley", "LotShape", "LandContour", "Utilities", "LotConfig", "LandSlope", "Neighborhood", "Condition1", "Condition2", "BldgType", "HouseStyle", "RoofStyle", "RoofMatl", "Exterior1st", "Exterior2nd", "MasVnrType", "ExterQual", "ExterCond", "Foundation", "BsmtQual", "BsmtCond", "BsmtExposure", "BsmtFinType1", "BsmtFinType2", "Heating", "HeatingQC", "CentralAir", "Electrical", "KitchenQual", "Functional", "FireplaceQu", "GarageType", "GarageFinish", "GarageQual", "GarageCond", "PavedDrive", "PoolQC", "Fence", "MiscFeature", "SaleType", "SaleCondition")

colSums(is.na(train[not_numeric]))
##      MSZoning        Street         Alley      LotShape   LandContour 
##             0             0          1369             0             0 
##     Utilities     LotConfig     LandSlope  Neighborhood    Condition1 
##             0             0             0             0             0 
##    Condition2      BldgType    HouseStyle     RoofStyle      RoofMatl 
##             0             0             0             0             0 
##   Exterior1st   Exterior2nd    MasVnrType     ExterQual     ExterCond 
##             0             0             8             0             0 
##    Foundation      BsmtQual      BsmtCond  BsmtExposure  BsmtFinType1 
##             0            37            37            38            37 
##  BsmtFinType2       Heating     HeatingQC    CentralAir    Electrical 
##            38             0             0             0             1 
##   KitchenQual    Functional   FireplaceQu    GarageType  GarageFinish 
##             0             0           690            81            81 
##    GarageQual    GarageCond    PavedDrive        PoolQC         Fence 
##            81            81             0          1453          1179 
##   MiscFeature      SaleType SaleCondition 
##          1406             0             0


Not Enough Unique Values

Here we take out three of the remaining non-numeric predictors, Street, Utilities, and CentralAir, because they only have two unique values. We’re not evaluating the rest for display but keeping the code below

length(unique(train$Street))
## [1] 2
length(unique(train$Utilities))
## [1] 2
length(unique(train$CentralAir))
## [1] 2
length(unique(train$MSZoning))
length(unique(train$LotShape))
length(unique(train$LandContour))
length(unique(train$LotConfig))
length(unique(train$LandSlope))
length(unique(train$Neighborhood))
length(unique(train$Condition1))
length(unique(train$Condition2))
length(unique(train$BldgType))
length(unique(train$HouseStyle))
length(unique(train$RoofStyle))
length(unique(train$RoofMatl))
length(unique(train$Exterior1st))
length(unique(train$Exterior2nd))
length(unique(train$MasVnrType))
length(unique(train$ExterQual))
length(unique(train$ExterCond))
length(unique(train$Foundation))
length(unique(train$BsmtQual))
length(unique(train$BsmtCond))
length(unique(train$BsmtExposure))
length(unique(train$BsmtFinType1))
length(unique(train$BsmtFinType2))
length(unique(train$Heating))
length(unique(train$HeatingQC))
length(unique(train$Electrical))
length(unique(train$KitchenQual))
length(unique(train$Functional))
length(unique(train$GarageType))
length(unique(train$GarageFinish))
length(unique(train$GarageQual))
length(unique(train$GarageCond))
length(unique(train$PavedDrive))
length(unique(train$SaleType))
length(unique(train$SaleCondition))


Start Simplifying

Here we perform Backwards Elimination using the variables remaining after our commonsense eliminations.


Batch Eliminations

In Backwards Elimination you remove one predictor at a time and reevaluating your model between each elimination however we’ll batch some of the earlier eliminations for efficiency.


Model 1

After running our first model, based on the p-values of the respective coefficients being high we can remove the following in our first batch elimination. We’re not predetermining a threshold for p to make our eliminations.

  • YearRemodAdd
  • TotRmsAbvGrd
  • LotShape
  • RoofStyle
  • MasVnrType
  • ExterCond
  • BsmtCond
  • Heating
  • HeatingQC
  • Electrical
  • Functional
  • GarageFinish
  • PavedDrive
  • SaleType
  • SaleCondition
lm1 <- lm(ssp ~ LotFrontage + LotArea + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + TotRmsAbvGrd + GarageArea + MSZoning + LotShape + LandContour + LotConfig + LandSlope + Neighborhood + Condition1 + Condition2 + BldgType + HouseStyle + RoofStyle + RoofMatl + Exterior1st + Exterior2nd + MasVnrType + ExterQual + ExterCond + Foundation + BsmtQual + BsmtCond + BsmtExposure + BsmtFinType1 + BsmtFinType2 + Heating + HeatingQC + Electrical + KitchenQual + Functional + GarageType + GarageFinish + GarageQual + GarageCond + PavedDrive + SaleType + SaleCondition, data=train)
summary(lm1)
## 
## Call:
## lm(formula = ssp ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + 
##     GrLivArea + TotRmsAbvGrd + GarageArea + MSZoning + LotShape + 
##     LandContour + LotConfig + LandSlope + Neighborhood + Condition1 + 
##     Condition2 + BldgType + HouseStyle + RoofStyle + RoofMatl + 
##     Exterior1st + Exterior2nd + MasVnrType + ExterQual + ExterCond + 
##     Foundation + BsmtQual + BsmtCond + BsmtExposure + BsmtFinType1 + 
##     BsmtFinType2 + Heating + HeatingQC + Electrical + KitchenQual + 
##     Functional + GarageType + GarageFinish + GarageQual + GarageCond + 
##     PavedDrive + SaleType + SaleCondition, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -26.1684  -1.3217   0.0134   1.3400  26.1684 
## 
## Coefficients: (2 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -2.464e+02  3.247e+01  -7.589 8.04e-14 ***
## LotFrontage           1.324e-02  7.930e-03   1.670 0.095276 .  
## LotArea               1.169e-04  2.179e-05   5.365 1.03e-07 ***
## OverallQual           1.060e+00  1.831e-01   5.792 9.61e-09 ***
## OverallCond           9.498e-01  1.564e-01   6.071 1.87e-09 ***
## YearBuilt             5.139e-02  1.323e-02   3.883 0.000111 ***
## YearRemodAdd          1.289e-02  9.955e-03   1.295 0.195670    
## BsmtUnfSF            -2.826e-03  4.554e-04  -6.205 8.32e-10 ***
## TotalBsmtSF           6.789e-03  9.644e-04   7.039 3.83e-12 ***
## X1stFlrSF            -4.519e-03  1.321e-03  -3.422 0.000650 ***
## GrLivArea             1.100e-02  9.087e-04  12.107  < 2e-16 ***
## TotRmsAbvGrd         -4.350e-02  1.509e-01  -0.288 0.773217    
## GarageArea            3.269e-03  9.230e-04   3.542 0.000418 ***
## MSZoningFV            6.449e+00  1.987e+00   3.246 0.001212 ** 
## MSZoningRH            4.881e+00  2.128e+00   2.293 0.022071 *  
## MSZoningRL            4.756e+00  1.744e+00   2.728 0.006500 ** 
## MSZoningRM            4.216e+00  1.617e+00   2.607 0.009276 ** 
## LotShapeIR2           8.978e-01  8.174e-01   1.098 0.272322    
## LotShapeIR3           1.077e+00  1.819e+00   0.592 0.553784    
## LotShapeReg           3.086e-01  2.945e-01   1.048 0.295049    
## LandContourHLS        1.314e+00  9.216e-01   1.426 0.154189    
## LandContourLow       -3.208e+00  1.383e+00  -2.320 0.020574 *  
## LandContourLvl        8.046e-01  6.921e-01   1.163 0.245324    
## LotConfigCulDSac      1.992e+00  7.247e-01   2.749 0.006091 ** 
## LotConfigFR2         -1.594e+00  7.654e-01  -2.082 0.037582 *  
## LotConfigFR3         -2.188e+00  1.914e+00  -1.143 0.253192    
## LotConfigInside      -5.342e-02  3.244e-01  -0.165 0.869267    
## LandSlopeMod          9.145e-01  7.265e-01   1.259 0.208408    
## LandSlopeSev         -6.162e+00  2.283e+00  -2.700 0.007074 ** 
## NeighborhoodBlueste   4.405e-01  2.957e+00   0.149 0.881587    
## NeighborhoodBrDale    1.324e+00  1.790e+00   0.740 0.459515    
## NeighborhoodBrkSide   1.231e-01  1.636e+00   0.075 0.940048    
## NeighborhoodClearCr  -1.907e+00  1.712e+00  -1.114 0.265745    
## NeighborhoodCollgCr  -1.386e+00  1.179e+00  -1.175 0.240124    
## NeighborhoodCrawfor   2.065e+00  1.433e+00   1.441 0.149857    
## NeighborhoodEdwards  -2.809e+00  1.321e+00  -2.127 0.033655 *  
## NeighborhoodGilbert  -1.309e+00  1.291e+00  -1.014 0.310779    
## NeighborhoodIDOTRR    4.482e-02  1.854e+00   0.024 0.980717    
## NeighborhoodMeadowV  -7.656e-01  1.899e+00  -0.403 0.686960    
## NeighborhoodMitchel  -2.367e+00  1.387e+00  -1.706 0.088303 .  
## NeighborhoodNAmes    -2.116e+00  1.278e+00  -1.656 0.098112 .  
## NeighborhoodNoRidge   4.103e+00  1.358e+00   3.021 0.002591 ** 
## NeighborhoodNPkVill   1.214e+00  2.786e+00   0.436 0.663218    
## NeighborhoodNridgHt   2.634e+00  1.195e+00   2.204 0.027764 *  
## NeighborhoodNWAmes   -2.323e+00  1.350e+00  -1.720 0.085764 .  
## NeighborhoodOldTown  -1.090e+00  1.635e+00  -0.667 0.505243    
## NeighborhoodSawyer   -9.059e-01  1.364e+00  -0.664 0.506723    
## NeighborhoodSawyerW  -2.794e-01  1.298e+00  -0.215 0.829670    
## NeighborhoodSomerst  -3.999e-01  1.428e+00  -0.280 0.779528    
## NeighborhoodStoneBr   6.329e+00  1.372e+00   4.614 4.52e-06 ***
## NeighborhoodSWISU    -6.314e-01  1.622e+00  -0.389 0.697206    
## NeighborhoodTimber   -1.828e+00  1.335e+00  -1.369 0.171185    
## NeighborhoodVeenker   1.133e+00  1.876e+00   0.604 0.546017    
## Condition1Feedr       3.456e-01  8.844e-01   0.391 0.696055    
## Condition1Norm        1.786e+00  6.961e-01   2.566 0.010443 *  
## Condition1PosA        9.717e-01  2.396e+00   0.406 0.685190    
## Condition1PosN        3.099e-01  1.615e+00   0.192 0.847852    
## Condition1RRAe       -1.820e+00  1.644e+00  -1.107 0.268530    
## Condition1RRAn        1.251e+00  1.127e+00   1.109 0.267610    
## Condition1RRNe        1.514e+00  3.595e+00   0.421 0.673823    
## Condition1RRNn       -3.425e-01  2.293e+00  -0.149 0.881274    
## Condition2Feedr       4.975e-01  3.767e+00   0.132 0.894971    
## Condition2Norm        1.367e+00  3.231e+00   0.423 0.672175    
## Condition2PosA        7.034e+00  6.371e+00   1.104 0.269800    
## Condition2PosN       -3.095e+01  4.424e+00  -6.996 5.14e-12 ***
## Condition2RRNn        2.767e+00  4.222e+00   0.655 0.512387    
## BldgType2fmCon       -2.009e+00  1.041e+00  -1.930 0.053880 .  
## BldgTypeDuplex       -4.347e+00  9.715e-01  -4.474 8.64e-06 ***
## BldgTypeTwnhs        -3.283e+00  9.503e-01  -3.454 0.000578 ***
## BldgTypeTwnhsE       -2.432e+00  6.501e-01  -3.741 0.000195 ***
## HouseStyle1.5Unf      2.319e+00  1.475e+00   1.573 0.116102    
## HouseStyle1Story      2.001e+00  6.921e-01   2.891 0.003934 ** 
## HouseStyle2.5Fin     -6.240e+00  1.932e+00  -3.231 0.001279 ** 
## HouseStyle2.5Unf     -2.870e+00  1.641e+00  -1.749 0.080557 .  
## HouseStyle2Story     -7.926e-01  5.716e-01  -1.386 0.165951    
## HouseStyleSFoyer      1.248e+00  1.102e+00   1.132 0.257933    
## HouseStyleSLvl        1.049e+00  9.098e-01   1.153 0.249392    
## RoofStyleGable        4.587e+00  4.285e+00   1.071 0.284593    
## RoofStyleGambrel      4.729e+00  4.498e+00   1.052 0.293310    
## RoofStyleHip          4.490e+00  4.291e+00   1.046 0.295621    
## RoofStyleMansard      7.219e+00  4.795e+00   1.506 0.132515    
## RoofMatlCompShg       9.547e+01  5.180e+00  18.432  < 2e-16 ***
## RoofMatlMembran       1.130e+02  8.243e+00  13.706  < 2e-16 ***
## RoofMatlRoll          9.720e+01  6.489e+00  14.978  < 2e-16 ***
## RoofMatlTar&Grv       9.638e+01  6.389e+00  15.086  < 2e-16 ***
## RoofMatlWdShake       9.253e+01  6.432e+00  14.387  < 2e-16 ***
## RoofMatlWdShngl       1.034e+02  5.365e+00  19.266  < 2e-16 ***
## Exterior1stBrkComm   -7.932e+00  5.570e+00  -1.424 0.154801    
## Exterior1stBrkFace    4.660e-01  2.178e+00   0.214 0.830612    
## Exterior1stCBlock     3.672e-02  4.348e+00   0.008 0.993263    
## Exterior1stCemntBd   -2.684e+00  3.511e+00  -0.765 0.444754    
## Exterior1stHdBoard   -2.654e+00  2.202e+00  -1.205 0.228489    
## Exterior1stImStucc   -8.850e+00  4.267e+00  -2.074 0.038364 *  
## Exterior1stMetalSd   -2.375e-01  2.543e+00  -0.093 0.925586    
## Exterior1stPlywood   -2.883e+00  2.194e+00  -1.314 0.189053    
## Exterior1stStone     -2.088e+00  5.781e+00  -0.361 0.718061    
## Exterior1stStucco    -1.846e+00  2.430e+00  -0.760 0.447731    
## Exterior1stVinylSd   -2.280e+00  2.214e+00  -1.030 0.303230    
## Exterior1stWd Sdng   -1.724e+00  2.116e+00  -0.815 0.415539    
## Exterior1stWdShing   -1.467e+00  2.257e+00  -0.650 0.515871    
## Exterior2ndAsphShn    2.015e+00  3.662e+00   0.550 0.582333    
## Exterior2ndBrk Cmn    3.872e+00  3.647e+00   1.061 0.288756    
## Exterior2ndBrkFace    1.186e+00  2.265e+00   0.524 0.600594    
## Exterior2ndCBlock            NA         NA      NA       NA    
## Exterior2ndCmentBd    3.092e+00  3.445e+00   0.898 0.369654    
## Exterior2ndHdBoard    2.270e+00  2.136e+00   1.063 0.288172    
## Exterior2ndImStucc    6.082e+00  2.366e+00   2.571 0.010310 *  
## Exterior2ndMetalSd    9.759e-01  2.475e+00   0.394 0.693509    
## Exterior2ndOther     -2.023e+00  4.215e+00  -0.480 0.631350    
## Exterior2ndPlywood    2.157e+00  2.063e+00   1.046 0.296056    
## Exterior2ndStone      6.028e-01  4.231e+00   0.142 0.886740    
## Exterior2ndStucco     2.186e+00  2.349e+00   0.930 0.352393    
## Exterior2ndVinylSd    2.566e+00  2.135e+00   1.202 0.229715    
## Exterior2ndWd Sdng    2.252e+00  2.028e+00   1.110 0.267266    
## Exterior2ndWd Shng    1.683e+00  2.109e+00   0.798 0.425012    
## MasVnrTypeBrkFace     1.228e+00  1.296e+00   0.947 0.343711    
## MasVnrTypeNone        1.098e+00  1.279e+00   0.859 0.390785    
## MasVnrTypeStone       2.145e+00  1.334e+00   1.607 0.108319    
## ExterQualFa          -3.481e+00  2.106e+00  -1.653 0.098720 .  
## ExterQualGd          -3.529e+00  7.840e-01  -4.501 7.65e-06 ***
## ExterQualTA          -3.266e+00  8.900e-01  -3.670 0.000257 ***
## ExterCondFa           7.266e-01  3.982e+00   0.183 0.855229    
## ExterCondGd          -5.676e-01  3.816e+00  -0.149 0.881779    
## ExterCondTA          -2.909e-02  3.811e+00  -0.008 0.993910    
## FoundationCBlock      7.934e-01  5.612e-01   1.414 0.157752    
## FoundationPConc       6.422e-01  6.046e-01   1.062 0.288445    
## FoundationStone       5.987e-01  1.723e+00   0.347 0.728351    
## FoundationWood       -5.060e+00  2.743e+00  -1.845 0.065377 .  
## BsmtQualFa           -1.424e+00  1.035e+00  -1.376 0.169073    
## BsmtQualGd           -2.479e+00  5.340e-01  -4.643 3.95e-06 ***
## BsmtQualTA           -2.432e+00  6.869e-01  -3.541 0.000419 ***
## BsmtCondGd           -2.889e-01  9.055e-01  -0.319 0.749741    
## BsmtCondPo            5.890e+00  5.996e+00   0.982 0.326253    
## BsmtCondTA            2.596e-02  7.288e-01   0.036 0.971596    
## BsmtExposureGd        1.897e+00  5.253e-01   3.611 0.000321 ***
## BsmtExposureMn       -8.482e-01  5.157e-01  -1.645 0.100375    
## BsmtExposureNo       -1.053e+00  3.698e-01  -2.848 0.004496 ** 
## BsmtFinType1BLQ       8.549e-02  4.970e-01   0.172 0.863459    
## BsmtFinType1GLQ       7.884e-01  4.413e-01   1.787 0.074351 .  
## BsmtFinType1LwQ      -8.771e-01  6.505e-01  -1.348 0.177934    
## BsmtFinType1Rec      -1.649e-01  5.108e-01  -0.323 0.746982    
## BsmtFinType1Unf       2.494e-01  5.063e-01   0.493 0.622435    
## BsmtFinType2BLQ      -1.818e-01  1.334e+00  -0.136 0.891607    
## BsmtFinType2GLQ       1.069e-01  1.631e+00   0.066 0.947769    
## BsmtFinType2LwQ      -3.113e-01  1.295e+00  -0.240 0.810120    
## BsmtFinType2Rec      -4.743e-01  1.264e+00  -0.375 0.707562    
## BsmtFinType2Unf       5.369e-01  1.130e+00   0.475 0.634658    
## HeatingGasW          -5.862e-01  1.158e+00  -0.506 0.612922    
## HeatingGrav           2.426e+00  3.430e+00   0.707 0.479459    
## HeatingOthW          -5.547e-01  4.170e+00  -0.133 0.894210    
## HeatingQCFa           2.954e-01  8.646e-01   0.342 0.732715    
## HeatingQCGd          -4.381e-01  3.632e-01  -1.206 0.228072    
## HeatingQCPo           5.401e-02  4.215e+00   0.013 0.989779    
## HeatingQCTA          -3.809e-01  3.674e-01  -1.037 0.300162    
## ElectricalFuseF      -4.577e-01  1.173e+00  -0.390 0.696483    
## ElectricalFuseP      -1.970e-01  3.744e+00  -0.053 0.958041    
## ElectricalMix                NA         NA      NA       NA    
## ElectricalSBrkr       3.245e-01  5.238e-01   0.619 0.535779    
## KitchenQualFa        -2.778e+00  1.122e+00  -2.475 0.013489 *  
## KitchenQualGd        -3.364e+00  5.699e-01  -5.903 5.05e-09 ***
## KitchenQualTA        -3.071e+00  6.661e-01  -4.611 4.58e-06 ***
## FunctionalMaj2       -2.709e+00  2.561e+00  -1.058 0.290504    
## FunctionalMin1        5.606e-01  1.482e+00   0.378 0.705220    
## FunctionalMin2        1.088e-01  1.451e+00   0.075 0.940276    
## FunctionalMod        -8.693e-01  1.875e+00  -0.464 0.643042    
## FunctionalTyp         1.754e+00  1.254e+00   1.399 0.162172    
## GarageTypeAttchd      3.044e+00  1.802e+00   1.689 0.091639 .  
## GarageTypeBasment     3.123e+00  2.120e+00   1.473 0.141103    
## GarageTypeBuiltIn     2.497e+00  1.900e+00   1.314 0.189078    
## GarageTypeCarPort     5.387e+00  2.552e+00   2.111 0.035040 *  
## GarageTypeDetchd      3.627e+00  1.796e+00   2.020 0.043702 *  
## GarageFinishRFn      -3.433e-01  3.387e-01  -1.014 0.310978    
## GarageFinishUnf      -2.593e-01  4.242e-01  -0.611 0.541082    
## GarageQualFa         -1.360e+01  4.583e+00  -2.967 0.003091 ** 
## GarageQualGd         -1.190e+01  4.741e+00  -2.511 0.012216 *  
## GarageQualPo         -1.776e+01  6.453e+00  -2.752 0.006045 ** 
## GarageQualTA         -1.304e+01  4.530e+00  -2.879 0.004085 ** 
## GarageCondFa          1.299e+01  5.234e+00   2.482 0.013251 *  
## GarageCondGd          1.292e+01  5.567e+00   2.320 0.020567 *  
## GarageCondPo          1.309e+01  5.742e+00   2.279 0.022881 *  
## GarageCondTA          1.338e+01  5.169e+00   2.589 0.009785 ** 
## PavedDriveP          -5.227e-01  1.028e+00  -0.509 0.611142    
## PavedDriveY          -2.432e-02  6.949e-01  -0.035 0.972083    
## SaleTypeCon           3.046e+00  2.734e+00   1.114 0.265397    
## SaleTypeConLD         2.356e+00  2.057e+00   1.146 0.252271    
## SaleTypeConLI         5.484e-02  2.303e+00   0.024 0.981010    
## SaleTypeConLw         7.081e-01  2.058e+00   0.344 0.730854    
## SaleTypeCWD           1.139e+00  1.998e+00   0.570 0.568707    
## SaleTypeNew           1.154e+00  2.539e+00   0.455 0.649541    
## SaleTypeOth           3.038e+00  3.558e+00   0.854 0.393421    
## SaleTypeWD           -1.985e-01  7.402e-01  -0.268 0.788589    
## SaleConditionAdjLand  4.699e+00  3.870e+00   1.214 0.224932    
## SaleConditionAlloca   9.881e-01  1.775e+00   0.557 0.577802    
## SaleConditionFamily  -7.086e-01  1.005e+00  -0.705 0.480904    
## SaleConditionNormal   5.878e-01  5.182e-01   1.134 0.256994    
## SaleConditionPartial  1.210e+00  2.447e+00   0.494 0.621135    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.417 on 900 degrees of freedom
##   (366 observations deleted due to missingness)
## Multiple R-squared:  0.9279, Adjusted R-squared:  0.9125 
## F-statistic: 60.04 on 193 and 900 DF,  p-value: < 2.2e-16


Model 2

From model 2 below we identify five more predictors to batch eliminate based on p-values that are too high.

  • Exterior1st
  • Exterior2nd
  • Foundation
  • BsmtFinType1
  • BsmtFinType2
lm2 <- update(lm1, .~. -YearRemodAdd -TotRmsAbvGrd -LotShape -RoofStyle -MasVnrType -ExterCond -BsmtCond -Heating -HeatingQC -Electrical -Functional -GarageFinish -PavedDrive -SaleType -SaleCondition, data=train)
summary(lm2)
## 
## Call:
## lm(formula = ssp ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + 
##     GarageArea + MSZoning + LandContour + LotConfig + LandSlope + 
##     Neighborhood + Condition1 + Condition2 + BldgType + HouseStyle + 
##     RoofMatl + Exterior1st + Exterior2nd + ExterQual + Foundation + 
##     BsmtQual + BsmtExposure + BsmtFinType1 + BsmtFinType2 + KitchenQual + 
##     GarageType + GarageQual + GarageCond, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.395  -1.380   0.058   1.368  25.395 
## 
## Coefficients: (1 not defined because of singularities)
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.318e+02  2.617e+01  -8.860  < 2e-16 ***
## LotFrontage          1.296e-02  7.716e-03   1.680 0.093269 .  
## LotArea              1.193e-04  1.971e-05   6.052 2.05e-09 ***
## OverallQual          1.044e+00  1.760e-01   5.932 4.17e-09 ***
## OverallCond          1.052e+00  1.321e-01   7.964 4.70e-15 ***
## YearBuilt            6.308e-02  1.225e-02   5.148 3.20e-07 ***
## BsmtUnfSF           -2.890e-03  4.430e-04  -6.524 1.11e-10 ***
## TotalBsmtSF          6.868e-03  9.136e-04   7.518 1.28e-13 ***
## X1stFlrSF           -4.457e-03  1.242e-03  -3.588 0.000351 ***
## GrLivArea            1.050e-02  8.303e-04  12.651  < 2e-16 ***
## GarageArea           3.720e-03  8.963e-04   4.150 3.61e-05 ***
## MSZoningFV           5.693e+00  1.903e+00   2.991 0.002851 ** 
## MSZoningRH           4.887e+00  2.077e+00   2.353 0.018849 *  
## MSZoningRL           4.671e+00  1.670e+00   2.797 0.005256 ** 
## MSZoningRM           4.100e+00  1.534e+00   2.672 0.007664 ** 
## LandContourHLS       1.320e+00  8.948e-01   1.475 0.140487    
## LandContourLow      -3.149e+00  1.336e+00  -2.358 0.018569 *  
## LandContourLvl       7.412e-01  6.618e-01   1.120 0.263032    
## LotConfigCulDSac     1.910e+00  6.924e-01   2.758 0.005931 ** 
## LotConfigFR2        -1.627e+00  7.517e-01  -2.165 0.030637 *  
## LotConfigFR3        -2.463e+00  1.912e+00  -1.288 0.198033    
## LotConfigInside     -3.349e-02  3.166e-01  -0.106 0.915771    
## LandSlopeMod         5.300e-01  7.046e-01   0.752 0.452087    
## LandSlopeSev        -6.814e+00  2.143e+00  -3.180 0.001520 ** 
## NeighborhoodBlueste  1.322e-01  2.907e+00   0.045 0.963748    
## NeighborhoodBrDale   5.991e-01  1.743e+00   0.344 0.731075    
## NeighborhoodBrkSide -6.242e-01  1.551e+00  -0.402 0.687420    
## NeighborhoodClearCr -2.809e+00  1.660e+00  -1.693 0.090841 .  
## NeighborhoodCollgCr -2.108e+00  1.122e+00  -1.878 0.060688 .  
## NeighborhoodCrawfor  1.498e+00  1.373e+00   1.092 0.275318    
## NeighborhoodEdwards -3.437e+00  1.261e+00  -2.726 0.006536 ** 
## NeighborhoodGilbert -1.909e+00  1.233e+00  -1.548 0.122032    
## NeighborhoodIDOTRR  -7.177e-01  1.754e+00  -0.409 0.682541    
## NeighborhoodMeadowV -1.846e+00  1.849e+00  -0.998 0.318426    
## NeighborhoodMitchel -3.482e+00  1.336e+00  -2.607 0.009282 ** 
## NeighborhoodNAmes   -3.067e+00  1.224e+00  -2.506 0.012381 *  
## NeighborhoodNoRidge  3.290e+00  1.305e+00   2.521 0.011849 *  
## NeighborhoodNPkVill  3.613e-01  2.740e+00   0.132 0.895146    
## NeighborhoodNridgHt  2.510e+00  1.136e+00   2.211 0.027301 *  
## NeighborhoodNWAmes  -3.264e+00  1.298e+00  -2.515 0.012069 *  
## NeighborhoodOldTown -2.010e+00  1.549e+00  -1.298 0.194736    
## NeighborhoodSawyer  -1.913e+00  1.311e+00  -1.459 0.144942    
## NeighborhoodSawyerW -1.168e+00  1.237e+00  -0.945 0.345094    
## NeighborhoodSomerst  8.796e-02  1.350e+00   0.065 0.948077    
## NeighborhoodStoneBr  5.845e+00  1.333e+00   4.386 1.28e-05 ***
## NeighborhoodSWISU   -1.352e+00  1.553e+00  -0.870 0.384506    
## NeighborhoodTimber  -2.443e+00  1.298e+00  -1.882 0.060100 .  
## NeighborhoodVeenker  4.908e-01  1.802e+00   0.272 0.785362    
## Condition1Feedr      3.332e-01  8.636e-01   0.386 0.699742    
## Condition1Norm       1.683e+00  6.678e-01   2.521 0.011869 *  
## Condition1PosA       5.758e-01  2.297e+00   0.251 0.802111    
## Condition1PosN       5.628e-01  1.583e+00   0.355 0.722301    
## Condition1RRAe      -2.016e+00  1.494e+00  -1.350 0.177481    
## Condition1RRAn       1.326e+00  1.086e+00   1.221 0.222318    
## Condition1RRNe       1.325e+00  3.599e+00   0.368 0.712904    
## Condition1RRNn       5.144e-03  2.245e+00   0.002 0.998173    
## Condition2Feedr      8.947e-01  3.680e+00   0.243 0.807977    
## Condition2Norm       2.018e+00  3.148e+00   0.641 0.521519    
## Condition2PosA       7.881e+00  5.006e+00   1.574 0.115775    
## Condition2PosN      -2.969e+01  4.323e+00  -6.867 1.18e-11 ***
## Condition2RRNn       3.297e+00  4.167e+00   0.791 0.428966    
## BldgType2fmCon      -1.505e+00  9.432e-01  -1.596 0.110913    
## BldgTypeDuplex      -4.325e+00  8.965e-01  -4.824 1.64e-06 ***
## BldgTypeTwnhs       -3.419e+00  8.996e-01  -3.800 0.000154 ***
## BldgTypeTwnhsE      -2.521e+00  6.115e-01  -4.123 4.06e-05 ***
## HouseStyle1.5Unf     2.446e+00  1.334e+00   1.834 0.066952 .  
## HouseStyle1Story     2.023e+00  6.655e-01   3.040 0.002433 ** 
## HouseStyle2.5Fin    -5.366e+00  1.890e+00  -2.839 0.004626 ** 
## HouseStyle2.5Unf    -2.331e+00  1.456e+00  -1.601 0.109747    
## HouseStyle2Story    -6.128e-01  5.476e-01  -1.119 0.263369    
## HouseStyleSFoyer     1.104e+00  1.074e+00   1.028 0.304368    
## HouseStyleSLvl       8.281e-01  8.721e-01   0.950 0.342596    
## RoofMatlCompShg      9.225e+01  4.900e+00  18.828  < 2e-16 ***
## RoofMatlMembran      1.041e+02  6.741e+00  15.441  < 2e-16 ***
## RoofMatlRoll         9.474e+01  6.235e+00  15.195  < 2e-16 ***
## RoofMatlTar&Grv      8.969e+01  5.223e+00  17.173  < 2e-16 ***
## RoofMatlWdShake      9.236e+01  5.795e+00  15.937  < 2e-16 ***
## RoofMatlWdShngl      9.975e+01  5.112e+00  19.511  < 2e-16 ***
## Exterior1stBrkComm  -1.038e+01  5.061e+00  -2.051 0.040540 *  
## Exterior1stBrkFace   4.492e-01  2.119e+00   0.212 0.832194    
## Exterior1stCBlock    1.625e+00  4.271e+00   0.381 0.703632    
## Exterior1stCemntBd  -2.250e+00  3.396e+00  -0.662 0.507839    
## Exterior1stHdBoard  -2.161e+00  2.133e+00  -1.013 0.311358    
## Exterior1stImStucc  -7.692e+00  4.233e+00  -1.817 0.069524 .  
## Exterior1stMetalSd  -7.480e-01  2.454e+00  -0.305 0.760613    
## Exterior1stPlywood  -2.369e+00  2.124e+00  -1.115 0.265057    
## Exterior1stStone    -1.364e+00  5.686e+00  -0.240 0.810476    
## Exterior1stStucco   -1.177e+00  2.329e+00  -0.506 0.613302    
## Exterior1stVinylSd  -2.044e+00  2.163e+00  -0.945 0.344781    
## Exterior1stWd Sdng  -1.587e+00  2.071e+00  -0.766 0.443780    
## Exterior1stWdShing  -8.988e-01  2.201e+00  -0.408 0.683067    
## Exterior2ndAsphShn   4.044e+00  3.281e+00   1.233 0.218022    
## Exterior2ndBrk Cmn   3.863e+00  3.568e+00   1.083 0.279149    
## Exterior2ndBrkFace   1.638e+00  2.208e+00   0.742 0.458239    
## Exterior2ndCBlock           NA         NA      NA       NA    
## Exterior2ndCmentBd   3.095e+00  3.336e+00   0.928 0.353797    
## Exterior2ndHdBoard   1.901e+00  2.048e+00   0.928 0.353576    
## Exterior2ndImStucc   5.069e+00  2.281e+00   2.222 0.026492 *  
## Exterior2ndMetalSd   1.584e+00  2.382e+00   0.665 0.506316    
## Exterior2ndOther    -5.194e-01  4.170e+00  -0.125 0.900897    
## Exterior2ndPlywood   1.770e+00  1.982e+00   0.893 0.371889    
## Exterior2ndStone     1.218e-01  4.193e+00   0.029 0.976829    
## Exterior2ndStucco    1.646e+00  2.201e+00   0.748 0.454785    
## Exterior2ndVinylSd   2.688e+00  2.080e+00   1.292 0.196634    
## Exterior2ndWd Sdng   2.208e+00  1.964e+00   1.124 0.261161    
## Exterior2ndWd Shng   1.247e+00  2.052e+00   0.608 0.543636    
## ExterQualFa         -3.922e+00  1.999e+00  -1.963 0.049988 *  
## ExterQualGd         -3.864e+00  7.633e-01  -5.062 4.97e-07 ***
## ExterQualTA         -3.626e+00  8.694e-01  -4.171 3.30e-05 ***
## FoundationCBlock     5.432e-01  5.378e-01   1.010 0.312777    
## FoundationPConc      5.290e-01  5.836e-01   0.906 0.364908    
## FoundationStone      4.195e-01  1.599e+00   0.262 0.793103    
## FoundationWood      -5.557e+00  2.727e+00  -2.038 0.041800 *  
## BsmtQualFa          -2.233e+00  9.789e-01  -2.281 0.022782 *  
## BsmtQualGd          -3.016e+00  5.251e-01  -5.743 1.25e-08 ***
## BsmtQualTA          -2.946e+00  6.680e-01  -4.410 1.15e-05 ***
## BsmtExposureGd       1.985e+00  5.177e-01   3.834 0.000134 ***
## BsmtExposureMn      -8.730e-01  5.111e-01  -1.708 0.087930 .  
## BsmtExposureNo      -1.170e+00  3.640e-01  -3.214 0.001354 ** 
## BsmtFinType1BLQ      2.138e-01  4.868e-01   0.439 0.660649    
## BsmtFinType1GLQ      9.523e-01  4.333e-01   2.198 0.028206 *  
## BsmtFinType1LwQ     -9.165e-01  6.341e-01  -1.445 0.148684    
## BsmtFinType1Rec     -1.863e-01  4.974e-01  -0.375 0.708096    
## BsmtFinType1Unf      4.898e-01  4.910e-01   0.998 0.318761    
## BsmtFinType2BLQ     -2.780e-01  1.304e+00  -0.213 0.831225    
## BsmtFinType2GLQ     -3.146e-01  1.596e+00  -0.197 0.843769    
## BsmtFinType2LwQ     -4.751e-01  1.270e+00  -0.374 0.708361    
## BsmtFinType2Rec     -7.747e-01  1.236e+00  -0.627 0.531041    
## BsmtFinType2Unf      4.348e-01  1.093e+00   0.398 0.690910    
## KitchenQualFa       -3.201e+00  1.070e+00  -2.993 0.002837 ** 
## KitchenQualGd       -3.464e+00  5.620e-01  -6.164 1.05e-09 ***
## KitchenQualTA       -3.386e+00  6.493e-01  -5.215 2.25e-07 ***
## GarageTypeAttchd     3.805e+00  1.768e+00   2.153 0.031606 *  
## GarageTypeBasment    4.514e+00  2.059e+00   2.193 0.028575 *  
## GarageTypeBuiltIn    3.600e+00  1.853e+00   1.943 0.052327 .  
## GarageTypeCarPort    6.391e+00  2.479e+00   2.578 0.010074 *  
## GarageTypeDetchd     4.257e+00  1.754e+00   2.428 0.015386 *  
## GarageQualFa        -1.453e+01  4.474e+00  -3.248 0.001204 ** 
## GarageQualGd        -1.298e+01  4.624e+00  -2.807 0.005106 ** 
## GarageQualPo        -1.682e+01  5.490e+00  -3.064 0.002246 ** 
## GarageQualTA        -1.404e+01  4.422e+00  -3.175 0.001547 ** 
## GarageCondFa         1.383e+01  5.129e+00   2.697 0.007130 ** 
## GarageCondGd         1.371e+01  5.461e+00   2.511 0.012194 *  
## GarageCondPo         1.334e+01  5.604e+00   2.380 0.017502 *  
## GarageCondTA         1.380e+01  5.064e+00   2.724 0.006569 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.449 on 957 degrees of freedom
##   (359 observations deleted due to missingness)
## Multiple R-squared:  0.9227, Adjusted R-squared:  0.9112 
## F-statistic: 79.94 on 143 and 957 DF,  p-value: < 2.2e-16


Model 3

From model 3 below we identify two more predictors to batch eliminate based on p-values that are too high. They were on the border of being eliminated next in model 2, however they seem to have even less impact in model 3 than in model 2.

  • LotFrontage
  • LandContour
lm3 <- update(lm2, .~. -Exterior1st -Exterior2nd -Foundation -BsmtFinType1 -BsmtFinType2, data=train)
summary(lm3)
## 
## Call:
## lm(formula = ssp ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + 
##     GarageArea + MSZoning + LandContour + LotConfig + LandSlope + 
##     Neighborhood + Condition1 + Condition2 + BldgType + HouseStyle + 
##     RoofMatl + ExterQual + BsmtQual + BsmtExposure + KitchenQual + 
##     GarageType + GarageQual + GarageCond, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.4647  -1.5244   0.0051   1.3967  25.4647 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.393e+02  2.277e+01 -10.513  < 2e-16 ***
## LotFrontage          1.002e-02  7.503e-03   1.336 0.181847    
## LotArea              1.178e-04  1.906e-05   6.182 9.22e-10 ***
## OverallQual          1.085e+00  1.697e-01   6.392 2.50e-10 ***
## OverallCond          1.031e+00  1.267e-01   8.133 1.23e-15 ***
## YearBuilt            6.969e-02  1.052e-02   6.625 5.65e-11 ***
## BsmtUnfSF           -2.628e-03  3.026e-04  -8.685  < 2e-16 ***
## TotalBsmtSF          6.187e-03  7.995e-04   7.738 2.46e-14 ***
## X1stFlrSF           -3.938e-03  1.189e-03  -3.313 0.000958 ***
## GrLivArea            1.041e-02  8.073e-04  12.891  < 2e-16 ***
## GarageArea           3.722e-03  8.776e-04   4.241 2.43e-05 ***
## MSZoningFV           5.122e+00  1.870e+00   2.738 0.006283 ** 
## MSZoningRH           4.400e+00  2.035e+00   2.162 0.030885 *  
## MSZoningRL           4.121e+00  1.629e+00   2.530 0.011545 *  
## MSZoningRM           3.581e+00  1.510e+00   2.372 0.017893 *  
## LandContourHLS       1.370e+00  8.610e-01   1.592 0.111791    
## LandContourLow      -2.482e+00  1.321e+00  -1.878 0.060666 .  
## LandContourLvl       7.501e-01  6.358e-01   1.180 0.238395    
## LotConfigCulDSac     1.618e+00  6.853e-01   2.361 0.018393 *  
## LotConfigFR2        -1.691e+00  7.430e-01  -2.276 0.023083 *  
## LotConfigFR3        -2.155e+00  1.914e+00  -1.126 0.260601    
## LotConfigInside     -1.040e-01  3.109e-01  -0.335 0.737979    
## LandSlopeMod         3.104e-01  6.796e-01   0.457 0.647957    
## LandSlopeSev        -6.748e+00  2.138e+00  -3.156 0.001647 ** 
## NeighborhoodBlueste -5.888e-01  2.796e+00  -0.211 0.833224    
## NeighborhoodBrDale   1.427e-01  1.676e+00   0.085 0.932183    
## NeighborhoodBrkSide -1.717e-01  1.507e+00  -0.114 0.909267    
## NeighborhoodClearCr -2.453e+00  1.611e+00  -1.523 0.128066    
## NeighborhoodCollgCr -1.794e+00  1.115e+00  -1.609 0.107896    
## NeighborhoodCrawfor  1.870e+00  1.338e+00   1.398 0.162335    
## NeighborhoodEdwards -3.404e+00  1.242e+00  -2.742 0.006217 ** 
## NeighborhoodGilbert -1.727e+00  1.218e+00  -1.418 0.156422    
## NeighborhoodIDOTRR  -9.836e-02  1.708e+00  -0.058 0.954082    
## NeighborhoodMeadowV -1.589e+00  1.701e+00  -0.934 0.350452    
## NeighborhoodMitchel -3.526e+00  1.309e+00  -2.693 0.007191 ** 
## NeighborhoodNAmes   -2.924e+00  1.193e+00  -2.452 0.014364 *  
## NeighborhoodNoRidge  3.758e+00  1.274e+00   2.950 0.003256 ** 
## NeighborhoodNPkVill  4.216e-01  1.725e+00   0.244 0.806937    
## NeighborhoodNridgHt  2.425e+00  1.133e+00   2.141 0.032502 *  
## NeighborhoodNWAmes  -3.690e+00  1.245e+00  -2.964 0.003113 ** 
## NeighborhoodOldTown -1.670e+00  1.514e+00  -1.103 0.270287    
## NeighborhoodSawyer  -2.263e+00  1.286e+00  -1.759 0.078814 .  
## NeighborhoodSawyerW -1.479e+00  1.194e+00  -1.239 0.215619    
## NeighborhoodSomerst  3.526e-01  1.340e+00   0.263 0.792460    
## NeighborhoodStoneBr  5.593e+00  1.298e+00   4.309 1.80e-05 ***
## NeighborhoodSWISU   -1.194e+00  1.537e+00  -0.777 0.437375    
## NeighborhoodTimber  -2.345e+00  1.277e+00  -1.836 0.066625 .  
## NeighborhoodVeenker  7.326e-01  1.723e+00   0.425 0.670796    
## Condition1Feedr      2.537e-01  8.495e-01   0.299 0.765297    
## Condition1Norm       1.443e+00  6.586e-01   2.191 0.028698 *  
## Condition1PosA       1.924e+00  2.158e+00   0.892 0.372840    
## Condition1PosN       1.136e+00  1.560e+00   0.728 0.466876    
## Condition1RRAe      -1.531e+00  1.488e+00  -1.029 0.303957    
## Condition1RRAn       1.106e+00  1.074e+00   1.030 0.303358    
## Condition1RRNe       1.157e+00  3.606e+00   0.321 0.748414    
## Condition1RRNn      -5.344e-02  2.149e+00  -0.025 0.980171    
## Condition2Feedr      4.055e-01  3.649e+00   0.111 0.911549    
## Condition2Norm       1.509e+00  3.124e+00   0.483 0.629228    
## Condition2PosA       7.333e+00  4.985e+00   1.471 0.141621    
## Condition2PosN      -3.044e+01  4.287e+00  -7.100 2.37e-12 ***
## Condition2RRNn       1.849e+00  4.119e+00   0.449 0.653717    
## BldgType2fmCon      -1.440e+00  9.224e-01  -1.561 0.118742    
## BldgTypeDuplex      -4.359e+00  8.730e-01  -4.993 7.01e-07 ***
## BldgTypeTwnhs       -3.232e+00  8.689e-01  -3.720 0.000210 ***
## BldgTypeTwnhsE      -2.363e+00  5.866e-01  -4.029 6.02e-05 ***
## HouseStyle1.5Unf     2.664e+00  1.326e+00   2.009 0.044792 *  
## HouseStyle1Story     2.008e+00  6.432e-01   3.122 0.001848 ** 
## HouseStyle2.5Fin    -4.888e+00  1.856e+00  -2.634 0.008563 ** 
## HouseStyle2.5Unf    -2.398e+00  1.423e+00  -1.685 0.092274 .  
## HouseStyle2Story    -5.532e-01  5.269e-01  -1.050 0.294062    
## HouseStyleSFoyer     1.559e+00  1.047e+00   1.489 0.136689    
## HouseStyleSLvl       8.815e-01  8.331e-01   1.058 0.290236    
## RoofMatlCompShg      9.029e+01  4.616e+00  19.559  < 2e-16 ***
## RoofMatlMembran      9.885e+01  6.364e+00  15.533  < 2e-16 ***
## RoofMatlRoll         9.243e+01  5.888e+00  15.700  < 2e-16 ***
## RoofMatlTar&Grv      8.730e+01  4.938e+00  17.680  < 2e-16 ***
## RoofMatlWdShake      8.959e+01  5.564e+00  16.103  < 2e-16 ***
## RoofMatlWdShngl      9.697e+01  4.833e+00  20.063  < 2e-16 ***
## ExterQualFa         -4.193e+00  1.789e+00  -2.344 0.019280 *  
## ExterQualGd         -3.946e+00  7.476e-01  -5.279 1.59e-07 ***
## ExterQualTA         -3.923e+00  8.490e-01  -4.621 4.32e-06 ***
## BsmtQualFa          -2.518e+00  9.632e-01  -2.615 0.009069 ** 
## BsmtQualGd          -3.327e+00  5.108e-01  -6.512 1.17e-10 ***
## BsmtQualTA          -3.306e+00  6.437e-01  -5.136 3.37e-07 ***
## BsmtExposureGd       2.252e+00  5.091e-01   4.424 1.07e-05 ***
## BsmtExposureMn      -7.599e-01  5.054e-01  -1.503 0.133057    
## BsmtExposureNo      -1.105e+00  3.588e-01  -3.081 0.002121 ** 
## KitchenQualFa       -3.491e+00  1.048e+00  -3.332 0.000894 ***
## KitchenQualGd       -3.678e+00  5.500e-01  -6.688 3.77e-11 ***
## KitchenQualTA       -3.762e+00  6.316e-01  -5.957 3.56e-09 ***
## GarageTypeAttchd     3.664e+00  1.697e+00   2.160 0.031024 *  
## GarageTypeBasment    4.101e+00  1.978e+00   2.074 0.038354 *  
## GarageTypeBuiltIn    3.547e+00  1.785e+00   1.986 0.047269 *  
## GarageTypeCarPort    5.238e+00  2.348e+00   2.231 0.025921 *  
## GarageTypeDetchd     3.976e+00  1.680e+00   2.366 0.018174 *  
## GarageQualFa        -1.422e+01  4.480e+00  -3.174 0.001547 ** 
## GarageQualGd        -1.312e+01  4.622e+00  -2.838 0.004629 ** 
## GarageQualPo        -1.685e+01  5.465e+00  -3.084 0.002100 ** 
## GarageQualTA        -1.395e+01  4.425e+00  -3.153 0.001664 ** 
## GarageCondFa         1.404e+01  5.124e+00   2.741 0.006243 ** 
## GarageCondGd         1.351e+01  5.440e+00   2.483 0.013195 *  
## GarageCondPo         1.377e+01  5.591e+00   2.462 0.013979 *  
## GarageCondTA         1.412e+01  5.061e+00   2.790 0.005371 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.48 on 999 degrees of freedom
##   (358 observations deleted due to missingness)
## Multiple R-squared:  0.918,  Adjusted R-squared:  0.9096 
## F-statistic: 109.6 on 102 and 999 DF,  p-value: < 2.2e-16


Model 4

From model 4 below we identify one more predictor to eliminate based on p-values that are too high. While one of the predictor’s values has a p-value below 0.05, the average of all four are well above 0.05. (Note, there’s a fifth-value of the predictor but it’s a default with no coefficient in the model. It’s the equivalent of selecting none of the other four options.)

However there’s six more variables to consider removing. In the Predictor Triage section, we’re going to run each one and compare just the resulting Adjusted R-Squared and the number of missing variables to Model 5 so we can work out an order to remove them.

  • LotConfig
  • LandSlope
  • Neighborhood
  • Condition1
  • Condition2
  • HouseStyle
  • GarageType
lm4 <- update(lm3, .~. -LotFrontage -LandContour, data=train)
summary(lm4)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + GarageArea + 
##     MSZoning + LotConfig + LandSlope + Neighborhood + Condition1 + 
##     Condition2 + BldgType + HouseStyle + RoofMatl + ExterQual + 
##     BsmtQual + BsmtExposure + KitchenQual + GarageType + GarageQual + 
##     GarageCond, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.487  -1.397   0.000   1.467  25.487 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.258e+02  2.038e+01 -11.075  < 2e-16 ***
## LotArea              9.522e-05  1.360e-05   7.000 4.18e-12 ***
## OverallQual          1.176e+00  1.468e-01   8.010 2.61e-15 ***
## OverallCond          9.617e-01  1.094e-01   8.787  < 2e-16 ***
## YearBuilt            6.657e-02  9.484e-03   7.020 3.64e-12 ***
## BsmtUnfSF           -2.390e-03  2.703e-04  -8.841  < 2e-16 ***
## TotalBsmtSF          5.639e-03  6.842e-04   8.241 4.27e-16 ***
## X1stFlrSF           -3.897e-03  1.013e-03  -3.847 0.000126 ***
## GrLivArea            1.054e-02  6.977e-04  15.101  < 2e-16 ***
## GarageArea           3.965e-03  7.701e-04   5.148 3.05e-07 ***
## MSZoningFV           5.256e+00  1.765e+00   2.978 0.002956 ** 
## MSZoningRH           4.316e+00  1.871e+00   2.306 0.021272 *  
## MSZoningRL           4.325e+00  1.528e+00   2.831 0.004716 ** 
## MSZoningRM           3.822e+00  1.436e+00   2.662 0.007861 ** 
## LotConfigCulDSac     8.916e-01  4.554e-01   1.958 0.050492 .  
## LotConfigFR2        -1.344e+00  5.950e-01  -2.259 0.024083 *  
## LotConfigFR3        -2.130e+00  1.857e+00  -1.147 0.251663    
## LotConfigInside     -9.965e-02  2.573e-01  -0.387 0.698632    
## LandSlopeMod         1.860e-01  5.231e-01   0.356 0.722266    
## LandSlopeSev        -6.043e+00  1.494e+00  -4.045 5.55e-05 ***
## NeighborhoodBlueste -8.443e-01  2.686e+00  -0.314 0.753367    
## NeighborhoodBrDale  -5.399e-02  1.533e+00  -0.035 0.971902    
## NeighborhoodBrkSide -1.921e-01  1.339e+00  -0.143 0.885946    
## NeighborhoodClearCr -2.528e+00  1.265e+00  -1.998 0.045939 *  
## NeighborhoodCollgCr -1.860e+00  9.921e-01  -1.875 0.060996 .  
## NeighborhoodCrawfor  1.812e+00  1.168e+00   1.551 0.121129    
## NeighborhoodEdwards -3.137e+00  1.112e+00  -2.821 0.004870 ** 
## NeighborhoodGilbert -1.533e+00  1.061e+00  -1.444 0.149051    
## NeighborhoodIDOTRR  -4.464e-01  1.530e+00  -0.292 0.770521    
## NeighborhoodMeadowV -1.541e+00  1.502e+00  -1.026 0.305018    
## NeighborhoodMitchel -3.306e+00  1.143e+00  -2.891 0.003904 ** 
## NeighborhoodNAmes   -2.974e+00  1.059e+00  -2.807 0.005073 ** 
## NeighborhoodNoRidge  3.624e+00  1.128e+00   3.214 0.001343 ** 
## NeighborhoodNPkVill  1.922e-01  1.496e+00   0.128 0.897825    
## NeighborhoodNridgHt  2.336e+00  1.023e+00   2.284 0.022531 *  
## NeighborhoodNWAmes  -3.389e+00  1.088e+00  -3.116 0.001876 ** 
## NeighborhoodOldTown -1.901e+00  1.354e+00  -1.404 0.160697    
## NeighborhoodSawyer  -2.414e+00  1.122e+00  -2.151 0.031664 *  
## NeighborhoodSawyerW -1.529e+00  1.072e+00  -1.427 0.153965    
## NeighborhoodSomerst  1.031e-01  1.234e+00   0.084 0.933435    
## NeighborhoodStoneBr  4.208e+00  1.131e+00   3.722 0.000206 ***
## NeighborhoodSWISU   -1.797e+00  1.393e+00  -1.290 0.197257    
## NeighborhoodTimber  -2.497e+00  1.124e+00  -2.222 0.026475 *  
## NeighborhoodVeenker -4.497e-01  1.423e+00  -0.316 0.752062    
## Condition1Feedr      4.659e-01  7.842e-01   0.594 0.552578    
## Condition1Norm       1.392e+00  6.281e-01   2.217 0.026837 *  
## Condition1PosA       9.222e-01  1.425e+00   0.647 0.517597    
## Condition1PosN       1.598e+00  1.077e+00   1.484 0.138065    
## Condition1RRAe      -2.244e+00  1.320e+00  -1.700 0.089435 .  
## Condition1RRAn       1.028e+00  1.001e+00   1.028 0.304370    
## Condition1RRNe      -2.716e-01  2.567e+00  -0.106 0.915776    
## Condition1RRNn       8.310e-01  1.841e+00   0.451 0.651790    
## Condition2Feedr     -1.495e+00  3.481e+00  -0.429 0.667729    
## Condition2Norm       2.510e-02  2.974e+00   0.008 0.993268    
## Condition2PosA       5.007e+00  4.775e+00   1.049 0.294513    
## Condition2PosN      -3.191e+01  4.010e+00  -7.958 3.89e-15 ***
## Condition2RRAe      -3.081e+00  4.719e+00  -0.653 0.514028    
## Condition2RRAn      -1.289e-01  4.660e+00  -0.028 0.977933    
## Condition2RRNn       4.235e-01  3.942e+00   0.107 0.914450    
## BldgType2fmCon      -1.818e+00  8.560e-01  -2.124 0.033874 *  
## BldgTypeDuplex      -4.124e+00  7.948e-01  -5.189 2.47e-07 ***
## BldgTypeTwnhs       -3.566e+00  7.667e-01  -4.651 3.65e-06 ***
## BldgTypeTwnhsE      -2.569e+00  5.034e-01  -5.103 3.86e-07 ***
## HouseStyle1.5Unf     2.598e+00  1.233e+00   2.107 0.035329 *  
## HouseStyle1Story     2.118e+00  5.693e-01   3.721 0.000208 ***
## HouseStyle2.5Fin    -4.996e+00  1.759e+00  -2.840 0.004586 ** 
## HouseStyle2.5Unf    -1.762e+00  1.278e+00  -1.379 0.168287    
## HouseStyle2Story    -6.911e-01  4.657e-01  -1.484 0.138068    
## HouseStyleSFoyer     1.412e+00  9.138e-01   1.546 0.122466    
## HouseStyleSLvl       1.076e+00  6.932e-01   1.552 0.120941    
## RoofMatlCompShg      8.625e+01  4.136e+00  20.855  < 2e-16 ***
## RoofMatlMembran      9.250e+01  5.647e+00  16.380  < 2e-16 ***
## RoofMatlMetal        9.308e+01  5.708e+00  16.307  < 2e-16 ***
## RoofMatlRoll         8.869e+01  5.480e+00  16.183  < 2e-16 ***
## RoofMatlTar&Grv      8.452e+01  4.355e+00  19.407  < 2e-16 ***
## RoofMatlWdShake      8.759e+01  4.515e+00  19.399  < 2e-16 ***
## RoofMatlWdShngl      9.218e+01  4.323e+00  21.321  < 2e-16 ***
## ExterQualFa         -3.686e+00  1.673e+00  -2.204 0.027725 *  
## ExterQualGd         -3.429e+00  6.742e-01  -5.086 4.22e-07 ***
## ExterQualTA         -3.461e+00  7.502e-01  -4.614 4.37e-06 ***
## BsmtQualFa          -2.743e+00  9.053e-01  -3.030 0.002499 ** 
## BsmtQualGd          -3.521e+00  4.678e-01  -7.527 9.94e-14 ***
## BsmtQualTA          -3.434e+00  5.743e-01  -5.980 2.91e-09 ***
## BsmtExposureGd       2.342e+00  4.339e-01   5.398 8.05e-08 ***
## BsmtExposureMn      -5.742e-01  4.435e-01  -1.295 0.195730    
## BsmtExposureNo      -1.101e+00  3.162e-01  -3.483 0.000513 ***
## KitchenQualFa       -3.929e+00  9.490e-01  -4.140 3.71e-05 ***
## KitchenQualGd       -3.906e+00  5.044e-01  -7.743 2.00e-14 ***
## KitchenQualTA       -4.081e+00  5.658e-01  -7.213 9.46e-13 ***
## GarageTypeAttchd     3.385e+00  1.527e+00   2.217 0.026835 *  
## GarageTypeBasment    3.200e+00  1.756e+00   1.822 0.068681 .  
## GarageTypeBuiltIn    3.276e+00  1.597e+00   2.051 0.040453 *  
## GarageTypeCarPort    2.539e+00  2.102e+00   1.208 0.227334    
## GarageTypeDetchd     3.520e+00  1.514e+00   2.325 0.020206 *  
## GarageQualFa        -1.434e+01  4.311e+00  -3.327 0.000903 ***
## GarageQualGd        -1.307e+01  4.420e+00  -2.958 0.003158 ** 
## GarageQualPo        -1.662e+01  5.145e+00  -3.230 0.001269 ** 
## GarageQualTA        -1.422e+01  4.263e+00  -3.335 0.000878 ***
## GarageCondFa         1.379e+01  4.965e+00   2.778 0.005560 ** 
## GarageCondGd         1.342e+01  5.138e+00   2.612 0.009099 ** 
## GarageCondPo         1.361e+01  5.312e+00   2.563 0.010508 *  
## GarageCondTA         1.417e+01  4.911e+00   2.886 0.003974 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.418 on 1246 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9103, Adjusted R-squared:  0.903 
## F-statistic: 125.2 on 101 and 1246 DF,  p-value: < 2.2e-16


Predictor Triage

After running the seven versions of model five below we note the following Adjusted R-Squared values for the model. In none of them were records with missing values added back in so we can compare between the Adjusted R-Squared values.


LotConfig 0.9023538 LandSlope 0.9018605 Neighborhood 0.8860973 Condition1 0.9022634 Condition2 0.8922981 HouseStyle 0.9016426 GarageType 0.9029784


Model 5a

Here we try removing LotConfig.

lm5a <- update(lm4, .~. -LotConfig, data=train)

summary(lm5a)$adj.r.squared
## [1] 0.9023538
length(summary(lm5a)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5b

Here we try removing LandSlope.

lm5b <- update(lm4, .~. -LandSlope, data=train)

summary(lm5b)$adj.r.squared
## [1] 0.9018605
length(summary(lm5b)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5c

Here we try removing Neighborhood.

lm5c <- update(lm4, .~. -Neighborhood, data=train)

summary(lm5c)$adj.r.squared
## [1] 0.8860973
length(summary(lm5c)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5d

Here we try removing Condition1.

lm5d <- update(lm4, .~. -Condition1, data=train)

summary(lm5d)$adj.r.squared
## [1] 0.9022634
length(summary(lm5d)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5e

Here we try removing Condition2.

lm5e <- update(lm4, .~. -Condition2, data=train)

summary(lm5e)$adj.r.squared
## [1] 0.8922981
length(summary(lm5d)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5f

Here we try removing HouseStyle.

lm5f <- update(lm4, .~. -HouseStyle, data=train)

summary(lm5f)$adj.r.squared
## [1] 0.9016426
length(summary(lm5d)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Model 5g

Here we try removing GarageType.

lm5g <- update(lm4, .~. -GarageType, data=train)

summary(lm5g)$adj.r.squared
## [1] 0.9029784
length(summary(lm5d)$na.action)
## [1] 112
summary(lm4)$adj.r.squared
## [1] 0.9030483
length(summary(lm4)$na.action)
## [1] 112


Remove the Triaged Predictors

We have the first seven we want to try to eliminate and the order we’ll try eliminating them. We’ll be on the look out for observations after each elimination.


GarageType 0.9029784 Condition1 0.9022634 LotConfig 0.9023538 LandSlope 0.9018605 HouseStyle 0.9016426 Condition2 0.8922981 Neighborhood 0.8860973


Model 5

Here we remove GarageType with no observations except to continue.

lm5 <- update(lm4, .~. -GarageType, data=train)
summary(lm5)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + GarageArea + 
##     MSZoning + LotConfig + LandSlope + Neighborhood + Condition1 + 
##     Condition2 + BldgType + HouseStyle + RoofMatl + ExterQual + 
##     BsmtQual + BsmtExposure + KitchenQual + GarageQual + GarageCond, 
##     data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5628  -1.4026  -0.0165   1.4816  25.5628 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.227e+02  1.986e+01 -11.211  < 2e-16 ***
## LotArea              9.390e-05  1.356e-05   6.923 7.07e-12 ***
## OverallQual          1.196e+00  1.457e-01   8.212 5.39e-16 ***
## OverallCond          9.760e-01  1.092e-01   8.935  < 2e-16 ***
## YearBuilt            6.667e-02  9.210e-03   7.239 7.87e-13 ***
## BsmtUnfSF           -2.410e-03  2.687e-04  -8.970  < 2e-16 ***
## TotalBsmtSF          5.700e-03  6.730e-04   8.470  < 2e-16 ***
## X1stFlrSF           -3.952e-03  9.623e-04  -4.107 4.27e-05 ***
## GrLivArea            1.052e-02  6.449e-04  16.309  < 2e-16 ***
## GarageArea           3.660e-03  7.388e-04   4.954 8.26e-07 ***
## MSZoningFV           5.221e+00  1.757e+00   2.972 0.003019 ** 
## MSZoningRH           3.971e+00  1.859e+00   2.136 0.032861 *  
## MSZoningRL           4.311e+00  1.522e+00   2.832 0.004697 ** 
## MSZoningRM           3.784e+00  1.431e+00   2.644 0.008294 ** 
## LotConfigCulDSac     8.741e-01  4.554e-01   1.919 0.055165 .  
## LotConfigFR2        -1.361e+00  5.942e-01  -2.291 0.022116 *  
## LotConfigFR3        -2.084e+00  1.856e+00  -1.123 0.261643    
## LotConfigInside     -1.315e-01  2.563e-01  -0.513 0.607983    
## LandSlopeMod         8.828e-02  5.175e-01   0.171 0.864572    
## LandSlopeSev        -5.878e+00  1.491e+00  -3.942 8.53e-05 ***
## NeighborhoodBlueste -8.392e-01  2.683e+00  -0.313 0.754476    
## NeighborhoodBrDale   7.230e-02  1.530e+00   0.047 0.962306    
## NeighborhoodBrkSide -7.662e-02  1.332e+00  -0.058 0.954141    
## NeighborhoodClearCr -2.483e+00  1.264e+00  -1.965 0.049656 *  
## NeighborhoodCollgCr -1.845e+00  9.914e-01  -1.861 0.063013 .  
## NeighborhoodCrawfor  1.823e+00  1.165e+00   1.564 0.118066    
## NeighborhoodEdwards -3.083e+00  1.104e+00  -2.792 0.005325 ** 
## NeighborhoodGilbert -1.577e+00  1.060e+00  -1.487 0.137138    
## NeighborhoodIDOTRR  -3.559e-01  1.528e+00  -0.233 0.815879    
## NeighborhoodMeadowV -1.541e+00  1.493e+00  -1.033 0.302015    
## NeighborhoodMitchel -3.404e+00  1.142e+00  -2.980 0.002936 ** 
## NeighborhoodNAmes   -2.960e+00  1.059e+00  -2.794 0.005281 ** 
## NeighborhoodNoRidge  3.659e+00  1.125e+00   3.252 0.001176 ** 
## NeighborhoodNPkVill  1.883e-01  1.493e+00   0.126 0.899603    
## NeighborhoodNridgHt  2.348e+00  1.020e+00   2.301 0.021529 *  
## NeighborhoodNWAmes  -3.379e+00  1.087e+00  -3.109 0.001922 ** 
## NeighborhoodOldTown -1.799e+00  1.352e+00  -1.330 0.183659    
## NeighborhoodSawyer  -2.395e+00  1.123e+00  -2.133 0.033091 *  
## NeighborhoodSawyerW -1.507e+00  1.072e+00  -1.406 0.160019    
## NeighborhoodSomerst  1.715e-01  1.234e+00   0.139 0.889419    
## NeighborhoodStoneBr  4.209e+00  1.129e+00   3.727 0.000202 ***
## NeighborhoodSWISU   -1.675e+00  1.384e+00  -1.210 0.226484    
## NeighborhoodTimber  -2.573e+00  1.123e+00  -2.292 0.022050 *  
## NeighborhoodVeenker -4.570e-01  1.423e+00  -0.321 0.748150    
## Condition1Feedr      4.234e-01  7.827e-01   0.541 0.588591    
## Condition1Norm       1.336e+00  6.276e-01   2.128 0.033496 *  
## Condition1PosA       9.347e-01  1.425e+00   0.656 0.511873    
## Condition1PosN       1.552e+00  1.077e+00   1.442 0.149676    
## Condition1RRAe      -2.231e+00  1.309e+00  -1.704 0.088677 .  
## Condition1RRAn       9.686e-01  1.000e+00   0.968 0.333112    
## Condition1RRNe      -3.860e-01  2.567e+00  -0.150 0.880498    
## Condition1RRNn       5.534e-01  1.820e+00   0.304 0.761160    
## Condition2Feedr     -9.538e-01  3.359e+00  -0.284 0.776518    
## Condition2Norm       5.517e-01  2.836e+00   0.195 0.845810    
## Condition2PosA       5.444e+00  4.719e+00   1.154 0.248854    
## Condition2PosN      -3.133e+01  3.905e+00  -8.024 2.34e-15 ***
## Condition2RRAe      -2.464e+00  4.603e+00  -0.535 0.592573    
## Condition2RRAn       7.167e-01  4.541e+00   0.158 0.874616    
## Condition2RRNn       1.046e+00  3.828e+00   0.273 0.784636    
## BldgType2fmCon      -1.921e+00  8.523e-01  -2.254 0.024388 *  
## BldgTypeDuplex      -4.297e+00  7.782e-01  -5.521 4.09e-08 ***
## BldgTypeTwnhs       -3.580e+00  7.583e-01  -4.721 2.61e-06 ***
## BldgTypeTwnhsE      -2.589e+00  4.994e-01  -5.184 2.52e-07 ***
## HouseStyle1.5Unf     2.662e+00  1.222e+00   2.179 0.029501 *  
## HouseStyle1Story     2.103e+00  5.547e-01   3.792 0.000157 ***
## HouseStyle2.5Fin    -4.932e+00  1.740e+00  -2.834 0.004664 ** 
## HouseStyle2.5Unf    -1.711e+00  1.274e+00  -1.343 0.179588    
## HouseStyle2Story    -7.253e-01  4.625e-01  -1.568 0.117065    
## HouseStyleSFoyer     1.461e+00  9.023e-01   1.620 0.105548    
## HouseStyleSLvl       1.055e+00  6.728e-01   1.567 0.117295    
## RoofMatlCompShg      8.609e+01  4.134e+00  20.828  < 2e-16 ***
## RoofMatlMembran      9.207e+01  5.645e+00  16.312  < 2e-16 ***
## RoofMatlMetal        9.275e+01  5.704e+00  16.261  < 2e-16 ***
## RoofMatlRoll         8.883e+01  5.477e+00  16.219  < 2e-16 ***
## RoofMatlTar&Grv      8.426e+01  4.348e+00  19.377  < 2e-16 ***
## RoofMatlWdShake      8.740e+01  4.514e+00  19.361  < 2e-16 ***
## RoofMatlWdShngl      9.206e+01  4.322e+00  21.302  < 2e-16 ***
## ExterQualFa         -3.774e+00  1.632e+00  -2.312 0.020920 *  
## ExterQualGd         -3.451e+00  6.739e-01  -5.121 3.51e-07 ***
## ExterQualTA         -3.483e+00  7.491e-01  -4.649 3.68e-06 ***
## BsmtQualFa          -2.756e+00  9.043e-01  -3.048 0.002353 ** 
## BsmtQualGd          -3.579e+00  4.664e-01  -7.673 3.37e-14 ***
## BsmtQualTA          -3.530e+00  5.715e-01  -6.176 8.87e-10 ***
## BsmtExposureGd       2.345e+00  4.331e-01   5.416 7.31e-08 ***
## BsmtExposureMn      -5.504e-01  4.434e-01  -1.241 0.214756    
## BsmtExposureNo      -1.090e+00  3.161e-01  -3.449 0.000582 ***
## KitchenQualFa       -3.827e+00  9.445e-01  -4.052 5.40e-05 ***
## KitchenQualGd       -3.826e+00  5.017e-01  -7.626 4.76e-14 ***
## KitchenQualTA       -3.997e+00  5.631e-01  -7.098 2.11e-12 ***
## GarageQualFa        -1.451e+01  4.305e+00  -3.370 0.000775 ***
## GarageQualGd        -1.305e+01  4.414e+00  -2.956 0.003179 ** 
## GarageQualPo        -1.673e+01  5.140e+00  -3.254 0.001168 ** 
## GarageQualTA        -1.427e+01  4.257e+00  -3.351 0.000829 ***
## GarageCondFa         1.371e+01  4.960e+00   2.764 0.005785 ** 
## GarageCondGd         1.328e+01  5.130e+00   2.589 0.009746 ** 
## GarageCondPo         1.361e+01  5.306e+00   2.565 0.010430 *  
## GarageCondTA         1.408e+01  4.907e+00   2.870 0.004169 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.419 on 1251 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9099, Adjusted R-squared:  0.903 
## F-statistic: 131.6 on 96 and 1251 DF,  p-value: < 2.2e-16


Model 6

Here we remove LotConfig with no observations except to continue.

lm6 <- update(lm5, .~. -LotConfig, data=train)
summary(lm6)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + GarageArea + 
##     MSZoning + LandSlope + Neighborhood + Condition1 + Condition2 + 
##     BldgType + HouseStyle + RoofMatl + ExterQual + BsmtQual + 
##     BsmtExposure + KitchenQual + GarageQual + GarageCond, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.7763  -1.3754  -0.0038   1.4847  25.7763 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.251e+02  1.990e+01 -11.312  < 2e-16 ***
## LotArea              9.729e-05  1.347e-05   7.224 8.72e-13 ***
## OverallQual          1.198e+00  1.461e-01   8.197 6.01e-16 ***
## OverallCond          9.867e-01  1.095e-01   9.014  < 2e-16 ***
## YearBuilt            6.749e-02  9.229e-03   7.313 4.64e-13 ***
## BsmtUnfSF           -2.389e-03  2.694e-04  -8.869  < 2e-16 ***
## TotalBsmtSF          5.681e-03  6.746e-04   8.422  < 2e-16 ***
## X1stFlrSF           -3.847e-03  9.652e-04  -3.985 7.12e-05 ***
## GrLivArea            1.052e-02  6.468e-04  16.271  < 2e-16 ***
## GarageArea           3.672e-03  7.404e-04   4.959 8.06e-07 ***
## MSZoningFV           5.262e+00  1.761e+00   2.989 0.002854 ** 
## MSZoningRH           4.174e+00  1.864e+00   2.240 0.025292 *  
## MSZoningRL           4.496e+00  1.525e+00   2.948 0.003255 ** 
## MSZoningRM           3.804e+00  1.436e+00   2.650 0.008161 ** 
## LandSlopeMod         9.149e-02  5.179e-01   0.177 0.859820    
## LandSlopeSev        -5.898e+00  1.495e+00  -3.945 8.43e-05 ***
## NeighborhoodBlueste -5.885e-01  2.691e+00  -0.219 0.826951    
## NeighborhoodBrDale   3.067e-01  1.533e+00   0.200 0.841498    
## NeighborhoodBrkSide  2.040e-02  1.335e+00   0.015 0.987812    
## NeighborhoodClearCr -2.398e+00  1.267e+00  -1.892 0.058715 .  
## NeighborhoodCollgCr -1.799e+00  9.926e-01  -1.813 0.070116 .  
## NeighborhoodCrawfor  1.833e+00  1.169e+00   1.568 0.117080    
## NeighborhoodEdwards -3.035e+00  1.107e+00  -2.742 0.006185 ** 
## NeighborhoodGilbert -1.553e+00  1.061e+00  -1.464 0.143456    
## NeighborhoodIDOTRR  -1.285e-01  1.531e+00  -0.084 0.933087    
## NeighborhoodMeadowV -1.344e+00  1.496e+00  -0.898 0.369131    
## NeighborhoodMitchel -3.292e+00  1.140e+00  -2.887 0.003951 ** 
## NeighborhoodNAmes   -2.956e+00  1.061e+00  -2.786 0.005414 ** 
## NeighborhoodNoRidge  3.701e+00  1.124e+00   3.293 0.001020 ** 
## NeighborhoodNPkVill  1.185e-01  1.497e+00   0.079 0.936909    
## NeighborhoodNridgHt  2.222e+00  1.020e+00   2.179 0.029529 *  
## NeighborhoodNWAmes  -3.363e+00  1.089e+00  -3.088 0.002057 ** 
## NeighborhoodOldTown -1.600e+00  1.354e+00  -1.182 0.237533    
## NeighborhoodSawyer  -2.302e+00  1.124e+00  -2.047 0.040851 *  
## NeighborhoodSawyerW -1.475e+00  1.075e+00  -1.373 0.169985    
## NeighborhoodSomerst  2.447e-01  1.235e+00   0.198 0.842956    
## NeighborhoodStoneBr  4.421e+00  1.129e+00   3.916 9.50e-05 ***
## NeighborhoodSWISU   -1.606e+00  1.388e+00  -1.157 0.247468    
## NeighborhoodTimber  -2.530e+00  1.124e+00  -2.251 0.024584 *  
## NeighborhoodVeenker -4.637e-01  1.416e+00  -0.327 0.743448    
## Condition1Feedr      2.440e-01  7.819e-01   0.312 0.755051    
## Condition1Norm       1.343e+00  6.296e-01   2.132 0.033173 *  
## Condition1PosA       8.511e-01  1.429e+00   0.596 0.551554    
## Condition1PosN       1.732e+00  1.078e+00   1.607 0.108345    
## Condition1RRAe      -2.017e+00  1.309e+00  -1.540 0.123726    
## Condition1RRAn       1.103e+00  1.001e+00   1.103 0.270450    
## Condition1RRNe       1.219e-01  2.567e+00   0.047 0.962134    
## Condition1RRNn       1.802e-01  1.793e+00   0.100 0.919975    
## Condition2Feedr     -1.384e+00  3.349e+00  -0.413 0.679537    
## Condition2Norm       5.223e-01  2.838e+00   0.184 0.854022    
## Condition2PosA       5.123e+00  4.726e+00   1.084 0.278509    
## Condition2PosN      -3.171e+01  3.907e+00  -8.118 1.13e-15 ***
## Condition2RRAe      -2.375e+00  4.609e+00  -0.515 0.606464    
## Condition2RRAn       9.479e-01  4.556e+00   0.208 0.835228    
## Condition2RRNn       1.299e+00  3.827e+00   0.339 0.734397    
## BldgType2fmCon      -1.990e+00  8.543e-01  -2.329 0.020000 *  
## BldgTypeDuplex      -4.365e+00  7.789e-01  -5.604 2.58e-08 ***
## BldgTypeTwnhs       -3.672e+00  7.584e-01  -4.842 1.45e-06 ***
## BldgTypeTwnhsE      -2.577e+00  5.005e-01  -5.149 3.04e-07 ***
## HouseStyle1.5Unf     2.686e+00  1.225e+00   2.192 0.028595 *  
## HouseStyle1Story     2.085e+00  5.561e-01   3.750 0.000185 ***
## HouseStyle2.5Fin    -5.052e+00  1.746e+00  -2.894 0.003869 ** 
## HouseStyle2.5Unf    -1.694e+00  1.278e+00  -1.325 0.185252    
## HouseStyle2Story    -6.874e-01  4.639e-01  -1.482 0.138635    
## HouseStyleSFoyer     1.660e+00  9.038e-01   1.837 0.066453 .  
## HouseStyleSLvl       1.030e+00  6.749e-01   1.526 0.127209    
## RoofMatlCompShg      8.636e+01  4.142e+00  20.852  < 2e-16 ***
## RoofMatlMembran      9.323e+01  5.639e+00  16.533  < 2e-16 ***
## RoofMatlMetal        9.396e+01  5.696e+00  16.494  < 2e-16 ***
## RoofMatlRoll         8.948e+01  5.492e+00  16.291  < 2e-16 ***
## RoofMatlTar&Grv      8.475e+01  4.353e+00  19.469  < 2e-16 ***
## RoofMatlWdShake      8.761e+01  4.528e+00  19.350  < 2e-16 ***
## RoofMatlWdShngl      9.221e+01  4.334e+00  21.275  < 2e-16 ***
## ExterQualFa         -3.693e+00  1.637e+00  -2.256 0.024256 *  
## ExterQualGd         -3.538e+00  6.736e-01  -5.253 1.76e-07 ***
## ExterQualTA         -3.569e+00  7.492e-01  -4.763 2.13e-06 ***
## BsmtQualFa          -2.747e+00  9.042e-01  -3.038 0.002432 ** 
## BsmtQualGd          -3.572e+00  4.669e-01  -7.650 3.98e-14 ***
## BsmtQualTA          -3.465e+00  5.712e-01  -6.067 1.72e-09 ***
## BsmtExposureGd       2.344e+00  4.338e-01   5.404 7.80e-08 ***
## BsmtExposureMn      -4.632e-01  4.441e-01  -1.043 0.297223    
## BsmtExposureNo      -1.068e+00  3.167e-01  -3.372 0.000768 ***
## KitchenQualFa       -3.823e+00  9.463e-01  -4.040 5.67e-05 ***
## KitchenQualGd       -3.743e+00  5.025e-01  -7.449 1.75e-13 ***
## KitchenQualTA       -3.925e+00  5.640e-01  -6.959 5.51e-12 ***
## GarageQualFa        -1.457e+01  4.312e+00  -3.378 0.000753 ***
## GarageQualGd        -1.307e+01  4.425e+00  -2.954 0.003196 ** 
## GarageQualPo        -1.673e+01  5.153e+00  -3.247 0.001195 ** 
## GarageQualTA        -1.434e+01  4.266e+00  -3.363 0.000794 ***
## GarageCondFa         1.375e+01  4.973e+00   2.764 0.005789 ** 
## GarageCondGd         1.362e+01  5.141e+00   2.648 0.008188 ** 
## GarageCondPo         1.372e+01  5.322e+00   2.577 0.010075 *  
## GarageCondTA         1.416e+01  4.920e+00   2.879 0.004063 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.432 on 1255 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.909,  Adjusted R-squared:  0.9023 
## F-statistic: 136.2 on 92 and 1255 DF,  p-value: < 2.2e-16


Model 7

Here we remove Condition1. There’s no real shift yet on the two main metrics we’re looking at: * The interquartile range of the residuals is still tight and evenly bordering zero * That Adjusted-R-Squared doesn’t precipitously drop

We know there’s no NA values in these seven variables for the records that are included so we can compare Adjusted R-Squareds directly in Models 5 through 11.

lm7 <- update(lm6, .~. -Condition1, data=train)
summary(lm7)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + GarageArea + 
##     MSZoning + LandSlope + Neighborhood + Condition2 + BldgType + 
##     HouseStyle + RoofMatl + ExterQual + BsmtQual + BsmtExposure + 
##     KitchenQual + GarageQual + GarageCond, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.8004  -1.4255   0.0177   1.5167  25.8004 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.260e+02  1.980e+01 -11.413  < 2e-16 ***
## LotArea              9.576e-05  1.347e-05   7.107 1.98e-12 ***
## OverallQual          1.206e+00  1.462e-01   8.253 3.86e-16 ***
## OverallCond          9.846e-01  1.095e-01   8.991  < 2e-16 ***
## YearBuilt            6.733e-02  9.197e-03   7.321 4.38e-13 ***
## BsmtUnfSF           -2.350e-03  2.692e-04  -8.731  < 2e-16 ***
## TotalBsmtSF          5.739e-03  6.738e-04   8.517  < 2e-16 ***
## X1stFlrSF           -3.847e-03  9.616e-04  -4.000 6.69e-05 ***
## GrLivArea            1.051e-02  6.480e-04  16.218  < 2e-16 ***
## GarageArea           3.584e-03  7.415e-04   4.834 1.50e-06 ***
## MSZoningFV           5.514e+00  1.761e+00   3.132 0.001777 ** 
## MSZoningRH           4.246e+00  1.866e+00   2.275 0.023048 *  
## MSZoningRL           4.450e+00  1.526e+00   2.917 0.003602 ** 
## MSZoningRM           3.629e+00  1.437e+00   2.526 0.011649 *  
## LandSlopeMod         9.128e-02  5.192e-01   0.176 0.860478    
## LandSlopeSev        -5.670e+00  1.495e+00  -3.793 0.000156 ***
## NeighborhoodBlueste -3.739e-01  2.701e+00  -0.138 0.889915    
## NeighborhoodBrDale   5.452e-01  1.536e+00   0.355 0.722744    
## NeighborhoodBrkSide  2.748e-01  1.329e+00   0.207 0.836180    
## NeighborhoodClearCr -2.329e+00  1.272e+00  -1.832 0.067229 .  
## NeighborhoodCollgCr -1.667e+00  9.957e-01  -1.674 0.094420 .  
## NeighborhoodCrawfor  1.990e+00  1.170e+00   1.701 0.089275 .  
## NeighborhoodEdwards -2.925e+00  1.110e+00  -2.635 0.008512 ** 
## NeighborhoodGilbert -1.461e+00  1.062e+00  -1.376 0.169104    
## NeighborhoodIDOTRR   7.609e-02  1.527e+00   0.050 0.960272    
## NeighborhoodMeadowV -1.130e+00  1.500e+00  -0.753 0.451338    
## NeighborhoodMitchel -3.102e+00  1.143e+00  -2.714 0.006732 ** 
## NeighborhoodNAmes   -2.922e+00  1.063e+00  -2.748 0.006082 ** 
## NeighborhoodNoRidge  3.826e+00  1.127e+00   3.394 0.000710 ***
## NeighborhoodNPkVill  1.824e-01  1.502e+00   0.121 0.903399    
## NeighborhoodNridgHt  2.300e+00  1.023e+00   2.247 0.024800 *  
## NeighborhoodNWAmes  -3.296e+00  1.085e+00  -3.038 0.002427 ** 
## NeighborhoodOldTown -1.611e+00  1.353e+00  -1.191 0.233950    
## NeighborhoodSawyer  -2.544e+00  1.125e+00  -2.262 0.023875 *  
## NeighborhoodSawyerW -1.710e+00  1.072e+00  -1.595 0.110940    
## NeighborhoodSomerst  3.622e-02  1.221e+00   0.030 0.976336    
## NeighborhoodStoneBr  4.477e+00  1.133e+00   3.951 8.20e-05 ***
## NeighborhoodSWISU   -1.624e+00  1.389e+00  -1.170 0.242422    
## NeighborhoodTimber  -2.402e+00  1.128e+00  -2.129 0.033412 *  
## NeighborhoodVeenker -6.486e-01  1.418e+00  -0.458 0.647388    
## Condition2Feedr     -4.030e-01  3.194e+00  -0.126 0.899621    
## Condition2Norm       1.804e+00  2.769e+00   0.652 0.514797    
## Condition2PosA       5.045e+00  4.740e+00   1.064 0.287383    
## Condition2PosN      -3.002e+01  3.757e+00  -7.991 2.99e-15 ***
## Condition2RRAe      -1.301e+00  4.552e+00  -0.286 0.775041    
## Condition2RRAn       1.516e+00  4.506e+00   0.337 0.736520    
## Condition2RRNn       1.646e+00  3.756e+00   0.438 0.661298    
## BldgType2fmCon      -1.889e+00  8.537e-01  -2.212 0.027122 *  
## BldgTypeDuplex      -4.699e+00  7.738e-01  -6.073 1.66e-09 ***
## BldgTypeTwnhs       -3.612e+00  7.609e-01  -4.747 2.29e-06 ***
## BldgTypeTwnhsE      -2.490e+00  5.015e-01  -4.965 7.79e-07 ***
## HouseStyle1.5Unf     2.440e+00  1.225e+00   1.992 0.046596 *  
## HouseStyle1Story     2.229e+00  5.520e-01   4.038 5.71e-05 ***
## HouseStyle2.5Fin    -4.712e+00  1.748e+00  -2.696 0.007105 ** 
## HouseStyle2.5Unf    -1.367e+00  1.266e+00  -1.079 0.280594    
## HouseStyle2Story    -5.355e-01  4.588e-01  -1.167 0.243320    
## HouseStyleSFoyer     1.834e+00  9.009e-01   2.036 0.041964 *  
## HouseStyleSLvl       1.242e+00  6.712e-01   1.851 0.064425 .  
## RoofMatlCompShg      8.752e+01  4.119e+00  21.250  < 2e-16 ***
## RoofMatlMembran      9.428e+01  5.628e+00  16.753  < 2e-16 ***
## RoofMatlMetal        9.498e+01  5.683e+00  16.714  < 2e-16 ***
## RoofMatlRoll         9.004e+01  5.508e+00  16.346  < 2e-16 ***
## RoofMatlTar&Grv      8.590e+01  4.326e+00  19.856  < 2e-16 ***
## RoofMatlWdShake      8.874e+01  4.493e+00  19.752  < 2e-16 ***
## RoofMatlWdShngl      9.334e+01  4.316e+00  21.625  < 2e-16 ***
## ExterQualFa         -4.164e+00  1.621e+00  -2.569 0.010311 *  
## ExterQualGd         -3.508e+00  6.750e-01  -5.198 2.35e-07 ***
## ExterQualTA         -3.561e+00  7.505e-01  -4.745 2.32e-06 ***
## BsmtQualFa          -2.845e+00  9.051e-01  -3.143 0.001712 ** 
## BsmtQualGd          -3.575e+00  4.677e-01  -7.643 4.17e-14 ***
## BsmtQualTA          -3.504e+00  5.717e-01  -6.130 1.17e-09 ***
## BsmtExposureGd       2.323e+00  4.353e-01   5.336 1.13e-07 ***
## BsmtExposureMn      -5.126e-01  4.435e-01  -1.156 0.248048    
## BsmtExposureNo      -1.074e+00  3.175e-01  -3.383 0.000738 ***
## KitchenQualFa       -3.895e+00  9.466e-01  -4.115 4.12e-05 ***
## KitchenQualGd       -3.769e+00  5.038e-01  -7.482 1.37e-13 ***
## KitchenQualTA       -3.942e+00  5.653e-01  -6.974 4.97e-12 ***
## GarageQualFa        -1.325e+01  4.292e+00  -3.088 0.002061 ** 
## GarageQualGd        -1.175e+01  4.408e+00  -2.667 0.007760 ** 
## GarageQualPo        -1.470e+01  5.098e+00  -2.884 0.003999 ** 
## GarageQualTA        -1.318e+01  4.253e+00  -3.099 0.001987 ** 
## GarageCondFa         1.241e+01  4.967e+00   2.498 0.012613 *  
## GarageCondGd         1.235e+01  5.135e+00   2.405 0.016337 *  
## GarageCondPo         1.180e+01  5.280e+00   2.235 0.025566 *  
## GarageCondTA         1.285e+01  4.913e+00   2.615 0.009026 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.445 on 1263 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9077, Adjusted R-squared:  0.9015 
## F-statistic: 147.8 on 84 and 1263 DF,  p-value: < 2.2e-16


Model 8

In Model 8 we remove LandSlope. Look how in the output below for Model 8 there is a drop off in significance for X1stFlrSF once LandSlope is removed. My estimation is our models so far overfit. Meaning we’re modeling a lot of the noise in the data and not just the underlying general predictors. Once we removed LandSlope, X1stFlrSF was no longer useful for fitting the noise and so it dropped off in significance. This tells me that we need to keep trying and look for more variables that drop off in significance.

lm8 <- update(lm7, .~. -LandSlope, data=train)
summary(lm8)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + GrLivArea + GarageArea + 
##     MSZoning + Neighborhood + Condition2 + BldgType + HouseStyle + 
##     RoofMatl + ExterQual + BsmtQual + BsmtExposure + KitchenQual + 
##     GarageQual + GarageCond, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.5290  -1.4056   0.0139   1.5141  25.5290 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.219e+02  1.987e+01 -11.164  < 2e-16 ***
## LotArea              6.638e-05  1.112e-05   5.971 3.05e-09 ***
## OverallQual          1.224e+00  1.467e-01   8.344  < 2e-16 ***
## OverallCond          9.814e-01  1.101e-01   8.918  < 2e-16 ***
## YearBuilt            6.587e-02  9.235e-03   7.133 1.65e-12 ***
## BsmtUnfSF           -2.339e-03  2.702e-04  -8.658  < 2e-16 ***
## TotalBsmtSF          5.702e-03  6.749e-04   8.449  < 2e-16 ***
## X1stFlrSF           -3.755e-03  9.657e-04  -3.889 0.000106 ***
## GrLivArea            1.056e-02  6.509e-04  16.219  < 2e-16 ***
## GarageArea           3.639e-03  7.451e-04   4.884 1.17e-06 ***
## MSZoningFV           5.332e+00  1.766e+00   3.019 0.002587 ** 
## MSZoningRH           4.070e+00  1.871e+00   2.176 0.029767 *  
## MSZoningRL           4.303e+00  1.530e+00   2.812 0.004996 ** 
## MSZoningRM           3.542e+00  1.440e+00   2.460 0.014024 *  
## NeighborhoodBlueste -4.048e-01  2.714e+00  -0.149 0.881469    
## NeighborhoodBrDale   5.897e-01  1.544e+00   0.382 0.702573    
## NeighborhoodBrkSide  2.561e-01  1.335e+00   0.192 0.847905    
## NeighborhoodClearCr -2.670e+00  1.269e+00  -2.105 0.035496 *  
## NeighborhoodCollgCr -1.596e+00  1.001e+00  -1.596 0.110842    
## NeighborhoodCrawfor  2.078e+00  1.170e+00   1.776 0.076019 .  
## NeighborhoodEdwards -2.816e+00  1.115e+00  -2.526 0.011644 *  
## NeighborhoodGilbert -1.346e+00  1.067e+00  -1.261 0.207415    
## NeighborhoodIDOTRR   5.870e-02  1.534e+00   0.038 0.969476    
## NeighborhoodMeadowV -1.111e+00  1.508e+00  -0.737 0.461258    
## NeighborhoodMitchel -2.931e+00  1.143e+00  -2.564 0.010463 *  
## NeighborhoodNAmes   -2.824e+00  1.068e+00  -2.643 0.008308 ** 
## NeighborhoodNoRidge  3.897e+00  1.133e+00   3.440 0.000601 ***
## NeighborhoodNPkVill  2.450e-01  1.510e+00   0.162 0.871101    
## NeighborhoodNridgHt  2.383e+00  1.028e+00   2.318 0.020626 *  
## NeighborhoodNWAmes  -3.199e+00  1.090e+00  -2.935 0.003395 ** 
## NeighborhoodOldTown -1.761e+00  1.359e+00  -1.296 0.195352    
## NeighborhoodSawyer  -2.415e+00  1.130e+00  -2.137 0.032752 *  
## NeighborhoodSawyerW -1.630e+00  1.077e+00  -1.514 0.130263    
## NeighborhoodSomerst  1.032e-01  1.227e+00   0.084 0.932952    
## NeighborhoodStoneBr  4.592e+00  1.137e+00   4.040 5.67e-05 ***
## NeighborhoodSWISU   -1.613e+00  1.396e+00  -1.155 0.248223    
## NeighborhoodTimber  -2.365e+00  1.134e+00  -2.086 0.037184 *  
## NeighborhoodVeenker -4.455e-01  1.423e+00  -0.313 0.754300    
## Condition2Feedr     -4.597e-01  3.200e+00  -0.144 0.885817    
## Condition2Norm       1.773e+00  2.774e+00   0.639 0.522863    
## Condition2PosA       4.921e+00  4.761e+00   1.033 0.301574    
## Condition2PosN      -2.977e+01  3.764e+00  -7.908 5.65e-15 ***
## Condition2RRAe      -1.167e+00  4.569e+00  -0.255 0.798505    
## Condition2RRAn       1.533e+00  4.526e+00   0.339 0.734882    
## Condition2RRNn       1.807e+00  3.768e+00   0.480 0.631571    
## BldgType2fmCon      -1.817e+00  8.579e-01  -2.118 0.034355 *  
## BldgTypeDuplex      -4.733e+00  7.774e-01  -6.089 1.50e-09 ***
## BldgTypeTwnhs       -3.808e+00  7.631e-01  -4.990 6.89e-07 ***
## BldgTypeTwnhsE      -2.642e+00  5.025e-01  -5.257 1.72e-07 ***
## HouseStyle1.5Unf     2.405e+00  1.231e+00   1.954 0.050934 .  
## HouseStyle1Story     2.230e+00  5.548e-01   4.020 6.16e-05 ***
## HouseStyle2.5Fin    -4.718e+00  1.749e+00  -2.698 0.007061 ** 
## HouseStyle2.5Unf    -1.392e+00  1.272e+00  -1.094 0.274007    
## HouseStyle2Story    -5.351e-01  4.611e-01  -1.160 0.246073    
## HouseStyleSFoyer     1.853e+00  9.027e-01   2.053 0.040315 *  
## HouseStyleSLvl       1.294e+00  6.733e-01   1.922 0.054838 .  
## RoofMatlCompShg      8.637e+01  4.128e+00  20.922  < 2e-16 ***
## RoofMatlMembran      8.869e+01  5.465e+00  16.228  < 2e-16 ***
## RoofMatlMetal        8.894e+01  5.488e+00  16.206  < 2e-16 ***
## RoofMatlRoll         8.892e+01  5.528e+00  16.084  < 2e-16 ***
## RoofMatlTar&Grv      8.396e+01  4.316e+00  19.454  < 2e-16 ***
## RoofMatlWdShake      8.656e+01  4.479e+00  19.324  < 2e-16 ***
## RoofMatlWdShngl      9.277e+01  4.335e+00  21.402  < 2e-16 ***
## ExterQualFa         -4.088e+00  1.620e+00  -2.523 0.011750 *  
## ExterQualGd         -3.551e+00  6.783e-01  -5.235 1.93e-07 ***
## ExterQualTA         -3.620e+00  7.542e-01  -4.799 1.78e-06 ***
## BsmtQualFa          -2.859e+00  9.097e-01  -3.143 0.001713 ** 
## BsmtQualGd          -3.579e+00  4.696e-01  -7.622 4.87e-14 ***
## BsmtQualTA          -3.555e+00  5.744e-01  -6.188 8.20e-10 ***
## BsmtExposureGd       2.329e+00  4.321e-01   5.389 8.44e-08 ***
## BsmtExposureMn      -4.716e-01  4.446e-01  -1.061 0.288956    
## BsmtExposureNo      -1.045e+00  3.170e-01  -3.296 0.001007 ** 
## KitchenQualFa       -4.086e+00  9.502e-01  -4.300 1.84e-05 ***
## KitchenQualGd       -3.684e+00  5.058e-01  -7.283 5.72e-13 ***
## KitchenQualTA       -3.858e+00  5.678e-01  -6.795 1.66e-11 ***
## GarageQualFa        -1.288e+01  4.309e+00  -2.989 0.002854 ** 
## GarageQualGd        -1.141e+01  4.426e+00  -2.578 0.010055 *  
## GarageQualPo        -1.435e+01  5.121e+00  -2.802 0.005154 ** 
## GarageQualTA        -1.288e+01  4.270e+00  -3.017 0.002604 ** 
## GarageCondFa         1.213e+01  4.990e+00   2.431 0.015202 *  
## GarageCondGd         1.210e+01  5.157e+00   2.346 0.019148 *  
## GarageCondPo         1.155e+01  5.304e+00   2.177 0.029654 *  
## GarageCondTA         1.256e+01  4.934e+00   2.546 0.011028 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.463 on 1265 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9066, Adjusted R-squared:  0.9005 
## F-statistic: 149.7 on 82 and 1265 DF,  p-value: < 2.2e-16


Model 9

In Model 9 below we remove X1stFlrSF. We’re starting to see an uptick in the residual spread but it’s only in the min and max which we can safely ignore as we’re only concerned with the interquartile spread and that it evenly borders zero.

lm9 <- update(lm8, .~. -X1stFlrSF, data=train)
summary(lm9)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     Neighborhood + Condition2 + BldgType + HouseStyle + RoofMatl + 
##     ExterQual + BsmtQual + BsmtExposure + KitchenQual + GarageQual + 
##     GarageCond, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -25.119  -1.413   0.061   1.463  26.279 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.238e+02  1.998e+01 -11.204  < 2e-16 ***
## LotArea              6.418e-05  1.116e-05   5.749 1.12e-08 ***
## OverallQual          1.184e+00  1.472e-01   8.046 1.95e-15 ***
## OverallCond          9.913e-01  1.106e-01   8.960  < 2e-16 ***
## YearBuilt            6.801e-02  9.270e-03   7.337 3.90e-13 ***
## BsmtUnfSF           -2.327e-03  2.717e-04  -8.564  < 2e-16 ***
## TotalBsmtSF          4.215e-03  5.593e-04   7.537 9.10e-14 ***
## GrLivArea            8.691e-03  4.423e-04  19.650  < 2e-16 ***
## GarageArea           3.557e-03  7.490e-04   4.750 2.27e-06 ***
## MSZoningFV           5.435e+00  1.776e+00   3.061 0.002256 ** 
## MSZoningRH           4.302e+00  1.880e+00   2.288 0.022291 *  
## MSZoningRL           4.349e+00  1.539e+00   2.827 0.004775 ** 
## MSZoningRM           3.593e+00  1.448e+00   2.481 0.013216 *  
## NeighborhoodBlueste -2.753e-01  2.729e+00  -0.101 0.919662    
## NeighborhoodBrDale   4.020e-01  1.552e+00   0.259 0.795618    
## NeighborhoodBrkSide  2.516e-01  1.343e+00   0.187 0.851382    
## NeighborhoodClearCr -2.312e+00  1.272e+00  -1.817 0.069380 .  
## NeighborhoodCollgCr -1.457e+00  1.005e+00  -1.449 0.147665    
## NeighborhoodCrawfor  1.961e+00  1.176e+00   1.667 0.095718 .  
## NeighborhoodEdwards -2.825e+00  1.121e+00  -2.521 0.011835 *  
## NeighborhoodGilbert -1.227e+00  1.072e+00  -1.144 0.252680    
## NeighborhoodIDOTRR   1.265e-01  1.542e+00   0.082 0.934632    
## NeighborhoodMeadowV -1.067e+00  1.516e+00  -0.704 0.481672    
## NeighborhoodMitchel -2.800e+00  1.149e+00  -2.436 0.014968 *  
## NeighborhoodNAmes   -2.761e+00  1.074e+00  -2.570 0.010291 *  
## NeighborhoodNoRidge  4.285e+00  1.135e+00   3.776 0.000167 ***
## NeighborhoodNPkVill  3.024e-01  1.518e+00   0.199 0.842131    
## NeighborhoodNridgHt  2.507e+00  1.034e+00   2.426 0.015410 *  
## NeighborhoodNWAmes  -3.231e+00  1.096e+00  -2.949 0.003250 ** 
## NeighborhoodOldTown -1.684e+00  1.367e+00  -1.232 0.218014    
## NeighborhoodSawyer  -2.404e+00  1.136e+00  -2.116 0.034577 *  
## NeighborhoodSawyerW -1.610e+00  1.083e+00  -1.487 0.137336    
## NeighborhoodSomerst  1.199e-01  1.234e+00   0.097 0.922605    
## NeighborhoodStoneBr  4.800e+00  1.142e+00   4.204 2.81e-05 ***
## NeighborhoodSWISU   -1.685e+00  1.404e+00  -1.200 0.230228    
## NeighborhoodTimber  -2.325e+00  1.140e+00  -2.040 0.041562 *  
## NeighborhoodVeenker -3.800e-01  1.431e+00  -0.266 0.790653    
## Condition2Feedr     -7.199e-01  3.217e+00  -0.224 0.822980    
## Condition2Norm       1.648e+00  2.789e+00   0.591 0.554755    
## Condition2PosA       5.407e+00  4.786e+00   1.130 0.258754    
## Condition2PosN      -2.964e+01  3.785e+00  -7.831 1.02e-14 ***
## Condition2RRAe      -2.972e-02  4.585e+00  -0.006 0.994830    
## Condition2RRAn       1.367e+00  4.550e+00   0.300 0.763852    
## Condition2RRNn       1.924e+00  3.789e+00   0.508 0.611728    
## BldgType2fmCon      -2.007e+00  8.612e-01  -2.331 0.019919 *  
## BldgTypeDuplex      -4.705e+00  7.816e-01  -6.019 2.30e-09 ***
## BldgTypeTwnhs       -3.925e+00  7.667e-01  -5.119 3.56e-07 ***
## BldgTypeTwnhsE      -2.754e+00  5.045e-01  -5.459 5.74e-08 ***
## HouseStyle1.5Unf     1.475e+00  1.214e+00   1.215 0.224644    
## HouseStyle1Story     1.078e+00  4.716e-01   2.286 0.022442 *  
## HouseStyle2.5Fin    -3.806e+00  1.742e+00  -2.184 0.029109 *  
## HouseStyle2.5Unf    -8.240e-01  1.270e+00  -0.649 0.516709    
## HouseStyle2Story     7.708e-02  4.358e-01   0.177 0.859656    
## HouseStyleSFoyer     8.558e-01  8.703e-01   0.983 0.325638    
## HouseStyleSLvl       2.705e-01  6.231e-01   0.434 0.664240    
## RoofMatlCompShg      8.522e+01  4.140e+00  20.582  < 2e-16 ***
## RoofMatlMembran      8.753e+01  5.487e+00  15.951  < 2e-16 ***
## RoofMatlMetal        8.763e+01  5.508e+00  15.910  < 2e-16 ***
## RoofMatlRoll         8.817e+01  5.556e+00  15.870  < 2e-16 ***
## RoofMatlTar&Grv      8.230e+01  4.319e+00  19.056  < 2e-16 ***
## RoofMatlWdShake      8.484e+01  4.482e+00  18.928  < 2e-16 ***
## RoofMatlWdShngl      9.177e+01  4.351e+00  21.091  < 2e-16 ***
## ExterQualFa         -3.966e+00  1.629e+00  -2.435 0.015031 *  
## ExterQualGd         -3.548e+00  6.820e-01  -5.202 2.29e-07 ***
## ExterQualTA         -3.693e+00  7.581e-01  -4.872 1.25e-06 ***
## BsmtQualFa          -3.187e+00  9.108e-01  -3.499 0.000484 ***
## BsmtQualGd          -3.647e+00  4.718e-01  -7.729 2.19e-14 ***
## BsmtQualTA          -3.644e+00  5.771e-01  -6.313 3.77e-10 ***
## BsmtExposureGd       2.353e+00  4.345e-01   5.415 7.33e-08 ***
## BsmtExposureMn      -3.644e-01  4.462e-01  -0.817 0.414241    
## BsmtExposureNo      -9.608e-01  3.180e-01  -3.022 0.002565 ** 
## KitchenQualFa       -3.950e+00  9.548e-01  -4.137 3.75e-05 ***
## KitchenQualGd       -3.676e+00  5.086e-01  -7.226 8.56e-13 ***
## KitchenQualTA       -3.820e+00  5.709e-01  -6.692 3.30e-11 ***
## GarageQualFa        -1.407e+01  4.322e+00  -3.256 0.001161 ** 
## GarageQualGd        -1.290e+01  4.433e+00  -2.911 0.003669 ** 
## GarageQualPo        -1.492e+01  5.147e+00  -2.899 0.003805 ** 
## GarageQualTA        -1.424e+01  4.279e+00  -3.329 0.000897 ***
## GarageCondFa         1.312e+01  5.012e+00   2.618 0.008944 ** 
## GarageCondGd         1.335e+01  5.175e+00   2.579 0.010011 *  
## GarageCondPo         1.235e+01  5.329e+00   2.318 0.020601 *  
## GarageCondTA         1.362e+01  4.954e+00   2.750 0.006044 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.482 on 1266 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9054, Adjusted R-squared:  0.8994 
## F-statistic: 149.7 on 81 and 1266 DF,  p-value: < 2.2e-16


Model 10

In Model 10 below we remove HouseStyle. No meaningful changes, let’s keep simplifying this model.

lm10 <- update(lm9, .~. -HouseStyle, data=train)
summary(lm10)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     Neighborhood + Condition2 + BldgType + RoofMatl + ExterQual + 
##     BsmtQual + BsmtExposure + KitchenQual + GarageQual + GarageCond, 
##     data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.9550  -1.4096   0.0264   1.4135  27.0198 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.277e+02  1.979e+01 -11.508  < 2e-16 ***
## LotArea              6.488e-05  1.115e-05   5.821 7.38e-09 ***
## OverallQual          1.171e+00  1.448e-01   8.086 1.43e-15 ***
## OverallCond          9.947e-01  1.106e-01   8.991  < 2e-16 ***
## YearBuilt            6.977e-02  9.138e-03   7.635 4.42e-14 ***
## BsmtUnfSF           -2.339e-03  2.709e-04  -8.634  < 2e-16 ***
## TotalBsmtSF          5.186e-03  3.896e-04  13.310  < 2e-16 ***
## GrLivArea            7.805e-03  2.936e-04  26.586  < 2e-16 ***
## GarageArea           3.662e-03  7.419e-04   4.936 9.02e-07 ***
## MSZoningFV           5.377e+00  1.773e+00   3.033 0.002467 ** 
## MSZoningRH           4.479e+00  1.874e+00   2.390 0.016997 *  
## MSZoningRL           4.415e+00  1.536e+00   2.875 0.004106 ** 
## MSZoningRM           3.706e+00  1.445e+00   2.565 0.010443 *  
## NeighborhoodBlueste -6.045e-01  2.727e+00  -0.222 0.824609    
## NeighborhoodBrDale  -9.021e-02  1.533e+00  -0.059 0.953094    
## NeighborhoodBrkSide -8.693e-02  1.328e+00  -0.065 0.947807    
## NeighborhoodClearCr -2.545e+00  1.270e+00  -2.003 0.045340 *  
## NeighborhoodCollgCr -1.635e+00  1.005e+00  -1.627 0.103952    
## NeighborhoodCrawfor  1.874e+00  1.177e+00   1.592 0.111602    
## NeighborhoodEdwards -3.047e+00  1.117e+00  -2.729 0.006447 ** 
## NeighborhoodGilbert -1.457e+00  1.069e+00  -1.363 0.173049    
## NeighborhoodIDOTRR  -1.694e-01  1.534e+00  -0.110 0.912091    
## NeighborhoodMeadowV -1.328e+00  1.505e+00  -0.882 0.377816    
## NeighborhoodMitchel -3.000e+00  1.146e+00  -2.618 0.008938 ** 
## NeighborhoodNAmes   -2.856e+00  1.072e+00  -2.665 0.007800 ** 
## NeighborhoodNoRidge  4.210e+00  1.133e+00   3.715 0.000212 ***
## NeighborhoodNPkVill  3.192e-02  1.512e+00   0.021 0.983159    
## NeighborhoodNridgHt  2.291e+00  1.031e+00   2.221 0.026497 *  
## NeighborhoodNWAmes  -3.303e+00  1.094e+00  -3.020 0.002575 ** 
## NeighborhoodOldTown -1.988e+00  1.360e+00  -1.462 0.144104    
## NeighborhoodSawyer  -2.462e+00  1.133e+00  -2.174 0.029890 *  
## NeighborhoodSawyerW -1.759e+00  1.081e+00  -1.627 0.103918    
## NeighborhoodSomerst -5.575e-02  1.234e+00  -0.045 0.963981    
## NeighborhoodStoneBr  4.708e+00  1.143e+00   4.121 4.02e-05 ***
## NeighborhoodSWISU   -2.401e+00  1.379e+00  -1.741 0.081888 .  
## NeighborhoodTimber  -2.472e+00  1.140e+00  -2.168 0.030317 *  
## NeighborhoodVeenker -4.751e-01  1.431e+00  -0.332 0.739941    
## Condition2Feedr     -7.017e-01  3.148e+00  -0.223 0.823657    
## Condition2Norm       1.440e+00  2.725e+00   0.529 0.597208    
## Condition2PosA       4.731e+00  4.590e+00   1.031 0.302874    
## Condition2PosN      -2.970e+01  3.748e+00  -7.924 4.97e-15 ***
## Condition2RRAe      -2.197e-01  4.514e+00  -0.049 0.961194    
## Condition2RRAn       1.104e+00  4.513e+00   0.245 0.806839    
## Condition2RRNn       1.870e+00  3.735e+00   0.501 0.616768    
## BldgType2fmCon      -2.013e+00  8.586e-01  -2.345 0.019203 *  
## BldgTypeDuplex      -4.707e+00  7.603e-01  -6.191 8.06e-10 ***
## BldgTypeTwnhs       -4.020e+00  7.671e-01  -5.241 1.87e-07 ***
## BldgTypeTwnhsE      -2.748e+00  5.024e-01  -5.469 5.45e-08 ***
## RoofMatlCompShg      8.702e+01  4.030e+00  21.593  < 2e-16 ***
## RoofMatlMembran      8.961e+01  5.387e+00  16.634  < 2e-16 ***
## RoofMatlMetal        8.893e+01  5.444e+00  16.335  < 2e-16 ***
## RoofMatlRoll         8.997e+01  5.478e+00  16.425  < 2e-16 ***
## RoofMatlTar&Grv      8.447e+01  4.187e+00  20.175  < 2e-16 ***
## RoofMatlWdShake      8.727e+01  4.329e+00  20.160  < 2e-16 ***
## RoofMatlWdShngl      9.357e+01  4.253e+00  22.002  < 2e-16 ***
## ExterQualFa         -3.887e+00  1.622e+00  -2.397 0.016693 *  
## ExterQualGd         -3.581e+00  6.821e-01  -5.250 1.78e-07 ***
## ExterQualTA         -3.706e+00  7.587e-01  -4.885 1.16e-06 ***
## BsmtQualFa          -3.027e+00  9.068e-01  -3.337 0.000870 ***
## BsmtQualGd          -3.669e+00  4.694e-01  -7.816 1.14e-14 ***
## BsmtQualTA          -3.620e+00  5.758e-01  -6.286 4.46e-10 ***
## BsmtExposureGd       2.320e+00  4.347e-01   5.338 1.11e-07 ***
## BsmtExposureMn      -2.886e-01  4.297e-01  -0.672 0.501969    
## BsmtExposureNo      -8.966e-01  2.947e-01  -3.042 0.002397 ** 
## KitchenQualFa       -3.907e+00  9.541e-01  -4.095 4.49e-05 ***
## KitchenQualGd       -3.681e+00  5.083e-01  -7.243 7.60e-13 ***
## KitchenQualTA       -3.897e+00  5.702e-01  -6.835 1.27e-11 ***
## GarageQualFa        -1.154e+01  4.003e+00  -2.884 0.003994 ** 
## GarageQualGd        -1.033e+01  4.102e+00  -2.519 0.011895 *  
## GarageQualPo        -1.225e+01  4.890e+00  -2.506 0.012342 *  
## GarageQualTA        -1.184e+01  3.967e+00  -2.984 0.002896 ** 
## GarageCondFa         1.052e+01  4.757e+00   2.211 0.027232 *  
## GarageCondGd         1.043e+01  4.862e+00   2.145 0.032106 *  
## GarageCondPo         9.753e+00  5.078e+00   1.920 0.055032 .  
## GarageCondTA         1.105e+01  4.694e+00   2.355 0.018669 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.488 on 1273 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.9046, Adjusted R-squared:  0.899 
## F-statistic: 163.1 on 74 and 1273 DF,  p-value: < 2.2e-16


Model 11

In Model 11 below we remove Condition2. Let’s keep simplifying the model.

lm11 <- update(lm10, .~. -Condition2, data=train)
summary(lm11)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     Neighborhood + BldgType + RoofMatl + ExterQual + BsmtQual + 
##     BsmtExposure + KitchenQual + GarageQual + GarageCond, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -52.499  -1.417  -0.014   1.499  28.300 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -2.097e+02  2.033e+01 -10.315  < 2e-16 ***
## LotArea              6.009e-05  1.173e-05   5.122 3.48e-07 ***
## OverallQual          1.138e+00  1.514e-01   7.519 1.04e-13 ***
## OverallCond          9.780e-01  1.164e-01   8.400  < 2e-16 ***
## YearBuilt            6.456e-02  9.545e-03   6.763 2.05e-11 ***
## BsmtUnfSF           -2.127e-03  2.842e-04  -7.487 1.31e-13 ***
## TotalBsmtSF          4.759e-03  4.081e-04  11.661  < 2e-16 ***
## GrLivArea            7.500e-03  3.070e-04  24.430  < 2e-16 ***
## GarageArea           3.362e-03  7.769e-04   4.327 1.63e-05 ***
## MSZoningFV           4.972e+00  1.866e+00   2.664 0.007814 ** 
## MSZoningRH           4.189e+00  1.972e+00   2.123 0.033906 *  
## MSZoningRL           4.247e+00  1.616e+00   2.629 0.008670 ** 
## MSZoningRM           3.709e+00  1.523e+00   2.435 0.015012 *  
## NeighborhoodBlueste -8.735e-01  2.873e+00  -0.304 0.761176    
## NeighborhoodBrDale  -5.992e-01  1.613e+00  -0.372 0.710269    
## NeighborhoodBrkSide -9.817e-01  1.385e+00  -0.709 0.478451    
## NeighborhoodClearCr -2.548e+00  1.338e+00  -1.904 0.057184 .  
## NeighborhoodCollgCr -1.689e+00  1.059e+00  -1.596 0.110843    
## NeighborhoodCrawfor  1.659e+00  1.240e+00   1.338 0.181124    
## NeighborhoodEdwards -3.819e+00  1.175e+00  -3.251 0.001182 ** 
## NeighborhoodGilbert -1.573e+00  1.126e+00  -1.397 0.162634    
## NeighborhoodIDOTRR  -9.880e-01  1.610e+00  -0.614 0.539459    
## NeighborhoodMeadowV -1.785e+00  1.584e+00  -1.127 0.259971    
## NeighborhoodMitchel -3.121e+00  1.207e+00  -2.585 0.009846 ** 
## NeighborhoodNAmes   -3.149e+00  1.129e+00  -2.789 0.005368 ** 
## NeighborhoodNoRidge  4.667e+00  1.193e+00   3.912 9.63e-05 ***
## NeighborhoodNPkVill -8.470e-02  1.594e+00  -0.053 0.957624    
## NeighborhoodNridgHt  2.558e+00  1.086e+00   2.355 0.018682 *  
## NeighborhoodNWAmes  -3.338e+00  1.151e+00  -2.899 0.003804 ** 
## NeighborhoodOldTown -2.617e+00  1.430e+00  -1.830 0.067535 .  
## NeighborhoodSawyer  -2.821e+00  1.192e+00  -2.367 0.018068 *  
## NeighborhoodSawyerW -1.814e+00  1.139e+00  -1.592 0.111559    
## NeighborhoodSomerst  1.877e-01  1.301e+00   0.144 0.885251    
## NeighborhoodStoneBr  5.158e+00  1.203e+00   4.286 1.95e-05 ***
## NeighborhoodSWISU   -2.902e+00  1.452e+00  -1.998 0.045889 *  
## NeighborhoodTimber  -2.255e+00  1.201e+00  -1.878 0.060675 .  
## NeighborhoodVeenker -4.527e-01  1.508e+00  -0.300 0.764143    
## BldgType2fmCon      -2.005e+00  8.677e-01  -2.310 0.021036 *  
## BldgTypeDuplex      -4.553e+00  7.737e-01  -5.884 5.11e-09 ***
## BldgTypeTwnhs       -4.237e+00  8.080e-01  -5.244 1.84e-07 ***
## BldgTypeTwnhsE      -2.895e+00  5.287e-01  -5.475 5.26e-08 ***
## RoofMatlCompShg      8.149e+01  4.217e+00  19.326  < 2e-16 ***
## RoofMatlMembran      8.416e+01  5.656e+00  14.879  < 2e-16 ***
## RoofMatlMetal        8.314e+01  5.713e+00  14.553  < 2e-16 ***
## RoofMatlRoll         8.446e+01  5.747e+00  14.698  < 2e-16 ***
## RoofMatlTar&Grv      7.917e+01  4.386e+00  18.050  < 2e-16 ***
## RoofMatlWdShake      8.186e+01  4.536e+00  18.048  < 2e-16 ***
## RoofMatlWdShngl      8.871e+01  4.460e+00  19.890  < 2e-16 ***
## ExterQualFa         -3.249e+00  1.702e+00  -1.908 0.056569 .  
## ExterQualGd         -2.777e+00  7.046e-01  -3.941 8.56e-05 ***
## ExterQualTA         -3.002e+00  7.860e-01  -3.819 0.000141 ***
## BsmtQualFa          -3.338e+00  9.502e-01  -3.513 0.000459 ***
## BsmtQualGd          -3.696e+00  4.938e-01  -7.484 1.33e-13 ***
## BsmtQualTA          -3.641e+00  6.053e-01  -6.016 2.33e-09 ***
## BsmtExposureGd       2.486e+00  4.577e-01   5.433 6.64e-08 ***
## BsmtExposureMn      -1.271e-01  4.526e-01  -0.281 0.778904    
## BsmtExposureNo      -7.870e-01  3.105e-01  -2.535 0.011367 *  
## KitchenQualFa       -3.889e+00  9.971e-01  -3.900 0.000101 ***
## KitchenQualGd       -3.621e+00  5.339e-01  -6.782 1.80e-11 ***
## KitchenQualTA       -3.894e+00  5.995e-01  -6.497 1.17e-10 ***
## GarageQualFa        -1.158e+01  4.218e+00  -2.746 0.006114 ** 
## GarageQualGd        -1.007e+01  4.303e+00  -2.341 0.019400 *  
## GarageQualPo        -1.237e+01  5.151e+00  -2.401 0.016508 *  
## GarageQualTA        -1.181e+01  4.179e+00  -2.826 0.004790 ** 
## GarageCondFa         1.033e+01  5.013e+00   2.061 0.039459 *  
## GarageCondGd         1.039e+01  5.120e+00   2.030 0.042566 *  
## GarageCondPo         9.800e+00  5.352e+00   1.831 0.067323 .  
## GarageCondTA         1.106e+01  4.946e+00   2.236 0.025515 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.677 on 1280 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.8934, Adjusted R-squared:  0.8878 
## F-statistic: 160.1 on 67 and 1280 DF,  p-value: < 2.2e-16


Backwards Elimination

Look in the output below how GarageCond, GarageQual, and MSZoning have dropped off in value in that order of significance. That tells me we did have overfitting from too many variables. We have our next three to remove and which order to remove them in.

Otherwise, our residuals look great with the median essentially being zero and the 1st and 3rd quartiles being tight around the median (remember we scaled our target value, SalesPrice to 0-100.). Adjusted R-squared is going down but it’s ok, housing is a complicated market. We want a generalizable model not a perfect model.

My only concern at this stage is to evaluate whether it would have been better to take the power of any of our variables. Maybe we can go back and evaluate pairs() and residuals when we get to the end of this Backwards Elimination process.


Model 12

In Model 12 below we remove Neighborhood. It’s the last of our triaged predictors to be removed and where we identify the next three to try removing. Out of many variables only a few values of Neighborhood have a definitive affect on the price. I don’t know the math for it but it we aggregated the p-values for all of them some way, it’s likely the combined p-value would not meet our significance criteria.

lm12 <- update(lm11, .~. -Neighborhood, data=train)
summary(lm12)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     BldgType + RoofMatl + ExterQual + BsmtQual + BsmtExposure + 
##     KitchenQual + GarageQual + GarageCond, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.616  -1.609   0.013   1.546  29.672 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -1.949e+02  1.653e+01 -11.792  < 2e-16 ***
## LotArea          5.605e-05  1.207e-05   4.643 3.78e-06 ***
## OverallQual      1.556e+00  1.572e-01   9.898  < 2e-16 ***
## OverallCond      9.665e-01  1.227e-01   7.880 6.85e-15 ***
## YearBuilt        5.184e-02  7.715e-03   6.719 2.73e-11 ***
## BsmtUnfSF       -2.206e-03  2.938e-04  -7.510 1.10e-13 ***
## TotalBsmtSF      5.061e-03  4.201e-04  12.047  < 2e-16 ***
## GrLivArea        7.964e-03  3.097e-04  25.711  < 2e-16 ***
## GarageArea       4.132e-03  8.097e-04   5.103 3.83e-07 ***
## MSZoningFV       3.776e+00  1.623e+00   2.327 0.020143 *  
## MSZoningRH       2.932e+00  1.918e+00   1.529 0.126581    
## MSZoningRL       2.807e+00  1.529e+00   1.836 0.066569 .  
## MSZoningRM       2.076e+00  1.533e+00   1.355 0.175732    
## BldgType2fmCon  -2.235e+00  9.218e-01  -2.424 0.015475 *  
## BldgTypeDuplex  -4.737e+00  8.270e-01  -5.728 1.26e-08 ***
## BldgTypeTwnhs   -2.522e+00  7.165e-01  -3.520 0.000447 ***
## BldgTypeTwnhsE  -1.437e+00  4.548e-01  -3.159 0.001620 ** 
## RoofMatlCompShg  9.029e+01  4.456e+00  20.263  < 2e-16 ***
## RoofMatlMembran  9.204e+01  5.989e+00  15.367  < 2e-16 ***
## RoofMatlMetal    9.233e+01  6.035e+00  15.299  < 2e-16 ***
## RoofMatlRoll     9.231e+01  6.139e+00  15.037  < 2e-16 ***
## RoofMatlTar&Grv  8.678e+01  4.638e+00  18.709  < 2e-16 ***
## RoofMatlWdShake  8.817e+01  4.790e+00  18.406  < 2e-16 ***
## RoofMatlWdShngl  9.644e+01  4.723e+00  20.421  < 2e-16 ***
## ExterQualFa     -3.707e+00  1.833e+00  -2.023 0.043282 *  
## ExterQualGd     -2.976e+00  7.517e-01  -3.959 7.95e-05 ***
## ExterQualTA     -4.117e+00  8.362e-01  -4.923 9.59e-07 ***
## BsmtQualFa      -4.175e+00  1.006e+00  -4.148 3.57e-05 ***
## BsmtQualGd      -4.125e+00  5.136e-01  -8.031 2.14e-15 ***
## BsmtQualTA      -4.639e+00  6.271e-01  -7.397 2.48e-13 ***
## BsmtExposureGd   2.202e+00  4.877e-01   4.516 6.86e-06 ***
## BsmtExposureMn  -1.468e-01  4.878e-01  -0.301 0.763518    
## BsmtExposureNo  -7.930e-01  3.291e-01  -2.410 0.016098 *  
## KitchenQualFa   -3.759e+00  1.071e+00  -3.511 0.000463 ***
## KitchenQualGd   -3.958e+00  5.692e-01  -6.954 5.60e-12 ***
## KitchenQualTA   -4.338e+00  6.369e-01  -6.811 1.48e-11 ***
## GarageQualFa    -7.458e+00  4.566e+00  -1.633 0.102671    
## GarageQualGd    -6.087e+00  4.658e+00  -1.307 0.191532    
## GarageQualPo    -6.209e+00  5.557e+00  -1.117 0.264011    
## GarageQualTA    -7.759e+00  4.522e+00  -1.716 0.086444 .  
## GarageCondFa     6.086e+00  5.413e+00   1.124 0.261059    
## GarageCondGd     5.416e+00  5.525e+00   0.980 0.327127    
## GarageCondPo     4.580e+00  5.772e+00   0.793 0.427688    
## GarageCondTA     6.321e+00  5.338e+00   1.184 0.236543    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.008 on 1304 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.871,  Adjusted R-squared:  0.8667 
## F-statistic: 204.7 on 43 and 1304 DF,  p-value: < 2.2e-16


Model 13

In Model 13 below, we remove GarageCond. Look in the output below how adjusted-R-Squared increased for the first time, but we also have changed from 112 records excluded due to missing values to 38 records excluded. Even though we can’t compare directly between the Adjusted-R-Squareds, this is an improvement we’ll keep, and the residual interquartile range is still tightly banded around a median near zero.

lm13 <- update(lm12, .~. -GarageCond, data=train)
summary(lm13)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     BldgType + RoofMatl + ExterQual + BsmtQual + BsmtExposure + 
##     KitchenQual + GarageQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.623  -1.661   0.005   1.564  29.611 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -1.937e+02  1.632e+01 -11.864  < 2e-16 ***
## LotArea          5.621e-05  1.206e-05   4.662 3.46e-06 ***
## OverallQual      1.566e+00  1.569e-01   9.987  < 2e-16 ***
## OverallCond      9.773e-01  1.221e-01   8.007 2.59e-15 ***
## YearBuilt        5.203e-02  7.673e-03   6.781 1.80e-11 ***
## BsmtUnfSF       -2.195e-03  2.932e-04  -7.486 1.30e-13 ***
## TotalBsmtSF      5.043e-03  4.191e-04  12.031  < 2e-16 ***
## GrLivArea        7.981e-03  3.086e-04  25.859  < 2e-16 ***
## GarageArea       4.139e-03  8.091e-04   5.116 3.59e-07 ***
## MSZoningFV       3.776e+00  1.621e+00   2.330 0.019978 *  
## MSZoningRH       2.956e+00  1.917e+00   1.542 0.123243    
## MSZoningRL       2.808e+00  1.526e+00   1.840 0.065964 .  
## MSZoningRM       2.095e+00  1.530e+00   1.369 0.171168    
## BldgType2fmCon  -2.199e+00  9.194e-01  -2.392 0.016914 *  
## BldgTypeDuplex  -4.703e+00  8.259e-01  -5.695 1.53e-08 ***
## BldgTypeTwnhs   -2.515e+00  7.159e-01  -3.513 0.000458 ***
## BldgTypeTwnhsE  -1.420e+00  4.542e-01  -3.126 0.001811 ** 
## RoofMatlCompShg  9.026e+01  4.453e+00  20.272  < 2e-16 ***
## RoofMatlMembran  9.204e+01  5.985e+00  15.378  < 2e-16 ***
## RoofMatlMetal    9.235e+01  6.031e+00  15.311  < 2e-16 ***
## RoofMatlRoll     9.203e+01  6.081e+00  15.133  < 2e-16 ***
## RoofMatlTar&Grv  8.670e+01  4.634e+00  18.710  < 2e-16 ***
## RoofMatlWdShake  8.813e+01  4.787e+00  18.412  < 2e-16 ***
## RoofMatlWdShngl  9.715e+01  4.684e+00  20.739  < 2e-16 ***
## ExterQualFa     -3.652e+00  1.829e+00  -1.997 0.045996 *  
## ExterQualGd     -2.941e+00  7.498e-01  -3.923 9.22e-05 ***
## ExterQualTA     -4.094e+00  8.347e-01  -4.904 1.05e-06 ***
## BsmtQualFa      -4.177e+00  1.005e+00  -4.155 3.47e-05 ***
## BsmtQualGd      -4.089e+00  5.125e-01  -7.979 3.19e-15 ***
## BsmtQualTA      -4.584e+00  6.248e-01  -7.337 3.83e-13 ***
## BsmtExposureGd   2.180e+00  4.868e-01   4.479 8.15e-06 ***
## BsmtExposureMn  -1.266e-01  4.854e-01  -0.261 0.794242    
## BsmtExposureNo  -7.977e-01  3.286e-01  -2.427 0.015351 *  
## KitchenQualFa   -3.827e+00  1.068e+00  -3.582 0.000354 ***
## KitchenQualGd   -4.002e+00  5.654e-01  -7.078 2.39e-12 ***
## KitchenQualTA   -4.378e+00  6.332e-01  -6.914 7.35e-12 ***
## GarageQualFa    -3.168e+00  2.488e+00  -1.273 0.203214    
## GarageQualGd    -1.834e+00  2.645e+00  -0.693 0.488131    
## GarageQualPo    -3.381e+00  3.408e+00  -0.992 0.321290    
## GarageQualTA    -3.242e+00  2.417e+00  -1.341 0.180071    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.005 on 1308 degrees of freedom
##   (112 observations deleted due to missingness)
## Multiple R-squared:  0.8707, Adjusted R-squared:  0.8669 
## F-statistic: 225.9 on 39 and 1308 DF,  p-value: < 2.2e-16


Model 14

In Model 14 below, we remove GarageQual. Let’s continue after.

lm14 <- update(lm13, .~. -GarageQual, data=train)
summary(lm14)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + MSZoning + 
##     BldgType + RoofMatl + ExterQual + BsmtQual + BsmtExposure + 
##     KitchenQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.997  -1.668   0.026   1.615  30.387 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -1.903e+02  1.480e+01 -12.856  < 2e-16 ***
## LotArea          5.578e-05  1.188e-05   4.696 2.92e-06 ***
## OverallQual      1.495e+00  1.470e-01  10.166  < 2e-16 ***
## OverallCond      9.202e-01  1.127e-01   8.166 7.09e-16 ***
## YearBuilt        5.036e-02  7.002e-03   7.192 1.04e-12 ***
## BsmtUnfSF       -2.206e-03  2.849e-04  -7.745 1.83e-14 ***
## TotalBsmtSF      4.938e-03  4.091e-04  12.069  < 2e-16 ***
## GrLivArea        7.813e-03  2.937e-04  26.604  < 2e-16 ***
## GarageArea       3.642e-03  6.783e-04   5.369 9.27e-08 ***
## MSZoningFV       3.728e+00  1.458e+00   2.557 0.010677 *  
## MSZoningRH       2.465e+00  1.664e+00   1.482 0.138592    
## MSZoningRL       2.828e+00  1.355e+00   2.088 0.036981 *  
## MSZoningRM       2.042e+00  1.359e+00   1.503 0.133045    
## BldgType2fmCon  -2.061e+00  7.822e-01  -2.635 0.008517 ** 
## BldgTypeDuplex  -3.754e+00  7.016e-01  -5.351 1.02e-07 ***
## BldgTypeTwnhs   -2.501e+00  6.732e-01  -3.715 0.000212 ***
## BldgTypeTwnhsE  -1.525e+00  4.439e-01  -3.436 0.000607 ***
## RoofMatlCompShg  8.889e+01  4.396e+00  20.220  < 2e-16 ***
## RoofMatlMembran  9.079e+01  5.917e+00  15.343  < 2e-16 ***
## RoofMatlMetal    9.091e+01  5.964e+00  15.242  < 2e-16 ***
## RoofMatlRoll     8.966e+01  5.996e+00  14.953  < 2e-16 ***
## RoofMatlTar&Grv  8.590e+01  4.565e+00  18.816  < 2e-16 ***
## RoofMatlWdShake  8.690e+01  4.729e+00  18.377  < 2e-16 ***
## RoofMatlWdShngl  9.661e+01  4.619e+00  20.914  < 2e-16 ***
## ExterQualFa     -3.782e+00  1.529e+00  -2.474 0.013480 *  
## ExterQualGd     -3.219e+00  7.404e-01  -4.348 1.47e-05 ***
## ExterQualTA     -4.589e+00  8.193e-01  -5.601 2.57e-08 ***
## BsmtQualFa      -4.396e+00  9.537e-01  -4.609 4.42e-06 ***
## BsmtQualGd      -4.275e+00  5.055e-01  -8.456  < 2e-16 ***
## BsmtQualTA      -4.810e+00  6.071e-01  -7.922 4.74e-15 ***
## BsmtExposureGd   2.234e+00  4.726e-01   4.727 2.51e-06 ***
## BsmtExposureMn  -5.700e-02  4.723e-01  -0.121 0.903950    
## BsmtExposureNo  -7.758e-01  3.189e-01  -2.432 0.015128 *  
## KitchenQualFa   -3.555e+00  9.506e-01  -3.740 0.000192 ***
## KitchenQualGd   -3.905e+00  5.459e-01  -7.155 1.36e-12 ***
## KitchenQualTA   -4.354e+00  6.075e-01  -7.168 1.24e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.974 on 1386 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.8734, Adjusted R-squared:  0.8702 
## F-statistic: 273.1 on 35 and 1386 DF,  p-value: < 2.2e-16


Model 15

After removing MSZoning below, it looks like Model 15 is our best so far.

Let’s keep going to see if we’ve hit the inflection point between simplicity and accuracy.

lm15 <- update(lm14, .~. -MSZoning, data=train)
summary(lm15)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + BldgType + 
##     RoofMatl + ExterQual + BsmtQual + BsmtExposure + KitchenQual, 
##     data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -59.179  -1.679   0.000   1.623  30.292 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.034e+02  1.387e+01 -14.665  < 2e-16 ***
## LotArea          5.830e-05  1.185e-05   4.919 9.71e-07 ***
## OverallQual      1.525e+00  1.466e-01  10.398  < 2e-16 ***
## OverallCond      9.316e-01  1.122e-01   8.302 2.40e-16 ***
## YearBuilt        5.797e-02  6.501e-03   8.917  < 2e-16 ***
## BsmtUnfSF       -2.269e-03  2.840e-04  -7.989 2.83e-15 ***
## TotalBsmtSF      4.988e-03  4.009e-04  12.445  < 2e-16 ***
## GrLivArea        7.824e-03  2.939e-04  26.625  < 2e-16 ***
## GarageArea       3.542e-03  6.748e-04   5.248 1.77e-07 ***
## BldgType2fmCon  -2.060e+00  7.823e-01  -2.633 0.008548 ** 
## BldgTypeDuplex  -3.721e+00  7.008e-01  -5.310 1.27e-07 ***
## BldgTypeTwnhs   -2.782e+00  6.453e-01  -4.311 1.74e-05 ***
## BldgTypeTwnhsE  -1.717e+00  4.262e-01  -4.028 5.93e-05 ***
## RoofMatlCompShg  8.927e+01  4.386e+00  20.351  < 2e-16 ***
## RoofMatlMembran  9.137e+01  5.916e+00  15.443  < 2e-16 ***
## RoofMatlMetal    9.144e+01  5.962e+00  15.338  < 2e-16 ***
## RoofMatlRoll     9.025e+01  5.987e+00  15.074  < 2e-16 ***
## RoofMatlTar&Grv  8.644e+01  4.554e+00  18.982  < 2e-16 ***
## RoofMatlWdShake  8.730e+01  4.717e+00  18.505  < 2e-16 ***
## RoofMatlWdShngl  9.697e+01  4.614e+00  21.014  < 2e-16 ***
## ExterQualFa     -4.444e+00  1.501e+00  -2.960 0.003126 ** 
## ExterQualGd     -3.209e+00  7.421e-01  -4.324 1.64e-05 ***
## ExterQualTA     -4.582e+00  8.196e-01  -5.591 2.71e-08 ***
## BsmtQualFa      -4.128e+00  9.515e-01  -4.338 1.54e-05 ***
## BsmtQualGd      -4.169e+00  5.053e-01  -8.251 3.62e-16 ***
## BsmtQualTA      -4.651e+00  6.056e-01  -7.681 2.97e-14 ***
## BsmtExposureGd   2.190e+00  4.729e-01   4.631 3.99e-06 ***
## BsmtExposureMn   2.576e-02  4.722e-01   0.055 0.956493    
## BsmtExposureNo  -6.799e-01  3.180e-01  -2.138 0.032712 *  
## KitchenQualFa   -3.435e+00  9.523e-01  -3.607 0.000321 ***
## KitchenQualGd   -3.856e+00  5.467e-01  -7.053 2.76e-12 ***
## KitchenQualTA   -4.334e+00  6.090e-01  -7.117 1.77e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.985 on 1390 degrees of freedom
##   (38 observations deleted due to missingness)
## Multiple R-squared:  0.8723, Adjusted R-squared:  0.8695 
## F-statistic: 306.3 on 31 and 1390 DF,  p-value: < 2.2e-16


Model 16

Removing BsmtExposure in Model 16 below gave us back an additional observation, our Adjusted-R-squared didn’t drop meaningfully, and the residual interquartile range is still tight.

lm16 <- update(lm15, .~. -BsmtExposure, data=train)
summary(lm16)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + BldgType + 
##     RoofMatl + ExterQual + BsmtQual + KitchenQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.322  -1.702   0.000   1.592  29.655 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.054e+02  1.402e+01 -14.657  < 2e-16 ***
## LotArea          7.263e-05  1.181e-05   6.152 9.98e-10 ***
## OverallQual      1.600e+00  1.483e-01  10.788  < 2e-16 ***
## OverallCond      9.463e-01  1.138e-01   8.318  < 2e-16 ***
## YearBuilt        5.786e-02  6.570e-03   8.808  < 2e-16 ***
## BsmtUnfSF       -2.741e-03  2.791e-04  -9.821  < 2e-16 ***
## TotalBsmtSF      5.618e-03  3.954e-04  14.208  < 2e-16 ***
## GrLivArea        7.644e-03  2.952e-04  25.899  < 2e-16 ***
## GarageArea       3.566e-03  6.831e-04   5.221 2.05e-07 ***
## BldgType2fmCon  -1.845e+00  7.903e-01  -2.334 0.019722 *  
## BldgTypeDuplex  -3.290e+00  7.071e-01  -4.653 3.58e-06 ***
## BldgTypeTwnhs   -2.806e+00  6.538e-01  -4.292 1.89e-05 ***
## BldgTypeTwnhsE  -1.652e+00  4.321e-01  -3.823 0.000138 ***
## RoofMatlCompShg  9.066e+01  4.445e+00  20.396  < 2e-16 ***
## RoofMatlMembran  9.423e+01  5.987e+00  15.739  < 2e-16 ***
## RoofMatlMetal    9.503e+01  6.025e+00  15.773  < 2e-16 ***
## RoofMatlRoll     9.127e+01  6.073e+00  15.030  < 2e-16 ***
## RoofMatlTar&Grv  8.951e+01  4.598e+00  19.468  < 2e-16 ***
## RoofMatlWdShake  8.886e+01  4.781e+00  18.589  < 2e-16 ***
## RoofMatlWdShngl  9.930e+01  4.668e+00  21.273  < 2e-16 ***
## ExterQualFa     -4.057e+00  1.522e+00  -2.666 0.007769 ** 
## ExterQualGd     -3.395e+00  7.501e-01  -4.526 6.53e-06 ***
## ExterQualTA     -4.663e+00  8.296e-01  -5.620 2.30e-08 ***
## BsmtQualFa      -4.413e+00  9.634e-01  -4.581 5.04e-06 ***
## BsmtQualGd      -4.433e+00  5.111e-01  -8.674  < 2e-16 ***
## BsmtQualTA      -5.111e+00  6.095e-01  -8.386  < 2e-16 ***
## KitchenQualFa   -3.283e+00  9.653e-01  -3.401 0.000691 ***
## KitchenQualGd   -3.673e+00  5.522e-01  -6.652 4.15e-11 ***
## KitchenQualTA   -4.250e+00  6.164e-01  -6.894 8.18e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.043 on 1394 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.8682, Adjusted R-squared:  0.8655 
## F-statistic: 327.8 on 28 and 1394 DF,  p-value: < 2.2e-16


Poke Holes in our Model

Model 16 is good however we want to compare it to what Model 17 could be by looking at the Adjusted R-squareds and number of excluded records depending on what we remove next.


Model 17a

In reviewing Model 16’s output above, between BldgType and ExterQual, it looks like BldgType has less significance for the model so we’ll try removing that first.

lm17a <- update(lm16, .~. -BldgType, data=train)

summary(lm17a)$adj.r.squared
## [1] 0.8605953
length(summary(lm17a)$na.action)
## [1] 37
summary(lm16)$adj.r.squared
## [1] 0.8655162
length(summary(lm16)$na.action)
## [1] 37


Model 17b

Removing either of them didn’t have a significant impact on Adjusted R-squared so it looks like we can remove both of them.

lm17b <- update(lm16, .~. -ExterQual, data=train)

summary(lm17b)$adj.r.squared
## [1] 0.8626119
length(summary(lm17b)$na.action)
## [1] 37
summary(lm16)$adj.r.squared
## [1] 0.8655162
length(summary(lm16)$na.action)
## [1] 37


Model 17

Removing ExterQual actually had less of an impact than removing BldgType so we’ll take out ExterQual first.

lm17 <- update(lm16, .~. -ExterQual, data=train)
summary(lm17)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + BldgType + 
##     RoofMatl + BsmtQual + KitchenQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.116  -1.746  -0.048   1.598  27.672 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.153e+02  1.395e+01 -15.430  < 2e-16 ***
## LotArea          7.164e-05  1.193e-05   6.007 2.41e-09 ***
## OverallQual      1.761e+00  1.456e-01  12.096  < 2e-16 ***
## OverallCond      9.617e-01  1.138e-01   8.453  < 2e-16 ***
## YearBuilt        6.131e-02  6.569e-03   9.334  < 2e-16 ***
## BsmtUnfSF       -2.613e-03  2.805e-04  -9.314  < 2e-16 ***
## TotalBsmtSF      5.756e-03  3.986e-04  14.443  < 2e-16 ***
## GrLivArea        7.693e-03  2.980e-04  25.813  < 2e-16 ***
## GarageArea       3.648e-03  6.898e-04   5.288 1.44e-07 ***
## BldgType2fmCon  -1.747e+00  7.939e-01  -2.200 0.027965 *  
## BldgTypeDuplex  -3.436e+00  7.131e-01  -4.818 1.61e-06 ***
## BldgTypeTwnhs   -2.747e+00  6.607e-01  -4.158 3.41e-05 ***
## BldgTypeTwnhsE  -1.493e+00  4.355e-01  -3.428 0.000625 ***
## RoofMatlCompShg  8.988e+01  4.485e+00  20.041  < 2e-16 ***
## RoofMatlMembran  9.309e+01  6.045e+00  15.400  < 2e-16 ***
## RoofMatlMetal    9.416e+01  6.084e+00  15.478  < 2e-16 ***
## RoofMatlRoll     9.064e+01  6.131e+00  14.784  < 2e-16 ***
## RoofMatlTar&Grv  8.848e+01  4.639e+00  19.074  < 2e-16 ***
## RoofMatlWdShake  8.826e+01  4.821e+00  18.307  < 2e-16 ***
## RoofMatlWdShngl  9.831e+01  4.709e+00  20.875  < 2e-16 ***
## BsmtQualFa      -5.050e+00  9.642e-01  -5.237 1.88e-07 ***
## BsmtQualGd      -5.110e+00  4.950e-01 -10.322  < 2e-16 ***
## BsmtQualTA      -5.853e+00  6.004e-01  -9.748  < 2e-16 ***
## KitchenQualFa   -4.261e+00  9.434e-01  -4.517 6.80e-06 ***
## KitchenQualGd   -4.549e+00  5.223e-01  -8.710  < 2e-16 ***
## KitchenQualTA   -5.492e+00  5.836e-01  -9.411  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.086 on 1397 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.865,  Adjusted R-squared:  0.8626 
## F-statistic: 358.1 on 25 and 1397 DF,  p-value: < 2.2e-16


Model 18

After removing BldgType in Model 18 below we’ve removed all of the predictors that don’t clearly impact the model with strong (low) p-values.

lm18 <- update(lm17, .~. -BldgType, data=train)
summary(lm18)
## 
## Call:
## lm(formula = ssp ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + 
##     BsmtQual + KitchenQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.125  -1.839  -0.052   1.714  27.907 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.101e+02  1.404e+01 -14.963  < 2e-16 ***
## LotArea          8.286e-05  1.190e-05   6.963 5.10e-12 ***
## OverallQual      1.805e+00  1.460e-01  12.359  < 2e-16 ***
## OverallCond      1.028e+00  1.151e-01   8.939  < 2e-16 ***
## YearBuilt        5.798e-02  6.583e-03   8.808  < 2e-16 ***
## BsmtUnfSF       -2.539e-03  2.845e-04  -8.925  < 2e-16 ***
## TotalBsmtSF      5.670e-03  3.982e-04  14.237  < 2e-16 ***
## GrLivArea        7.663e-03  2.964e-04  25.854  < 2e-16 ***
## GarageArea       4.136e-03  6.959e-04   5.943 3.52e-09 ***
## RoofMatlCompShg  9.010e+01  4.535e+00  19.868  < 2e-16 ***
## RoofMatlMembran  9.338e+01  6.130e+00  15.233  < 2e-16 ***
## RoofMatlMetal    9.459e+01  6.167e+00  15.339  < 2e-16 ***
## RoofMatlRoll     8.777e+01  6.149e+00  14.273  < 2e-16 ***
## RoofMatlTar&Grv  8.859e+01  4.689e+00  18.892  < 2e-16 ***
## RoofMatlWdShake  8.873e+01  4.879e+00  18.185  < 2e-16 ***
## RoofMatlWdShngl  9.841e+01  4.770e+00  20.630  < 2e-16 ***
## BsmtQualFa      -4.968e+00  9.797e-01  -5.071 4.48e-07 ***
## BsmtQualGd      -5.218e+00  5.030e-01 -10.374  < 2e-16 ***
## BsmtQualTA      -5.765e+00  6.086e-01  -9.472  < 2e-16 ***
## KitchenQualFa   -4.290e+00  9.565e-01  -4.485 7.87e-06 ***
## KitchenQualGd   -4.494e+00  5.311e-01  -8.463  < 2e-16 ***
## KitchenQualTA   -5.619e+00  5.914e-01  -9.502  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.156 on 1401 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:   0.86,  Adjusted R-squared:  0.8579 
## F-statistic: 409.7 on 21 and 1401 DF,  p-value: < 2.2e-16


Evaluate Proposed Model

We could try removing each one of the variables and compare to our proposed final model 18 like we did twice before. However let’s go examine our remaining predictors like we did in the Commonsense Elimination step.


Evaluate Numeric Predictors

First we need to identify and remove the non-numeric predictors, RoofMatl, BsmtQual, KitchenQual so we can chart the pair-wise relationships between our target variable, ssp, on the y-axis and our numeric predictors on the x-axes.

Looking at the pair wise chart below, a number of the relationships looked curved and not linear. Let’s try a non-linear transformation on our target variable, SalePrice and see if any of the relationships with the predictors flatten out.

Code Summary + Identify Numeric Predictors + Review Pairwise Plots


Identify Numeric Predictors

str(train[c("ssp", "LotArea", "OverallQual", "OverallCond", "YearBuilt", "BsmtUnfSF", "TotalBsmtSF", "GrLivArea", "GarageArea", "RoofMatl", "BsmtQual", "KitchenQual")])
## 'data.frame':    1460 obs. of  12 variables:
##  $ ssp        : num  24.1 20.4 26.2 14.6 29.9 ...
##  $ LotArea    : int  8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
##  $ OverallQual: int  7 6 7 7 8 5 8 7 7 5 ...
##  $ OverallCond: int  5 8 5 5 5 5 5 6 5 6 ...
##  $ YearBuilt  : int  2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
##  $ BsmtUnfSF  : int  150 284 434 540 490 64 317 216 952 140 ...
##  $ TotalBsmtSF: int  856 1262 920 756 1145 796 1686 1107 952 991 ...
##  $ GrLivArea  : int  1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
##  $ GarageArea : int  548 460 608 642 836 480 636 484 468 205 ...
##  $ RoofMatl   : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ BsmtQual   : chr  "Gd" "Gd" "Gd" "TA" ...
##  $ KitchenQual: chr  "Gd" "TA" "Gd" "Gd" ...


Review Pairwise Plots

pairs18 <- train[c("ssp", "LotArea", "OverallQual", "OverallCond", "YearBuilt", "BsmtUnfSF", "TotalBsmtSF", "GrLivArea", "GarageArea")]
pairs(pairs18,gap=.5)


Log-Transform Target

Based on the previous section we’re going to take a log transformation of the target variable. We need to start with the original variable so we can easily undo the transformation on our predicted values if we end up keeping the transformation for our model.

train$sspln <- log(train$SalePrice)

max_sspln <- max(train$sspln)
min_sspln <- min(train$sspln)
range <- max_sspln - min_sspln
train$sspln <- 100 * (train$sspln - min_sspln) / range


Redo Pairwise Plots

It’s hard to tell with certainty but it does look like the obvious curve in OverallQual has linearized so we’ll keep this change for our model.

pairs18 <- train[c("sspln", "LotArea", "OverallQual", "OverallCond", "YearBuilt", "BsmtUnfSF", "TotalBsmtSF", "GrLivArea", "GarageArea")]
pairs(pairs18,gap=.5)


Retrain Model

Let’s retrain our model using log-transformed (and then scaled to 100) SalePrice to compare against our previous model.

It looks like Adjusted R-Squared has gone up but our model now slightly overestimates the value of homes.

Let’s keep and submit and see how we did.

lm19 <- lm(sspln ~ LotArea + OverallQual + OverallCond + YearBuilt + 
    BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + 
    BsmtQual + KitchenQual, data=train)
summary(lm19)
## 
## Call:
## lm(formula = sspln ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + 
##     BsmtQual + KitchenQual, data = train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -58.642  -1.875   0.316   2.397  15.579 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -2.954e+02  1.536e+01 -19.228  < 2e-16 ***
## LotArea          9.797e-05  1.302e-05   7.524 9.46e-14 ***
## OverallQual      2.439e+00  1.598e-01  15.262  < 2e-16 ***
## OverallCond      1.901e+00  1.259e-01  15.097  < 2e-16 ***
## YearBuilt        1.039e-01  7.203e-03  14.430  < 2e-16 ***
## BsmtUnfSF       -2.709e-03  3.113e-04  -8.703  < 2e-16 ***
## TotalBsmtSF      6.286e-03  4.357e-04  14.426  < 2e-16 ***
## GrLivArea        8.815e-03  3.243e-04  27.180  < 2e-16 ***
## GarageArea       6.491e-03  7.614e-04   8.525  < 2e-16 ***
## RoofMatlCompShg  9.753e+01  4.962e+00  19.656  < 2e-16 ***
## RoofMatlMembran  1.023e+02  6.708e+00  15.255  < 2e-16 ***
## RoofMatlMetal    1.053e+02  6.748e+00  15.599  < 2e-16 ***
## RoofMatlRoll     9.666e+01  6.729e+00  14.366  < 2e-16 ***
## RoofMatlTar&Grv  9.700e+01  5.131e+00  18.904  < 2e-16 ***
## RoofMatlWdShake  9.586e+01  5.339e+00  17.955  < 2e-16 ***
## RoofMatlWdShngl  9.844e+01  5.219e+00  18.860  < 2e-16 ***
## BsmtQualFa      -2.702e+00  1.072e+00  -2.520 0.011832 *  
## BsmtQualGd      -1.832e+00  5.504e-01  -3.328 0.000899 ***
## BsmtQualTA      -2.711e+00  6.660e-01  -4.071 4.94e-05 ***
## KitchenQualFa   -2.996e+00  1.047e+00  -2.863 0.004262 ** 
## KitchenQualGd   -1.329e+00  5.811e-01  -2.288 0.022304 *  
## KitchenQualTA   -2.778e+00  6.471e-01  -4.293 1.89e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.548 on 1401 degrees of freedom
##   (37 observations deleted due to missingness)
## Multiple R-squared:  0.8758, Adjusted R-squared:  0.874 
## F-statistic: 470.5 on 21 and 1401 DF,  p-value: < 2.2e-16


Final Model

It’s time to train the final model, predict the target values for the test data, and submit our results to the contest!


Train Final Model

Here we train the final model but without the 0-100 scaling we used to help compare the residual interquartile range.

train$spln <- log(train$SalePrice)

lm20 <- lm(spln ~ LotArea + OverallQual + OverallCond + YearBuilt + 
    BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + 
    BsmtQual + KitchenQual, data=train)


Predict Test Data Targets

This code produces target values for the test data, removes the log-transformation in the predicted sale price, and writes it to a file so we can submit it.

test_targetsln <- predict(lm20, newdata=test)
test_targets <- exp(test_targetsln)
targets <- data.frame(cbind(test_targets))
test$SalePrice <- targets[,1]
sub = data.frame(test$Id,test$SalePrice)
colnames(sub)[1] ="Id"
colnames(sub)[2] ="SalePrice"
write.csv(sub, file="./submission.csv", row.names=FALSE)


Our Submission Failed

Our submission failed becasue there were N/A values. It looks like the majority of the NA values come from the BsmtQual predictor so we’re going to adjust our model one more time.


Where are the NAs

Here we took just the rows where SalePrice is NA and look at just the predictors used in the model to arrive at needing to remove BsmtQual from our model. Currently there are 46 missing.

new_test <- test[is.na(test$SalePrice),]
new_test[c('Id', 'SalePrice', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'BsmtUnfSF', 'TotalBsmtSF', 'GrLivArea', 'GarageArea', 'RoofMatl', 'BsmtQual', 'KitchenQual')]
##        Id SalePrice LotArea OverallQual OverallCond YearBuilt BsmtUnfSF
## 96   1556        NA   10632           5           3      1917       689
## 126  1586        NA    8777           3           6      1945         0
## 134  1594        NA    7200           4           6      1967         0
## 270  1730        NA    8250           6           7      1981         0
## 319  1779        NA    9533           5           5      1953         0
## 355  1815        NA    5925           2           4      1940         0
## 388  1848        NA    9000           2           2      1947         0
## 389  1849        NA   15635           4           5      1954         0
## 397  1857        NA   26400           5           7      1880         0
## 398  1858        NA    7018           5           5      1979         0
## 399  1859        NA    7018           5           5      1979         0
## 401  1861        NA    7007           5           5      1979         0
## 456  1916        NA   21780           2           4      1910         0
## 591  2051        NA    7785           5           5      1956         0
## 607  2067        NA    8838           5           3      1957         0
## 609  2069        NA   10122           4           6      1948         0
## 661  2121        NA    5940           4           7      1946        NA
## 663  2123        NA    6120           5           6      1945         0
## 729  2189        NA   47007           5           7      1959         0
## 730  2190        NA    6012           4           5      1955         0
## 731  2191        NA    6845           4           5      1955         0
## 734  2194        NA    8050           5           8      1947         0
## 757  2217        NA   14584           1           5      1952         0
## 758  2218        NA    5280           4           7      1895       173
## 759  2219        NA    5150           4           7      1910       356
## 765  2225        NA   10260           5           4      1976         0
## 928  2388        NA   10899           4           5      1964         0
## 976  2436        NA    7000           5           6      1961         0
## 993  2453        NA    8626           4           6      1956         0
## 994  2454        NA   11800           4           7      1949         0
## 1031 2491        NA    9000           4           7      1945         0
## 1039 2499        NA   11515           4           5      1958         0
## 1088 2548        NA    9555           5           6      1979         0
## 1093 2553        NA    6882           4           3      1955         0
## 1105 2565        NA   13108           5           5      1951         0
## 1117 2577        NA    9060           5           6      1923       311
## 1119 2579        NA   11067           2           4      1939         0
## 1140 2600        NA   43500           3           5      1953         0
## 1243 2703        NA    8927           6           6      1977         0
## 1304 2764        NA   11650           7           5      1959         0
## 1307 2767        NA    8544           3           4      1950         0
## 1344 2804        NA   21370           5           5      1950         0
## 1345 2805        NA    8250           5           7      1935         0
## 1365 2825        NA   12048           5           6      1952         0
## 1432 2892        NA   12366           3           5      1945         0
## 1445 2905        NA   31250           1           3      1951         0
##      TotalBsmtSF GrLivArea GarageArea RoofMatl BsmtQual KitchenQual
## 96           689      1224        180  CompShg       Gd        <NA>
## 126            0       640        240  CompShg     <NA>          TA
## 134            0      2650          0  Tar&Grv     <NA>          TA
## 270            0      1882        612  CompShg     <NA>          TA
## 319            0      1210        616  CompShg     <NA>          TA
## 355            0       612        308  CompShg     <NA>          TA
## 388            0       660          0  CompShg     <NA>          Fa
## 389            0      1383        498  CompShg     <NA>          TA
## 397            0      2016        576  CompShg     <NA>          TA
## 398            0      2228        720  CompShg     <NA>          TA
## 399            0      1535        400  CompShg     <NA>          TA
## 401            0      1513        400  CompShg     <NA>          TA
## 456            0       810        280  CompShg     <NA>          TA
## 591            0      1014        267  CompShg     <NA>          TA
## 607            0      1764        301  CompShg     <NA>          TA
## 609            0       869        390  CompShg     <NA>          TA
## 661           NA       896        280  CompShg     <NA>          TA
## 663            0       808        164  CompShg     <NA>          TA
## 729            0      3820        624  CompShg     <NA>          Ex
## 730            0      1152          0  CompShg     <NA>          TA
## 731            0      1152          0  CompShg     <NA>          TA
## 734            0      1137          0  CompShg     <NA>          TA
## 757            0       733        487  CompShg     <NA>          Fa
## 758          173      1361        185  CompShg     <NA>          TA
## 759          356      1049        195  CompShg     <NA>          TA
## 765            0      1872        484  CompShg     <NA>          TA
## 928            0      1224        530  CompShg     <NA>          TA
## 976            0       925        300  CompShg     <NA>          TA
## 993            0       968        331  CompShg     <NA>          TA
## 994            0      1382        384  CompShg     <NA>          TA
## 1031           0       998        460  CompShg     <NA>          TA
## 1039           0       943        308  CompShg     <NA>          Gd
## 1088           0      2233        579  CompShg     <NA>          TA
## 1093           0      1152          0  CompShg     <NA>          Fa
## 1105           0      1226        400  CompShg     <NA>          TA
## 1117         859      1828         NA  CompShg       Gd          Gd
## 1119           0       845        256  CompShg     <NA>          TA
## 1140           0      2034       1041  CompShg     <NA>          TA
## 1243           0      1654        528  CompShg     <NA>          TA
## 1304           0      1472        484  CompShg     <NA>          Gd
## 1307           0      1040        400  CompShg     <NA>          TA
## 1344           0      1640        394  CompShg     <NA>          TA
## 1345           0      1032        260  CompShg     <NA>          TA
## 1365           0      1488        569  CompShg     <NA>          TA
## 1432           0       729          0  CompShg     <NA>          TA
## 1445           0      1600        270  CompShg     <NA>          TA


Model 21

Originally I was going to train a model without BsmtQual to run on just the 46 with missing. This is how I’d like to proceed but then it wouldn’t be one model. I’ll save that approach for real world applications. Looking at the model output it still looks great.

lm21 <- lm(spln ~ LotArea + OverallQual + OverallCond + YearBuilt + 
    BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + KitchenQual, data=train)
summary(lm21)
## 
## Call:
## lm(formula = spln ~ LotArea + OverallQual + OverallCond + YearBuilt + 
##     BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea + RoofMatl + 
##     KitchenQual, data = train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.79879 -0.05970  0.00734  0.07420  0.47087 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      8.610e-01  4.011e-01   2.147 0.031979 *  
## LotArea          3.009e-06  3.998e-07   7.525 9.21e-14 ***
## OverallQual      7.898e-02  4.688e-03  16.846  < 2e-16 ***
## OverallCond      5.603e-02  3.782e-03  14.815  < 2e-16 ***
## YearBuilt        3.417e-03  1.865e-04  18.326  < 2e-16 ***
## BsmtUnfSF       -8.465e-05  9.563e-06  -8.852  < 2e-16 ***
## TotalBsmtSF      1.970e-04  1.211e-05  16.267  < 2e-16 ***
## GrLivArea        2.725e-04  9.778e-06  27.870  < 2e-16 ***
## GarageArea       2.089e-04  2.285e-05   9.144  < 2e-16 ***
## RoofMatlCompShg  3.013e+00  1.519e-01  19.834  < 2e-16 ***
## RoofMatlMembran  3.173e+00  2.061e-01  15.399  < 2e-16 ***
## RoofMatlMetal    3.233e+00  2.072e-01  15.608  < 2e-16 ***
## RoofMatlRoll     2.979e+00  2.065e-01  14.424  < 2e-16 ***
## RoofMatlTar&Grv  3.004e+00  1.568e-01  19.160  < 2e-16 ***
## RoofMatlWdShake  2.964e+00  1.636e-01  18.114  < 2e-16 ***
## RoofMatlWdShngl  3.040e+00  1.601e-01  18.986  < 2e-16 ***
## KitchenQualFa   -1.226e-01  3.101e-02  -3.955 8.02e-05 ***
## KitchenQualGd   -6.098e-02  1.650e-02  -3.695 0.000228 ***
## KitchenQualTA   -1.077e-01  1.883e-02  -5.719 1.30e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1403 on 1441 degrees of freedom
## Multiple R-squared:  0.8781, Adjusted R-squared:  0.8766 
## F-statistic: 576.7 on 18 and 1441 DF,  p-value: < 2.2e-16


Any NAs left

We know there will be a few NAs. We’re going to replace them with the median SalePrice.

Good news! We’re down to 3 NAs with the NAs spread over four predictors.

Let’s swap the NAs with the median value since we’re not going to train a model without the offending predictor just for these few.

Code Summary + predict test values with new Model 21 + generate table with the NA records and their predictors + overwrite their values with the median sales price $158,008.5

test_targetsln <- predict(lm21, newdata=test)
test_targets <- exp(test_targetsln)
targets <- data.frame(cbind(test_targets))
test$SalePrice <- targets[,1]
new_test <- test[is.na(test$SalePrice),]
new_test[c('Id', 'SalePrice', 'LotArea', 'OverallQual', 'OverallCond', 'YearBuilt', 'BsmtUnfSF', 'TotalBsmtSF', 'GrLivArea', 'GarageArea', 'RoofMatl', 'KitchenQual')]
##        Id SalePrice LotArea OverallQual OverallCond YearBuilt BsmtUnfSF
## 96   1556        NA   10632           5           3      1917       689
## 661  2121        NA    5940           4           7      1946        NA
## 1117 2577        NA    9060           5           6      1923       311
##      TotalBsmtSF GrLivArea GarageArea RoofMatl KitchenQual
## 96           689      1224        180  CompShg        <NA>
## 661           NA       896        280  CompShg          TA
## 1117         859      1828         NA  CompShg          Gd
medianSalesPrice <- median(test$SalePrice, na.rm=TRUE)
test$SalePrice[96] <- medianSalesPrice
test$SalePrice[661] <- medianSalesPrice
test$SalePrice[1117] <- medianSalesPrice


Final Submission

Here we print Model 21’s predictions of the test data

sub = data.frame(test$Id,test$SalePrice)
colnames(sub)[1] ="Id"
colnames(sub)[2] ="SalePrice"
write.csv(sub, file="./submission.csv", row.names=FALSE)


Score

We got 0.15197

From a Google search, “the majority of scores are between 0 and 0.25, with a median value of 0.1446”, so we do ok!