Data 605 Final Project

Abstract:

House Prices: Advanced Regression Techniques

https://www.kaggle.com/c/house-prices-advanced-regression-techniques

Predict sales prices and practice feature engineering

Competition Description:

Ask a home buyer to describe their dream house, and they probably won’t begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition’s dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Goal It is to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

Train Data

Fristly, I read data into r. Use str function to find the general outlook for train data, such as the number of observations, the names of the variables and their data types.

train<-read.csv("https://raw.githubusercontent.com/czhu505/Data605-/master/train.csv",stringsAsFactors = F)
str(train)
## 'data.frame':    1460 obs. of  81 variables:
##  $ Id           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ MSSubClass   : int  60 20 60 70 60 50 20 60 50 190 ...
##  $ MSZoning     : chr  "RL" "RL" "RL" "RL" ...
##  $ LotFrontage  : int  65 80 68 60 84 85 75 NA 51 50 ...
##  $ LotArea      : int  8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
##  $ Street       : chr  "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley        : chr  NA NA NA NA ...
##  $ LotShape     : chr  "Reg" "Reg" "IR1" "IR1" ...
##  $ LandContour  : chr  "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities    : chr  "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ LotConfig    : chr  "Inside" "FR2" "Inside" "Corner" ...
##  $ LandSlope    : chr  "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood : chr  "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
##  $ Condition1   : chr  "Norm" "Feedr" "Norm" "Norm" ...
##  $ Condition2   : chr  "Norm" "Norm" "Norm" "Norm" ...
##  $ BldgType     : chr  "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ HouseStyle   : chr  "2Story" "1Story" "2Story" "2Story" ...
##  $ OverallQual  : int  7 6 7 7 8 5 8 7 7 5 ...
##  $ OverallCond  : int  5 8 5 5 5 5 5 6 5 6 ...
##  $ YearBuilt    : int  2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
##  $ YearRemodAdd : int  2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
##  $ RoofStyle    : chr  "Gable" "Gable" "Gable" "Gable" ...
##  $ RoofMatl     : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior1st  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
##  $ Exterior2nd  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
##  $ MasVnrType   : chr  "BrkFace" "None" "BrkFace" "None" ...
##  $ MasVnrArea   : int  196 0 162 0 350 0 186 240 0 0 ...
##  $ ExterQual    : chr  "Gd" "TA" "Gd" "TA" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ Foundation   : chr  "PConc" "CBlock" "PConc" "BrkTil" ...
##  $ BsmtQual     : chr  "Gd" "Gd" "Gd" "TA" ...
##  $ BsmtCond     : chr  "TA" "TA" "TA" "Gd" ...
##  $ BsmtExposure : chr  "No" "Gd" "Mn" "No" ...
##  $ BsmtFinType1 : chr  "GLQ" "ALQ" "GLQ" "ALQ" ...
##  $ BsmtFinSF1   : int  706 978 486 216 655 732 1369 859 0 851 ...
##  $ BsmtFinType2 : chr  "Unf" "Unf" "Unf" "Unf" ...
##  $ BsmtFinSF2   : int  0 0 0 0 0 0 0 32 0 0 ...
##  $ BsmtUnfSF    : int  150 284 434 540 490 64 317 216 952 140 ...
##  $ TotalBsmtSF  : int  856 1262 920 756 1145 796 1686 1107 952 991 ...
##  $ Heating      : chr  "GasA" "GasA" "GasA" "GasA" ...
##  $ HeatingQC    : chr  "Ex" "Ex" "Ex" "Gd" ...
##  $ CentralAir   : chr  "Y" "Y" "Y" "Y" ...
##  $ Electrical   : chr  "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ X1stFlrSF    : int  856 1262 920 961 1145 796 1694 1107 1022 1077 ...
##  $ X2ndFlrSF    : int  854 0 866 756 1053 566 0 983 752 0 ...
##  $ LowQualFinSF : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ GrLivArea    : int  1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
##  $ BsmtFullBath : int  1 0 1 1 1 1 1 1 0 1 ...
##  $ BsmtHalfBath : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ FullBath     : int  2 2 2 1 2 1 2 2 2 1 ...
##  $ HalfBath     : int  1 0 1 0 1 1 0 1 0 0 ...
##  $ BedroomAbvGr : int  3 3 3 3 4 1 3 3 2 2 ...
##  $ KitchenAbvGr : int  1 1 1 1 1 1 1 1 2 2 ...
##  $ KitchenQual  : chr  "Gd" "TA" "Gd" "Gd" ...
##  $ TotRmsAbvGrd : int  8 6 6 7 9 5 7 7 8 5 ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Typ" ...
##  $ Fireplaces   : int  0 1 1 1 1 0 1 2 2 2 ...
##  $ FireplaceQu  : chr  NA "TA" "TA" "Gd" ...
##  $ GarageType   : chr  "Attchd" "Attchd" "Attchd" "Detchd" ...
##  $ GarageYrBlt  : int  2003 1976 2001 1998 2000 1993 2004 1973 1931 1939 ...
##  $ GarageFinish : chr  "RFn" "RFn" "RFn" "Unf" ...
##  $ GarageCars   : int  2 2 2 3 3 2 2 2 2 1 ...
##  $ GarageArea   : int  548 460 608 642 836 480 636 484 468 205 ...
##  $ GarageQual   : chr  "TA" "TA" "TA" "TA" ...
##  $ GarageCond   : chr  "TA" "TA" "TA" "TA" ...
##  $ PavedDrive   : chr  "Y" "Y" "Y" "Y" ...
##  $ WoodDeckSF   : int  0 298 0 0 192 40 255 235 90 0 ...
##  $ OpenPorchSF  : int  61 0 42 35 84 30 57 204 0 4 ...
##  $ EnclosedPorch: int  0 0 0 272 0 0 0 228 205 0 ...
##  $ X3SsnPorch   : int  0 0 0 0 0 320 0 0 0 0 ...
##  $ ScreenPorch  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolArea     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolQC       : chr  NA NA NA NA ...
##  $ Fence        : chr  NA NA NA NA ...
##  $ MiscFeature  : chr  NA NA NA NA ...
##  $ MiscVal      : int  0 0 0 0 0 700 0 350 0 0 ...
##  $ MoSold       : int  2 5 9 2 12 10 8 11 4 1 ...
##  $ YrSold       : int  2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
##  $ SaleType     : chr  "WD" "WD" "WD" "WD" ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Abnorml" ...
##  $ SalePrice    : int  208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...

Since to predict sales prices is the goal, SalePrice is depandent varible. I select numerical variables as our observations.

library(purrr)
## Warning: package 'purrr' was built under R version 3.3.3
traindata<-train%>%keep(is.numeric)
summary(traindata)
##        Id           MSSubClass     LotFrontage        LotArea      
##  Min.   :   1.0   Min.   : 20.0   Min.   : 21.00   Min.   :  1300  
##  1st Qu.: 365.8   1st Qu.: 20.0   1st Qu.: 59.00   1st Qu.:  7554  
##  Median : 730.5   Median : 50.0   Median : 69.00   Median :  9478  
##  Mean   : 730.5   Mean   : 56.9   Mean   : 70.05   Mean   : 10517  
##  3rd Qu.:1095.2   3rd Qu.: 70.0   3rd Qu.: 80.00   3rd Qu.: 11602  
##  Max.   :1460.0   Max.   :190.0   Max.   :313.00   Max.   :215245  
##                                   NA's   :259                      
##   OverallQual      OverallCond      YearBuilt     YearRemodAdd 
##  Min.   : 1.000   Min.   :1.000   Min.   :1872   Min.   :1950  
##  1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967  
##  Median : 6.000   Median :5.000   Median :1973   Median :1994  
##  Mean   : 6.099   Mean   :5.575   Mean   :1971   Mean   :1985  
##  3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004  
##  Max.   :10.000   Max.   :9.000   Max.   :2010   Max.   :2010  
##                                                                
##    MasVnrArea       BsmtFinSF1       BsmtFinSF2        BsmtUnfSF     
##  Min.   :   0.0   Min.   :   0.0   Min.   :   0.00   Min.   :   0.0  
##  1st Qu.:   0.0   1st Qu.:   0.0   1st Qu.:   0.00   1st Qu.: 223.0  
##  Median :   0.0   Median : 383.5   Median :   0.00   Median : 477.5  
##  Mean   : 103.7   Mean   : 443.6   Mean   :  46.55   Mean   : 567.2  
##  3rd Qu.: 166.0   3rd Qu.: 712.2   3rd Qu.:   0.00   3rd Qu.: 808.0  
##  Max.   :1600.0   Max.   :5644.0   Max.   :1474.00   Max.   :2336.0  
##  NA's   :8                                                           
##   TotalBsmtSF       X1stFlrSF      X2ndFlrSF     LowQualFinSF    
##  Min.   :   0.0   Min.   : 334   Min.   :   0   Min.   :  0.000  
##  1st Qu.: 795.8   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000  
##  Median : 991.5   Median :1087   Median :   0   Median :  0.000  
##  Mean   :1057.4   Mean   :1163   Mean   : 347   Mean   :  5.845  
##  3rd Qu.:1298.2   3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000  
##  Max.   :6110.0   Max.   :4692   Max.   :2065   Max.   :572.000  
##                                                                  
##    GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath    
##  Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :1464   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565  
##  3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000  
##                                                                   
##     HalfBath       BedroomAbvGr    KitchenAbvGr    TotRmsAbvGrd   
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Min.   : 2.000  
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   1st Qu.: 5.000  
##  Median :0.0000   Median :3.000   Median :1.000   Median : 6.000  
##  Mean   :0.3829   Mean   :2.866   Mean   :1.047   Mean   : 6.518  
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000   3rd Qu.: 7.000  
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000   Max.   :14.000  
##                                                                   
##    Fireplaces     GarageYrBlt     GarageCars      GarageArea    
##  Min.   :0.000   Min.   :1900   Min.   :0.000   Min.   :   0.0  
##  1st Qu.:0.000   1st Qu.:1961   1st Qu.:1.000   1st Qu.: 334.5  
##  Median :1.000   Median :1980   Median :2.000   Median : 480.0  
##  Mean   :0.613   Mean   :1979   Mean   :1.767   Mean   : 473.0  
##  3rd Qu.:1.000   3rd Qu.:2002   3rd Qu.:2.000   3rd Qu.: 576.0  
##  Max.   :3.000   Max.   :2010   Max.   :4.000   Max.   :1418.0  
##                  NA's   :81                                     
##    WoodDeckSF      OpenPorchSF     EnclosedPorch      X3SsnPorch    
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Median :  0.00   Median : 25.00   Median :  0.00   Median :  0.00  
##  Mean   : 94.24   Mean   : 46.66   Mean   : 21.95   Mean   :  3.41  
##  3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00   3rd Qu.:  0.00  
##  Max.   :857.00   Max.   :547.00   Max.   :552.00   Max.   :508.00  
##                                                                     
##   ScreenPorch        PoolArea          MiscVal             MoSold      
##  Min.   :  0.00   Min.   :  0.000   Min.   :    0.00   Min.   : 1.000  
##  1st Qu.:  0.00   1st Qu.:  0.000   1st Qu.:    0.00   1st Qu.: 5.000  
##  Median :  0.00   Median :  0.000   Median :    0.00   Median : 6.000  
##  Mean   : 15.06   Mean   :  2.759   Mean   :   43.49   Mean   : 6.322  
##  3rd Qu.:  0.00   3rd Qu.:  0.000   3rd Qu.:    0.00   3rd Qu.: 8.000  
##  Max.   :480.00   Max.   :738.000   Max.   :15500.00   Max.   :12.000  
##                                                                        
##      YrSold       SalePrice     
##  Min.   :2006   Min.   : 34900  
##  1st Qu.:2007   1st Qu.:129975  
##  Median :2008   Median :163000  
##  Mean   :2008   Mean   :180921  
##  3rd Qu.:2009   3rd Qu.:214000  
##  Max.   :2010   Max.   :755000  
## 
correlation<-cor(traindata, use="pairwise")
correlation<-correlation[,ncol(correlation)]
sort(correlation)
##  KitchenAbvGr EnclosedPorch    MSSubClass   OverallCond        YrSold 
##   -0.13590737   -0.12857796   -0.08428414   -0.07785589   -0.02892259 
##  LowQualFinSF            Id       MiscVal  BsmtHalfBath    BsmtFinSF2 
##   -0.02560613   -0.02191672   -0.02118958   -0.01684415   -0.01137812 
##    X3SsnPorch        MoSold      PoolArea   ScreenPorch  BedroomAbvGr 
##    0.04458367    0.04643225    0.09240355    0.11144657    0.16821315 
##     BsmtUnfSF  BsmtFullBath       LotArea      HalfBath   OpenPorchSF 
##    0.21447911    0.22712223    0.26384335    0.28410768    0.31585623 
##     X2ndFlrSF    WoodDeckSF   LotFrontage    BsmtFinSF1    Fireplaces 
##    0.31933380    0.32441344    0.35179910    0.38641981    0.46692884 
##    MasVnrArea   GarageYrBlt  YearRemodAdd     YearBuilt  TotRmsAbvGrd 
##    0.47749305    0.48636168    0.50710097    0.52289733    0.53372316 
##      FullBath     X1stFlrSF   TotalBsmtSF    GarageArea    GarageCars 
##    0.56066376    0.60585218    0.61358055    0.62343144    0.64040920 
##     GrLivArea   OverallQual     SalePrice 
##    0.70862448    0.79098160    1.00000000

The last column of correlation table shows the correlation between SalePrice and other variables. After sort the list of the number, I can search a feature which has the strongest relatiship with SalePrice. Here, I look up from the 3rd largest values.

Pick one of the quantitative independent variables from the training data set (train.csv) , and define that variable as X. Make sure this variable is skewed to the right!

The following is to use hist graph to oberve the skewness of the variables.

par(mfrow=c(2,2)) 
hist(traindata$GrLivArea )
hist(traindata$GarageCars)
hist(traindata$GarageArea)
hist(traindata$TotalBsmtSF)

Now, I select “TotalBsmtSF” as x depent variable which has right skeness distribution. Pick the dependent variable “SalePrice” and define it as Y.

Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the 1st quartile of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities. In addition, make a table of counts as shown below.

a. P(X>x | Y>y)

It is interpreting the probability of the x greater than 1st quantile given the y greater than 1st quantile. Since X and Y are discrete, P(X>x)* p(Y>y)/p(Y>y)= P(X>x | Y>y)

library(data.table)
library(dplyr)
X<-traindata$TotalBsmtSF
Y<-traindata$SalePrice
t<-data.table(X,Y)
b<-t[ which(Y>quantile(Y,0.25)),]
a<-b[ which(X>quantile(X,0.25)),]
nrow(t)
## [1] 1460
nrow(b)
## [1] 1095
nrow(a)
## [1] 810
nrow(a)/nrow(b)
## [1] 0.739726

b. P(X>x, Y>y)

It is inerpreting the probability of x greater than 1st quantile and y greater than 1st quantile.

nrow(a)/nrow(t)
## [1] 0.5547945

c. P(Xy)

It is interpreting the probability of the x smaller than 1st quantile given the y greater than 1st quantile. P(Xy)/p(Y>y) = P(Xquantile(X,0.25)),])

[1] 1095

```

| x/y | <=1st quartile | >1st quartile | Total |

|<=1st quartile | 115 | 285 | 400 |

|>1st quartile | 285 | 810 | 1095 |


|Total | 400 | 1095 | 1495 |

Does splitting the training data in this fashion make them independent? Let A be the new variable counting those observations above the 1st quartile for X, and let B be the new variable counting those observations above the 1st quartile for Y. Does P(AB)=P(A)P(B)? Check mathematically, and then evaluate by running a Chi Square test for association.

A<-sum(X>quantile(X,0.25))
B<-sum(Y>quantile(Y,0.25))
PA<-A/length(X)
PB<-B/length(Y)
PA*PB
## [1] 0.5625

Comparing to the previous caluculation for P(X>x, Y>y) = 0.55479, P(X>x)P(Y>y)=0.5625 is slictly larger, which means their coveriance is not equal to zero. Let’s see the following chi square test. The hypothesis whether “TotalBsmtSF” is independent of “SalePrice” at .05 significance level.

chisq.test(X,Y) 
## Warning in chisq.test(X, Y): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  X and Y
## X-squared = 509710, df = 476640, p-value < 2.2e-16

As the p-value is smaller than the .05 significance level, I do reject the null hypothesis that “TotalBsmtSF” is independent of “SalePrice” at .05 significance level, wihc is match the previous result as P(X>x, Y>y) != P(X>x)P(Y>y).

Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot of X and Y. Derive a correlation matrix for any THREE quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide a 92% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?

plot(Y ~ X, col = "blue",xlab="TotalBsmtSF", ylab="SalePrice",scientific=FALSE)

From the scatter plot “TotalBsmtSF” shows it has linear correlationship with SalePrice.However, there are many outliners and overlap of the data.

The following, I randomly pick three variables “TotalBsmtSF”, “GrLivArea” and “SalePrice”. The following is to show correlation matrix of 3 variables.

corrmatrix<-cor(subset(traindata, select = c("TotalBsmtSF", "GrLivArea", "SalePrice")))
corrmatrix
##             TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF   1.0000000 0.4548682 0.6135806
## GrLivArea     0.4548682 1.0000000 0.7086245
## SalePrice     0.6135806 0.7086245 1.0000000

In correlationship matrix, I can see the pariwise relationship in the data set. It shows non indepents between each others, since non are zero or closes to zero.

Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide a 92% confidence interval.

H0: true correlation of TotalBsmtSF and GrLivArea is equal to 0.

H1: true correlation of TotalBsmtSF and GrLivArea is not equal to 0.

t1<-subset(traindata, select = c("TotalBsmtSF", "GrLivArea", "SalePrice")) 
cor.test(~ TotalBsmtSF+ GrLivArea, data = t1, conf.level = 0.92)
## 
##  Pearson's product-moment correlation
## 
## data:  TotalBsmtSF and GrLivArea
## t = 19.503, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
##  0.4177447 0.4904754
## sample estimates:
##       cor 
## 0.4548682

Since cor.test has p small than 0.05 within 92% confident interval, I can reject H0 that TotalBsmtSF and GrLivArea are not correlative.

I use same method to test TotalBsmtSF and SalePrice, GrLivArea and SalePrice.

cor.test(~ TotalBsmtSF+ SalePrice, data = t1, conf.level = 0.92)
## 
##  Pearson's product-moment correlation
## 
## data:  TotalBsmtSF and SalePrice
## t = 29.671, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
##  0.5841762 0.6413763
## sample estimates:
##       cor 
## 0.6135806
cor.test(~ GrLivArea+ SalePrice, data = t1, conf.level = 0.92)
## 
##  Pearson's product-moment correlation
## 
## data:  GrLivArea and SalePrice
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
##  0.6850407 0.7307245
## sample estimates:
##       cor 
## 0.7086245

Since two tests have p small than 0.05 within 92% confident interval, I can reject H0 that TotalBsmtSF and SalePrice, GrLivArea and SalePrice are not correlative.

Linear Algebra and Correlation. Invert your 3 x 3 correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.

Usaully,the precision matrix thus allows to obtain direct covariation between two variables by capturing partial correlations.It gives the conditional independent covariation between two variables.

Fristly I need to check if det=0. If det is not 0, I can directly use inv(A) to find inverse matrix.

det(corrmatrix)
## [1] 0.3100169
library("matlib")
precisionMatrix<-function(A){
  n=nrow(A)
  m<-matrix(0,nrow=n,ncol=n)
  for (i in 1:n)
    for(j in 1:n)
      m[i,j]=cofactor(A,i,j)
  return(t(m)/det(A)) 
}
precision<-precisionMatrix(corrmatrix)
precision
##             [,1]        [,2]       [,3]
## [1,]  1.60588442 -0.06473842 -0.9394642
## [2,] -0.06473842  2.01124151 -1.3854927
## [3,] -0.93946422 -1.38549273  2.5582310

From above pariswise converiance matrex, it clearly showed all covariance are non zero, which match previous Pearson test results.

Check for decomposition on the matrix A^T*A = I.

round(precision%*%corrmatrix,5)
##      TotalBsmtSF GrLivArea SalePrice
## [1,]           1         0         0
## [2,]           0         1         0
## [3,]           0         0         1

I conduct LDU decomposition on the corrmatrix.

swap<-function(My_matrix,i,m){
   j=i
   temp=My_matrix
   while((My_matrix[i,j]==0) && (i<=m)) i=i+1
   if(My_matrix[i,j]!=0){
       My_matrix[j,]=My_matrix[i,]
       My_matrix[i,]=temp[j,]
   }
   return(My_matrix)
}



U<-function(My_matrix) {
  
    M=nrow(My_matrix)
    N=nrow(My_matrix)
     
    i=1
    j=1
    k=1
    while(i<=M && j<=N){
      
        if(My_matrix[i,j]==0) swap(My_matrix,i,j,M)
        
        if(My_matrix[i,i]==0) {
          j=j+1
        }
        
        if(My_matrix[i,j]!=0) {
              My_matrix[i,]<-My_matrix[i,]/My_matrix[i,j]
              k=i+1
              while(k<=M){
                   My_matrix[k,] <-My_matrix[k,]-My_matrix[k,j]/My_matrix[i,j]*My_matrix[i,]
                   k=k+1
              }
              i=i+1
              j=j+1
        }
    }
            
     return(round(My_matrix,2))
      
} 




D<-function(My_matrix) {
  
      M=nrow(My_matrix)
      
      for(i in 1:M){ 
          if(My_matrix[i,i]==0) swap(My_matrix,i,M)
          j=i+1
          while(j<=M){
              if(My_matrix[i,i]!=0) My_matrix[j,] <-My_matrix[j,] -       
                                My_matrix[j,i]/My_matrix[i,i]*My_matrix[i,]
              j=j+1
          }
              
      }
      
      for(a in M:2){
              for (b in (a-1):1){
                if(My_matrix[a,a]!=0) My_matrix[b,] <-My_matrix[b,] -   
                           My_matrix[b,a]/My_matrix[a,a]*My_matrix[a,]
              }
        }
              
       return(round(My_matrix,2))
      
} 

L<-function(My_matrix) {
  
      M=nrow(My_matrix)
  
      temp=matrix(c(1:(M*M)),byrow =T,nrow=M,ncol=M)
      temp=0*temp
      
      for(i in 1:M){
          if(My_matrix[i,i]==0) swap(My_matrix,i,M)
          temp[i,i]<-1
          if(My_matrix[i,i]==0) swap(My_matrix,i,M)
          
          j=i+1
          
          while(j<=M){
              if(My_matrix[i,i]!=0) {
                temp[j,i]<-My_matrix[j,i]/My_matrix[i,i]
                My_matrix[j,] <-My_matrix[j,] -       
                                My_matrix[j,i]/My_matrix[i,i]*My_matrix[i,]
              }
              j=j+1
          }
              
      }
      
      return(round(temp,2))
      
} 

L(corrmatrix)
##      [,1] [,2] [,3]
## [1,] 1.00 0.00    0
## [2,] 0.45 1.00    0
## [3,] 0.61 0.54    1
D(corrmatrix)
##             TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF           1      0.00      0.00
## GrLivArea             0      0.79      0.00
## SalePrice             0      0.00      0.39
U(corrmatrix)
##             TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF           1      0.45      0.61
## GrLivArea             0      1.00      0.54
## SalePrice             0      0.00      1.00

Calculus-Based Probability & Statistics. Many times, it makes sense to fit a closed form distribution to data. For the first variable that you selected which is skewed to the right, shift it so that the minimum value is above zero as necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of ?? for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, ??)). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.

Back to the first variable in previous picked as X “TotalBsmtSF”, it is non zero data set with right skewness distribution. The following , I low the MASS package and run fitdistr to fit the expontial pdf, use MLE to estimate the lambda and 5%, 95% interval.

library(MASS)
## Warning: package 'MASS' was built under R version 3.3.3
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
library(survival)
## Warning: package 'survival' was built under R version 3.3.3
library(fitdistrplus)
## Warning: package 'fitdistrplus' was built under R version 3.3.3
ft<-fitdistr(traindata$TotalBsmtSF, densfun="exponential") 
ft$estimate
##         rate 
## 0.0009456896
lambda<-1/1/(sum(traindata$TotalBsmtSF)/length(traindata$TotalBsmtSF))
set.seed(123)
s<-rexp(1000, lambda)
hist(s, pch=20, breaks=25, prob=FALSE, main="")

log(.05)/lambda * -1
## [1] 3167.776
log(.95)/lambda * -1
## [1] 54.23904

Comparing to the quantile of 5% and 95% TotalBsmtSF:

quantile(traindata$TotalBsmtSF,0.95)
##  95% 
## 1753
quantile(traindata$TotalBsmtSF,0.05)
##    5% 
## 519.3

The fitted exponetial distribution shifts to left side comparing to the real data in trainning set. The sample data needs to be modified to fit the trainning data.

traindata<-data.frame(select_if(train, is.numeric))

Modeling. Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.

Remove 5% outliners from SalePrice, remove outliners, then remove all missing value in traindata.

library(outliers)
traindata['t']<-scores(traindata$SalePrice, type="t", prob=0.95)
traindata<-traindata[!traindata$t,]
nrow(traindata<-na.omit(traindata))
## [1] 1023

Put all variable into lm().

rmodel<-lm(SalePrice~ . , data=traindata)

Next, use stepAIC selects the model based on Akaike Information Criteria. The goal is to find the model with the smallest AIC by removing or adding variables in the scope.

library(MASS)
step <- stepAIC(rmodel, direction="both")
## Start:  AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + 
##     LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold + t
## 
## 
## Step:  AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + 
##     LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold
## 
## 
## Step:  AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + 
##     LowQualFinSF + BsmtFullBath + BsmtHalfBath + FullBath + HalfBath + 
##     BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + 
##     EnclosedPorch + X3SsnPorch + ScreenPorch + PoolArea + MiscVal + 
##     MoSold + YrSold
## 
## 
## Step:  AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     BsmtFullBath + BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr + 
##     KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt + 
##     GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch + 
##     X3SsnPorch + ScreenPorch + PoolArea + MiscVal + MoSold + 
##     YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - BsmtHalfBath   1 2.2071e+06 5.4498e+11 20626
## - MiscVal        1 6.7276e+06 5.4498e+11 20626
## - X3SsnPorch     1 8.0866e+06 5.4498e+11 20626
## - MasVnrArea     1 9.8221e+06 5.4499e+11 20626
## - LotFrontage    1 1.0112e+07 5.4499e+11 20626
## - MoSold         1 1.6713e+08 5.4514e+11 20626
## - BedroomAbvGr   1 1.8968e+08 5.4517e+11 20626
## - HalfBath       1 2.7581e+08 5.4525e+11 20626
## - LowQualFinSF   1 3.3293e+08 5.4531e+11 20626
## - LotArea        1 3.8867e+08 5.4536e+11 20626
## - GarageYrBlt    1 4.2682e+08 5.4540e+11 20627
## - Id             1 7.2155e+08 5.4570e+11 20627
## - OpenPorchSF    1 7.6264e+08 5.4574e+11 20627
## <none>                        5.4498e+11 20628
## - YrSold         1 1.1136e+09 5.4609e+11 20628
## - EnclosedPorch  1 1.3720e+09 5.4635e+11 20628
## - GarageArea     1 1.4733e+09 5.4645e+11 20628
## - BsmtFinSF2     1 1.7679e+09 5.4674e+11 20629
## - BsmtUnfSF      1 1.8214e+09 5.4680e+11 20629
## - BsmtFinSF1     1 2.0599e+09 5.4704e+11 20630
## - WoodDeckSF     1 3.1320e+09 5.4811e+11 20632
## - TotRmsAbvGrd   1 3.2194e+09 5.4820e+11 20632
## - KitchenAbvGr   1 7.6518e+09 5.5263e+11 20640
## - YearRemodAdd   1 8.7108e+09 5.5369e+11 20642
## - FullBath       1 8.7931e+09 5.5377e+11 20642
## - ScreenPorch    1 9.7814e+09 5.5476e+11 20644
## - GarageCars     1 9.9027e+09 5.5488e+11 20644
## - X1stFlrSF      1 1.0189e+10 5.5517e+11 20645
## - BsmtFullBath   1 1.3390e+10 5.5837e+11 20651
## - PoolArea       1 1.3578e+10 5.5855e+11 20651
## - OverallCond    1 1.5313e+10 5.6029e+11 20654
## - Fireplaces     1 1.5468e+10 5.6044e+11 20654
## - X2ndFlrSF      1 1.6501e+10 5.6148e+11 20656
## - MSSubClass     1 1.7511e+10 5.6249e+11 20658
## - YearBuilt      1 2.2817e+10 5.6779e+11 20668
## - OverallQual    1 8.9026e+10 6.3400e+11 20781
## 
## Step:  AIC=20625.67
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea + 
##     WoodDeckSF + OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - MiscVal        1 6.6174e+06 5.4498e+11 20624
## - X3SsnPorch     1 8.5351e+06 5.4499e+11 20624
## - LotFrontage    1 1.0076e+07 5.4499e+11 20624
## - MasVnrArea     1 1.0112e+07 5.4499e+11 20624
## - MoSold         1 1.6892e+08 5.4515e+11 20624
## - BedroomAbvGr   1 1.8763e+08 5.4517e+11 20624
## - HalfBath       1 2.7534e+08 5.4525e+11 20624
## - LowQualFinSF   1 3.3423e+08 5.4531e+11 20624
## - LotArea        1 3.8783e+08 5.4537e+11 20624
## - GarageYrBlt    1 4.3122e+08 5.4541e+11 20625
## - Id             1 7.2312e+08 5.4570e+11 20625
## - OpenPorchSF    1 7.6120e+08 5.4574e+11 20625
## <none>                        5.4498e+11 20626
## - YrSold         1 1.1179e+09 5.4610e+11 20626
## - EnclosedPorch  1 1.3739e+09 5.4635e+11 20626
## - GarageArea     1 1.4741e+09 5.4645e+11 20626
## - BsmtFinSF2     1 1.7952e+09 5.4677e+11 20627
## - BsmtUnfSF      1 1.8192e+09 5.4680e+11 20627
## - BsmtFinSF1     1 2.1154e+09 5.4709e+11 20628
## + BsmtHalfBath   1 2.2071e+06 5.4498e+11 20628
## - WoodDeckSF     1 3.1611e+09 5.4814e+11 20630
## - TotRmsAbvGrd   1 3.2199e+09 5.4820e+11 20630
## - KitchenAbvGr   1 7.6496e+09 5.5263e+11 20638
## - YearRemodAdd   1 8.7543e+09 5.5373e+11 20640
## - FullBath       1 8.8690e+09 5.5385e+11 20640
## - ScreenPorch    1 9.7793e+09 5.5476e+11 20642
## - GarageCars     1 9.9301e+09 5.5491e+11 20642
## - X1stFlrSF      1 1.0187e+10 5.5517e+11 20643
## - PoolArea       1 1.3576e+10 5.5855e+11 20649
## - BsmtFullBath   1 1.4477e+10 5.5946e+11 20651
## - OverallCond    1 1.5412e+10 5.6039e+11 20652
## - Fireplaces     1 1.5514e+10 5.6049e+11 20652
## - X2ndFlrSF      1 1.6509e+10 5.6149e+11 20654
## - MSSubClass     1 1.7542e+10 5.6252e+11 20656
## - YearBuilt      1 2.2895e+10 5.6787e+11 20666
## - OverallQual    1 8.9025e+10 6.3400e+11 20779
## 
## Step:  AIC=20623.68
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea + 
##     WoodDeckSF + OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MoSold + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - X3SsnPorch     1 9.3457e+06 5.4499e+11 20622
## - LotFrontage    1 9.5398e+06 5.4499e+11 20622
## - MasVnrArea     1 1.0551e+07 5.4500e+11 20622
## - MoSold         1 1.7146e+08 5.4516e+11 20622
## - BedroomAbvGr   1 1.8772e+08 5.4517e+11 20622
## - HalfBath       1 2.6989e+08 5.4525e+11 20622
## - LowQualFinSF   1 3.3288e+08 5.4532e+11 20622
## - LotArea        1 3.8796e+08 5.4537e+11 20622
## - GarageYrBlt    1 4.2582e+08 5.4541e+11 20623
## - Id             1 7.2006e+08 5.4571e+11 20623
## - OpenPorchSF    1 7.7021e+08 5.4576e+11 20623
## <none>                        5.4498e+11 20624
## - YrSold         1 1.1126e+09 5.4610e+11 20624
## - EnclosedPorch  1 1.3784e+09 5.4636e+11 20624
## - GarageArea     1 1.4773e+09 5.4646e+11 20624
## - BsmtFinSF2     1 1.7931e+09 5.4678e+11 20625
## - BsmtUnfSF      1 1.8221e+09 5.4681e+11 20625
## - BsmtFinSF1     1 2.1149e+09 5.4710e+11 20626
## + MiscVal        1 6.6174e+06 5.4498e+11 20626
## + BsmtHalfBath   1 2.0968e+06 5.4498e+11 20626
## - WoodDeckSF     1 3.1661e+09 5.4815e+11 20628
## - TotRmsAbvGrd   1 3.2499e+09 5.4823e+11 20628
## - KitchenAbvGr   1 7.6472e+09 5.5263e+11 20636
## - YearRemodAdd   1 8.7478e+09 5.5373e+11 20638
## - FullBath       1 8.8656e+09 5.5385e+11 20638
## - GarageCars     1 9.9245e+09 5.5491e+11 20640
## - ScreenPorch    1 1.0152e+10 5.5514e+11 20641
## - X1stFlrSF      1 1.0186e+10 5.5517e+11 20641
## - PoolArea       1 1.3758e+10 5.5874e+11 20647
## - BsmtFullBath   1 1.4533e+10 5.5952e+11 20649
## - OverallCond    1 1.5575e+10 5.6056e+11 20651
## - Fireplaces     1 1.5649e+10 5.6063e+11 20651
## - X2ndFlrSF      1 1.6520e+10 5.6151e+11 20652
## - MSSubClass     1 1.7610e+10 5.6259e+11 20654
## - YearBuilt      1 2.2890e+10 5.6787e+11 20664
## - OverallQual    1 8.9168e+10 6.3415e+11 20777
## 
## Step:  AIC=20621.7
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea + 
##     WoodDeckSF + OpenPorchSF + EnclosedPorch + ScreenPorch + 
##     PoolArea + MoSold + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - LotFrontage    1 1.0575e+07 5.4500e+11 20620
## - MasVnrArea     1 1.0578e+07 5.4500e+11 20620
## - MoSold         1 1.7239e+08 5.4517e+11 20620
## - BedroomAbvGr   1 1.8916e+08 5.4518e+11 20620
## - HalfBath       1 2.7550e+08 5.4527e+11 20620
## - LowQualFinSF   1 3.3401e+08 5.4533e+11 20620
## - LotArea        1 3.8731e+08 5.4538e+11 20620
## - GarageYrBlt    1 4.2591e+08 5.4542e+11 20621
## - Id             1 7.3991e+08 5.4573e+11 20621
## - OpenPorchSF    1 7.6592e+08 5.4576e+11 20621
## <none>                        5.4499e+11 20622
## - YrSold         1 1.1077e+09 5.4610e+11 20622
## - EnclosedPorch  1 1.3718e+09 5.4637e+11 20622
## - GarageArea     1 1.4747e+09 5.4647e+11 20623
## - BsmtFinSF2     1 1.7887e+09 5.4678e+11 20623
## - BsmtUnfSF      1 1.8212e+09 5.4682e+11 20623
## - BsmtFinSF1     1 2.1162e+09 5.4711e+11 20624
## + X3SsnPorch     1 9.3457e+06 5.4498e+11 20624
## + MiscVal        1 7.4280e+06 5.4499e+11 20624
## + BsmtHalfBath   1 2.5497e+06 5.4499e+11 20624
## - WoodDeckSF     1 3.1575e+09 5.4815e+11 20626
## - TotRmsAbvGrd   1 3.2420e+09 5.4824e+11 20626
## - KitchenAbvGr   1 7.6628e+09 5.5266e+11 20634
## - YearRemodAdd   1 8.7461e+09 5.5374e+11 20636
## - FullBath       1 8.9324e+09 5.5393e+11 20636
## - GarageCars     1 9.9353e+09 5.5493e+11 20638
## - ScreenPorch    1 1.0144e+10 5.5514e+11 20639
## - X1stFlrSF      1 1.0205e+10 5.5520e+11 20639
## - PoolArea       1 1.3761e+10 5.5875e+11 20645
## - BsmtFullBath   1 1.4525e+10 5.5952e+11 20647
## - OverallCond    1 1.5581e+10 5.6058e+11 20649
## - Fireplaces     1 1.5641e+10 5.6064e+11 20649
## - X2ndFlrSF      1 1.6516e+10 5.6151e+11 20650
## - MSSubClass     1 1.7626e+10 5.6262e+11 20652
## - YearBuilt      1 2.2881e+10 5.6788e+11 20662
## - OverallQual    1 8.9168e+10 6.3416e+11 20775
## 
## Step:  AIC=20619.72
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + ScreenPorch + PoolArea + MoSold + 
##     YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - MasVnrArea     1 1.0208e+07 5.4502e+11 20618
## - MoSold         1 1.7269e+08 5.4518e+11 20618
## - BedroomAbvGr   1 1.8372e+08 5.4519e+11 20618
## - HalfBath       1 2.7484e+08 5.4528e+11 20618
## - LowQualFinSF   1 3.3404e+08 5.4534e+11 20618
## - GarageYrBlt    1 4.4020e+08 5.4545e+11 20619
## - LotArea        1 5.2345e+08 5.4553e+11 20619
## - Id             1 7.3902e+08 5.4574e+11 20619
## - OpenPorchSF    1 7.6702e+08 5.4577e+11 20619
## <none>                        5.4500e+11 20620
## - YrSold         1 1.0987e+09 5.4610e+11 20620
## - EnclosedPorch  1 1.3845e+09 5.4639e+11 20620
## - GarageArea     1 1.5143e+09 5.4652e+11 20621
## - BsmtFinSF2     1 1.7838e+09 5.4679e+11 20621
## - BsmtUnfSF      1 1.8110e+09 5.4682e+11 20621
## - BsmtFinSF1     1 2.1154e+09 5.4712e+11 20622
## + LotFrontage    1 1.0575e+07 5.4499e+11 20622
## + X3SsnPorch     1 1.0380e+07 5.4499e+11 20622
## + MiscVal        1 6.8764e+06 5.4500e+11 20622
## + BsmtHalfBath   1 2.5412e+06 5.4500e+11 20622
## - WoodDeckSF     1 3.1491e+09 5.4815e+11 20624
## - TotRmsAbvGrd   1 3.2669e+09 5.4827e+11 20624
## - KitchenAbvGr   1 7.6526e+09 5.5266e+11 20632
## - YearRemodAdd   1 8.7368e+09 5.5374e+11 20634
## - FullBath       1 8.9276e+09 5.5393e+11 20634
## - GarageCars     1 9.9388e+09 5.5494e+11 20636
## - ScreenPorch    1 1.0145e+10 5.5515e+11 20637
## - X1stFlrSF      1 1.0392e+10 5.5540e+11 20637
## - PoolArea       1 1.3817e+10 5.5882e+11 20643
## - BsmtFullBath   1 1.4515e+10 5.5952e+11 20645
## - OverallCond    1 1.5631e+10 5.6064e+11 20647
## - Fireplaces     1 1.5651e+10 5.6066e+11 20647
## - X2ndFlrSF      1 1.6580e+10 5.6158e+11 20648
## - MSSubClass     1 1.9362e+10 5.6437e+11 20653
## - YearBuilt      1 2.3234e+10 5.6824e+11 20660
## - OverallQual    1 8.9175e+10 6.3418e+11 20773
## 
## Step:  AIC=20617.74
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath + 
##     HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + 
##     EnclosedPorch + ScreenPorch + PoolArea + MoSold + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - MoSold         1 1.7086e+08 5.4519e+11 20616
## - BedroomAbvGr   1 1.8471e+08 5.4520e+11 20616
## - HalfBath       1 2.7281e+08 5.4529e+11 20616
## - LowQualFinSF   1 3.3803e+08 5.4535e+11 20616
## - GarageYrBlt    1 4.3578e+08 5.4545e+11 20617
## - LotArea        1 5.3867e+08 5.4555e+11 20617
## - Id             1 7.3075e+08 5.4575e+11 20617
## - OpenPorchSF    1 7.7773e+08 5.4579e+11 20617
## <none>                        5.4502e+11 20618
## - YrSold         1 1.1001e+09 5.4612e+11 20618
## - EnclosedPorch  1 1.3914e+09 5.4641e+11 20618
## - GarageArea     1 1.5050e+09 5.4652e+11 20619
## - BsmtFinSF2     1 1.7822e+09 5.4680e+11 20619
## - BsmtUnfSF      1 1.8019e+09 5.4682e+11 20619
## - BsmtFinSF1     1 2.1121e+09 5.4713e+11 20620
## + X3SsnPorch     1 1.0389e+07 5.4500e+11 20620
## + MasVnrArea     1 1.0208e+07 5.4500e+11 20620
## + LotFrontage    1 1.0205e+07 5.4500e+11 20620
## + MiscVal        1 7.3263e+06 5.4501e+11 20620
## + BsmtHalfBath   1 2.8493e+06 5.4501e+11 20620
## - WoodDeckSF     1 3.1561e+09 5.4817e+11 20622
## - TotRmsAbvGrd   1 3.2596e+09 5.4827e+11 20622
## - KitchenAbvGr   1 7.6454e+09 5.5266e+11 20630
## - YearRemodAdd   1 8.8359e+09 5.5385e+11 20632
## - FullBath       1 8.9929e+09 5.5401e+11 20633
## - GarageCars     1 9.9344e+09 5.5495e+11 20634
## - ScreenPorch    1 1.0137e+10 5.5515e+11 20635
## - X1stFlrSF      1 1.0401e+10 5.5542e+11 20635
## - PoolArea       1 1.3809e+10 5.5882e+11 20641
## - BsmtFullBath   1 1.4647e+10 5.5966e+11 20643
## - OverallCond    1 1.5624e+10 5.6064e+11 20645
## - Fireplaces     1 1.5653e+10 5.6067e+11 20645
## - X2ndFlrSF      1 1.6600e+10 5.6162e+11 20646
## - MSSubClass     1 1.9537e+10 5.6455e+11 20652
## - YearBuilt      1 2.3462e+10 5.6848e+11 20659
## - OverallQual    1 8.9238e+10 6.3425e+11 20771
## 
## Step:  AIC=20616.06
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath + 
##     HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + 
##     EnclosedPorch + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - BedroomAbvGr   1 1.7700e+08 5.4536e+11 20614
## - HalfBath       1 2.5897e+08 5.4544e+11 20615
## - LowQualFinSF   1 3.2825e+08 5.4551e+11 20615
## - GarageYrBlt    1 4.4697e+08 5.4563e+11 20615
## - LotArea        1 5.2866e+08 5.4571e+11 20615
## - Id             1 7.2191e+08 5.4591e+11 20615
## - OpenPorchSF    1 8.3698e+08 5.4602e+11 20616
## <none>                        5.4519e+11 20616
## - YrSold         1 1.2573e+09 5.4644e+11 20616
## - EnclosedPorch  1 1.3278e+09 5.4651e+11 20617
## - GarageArea     1 1.4997e+09 5.4669e+11 20617
## - BsmtFinSF2     1 1.7412e+09 5.4693e+11 20617
## - BsmtUnfSF      1 1.7486e+09 5.4693e+11 20617
## + MoSold         1 1.7086e+08 5.4502e+11 20618
## - BsmtFinSF1     1 2.0677e+09 5.4725e+11 20618
## + X3SsnPorch     1 1.1372e+07 5.4517e+11 20618
## + LotFrontage    1 1.0536e+07 5.4518e+11 20618
## + MiscVal        1 9.9699e+06 5.4518e+11 20618
## + MasVnrArea     1 8.3841e+06 5.4518e+11 20618
## + BsmtHalfBath   1 4.8184e+06 5.4518e+11 20618
## - TotRmsAbvGrd   1 3.2084e+09 5.4839e+11 20620
## - WoodDeckSF     1 3.2276e+09 5.4841e+11 20620
## - KitchenAbvGr   1 7.5495e+09 5.5274e+11 20628
## - YearRemodAdd   1 8.8480e+09 5.5403e+11 20631
## - FullBath       1 9.0547e+09 5.5424e+11 20631
## - GarageCars     1 9.9742e+09 5.5516e+11 20633
## - ScreenPorch    1 1.0139e+10 5.5533e+11 20633
## - X1stFlrSF      1 1.0445e+10 5.5563e+11 20634
## - PoolArea       1 1.4057e+10 5.5924e+11 20640
## - BsmtFullBath   1 1.4627e+10 5.5981e+11 20641
## - OverallCond    1 1.5535e+10 5.6072e+11 20643
## - Fireplaces     1 1.5721e+10 5.6091e+11 20643
## - X2ndFlrSF      1 1.6673e+10 5.6186e+11 20645
## - MSSubClass     1 1.9861e+10 5.6505e+11 20651
## - YearBuilt      1 2.3331e+10 5.6852e+11 20657
## - OverallQual    1 9.0670e+10 6.3586e+11 20771
## 
## Step:  AIC=20614.39
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath + 
##     HalfBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt + 
##     GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch + 
##     ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - HalfBath       1 2.6203e+08 5.4562e+11 20613
## - LowQualFinSF   1 3.1357e+08 5.4568e+11 20613
## - GarageYrBlt    1 4.2622e+08 5.4579e+11 20613
## - LotArea        1 5.1285e+08 5.4588e+11 20613
## - Id             1 7.2592e+08 5.4609e+11 20614
## - OpenPorchSF    1 8.8702e+08 5.4625e+11 20614
## <none>                        5.4536e+11 20614
## - YrSold         1 1.2569e+09 5.4662e+11 20615
## - EnclosedPorch  1 1.2997e+09 5.4666e+11 20615
## - GarageArea     1 1.5173e+09 5.4688e+11 20615
## - BsmtFinSF2     1 1.6886e+09 5.4705e+11 20616
## - BsmtUnfSF      1 1.7219e+09 5.4708e+11 20616
## + BedroomAbvGr   1 1.7700e+08 5.4519e+11 20616
## + MoSold         1 1.6316e+08 5.4520e+11 20616
## - BsmtFinSF1     1 2.0882e+09 5.4745e+11 20616
## + X3SsnPorch     1 1.2529e+07 5.4535e+11 20616
## + MiscVal        1 1.0298e+07 5.4535e+11 20616
## + MasVnrArea     1 9.3029e+06 5.4535e+11 20616
## + LotFrontage    1 5.1703e+06 5.4536e+11 20616
## + BsmtHalfBath   1 1.3014e+06 5.4536e+11 20616
## - TotRmsAbvGrd   1 3.1753e+09 5.4854e+11 20618
## - WoodDeckSF     1 3.1903e+09 5.4855e+11 20618
## - KitchenAbvGr   1 7.5009e+09 5.5286e+11 20626
## - FullBath       1 8.8975e+09 5.5426e+11 20629
## - YearRemodAdd   1 9.7153e+09 5.5508e+11 20631
## - GarageCars     1 9.9867e+09 5.5535e+11 20631
## - ScreenPorch    1 1.0018e+10 5.5538e+11 20631
## - X1stFlrSF      1 1.0397e+10 5.5576e+11 20632
## - PoolArea       1 1.4091e+10 5.5945e+11 20639
## - BsmtFullBath   1 1.4605e+10 5.5997e+11 20639
## - OverallCond    1 1.5358e+10 5.6072e+11 20641
## - Fireplaces     1 1.6158e+10 5.6152e+11 20642
## - X2ndFlrSF      1 1.6507e+10 5.6187e+11 20643
## - MSSubClass     1 1.9758e+10 5.6512e+11 20649
## - YearBuilt      1 2.3170e+10 5.6853e+11 20655
## - OverallQual    1 9.3163e+10 6.3853e+11 20774
## 
## Step:  AIC=20612.88
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath + 
##     KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt + 
##     GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch + 
##     ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - LowQualFinSF   1 3.1847e+08 5.4594e+11 20612
## - GarageYrBlt    1 4.5512e+08 5.4608e+11 20612
## - LotArea        1 5.0763e+08 5.4613e+11 20612
## - Id             1 7.2860e+08 5.4635e+11 20612
## - OpenPorchSF    1 9.1530e+08 5.4654e+11 20613
## <none>                        5.4562e+11 20613
## - EnclosedPorch  1 1.2536e+09 5.4688e+11 20613
## - YrSold         1 1.2575e+09 5.4688e+11 20613
## - GarageArea     1 1.4623e+09 5.4709e+11 20614
## - BsmtUnfSF      1 1.7201e+09 5.4735e+11 20614
## - BsmtFinSF2     1 1.7790e+09 5.4740e+11 20614
## + HalfBath       1 2.6203e+08 5.4536e+11 20614
## + BedroomAbvGr   1 1.8007e+08 5.4544e+11 20615
## + MoSold         1 1.4951e+08 5.4548e+11 20615
## + X3SsnPorch     1 1.8657e+07 5.4561e+11 20615
## - BsmtFinSF1     1 2.1219e+09 5.4775e+11 20615
## + MasVnrArea     1 7.4802e+06 5.4562e+11 20615
## + LotFrontage    1 4.7014e+06 5.4562e+11 20615
## + MiscVal        1 3.0754e+06 5.4562e+11 20615
## + BsmtHalfBath   1 9.9622e+05 5.4562e+11 20615
## - WoodDeckSF     1 3.2003e+09 5.4883e+11 20617
## - TotRmsAbvGrd   1 3.2270e+09 5.4885e+11 20617
## - KitchenAbvGr   1 7.6253e+09 5.5325e+11 20625
## - FullBath       1 9.2673e+09 5.5489e+11 20628
## - YearRemodAdd   1 9.7939e+09 5.5542e+11 20629
## - ScreenPorch    1 1.0242e+10 5.5587e+11 20630
## - GarageCars     1 1.0326e+10 5.5595e+11 20630
## - X1stFlrSF      1 1.0456e+10 5.5608e+11 20630
## - PoolArea       1 1.4182e+10 5.5981e+11 20637
## - BsmtFullBath   1 1.4440e+10 5.6006e+11 20638
## - OverallCond    1 1.5273e+10 5.6090e+11 20639
## - Fireplaces     1 1.6748e+10 5.6237e+11 20642
## - MSSubClass     1 1.9679e+10 5.6530e+11 20647
## - X2ndFlrSF      1 2.3916e+10 5.6954e+11 20655
## - YearBuilt      1 2.7193e+10 5.7282e+11 20661
## - OverallQual    1 9.2902e+10 6.3853e+11 20772
## 
## Step:  AIC=20611.48
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea + 
##     WoodDeckSF + OpenPorchSF + EnclosedPorch + ScreenPorch + 
##     PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - GarageYrBlt    1 3.9982e+08 5.4634e+11 20610
## - LotArea        1 5.0286e+08 5.4645e+11 20610
## - Id             1 7.7940e+08 5.4672e+11 20611
## - OpenPorchSF    1 9.4543e+08 5.4689e+11 20611
## <none>                        5.4594e+11 20612
## - EnclosedPorch  1 1.2324e+09 5.4718e+11 20612
## - YrSold         1 1.2458e+09 5.4719e+11 20612
## - GarageArea     1 1.5017e+09 5.4745e+11 20612
## - BsmtUnfSF      1 1.6770e+09 5.4762e+11 20613
## - BsmtFinSF2     1 1.7646e+09 5.4771e+11 20613
## + LowQualFinSF   1 3.1847e+08 5.4562e+11 20613
## + GrLivArea      1 3.1847e+08 5.4562e+11 20613
## + HalfBath       1 2.6693e+08 5.4568e+11 20613
## + BedroomAbvGr   1 1.6517e+08 5.4578e+11 20613
## + MoSold         1 1.4069e+08 5.4580e+11 20613
## - BsmtFinSF1     1 2.0630e+09 5.4801e+11 20613
## + X3SsnPorch     1 2.0175e+07 5.4592e+11 20613
## + MasVnrArea     1 1.0823e+07 5.4593e+11 20614
## + LotFrontage    1 4.8440e+06 5.4594e+11 20614
## + MiscVal        1 2.1890e+06 5.4594e+11 20614
## + BsmtHalfBath   1 2.0437e+06 5.4594e+11 20614
## - WoodDeckSF     1 3.2001e+09 5.4914e+11 20616
## - TotRmsAbvGrd   1 3.5533e+09 5.4950e+11 20616
## - KitchenAbvGr   1 8.0622e+09 5.5401e+11 20625
## - FullBath       1 9.3865e+09 5.5533e+11 20627
## - YearRemodAdd   1 9.8565e+09 5.5580e+11 20628
## - ScreenPorch    1 1.0108e+10 5.5605e+11 20628
## - GarageCars     1 1.0246e+10 5.5619e+11 20629
## - X1stFlrSF      1 1.0317e+10 5.5626e+11 20629
## - PoolArea       1 1.3870e+10 5.5981e+11 20635
## - BsmtFullBath   1 1.4584e+10 5.6053e+11 20636
## - OverallCond    1 1.5045e+10 5.6099e+11 20637
## - Fireplaces     1 1.6780e+10 5.6272e+11 20640
## - MSSubClass     1 1.9488e+10 5.6543e+11 20645
## - X2ndFlrSF      1 2.3605e+10 5.6955e+11 20653
## - YearBuilt      1 2.7011e+10 5.7295e+11 20659
## - OverallQual    1 9.3201e+10 6.3914e+11 20771
## 
## Step:  AIC=20610.23
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - LotArea        1 5.7428e+08 5.4692e+11 20609
## - Id             1 8.3474e+08 5.4718e+11 20610
## - OpenPorchSF    1 9.1880e+08 5.4726e+11 20610
## <none>                        5.4634e+11 20610
## - GarageArea     1 1.1673e+09 5.4751e+11 20610
## - EnclosedPorch  1 1.2550e+09 5.4760e+11 20611
## - YrSold         1 1.2769e+09 5.4762e+11 20611
## - BsmtUnfSF      1 1.7219e+09 5.4807e+11 20611
## + GarageYrBlt    1 3.9982e+08 5.4594e+11 20612
## - BsmtFinSF2     1 1.8164e+09 5.4816e+11 20612
## + HalfBath       1 2.9383e+08 5.4605e+11 20612
## + LowQualFinSF   1 2.6317e+08 5.4608e+11 20612
## + GrLivArea      1 2.6317e+08 5.4608e+11 20612
## + MoSold         1 1.5086e+08 5.4619e+11 20612
## + BedroomAbvGr   1 1.4706e+08 5.4620e+11 20612
## - BsmtFinSF1     1 2.1180e+09 5.4846e+11 20612
## + X3SsnPorch     1 2.1283e+07 5.4632e+11 20612
## + LotFrontage    1 1.5391e+07 5.4633e+11 20612
## + BsmtHalfBath   1 6.2942e+06 5.4634e+11 20612
## + MasVnrArea     1 5.9863e+06 5.4634e+11 20612
## + MiscVal        1 3.4500e+03 5.4634e+11 20612
## - WoodDeckSF     1 3.0039e+09 5.4935e+11 20614
## - TotRmsAbvGrd   1 3.4744e+09 5.4982e+11 20615
## - KitchenAbvGr   1 7.8334e+09 5.5418e+11 20623
## - FullBath       1 9.2370e+09 5.5558e+11 20625
## - YearRemodAdd   1 9.4831e+09 5.5583e+11 20626
## - ScreenPorch    1 1.0185e+10 5.5653e+11 20627
## - GarageCars     1 1.0327e+10 5.5667e+11 20627
## - X1stFlrSF      1 1.0607e+10 5.5695e+11 20628
## - PoolArea       1 1.3745e+10 5.6009e+11 20634
## - BsmtFullBath   1 1.4782e+10 5.6113e+11 20636
## - OverallCond    1 1.5825e+10 5.6217e+11 20637
## - Fireplaces     1 1.7532e+10 5.6388e+11 20641
## - MSSubClass     1 2.0007e+10 5.6635e+11 20645
## - X2ndFlrSF      1 2.4032e+10 5.7038e+11 20652
## - YearBuilt      1 3.1459e+10 5.7780e+11 20666
## - OverallQual    1 9.3129e+10 6.3947e+11 20769
## 
## Step:  AIC=20609.3
## SalePrice ~ Id + MSSubClass + OverallQual + OverallCond + YearBuilt + 
##     YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + 
##     EnclosedPorch + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - Id             1 8.7315e+08 5.4779e+11 20609
## - OpenPorchSF    1 9.4972e+08 5.4787e+11 20609
## <none>                        5.4692e+11 20609
## - EnclosedPorch  1 1.1752e+09 5.4809e+11 20610
## - GarageArea     1 1.3136e+09 5.4823e+11 20610
## - YrSold         1 1.3307e+09 5.4825e+11 20610
## + LotArea        1 5.7428e+08 5.4634e+11 20610
## + GarageYrBlt    1 4.7125e+08 5.4645e+11 20610
## - BsmtUnfSF      1 1.7495e+09 5.4867e+11 20611
## + HalfBath       1 2.9037e+08 5.4663e+11 20611
## - BsmtFinSF2     1 1.8635e+09 5.4878e+11 20611
## + LowQualFinSF   1 2.5407e+08 5.4666e+11 20611
## + GrLivArea      1 2.5407e+08 5.4666e+11 20611
## + LotFrontage    1 1.7630e+08 5.4674e+11 20611
## + MoSold         1 1.4250e+08 5.4678e+11 20611
## + BedroomAbvGr   1 1.3046e+08 5.4679e+11 20611
## + X3SsnPorch     1 2.5012e+07 5.4689e+11 20611
## + MasVnrArea     1 1.8386e+07 5.4690e+11 20611
## + BsmtHalfBath   1 5.1087e+06 5.4691e+11 20611
## + MiscVal        1 1.9594e+05 5.4692e+11 20611
## - BsmtFinSF1     1 2.2527e+09 5.4917e+11 20612
## - WoodDeckSF     1 3.0724e+09 5.4999e+11 20613
## - TotRmsAbvGrd   1 3.4224e+09 5.5034e+11 20614
## - KitchenAbvGr   1 7.8106e+09 5.5473e+11 20622
## - FullBath       1 9.1584e+09 5.5608e+11 20624
## - YearRemodAdd   1 9.4527e+09 5.5637e+11 20625
## - GarageCars     1 1.0270e+10 5.5719e+11 20626
## - ScreenPorch    1 1.0297e+10 5.5721e+11 20626
## - X1stFlrSF      1 1.1804e+10 5.5872e+11 20629
## - PoolArea       1 1.3294e+10 5.6021e+11 20632
## - BsmtFullBath   1 1.4958e+10 5.6188e+11 20635
## - OverallCond    1 1.5622e+10 5.6254e+11 20636
## - Fireplaces     1 1.8628e+10 5.6555e+11 20642
## - MSSubClass     1 2.3824e+10 5.7074e+11 20651
## - X2ndFlrSF      1 2.6244e+10 5.7316e+11 20655
## - YearBuilt      1 3.1047e+10 5.7796e+11 20664
## - OverallQual    1 9.2561e+10 6.3948e+11 20767
## 
## Step:  AIC=20608.93
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
##     YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + 
##     EnclosedPorch + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - OpenPorchSF    1 8.9172e+08 5.4868e+11 20609
## <none>                        5.4779e+11 20609
## - EnclosedPorch  1 1.1448e+09 5.4894e+11 20609
## + Id             1 8.7315e+08 5.4692e+11 20609
## - YrSold         1 1.3168e+09 5.4911e+11 20609
## - GarageArea     1 1.4087e+09 5.4920e+11 20610
## + LotArea        1 6.1269e+08 5.4718e+11 20610
## + GarageYrBlt    1 5.3498e+08 5.4726e+11 20610
## - BsmtUnfSF      1 1.7463e+09 5.4954e+11 20610
## + LowQualFinSF   1 2.9847e+08 5.4749e+11 20610
## + GrLivArea      1 2.9847e+08 5.4749e+11 20610
## + HalfBath       1 2.9572e+08 5.4749e+11 20610
## - BsmtFinSF2     1 1.8825e+09 5.4967e+11 20610
## + LotFrontage    1 1.8473e+08 5.4761e+11 20611
## + MoSold         1 1.3321e+08 5.4766e+11 20611
## + BedroomAbvGr   1 1.3151e+08 5.4766e+11 20611
## + X3SsnPorch     1 5.7416e+07 5.4773e+11 20611
## + BsmtHalfBath   1 8.5136e+06 5.4778e+11 20611
## + MasVnrArea     1 5.6666e+06 5.4779e+11 20611
## + MiscVal        1 1.6268e+06 5.4779e+11 20611
## - BsmtFinSF1     1 2.3504e+09 5.5014e+11 20611
## - WoodDeckSF     1 3.1087e+09 5.5090e+11 20613
## - TotRmsAbvGrd   1 3.2398e+09 5.5103e+11 20613
## - KitchenAbvGr   1 7.6421e+09 5.5543e+11 20621
## - FullBath       1 9.0367e+09 5.5683e+11 20624
## - YearRemodAdd   1 9.5639e+09 5.5735e+11 20625
## - GarageCars     1 1.0001e+10 5.5779e+11 20625
## - ScreenPorch    1 1.0140e+10 5.5793e+11 20626
## - X1stFlrSF      1 1.1773e+10 5.5956e+11 20629
## - PoolArea       1 1.3617e+10 5.6141e+11 20632
## - BsmtFullBath   1 1.4561e+10 5.6235e+11 20634
## - OverallCond    1 1.5475e+10 5.6327e+11 20635
## - Fireplaces     1 1.8702e+10 5.6649e+11 20641
## - MSSubClass     1 2.4158e+10 5.7195e+11 20651
## - X2ndFlrSF      1 2.6451e+10 5.7424e+11 20655
## - YearBuilt      1 3.0866e+10 5.7866e+11 20663
## - OverallQual    1 9.4215e+10 6.4201e+11 20769
## 
## Step:  AIC=20608.6
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
##     YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + EnclosedPorch + 
##     ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - EnclosedPorch  1 9.5826e+08 5.4964e+11 20608
## <none>                        5.4868e+11 20609
## + OpenPorchSF    1 8.9172e+08 5.4779e+11 20609
## + Id             1 8.1514e+08 5.4787e+11 20609
## - YrSold         1 1.4324e+09 5.5011e+11 20609
## + LotArea        1 6.4228e+08 5.4804e+11 20609
## - GarageArea     1 1.5888e+09 5.5027e+11 20610
## + GarageYrBlt    1 5.0451e+08 5.4818e+11 20610
## + LowQualFinSF   1 3.2670e+08 5.4836e+11 20610
## + GrLivArea      1 3.2670e+08 5.4836e+11 20610
## + HalfBath       1 3.2432e+08 5.4836e+11 20610
## + LotFrontage    1 1.9055e+08 5.4849e+11 20610
## + MoSold         1 1.8567e+08 5.4850e+11 20610
## + BedroomAbvGr   1 1.7490e+08 5.4851e+11 20610
## - BsmtUnfSF      1 2.0581e+09 5.5074e+11 20610
## + X3SsnPorch     1 4.5272e+07 5.4864e+11 20611
## + MasVnrArea     1 1.5796e+07 5.4867e+11 20611
## + BsmtHalfBath   1 4.5271e+06 5.4868e+11 20611
## + MiscVal        1 6.9648e+04 5.4868e+11 20611
## - BsmtFinSF2     1 2.1635e+09 5.5085e+11 20611
## - BsmtFinSF1     1 2.6500e+09 5.5133e+11 20612
## - WoodDeckSF     1 2.9029e+09 5.5159e+11 20612
## - TotRmsAbvGrd   1 3.0458e+09 5.5173e+11 20612
## - KitchenAbvGr   1 7.7107e+09 5.5639e+11 20621
## - FullBath       1 9.0846e+09 5.5777e+11 20623
## - GarageCars     1 9.6777e+09 5.5836e+11 20625
## - YearRemodAdd   1 1.0124e+10 5.5881e+11 20625
## - ScreenPorch    1 1.0487e+10 5.5917e+11 20626
## - X1stFlrSF      1 1.2176e+10 5.6086e+11 20629
## - PoolArea       1 1.3976e+10 5.6266e+11 20632
## - BsmtFullBath   1 1.4892e+10 5.6357e+11 20634
## - OverallCond    1 1.5160e+10 5.6384e+11 20635
## - Fireplaces     1 1.8865e+10 5.6755e+11 20641
## - MSSubClass     1 2.4349e+10 5.7303e+11 20651
## - X2ndFlrSF      1 2.9315e+10 5.7800e+11 20660
## - YearBuilt      1 3.1013e+10 5.7969e+11 20663
## - OverallQual    1 9.4452e+10 6.4313e+11 20769
## 
## Step:  AIC=20608.38
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
##     YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + 
##     PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## <none>                        5.4964e+11 20608
## + EnclosedPorch  1 9.5826e+08 5.4868e+11 20609
## + Id             1 7.9402e+08 5.4885e+11 20609
## + OpenPorchSF    1 7.0523e+08 5.4894e+11 20609
## - YrSold         1 1.4495e+09 5.5109e+11 20609
## + LotArea        1 5.6133e+08 5.4908e+11 20609
## + GarageYrBlt    1 5.2360e+08 5.4912e+11 20609
## - GarageArea     1 1.6663e+09 5.5131e+11 20610
## + LowQualFinSF   1 3.0316e+08 5.4934e+11 20610
## + GrLivArea      1 3.0316e+08 5.4934e+11 20610
## + HalfBath       1 2.7642e+08 5.4936e+11 20610
## + LotFrontage    1 2.0859e+08 5.4943e+11 20610
## + BedroomAbvGr   1 1.4683e+08 5.4949e+11 20610
## + MoSold         1 1.2523e+08 5.4952e+11 20610
## - BsmtUnfSF      1 2.0301e+09 5.5167e+11 20610
## + X3SsnPorch     1 3.0487e+07 5.4961e+11 20610
## + MasVnrArea     1 2.0544e+07 5.4962e+11 20610
## + BsmtHalfBath   1 6.9368e+06 5.4963e+11 20610
## + MiscVal        1 4.9818e+05 5.4964e+11 20610
## - BsmtFinSF2     1 2.2388e+09 5.5188e+11 20611
## - BsmtFinSF1     1 2.5428e+09 5.5218e+11 20611
## - WoodDeckSF     1 2.7191e+09 5.5236e+11 20611
## - TotRmsAbvGrd   1 2.7904e+09 5.5243e+11 20612
## - KitchenAbvGr   1 7.9493e+09 5.5759e+11 20621
## - FullBath       1 9.0684e+09 5.5871e+11 20623
## - GarageCars     1 9.7097e+09 5.5935e+11 20624
## - ScreenPorch    1 9.8054e+09 5.5945e+11 20625
## - YearRemodAdd   1 1.0465e+10 5.6011e+11 20626
## - X1stFlrSF      1 1.2302e+10 5.6194e+11 20629
## - PoolArea       1 1.3435e+10 5.6308e+11 20631
## - OverallCond    1 1.4373e+10 5.6401e+11 20633
## - BsmtFullBath   1 1.5378e+10 5.6502e+11 20635
## - Fireplaces     1 1.8846e+10 5.6849e+11 20641
## - MSSubClass     1 2.4785e+10 5.7443e+11 20652
## - X2ndFlrSF      1 3.0286e+10 5.7993e+11 20661
## - YearBuilt      1 3.1322e+10 5.8096e+11 20663
## - OverallQual    1 9.7972e+10 6.4761e+11 20774
step$anova 
## Stepwise Model Path 
## Analysis of Deviance Table
## 
## Initial Model:
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + 
##     LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold + t
## 
## Final Model:
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
##     YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
##     X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + 
##     PoolArea + YrSold
## 
## 
##               Step Df  Deviance Resid. Df   Resid. Dev      AIC
## 1                                     987 544976140366 20627.66
## 2              - t  0         0       987 544976140366 20627.66
## 3      - GrLivArea  0         0       987 544976140366 20627.66
## 4    - TotalBsmtSF  0         0       987 544976140366 20627.66
## 5   - BsmtHalfBath  1   2207060       988 544978347426 20625.67
## 6        - MiscVal  1   6617357       989 544984964783 20623.68
## 7     - X3SsnPorch  1   9345745       990 544994310528 20621.70
## 8    - LotFrontage  1  10574577       991 545004885105 20619.72
## 9     - MasVnrArea  1  10208382       992 545015093487 20617.74
## 10        - MoSold  1 170864174       993 545185957661 20616.06
## 11  - BedroomAbvGr  1 177002598       994 545362960259 20614.39
## 12      - HalfBath  1 262034782       995 545624995040 20612.88
## 13  - LowQualFinSF  1 318465833       996 545943460873 20611.48
## 14   - GarageYrBlt  1 399824726       997 546343285599 20610.23
## 15       - LotArea  1 574282008       998 546917567607 20609.30
## 16            - Id  1 873148554       999 547790716162 20608.93
## 17   - OpenPorchSF  1 891715890      1000 548682432052 20608.60
## 18 - EnclosedPorch  1 958263789      1001 549640695841 20608.38

Suggested model is:

SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + PoolArea + YrSold

The following use summary to check p-value for each variable one by one from the highest value, remove the variable if p-value is greater than 0.05.

finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
    YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
    X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
    Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + 
    PoolArea + YrSold, data=traindata)
summary(finalmodel)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     ScreenPorch + PoolArea + YrSold, data = traindata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -238541  -13301    -889   11862   73178 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6.993e+05  1.116e+06   0.627  0.53110    
## MSSubClass   -1.375e+02  2.046e+01  -6.718 3.08e-11 ***
## OverallQual   1.341e+04  1.004e+03  13.358  < 2e-16 ***
## OverallCond   4.455e+03  8.708e+02   5.116 3.73e-07 ***
## YearBuilt     3.255e+02  4.309e+01   7.553 9.59e-14 ***
## YearRemodAdd  2.294e+02  5.254e+01   4.366 1.40e-05 ***
## BsmtFinSF1    8.093e+00  3.761e+00   2.152  0.03164 *  
## BsmtFinSF2    1.132e+01  5.605e+00   2.019  0.04373 *  
## BsmtUnfSF     6.543e+00  3.403e+00   1.923  0.05479 .  
## X1stFlrSF     2.312e+01  4.885e+00   4.733 2.53e-06 ***
## X2ndFlrSF     2.732e+01  3.678e+00   7.427 2.38e-13 ***
## BsmtFullBath  1.049e+04  1.982e+03   5.292 1.49e-07 ***
## FullBath      8.644e+03  2.127e+03   4.064 5.20e-05 ***
## KitchenAbvGr -1.625e+04  4.272e+03  -3.805  0.00015 ***
## TotRmsAbvGrd  2.109e+03  9.355e+02   2.254  0.02439 *  
## Fireplaces    8.325e+03  1.421e+03   5.859 6.33e-09 ***
## GarageCars    9.782e+03  2.326e+03   4.205 2.84e-05 ***
## GarageArea    1.353e+01  7.766e+00   1.742  0.08181 .  
## WoodDeckSF    1.528e+01  6.866e+00   2.225  0.02628 *  
## ScreenPorch   5.760e+01  1.363e+01   4.226 2.60e-05 ***
## PoolArea     -9.918e+01  2.005e+01  -4.947 8.86e-07 ***
## YrSold       -9.028e+02  5.557e+02  -1.625  0.10454    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23430 on 1001 degrees of freedom
## Multiple R-squared:  0.8056, Adjusted R-squared:  0.8015 
## F-statistic: 197.6 on 21 and 1001 DF,  p-value: < 2.2e-16

remove YrSold

finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
    YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
    X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
    Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + 
    PoolArea , data=traindata)
summary(finalmodel)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     ScreenPorch + PoolArea, data = traindata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -239944  -13223    -705   11666   72101 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.107e+06  9.782e+04 -11.318  < 2e-16 ***
## MSSubClass   -1.364e+02  2.047e+01  -6.665 4.37e-11 ***
## OverallQual   1.341e+04  1.005e+03  13.349  < 2e-16 ***
## OverallCond   4.431e+03  8.714e+02   5.085 4.38e-07 ***
## YearBuilt     3.263e+02  4.313e+01   7.567 8.62e-14 ***
## YearRemodAdd  2.254e+02  5.253e+01   4.291 1.95e-05 ***
## BsmtFinSF1    8.284e+00  3.762e+00   2.202 0.027902 *  
## BsmtFinSF2    1.124e+01  5.609e+00   2.003 0.045401 *  
## BsmtUnfSF     6.572e+00  3.405e+00   1.930 0.053914 .  
## X1stFlrSF     2.306e+01  4.889e+00   4.717 2.73e-06 ***
## X2ndFlrSF     2.721e+01  3.681e+00   7.394 3.01e-13 ***
## BsmtFullBath  1.027e+04  1.979e+03   5.190 2.54e-07 ***
## FullBath      8.550e+03  2.128e+03   4.018 6.31e-05 ***
## KitchenAbvGr -1.663e+04  4.269e+03  -3.895 0.000105 ***
## TotRmsAbvGrd  2.166e+03  9.356e+02   2.315 0.020835 *  
## Fireplaces    8.365e+03  1.422e+03   5.883 5.49e-09 ***
## GarageCars    9.989e+03  2.325e+03   4.297 1.90e-05 ***
## GarageArea    1.316e+01  7.769e+00   1.694 0.090567 .  
## WoodDeckSF    1.506e+01  6.870e+00   2.192 0.028615 *  
## ScreenPorch   5.727e+01  1.364e+01   4.198 2.93e-05 ***
## PoolArea     -9.745e+01  2.004e+01  -4.863 1.34e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23450 on 1002 degrees of freedom
## Multiple R-squared:  0.8051, Adjusted R-squared:  0.8012 
## F-statistic:   207 on 20 and 1002 DF,  p-value: < 2.2e-16

remove GarageArea

finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + 
    YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + 
    X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + 
    Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + 
    PoolArea , data=traindata)
summary(finalmodel)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + 
##     PoolArea, data = traindata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -235931  -13269   -1062   12134   72295 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.108e+06  9.791e+04 -11.319  < 2e-16 ***
## MSSubClass   -1.391e+02  2.043e+01  -6.808 1.70e-11 ***
## OverallQual   1.340e+04  1.006e+03  13.326  < 2e-16 ***
## OverallCond   4.464e+03  8.720e+02   5.119 3.68e-07 ***
## YearBuilt     3.290e+02  4.314e+01   7.628 5.54e-14 ***
## YearRemodAdd  2.234e+02  5.256e+01   4.249 2.34e-05 ***
## BsmtFinSF1    8.887e+00  3.749e+00   2.371 0.017937 *  
## BsmtFinSF2    1.161e+01  5.610e+00   2.070 0.038694 *  
## BsmtUnfSF     6.950e+00  3.401e+00   2.044 0.041263 *  
## X1stFlrSF     2.419e+01  4.848e+00   4.989 7.17e-07 ***
## X2ndFlrSF     2.797e+01  3.657e+00   7.647 4.81e-14 ***
## BsmtFullBath  1.036e+04  1.980e+03   5.232 2.05e-07 ***
## FullBath      8.408e+03  2.128e+03   3.951 8.34e-05 ***
## KitchenAbvGr -1.663e+04  4.273e+03  -3.891 0.000106 ***
## TotRmsAbvGrd  2.066e+03  9.347e+02   2.211 0.027277 *  
## Fireplaces    8.072e+03  1.413e+03   5.714 1.46e-08 ***
## GarageCars    1.287e+04  1.586e+03   8.112 1.44e-15 ***
## WoodDeckSF    1.496e+01  6.876e+00   2.176 0.029776 *  
## ScreenPorch   5.652e+01  1.365e+01   4.142 3.73e-05 ***
## PoolArea     -9.455e+01  1.998e+01  -4.731 2.55e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23470 on 1003 degrees of freedom
## Multiple R-squared:  0.8046, Adjusted R-squared:  0.8009 
## F-statistic: 217.3 on 19 and 1003 DF,  p-value: < 2.2e-16

The finalmodel:

SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + PoolArea

Check the lm() residual.

plot(finalmodel)

Though there is very few outliner still exist, most of standardized residuals are line alone the line. the model is probablily goodfit for the data. Now extract coefficient from the summary.

coef<-summary(finalmodel)$coefficients[1:20]
coef
##  [1] -1.108198e+06 -1.390496e+02  1.340014e+04  4.463995e+03  3.290395e+02
##  [6]  2.233629e+02  8.887390e+00  1.161355e+01  6.950429e+00  2.418612e+01
## [11]  2.796644e+01  1.035946e+04  8.408329e+03 -1.662665e+04  2.066350e+03
## [16]  8.071619e+03  1.286920e+04  1.496366e+01  5.652352e+01 -9.454670e+01

The next is to use test set data to predit SalePrice.

test<-read.csv("https://raw.githubusercontent.com/czhu505/Data605-/master/test.csv",stringsAsFactors = F)

SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + PoolArea

colnames<-c('Id','MSSubClass','OverallQual','OverallCond',
    'YearBuilt' ,'YearRemodAdd' , 'BsmtFinSF1' , 'BsmtFinSF2' , 'BsmtUnfSF' ,
    'X1stFlrSF', 'X2ndFlrSF' , 'BsmtFullBath' , 'FullBath' , 'KitchenAbvGr' ,
    'TotRmsAbvGrd' , 'Fireplaces' , 'GarageCars' , 'WoodDeckSF' , 'ScreenPorch', 
    'PoolArea')
test1<- data.frame(test[, colnames])
summary(test1)
##        Id         MSSubClass      OverallQual      OverallCond   
##  Min.   :1461   Min.   : 20.00   Min.   : 1.000   Min.   :1.000  
##  1st Qu.:1826   1st Qu.: 20.00   1st Qu.: 5.000   1st Qu.:5.000  
##  Median :2190   Median : 50.00   Median : 6.000   Median :5.000  
##  Mean   :2190   Mean   : 57.38   Mean   : 6.079   Mean   :5.554  
##  3rd Qu.:2554   3rd Qu.: 70.00   3rd Qu.: 7.000   3rd Qu.:6.000  
##  Max.   :2919   Max.   :190.00   Max.   :10.000   Max.   :9.000  
##                                                                  
##    YearBuilt     YearRemodAdd    BsmtFinSF1       BsmtFinSF2     
##  Min.   :1879   Min.   :1950   Min.   :   0.0   Min.   :   0.00  
##  1st Qu.:1953   1st Qu.:1963   1st Qu.:   0.0   1st Qu.:   0.00  
##  Median :1973   Median :1992   Median : 350.5   Median :   0.00  
##  Mean   :1971   Mean   :1984   Mean   : 439.2   Mean   :  52.62  
##  3rd Qu.:2001   3rd Qu.:2004   3rd Qu.: 753.5   3rd Qu.:   0.00  
##  Max.   :2010   Max.   :2010   Max.   :4010.0   Max.   :1526.00  
##                                NA's   :1        NA's   :1        
##    BsmtUnfSF        X1stFlrSF        X2ndFlrSF     BsmtFullBath   
##  Min.   :   0.0   Min.   : 407.0   Min.   :   0   Min.   :0.0000  
##  1st Qu.: 219.2   1st Qu.: 873.5   1st Qu.:   0   1st Qu.:0.0000  
##  Median : 460.0   Median :1079.0   Median :   0   Median :0.0000  
##  Mean   : 554.3   Mean   :1156.5   Mean   : 326   Mean   :0.4345  
##  3rd Qu.: 797.8   3rd Qu.:1382.5   3rd Qu.: 676   3rd Qu.:1.0000  
##  Max.   :2140.0   Max.   :5095.0   Max.   :1862   Max.   :3.0000  
##  NA's   :1                                        NA's   :2       
##     FullBath      KitchenAbvGr    TotRmsAbvGrd      Fireplaces    
##  Min.   :0.000   Min.   :0.000   Min.   : 3.000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.: 5.000   1st Qu.:0.0000  
##  Median :2.000   Median :1.000   Median : 6.000   Median :0.0000  
##  Mean   :1.571   Mean   :1.042   Mean   : 6.385   Mean   :0.5812  
##  3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.: 7.000   3rd Qu.:1.0000  
##  Max.   :4.000   Max.   :2.000   Max.   :15.000   Max.   :4.0000  
##                                                                   
##    GarageCars      WoodDeckSF       ScreenPorch        PoolArea      
##  Min.   :0.000   Min.   :   0.00   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.:1.000   1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.:  0.000  
##  Median :2.000   Median :   0.00   Median :  0.00   Median :  0.000  
##  Mean   :1.766   Mean   :  93.17   Mean   : 17.06   Mean   :  1.744  
##  3rd Qu.:2.000   3rd Qu.: 168.00   3rd Qu.:  0.00   3rd Qu.:  0.000  
##  Max.   :5.000   Max.   :1424.00   Max.   :576.00   Max.   :800.000  
##  NA's   :1

Now replace all NA to 0, and add a new column as “SalePrice”.

test1[is.na(test1)] <- 0
head(test1)
##     Id MSSubClass OverallQual OverallCond YearBuilt YearRemodAdd
## 1 1461         20           5           6      1961         1961
## 2 1462         20           6           6      1958         1958
## 3 1463         60           5           5      1997         1998
## 4 1464         60           6           6      1998         1998
## 5 1465        120           8           5      1992         1992
## 6 1466         60           6           5      1993         1994
##   BsmtFinSF1 BsmtFinSF2 BsmtUnfSF X1stFlrSF X2ndFlrSF BsmtFullBath
## 1        468        144       270       896         0            0
## 2        923          0       406      1329         0            0
## 3        791          0       137       928       701            0
## 4        602          0       324       926       678            0
## 5        263          0      1017      1280         0            0
## 6          0          0       763       763       892            0
##   FullBath KitchenAbvGr TotRmsAbvGrd Fireplaces GarageCars WoodDeckSF
## 1        1            1            5          0          1        140
## 2        1            1            6          0          1        393
## 3        2            1            6          1          2        212
## 4        2            1            7          1          2        360
## 5        2            1            5          0          2          0
## 6        2            1            7          1          2        157
##   ScreenPorch PoolArea
## 1         120        0
## 2           0        0
## 3           0        0
## 4           0        0
## 5         144        0
## 6           0        0

Use finalmodel modle to calculate SalePrice using data from test set.

test1["SalePrice"] <- coef[1]
test1$MSSubClass<-test1$MSSubClass*coef[2]
test1$OverallQual<-test1$OverallQual*coef[3]
test1$OverallCond<-test1$OverallCond*coef[4]
test1$OverallCond<-test1$YearBuilt*coef[5]
test1$YearRemodAdd<-test1$YearRemodAdd*coef[6]
test1$BsmtFinSF1<-test1$BsmtFinSF1*coef[7]
test1$BsmtFinSF2<-test1$BsmtFinSF2*coef[8]
test1$BsmtUnfSF<-test1$BsmtUnfSF*coef[9]
test1$X1stFlrSF<-test1$X1stFlrSF*coef[10]
test1$X2ndFlrSF<-test1$X2ndFlrSF*coef[11]
test1$BsmtFullBath<-test1$BsmtFullBath*coef[12]
test1$FullBath<-test1$FullBath*coef[13]
test1$KitchenAbvGr<-test1$KitchenAbvGr*coef[14]
test1$TotRmsAbvGrd<-test1$TotRmsAbvGrd*coef[15]
test1$Fireplaces<-test1$Fireplaces*coef[16]
test1$GarageCars<-test1$GarageCars*coef[17]
test1$WoodDeckSF<-test1$WoodDeckSF*coef[18]
test1$ScreenPorch<-test1$ScreenPorch*coef[19]
test1$PoolArea<-test1$PoolArea*coef[20]
test1["SalePrice"]<-test1$MSSubClass + test1$OverallQual + test1$OverallCond + 
    test1$YearBuilt + test1$YearRemodAdd + test1$BsmtFinSF1 + test1$BsmtFinSF2 + test1$BsmtUnfSF + test1$X1stFlrSF + test1$X2ndFlrSF + test1$BsmtFullBath + test1$FullBath + test1$KitchenAbvGr + test1$TotRmsAbvGrd + test1$Fireplaces + test1$GarageCars + test1$WoodDeckSF + test1$ScreenPorch + test1$PoolArea+test1$SalePrice

Write preidted SalePrice and Id to csv file.

predict<- data.frame(test1$Id, test1$SalePrice)
head(predict)
##   test1.Id test1.SalePrice
## 1     1461        94482.97
## 2     1462       119081.49
## 3     1463       155429.19
## 4     1464       172368.75
## 5     1465       169458.25
## 6     1466       166530.99
#write.csv(predict, file = "C:/Users/czhu5/OneDrive/Desktop/605/predict.csv")

My Score: 0.30252

Kaggle Score https://www.kaggle.com/c/house-prices-advanced-regression-techniques/leaderboard