House Prices: Advanced Regression Techniques
https://www.kaggle.com/c/house-prices-advanced-regression-techniques
Predict sales prices and practice feature engineering
Competition Description:
Ask a home buyer to describe their dream house, and they probably won’t begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition’s dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
Goal It is to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.
Fristly, I read data into r. Use str function to find the general outlook for train data, such as the number of observations, the names of the variables and their data types.
train<-read.csv("https://raw.githubusercontent.com/czhu505/Data605-/master/train.csv",stringsAsFactors = F)
str(train)
## 'data.frame': 1460 obs. of 81 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ MSZoning : chr "RL" "RL" "RL" "RL" ...
## $ LotFrontage : int 65 80 68 60 84 85 75 NA 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr NA NA NA NA ...
## $ LotShape : chr "Reg" "Reg" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "FR2" "Inside" "Corner" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
## $ Condition1 : chr "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "2Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ RoofStyle : chr "Gable" "Gable" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
## $ Exterior2nd : chr "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
## $ MasVnrType : chr "BrkFace" "None" "BrkFace" "None" ...
## $ MasVnrArea : int 196 0 162 0 350 0 186 240 0 0 ...
## $ ExterQual : chr "Gd" "TA" "Gd" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "PConc" "CBlock" "PConc" "BrkTil" ...
## $ BsmtQual : chr "Gd" "Gd" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "Gd" ...
## $ BsmtExposure : chr "No" "Gd" "Mn" "No" ...
## $ BsmtFinType1 : chr "GLQ" "ALQ" "GLQ" "ALQ" ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinType2 : chr "Unf" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "Ex" "Ex" "Ex" "Gd" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ KitchenQual : chr "Gd" "TA" "Gd" "Gd" ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ FireplaceQu : chr NA "TA" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Detchd" ...
## $ GarageYrBlt : int 2003 1976 2001 1998 2000 1993 2004 1973 1931 1939 ...
## $ GarageFinish : chr "RFn" "RFn" "RFn" "Unf" ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch: int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr NA NA NA NA ...
## $ Fence : chr NA NA NA NA ...
## $ MiscFeature : chr NA NA NA NA ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition: chr "Normal" "Normal" "Normal" "Abnorml" ...
## $ SalePrice : int 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
Since to predict sales prices is the goal, SalePrice is depandent varible. I select numerical variables as our observations.
library(purrr)
## Warning: package 'purrr' was built under R version 3.3.3
traindata<-train%>%keep(is.numeric)
summary(traindata)
## Id MSSubClass LotFrontage LotArea
## Min. : 1.0 Min. : 20.0 Min. : 21.00 Min. : 1300
## 1st Qu.: 365.8 1st Qu.: 20.0 1st Qu.: 59.00 1st Qu.: 7554
## Median : 730.5 Median : 50.0 Median : 69.00 Median : 9478
## Mean : 730.5 Mean : 56.9 Mean : 70.05 Mean : 10517
## 3rd Qu.:1095.2 3rd Qu.: 70.0 3rd Qu.: 80.00 3rd Qu.: 11602
## Max. :1460.0 Max. :190.0 Max. :313.00 Max. :215245
## NA's :259
## OverallQual OverallCond YearBuilt YearRemodAdd
## Min. : 1.000 Min. :1.000 Min. :1872 Min. :1950
## 1st Qu.: 5.000 1st Qu.:5.000 1st Qu.:1954 1st Qu.:1967
## Median : 6.000 Median :5.000 Median :1973 Median :1994
## Mean : 6.099 Mean :5.575 Mean :1971 Mean :1985
## 3rd Qu.: 7.000 3rd Qu.:6.000 3rd Qu.:2000 3rd Qu.:2004
## Max. :10.000 Max. :9.000 Max. :2010 Max. :2010
##
## MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF
## Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 223.0
## Median : 0.0 Median : 383.5 Median : 0.00 Median : 477.5
## Mean : 103.7 Mean : 443.6 Mean : 46.55 Mean : 567.2
## 3rd Qu.: 166.0 3rd Qu.: 712.2 3rd Qu.: 0.00 3rd Qu.: 808.0
## Max. :1600.0 Max. :5644.0 Max. :1474.00 Max. :2336.0
## NA's :8
## TotalBsmtSF X1stFlrSF X2ndFlrSF LowQualFinSF
## Min. : 0.0 Min. : 334 Min. : 0 Min. : 0.000
## 1st Qu.: 795.8 1st Qu.: 882 1st Qu.: 0 1st Qu.: 0.000
## Median : 991.5 Median :1087 Median : 0 Median : 0.000
## Mean :1057.4 Mean :1163 Mean : 347 Mean : 5.845
## 3rd Qu.:1298.2 3rd Qu.:1391 3rd Qu.: 728 3rd Qu.: 0.000
## Max. :6110.0 Max. :4692 Max. :2065 Max. :572.000
##
## GrLivArea BsmtFullBath BsmtHalfBath FullBath
## Min. : 334 Min. :0.0000 Min. :0.00000 Min. :0.000
## 1st Qu.:1130 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.000
## Median :1464 Median :0.0000 Median :0.00000 Median :2.000
## Mean :1515 Mean :0.4253 Mean :0.05753 Mean :1.565
## 3rd Qu.:1777 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:2.000
## Max. :5642 Max. :3.0000 Max. :2.00000 Max. :3.000
##
## HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd
## Min. :0.0000 Min. :0.000 Min. :0.000 Min. : 2.000
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.: 5.000
## Median :0.0000 Median :3.000 Median :1.000 Median : 6.000
## Mean :0.3829 Mean :2.866 Mean :1.047 Mean : 6.518
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:1.000 3rd Qu.: 7.000
## Max. :2.0000 Max. :8.000 Max. :3.000 Max. :14.000
##
## Fireplaces GarageYrBlt GarageCars GarageArea
## Min. :0.000 Min. :1900 Min. :0.000 Min. : 0.0
## 1st Qu.:0.000 1st Qu.:1961 1st Qu.:1.000 1st Qu.: 334.5
## Median :1.000 Median :1980 Median :2.000 Median : 480.0
## Mean :0.613 Mean :1979 Mean :1.767 Mean : 473.0
## 3rd Qu.:1.000 3rd Qu.:2002 3rd Qu.:2.000 3rd Qu.: 576.0
## Max. :3.000 Max. :2010 Max. :4.000 Max. :1418.0
## NA's :81
## WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 0.00 Median : 25.00 Median : 0.00 Median : 0.00
## Mean : 94.24 Mean : 46.66 Mean : 21.95 Mean : 3.41
## 3rd Qu.:168.00 3rd Qu.: 68.00 3rd Qu.: 0.00 3rd Qu.: 0.00
## Max. :857.00 Max. :547.00 Max. :552.00 Max. :508.00
##
## ScreenPorch PoolArea MiscVal MoSold
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 1.000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 5.000
## Median : 0.00 Median : 0.000 Median : 0.00 Median : 6.000
## Mean : 15.06 Mean : 2.759 Mean : 43.49 Mean : 6.322
## 3rd Qu.: 0.00 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 8.000
## Max. :480.00 Max. :738.000 Max. :15500.00 Max. :12.000
##
## YrSold SalePrice
## Min. :2006 Min. : 34900
## 1st Qu.:2007 1st Qu.:129975
## Median :2008 Median :163000
## Mean :2008 Mean :180921
## 3rd Qu.:2009 3rd Qu.:214000
## Max. :2010 Max. :755000
##
correlation<-cor(traindata, use="pairwise")
correlation<-correlation[,ncol(correlation)]
sort(correlation)
## KitchenAbvGr EnclosedPorch MSSubClass OverallCond YrSold
## -0.13590737 -0.12857796 -0.08428414 -0.07785589 -0.02892259
## LowQualFinSF Id MiscVal BsmtHalfBath BsmtFinSF2
## -0.02560613 -0.02191672 -0.02118958 -0.01684415 -0.01137812
## X3SsnPorch MoSold PoolArea ScreenPorch BedroomAbvGr
## 0.04458367 0.04643225 0.09240355 0.11144657 0.16821315
## BsmtUnfSF BsmtFullBath LotArea HalfBath OpenPorchSF
## 0.21447911 0.22712223 0.26384335 0.28410768 0.31585623
## X2ndFlrSF WoodDeckSF LotFrontage BsmtFinSF1 Fireplaces
## 0.31933380 0.32441344 0.35179910 0.38641981 0.46692884
## MasVnrArea GarageYrBlt YearRemodAdd YearBuilt TotRmsAbvGrd
## 0.47749305 0.48636168 0.50710097 0.52289733 0.53372316
## FullBath X1stFlrSF TotalBsmtSF GarageArea GarageCars
## 0.56066376 0.60585218 0.61358055 0.62343144 0.64040920
## GrLivArea OverallQual SalePrice
## 0.70862448 0.79098160 1.00000000
The last column of correlation table shows the correlation between SalePrice and other variables. After sort the list of the number, I can search a feature which has the strongest relatiship with SalePrice. Here, I look up from the 3rd largest values.
Pick one of the quantitative independent variables from the training data set (train.csv) , and define that variable as X. Make sure this variable is skewed to the right!
The following is to use hist graph to oberve the skewness of the variables.
par(mfrow=c(2,2))
hist(traindata$GrLivArea )
hist(traindata$GarageCars)
hist(traindata$GarageArea)
hist(traindata$TotalBsmtSF)
Now, I select “TotalBsmtSF” as x depent variable which has right skeness distribution. Pick the dependent variable “SalePrice” and define it as Y.
Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the 1st quartile of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities. In addition, make a table of counts as shown below.
a. P(X>x | Y>y)
It is interpreting the probability of the x greater than 1st quantile given the y greater than 1st quantile. Since X and Y are discrete, P(X>x)* p(Y>y)/p(Y>y)= P(X>x | Y>y)
library(data.table)
library(dplyr)
X<-traindata$TotalBsmtSF
Y<-traindata$SalePrice
t<-data.table(X,Y)
b<-t[ which(Y>quantile(Y,0.25)),]
a<-b[ which(X>quantile(X,0.25)),]
nrow(t)
## [1] 1460
nrow(b)
## [1] 1095
nrow(a)
## [1] 810
nrow(a)/nrow(b)
## [1] 0.739726
b. P(X>x, Y>y)
It is inerpreting the probability of x greater than 1st quantile and y greater than 1st quantile.
nrow(a)/nrow(t)
## [1] 0.5547945
c. P(X
It is interpreting the probability of the x smaller than 1st quantile given the y greater than 1st quantile. P(X
```
|>1st quartile | 285 | 810 | 1095 |
|Total | 400 | 1095 | 1495 |
Does splitting the training data in this fashion make them independent? Let A be the new variable counting those observations above the 1st quartile for X, and let B be the new variable counting those observations above the 1st quartile for Y. Does P(AB)=P(A)P(B)? Check mathematically, and then evaluate by running a Chi Square test for association.
A<-sum(X>quantile(X,0.25))
B<-sum(Y>quantile(Y,0.25))
PA<-A/length(X)
PB<-B/length(Y)
PA*PB
## [1] 0.5625
Comparing to the previous caluculation for P(X>x, Y>y) = 0.55479, P(X>x)P(Y>y)=0.5625 is slictly larger, which means their coveriance is not equal to zero. Let’s see the following chi square test. The hypothesis whether “TotalBsmtSF” is independent of “SalePrice” at .05 significance level.
chisq.test(X,Y)
## Warning in chisq.test(X, Y): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: X and Y
## X-squared = 509710, df = 476640, p-value < 2.2e-16
As the p-value is smaller than the .05 significance level, I do reject the null hypothesis that “TotalBsmtSF” is independent of “SalePrice” at .05 significance level, wihc is match the previous result as P(X>x, Y>y) != P(X>x)P(Y>y).
Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot of X and Y. Derive a correlation matrix for any THREE quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide a 92% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
plot(Y ~ X, col = "blue",xlab="TotalBsmtSF", ylab="SalePrice",scientific=FALSE)
From the scatter plot “TotalBsmtSF” shows it has linear correlationship with SalePrice.However, there are many outliners and overlap of the data.
The following, I randomly pick three variables “TotalBsmtSF”, “GrLivArea” and “SalePrice”. The following is to show correlation matrix of 3 variables.
corrmatrix<-cor(subset(traindata, select = c("TotalBsmtSF", "GrLivArea", "SalePrice")))
corrmatrix
## TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF 1.0000000 0.4548682 0.6135806
## GrLivArea 0.4548682 1.0000000 0.7086245
## SalePrice 0.6135806 0.7086245 1.0000000
In correlationship matrix, I can see the pariwise relationship in the data set. It shows non indepents between each others, since non are zero or closes to zero.
Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide a 92% confidence interval.
H0: true correlation of TotalBsmtSF and GrLivArea is equal to 0.
H1: true correlation of TotalBsmtSF and GrLivArea is not equal to 0.
t1<-subset(traindata, select = c("TotalBsmtSF", "GrLivArea", "SalePrice"))
cor.test(~ TotalBsmtSF+ GrLivArea, data = t1, conf.level = 0.92)
##
## Pearson's product-moment correlation
##
## data: TotalBsmtSF and GrLivArea
## t = 19.503, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
## 0.4177447 0.4904754
## sample estimates:
## cor
## 0.4548682
Since cor.test has p small than 0.05 within 92% confident interval, I can reject H0 that TotalBsmtSF and GrLivArea are not correlative.
I use same method to test TotalBsmtSF and SalePrice, GrLivArea and SalePrice.
cor.test(~ TotalBsmtSF+ SalePrice, data = t1, conf.level = 0.92)
##
## Pearson's product-moment correlation
##
## data: TotalBsmtSF and SalePrice
## t = 29.671, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
## 0.5841762 0.6413763
## sample estimates:
## cor
## 0.6135806
cor.test(~ GrLivArea+ SalePrice, data = t1, conf.level = 0.92)
##
## Pearson's product-moment correlation
##
## data: GrLivArea and SalePrice
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 92 percent confidence interval:
## 0.6850407 0.7307245
## sample estimates:
## cor
## 0.7086245
Since two tests have p small than 0.05 within 92% confident interval, I can reject H0 that TotalBsmtSF and SalePrice, GrLivArea and SalePrice are not correlative.
Linear Algebra and Correlation. Invert your 3 x 3 correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.
Usaully,the precision matrix thus allows to obtain direct covariation between two variables by capturing partial correlations.It gives the conditional independent covariation between two variables.
Fristly I need to check if det=0. If det is not 0, I can directly use inv(A) to find inverse matrix.
det(corrmatrix)
## [1] 0.3100169
library("matlib")
precisionMatrix<-function(A){
n=nrow(A)
m<-matrix(0,nrow=n,ncol=n)
for (i in 1:n)
for(j in 1:n)
m[i,j]=cofactor(A,i,j)
return(t(m)/det(A))
}
precision<-precisionMatrix(corrmatrix)
precision
## [,1] [,2] [,3]
## [1,] 1.60588442 -0.06473842 -0.9394642
## [2,] -0.06473842 2.01124151 -1.3854927
## [3,] -0.93946422 -1.38549273 2.5582310
From above pariswise converiance matrex, it clearly showed all covariance are non zero, which match previous Pearson test results.
Check for decomposition on the matrix A^T*A = I.
round(precision%*%corrmatrix,5)
## TotalBsmtSF GrLivArea SalePrice
## [1,] 1 0 0
## [2,] 0 1 0
## [3,] 0 0 1
I conduct LDU decomposition on the corrmatrix.
swap<-function(My_matrix,i,m){
j=i
temp=My_matrix
while((My_matrix[i,j]==0) && (i<=m)) i=i+1
if(My_matrix[i,j]!=0){
My_matrix[j,]=My_matrix[i,]
My_matrix[i,]=temp[j,]
}
return(My_matrix)
}
U<-function(My_matrix) {
M=nrow(My_matrix)
N=nrow(My_matrix)
i=1
j=1
k=1
while(i<=M && j<=N){
if(My_matrix[i,j]==0) swap(My_matrix,i,j,M)
if(My_matrix[i,i]==0) {
j=j+1
}
if(My_matrix[i,j]!=0) {
My_matrix[i,]<-My_matrix[i,]/My_matrix[i,j]
k=i+1
while(k<=M){
My_matrix[k,] <-My_matrix[k,]-My_matrix[k,j]/My_matrix[i,j]*My_matrix[i,]
k=k+1
}
i=i+1
j=j+1
}
}
return(round(My_matrix,2))
}
D<-function(My_matrix) {
M=nrow(My_matrix)
for(i in 1:M){
if(My_matrix[i,i]==0) swap(My_matrix,i,M)
j=i+1
while(j<=M){
if(My_matrix[i,i]!=0) My_matrix[j,] <-My_matrix[j,] -
My_matrix[j,i]/My_matrix[i,i]*My_matrix[i,]
j=j+1
}
}
for(a in M:2){
for (b in (a-1):1){
if(My_matrix[a,a]!=0) My_matrix[b,] <-My_matrix[b,] -
My_matrix[b,a]/My_matrix[a,a]*My_matrix[a,]
}
}
return(round(My_matrix,2))
}
L<-function(My_matrix) {
M=nrow(My_matrix)
temp=matrix(c(1:(M*M)),byrow =T,nrow=M,ncol=M)
temp=0*temp
for(i in 1:M){
if(My_matrix[i,i]==0) swap(My_matrix,i,M)
temp[i,i]<-1
if(My_matrix[i,i]==0) swap(My_matrix,i,M)
j=i+1
while(j<=M){
if(My_matrix[i,i]!=0) {
temp[j,i]<-My_matrix[j,i]/My_matrix[i,i]
My_matrix[j,] <-My_matrix[j,] -
My_matrix[j,i]/My_matrix[i,i]*My_matrix[i,]
}
j=j+1
}
}
return(round(temp,2))
}
L(corrmatrix)
## [,1] [,2] [,3]
## [1,] 1.00 0.00 0
## [2,] 0.45 1.00 0
## [3,] 0.61 0.54 1
D(corrmatrix)
## TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF 1 0.00 0.00
## GrLivArea 0 0.79 0.00
## SalePrice 0 0.00 0.39
U(corrmatrix)
## TotalBsmtSF GrLivArea SalePrice
## TotalBsmtSF 1 0.45 0.61
## GrLivArea 0 1.00 0.54
## SalePrice 0 0.00 1.00
Calculus-Based Probability & Statistics. Many times, it makes sense to fit a closed form distribution to data. For the first variable that you selected which is skewed to the right, shift it so that the minimum value is above zero as necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of ?? for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, ??)). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.
Back to the first variable in previous picked as X “TotalBsmtSF”, it is non zero data set with right skewness distribution. The following , I low the MASS package and run fitdistr to fit the expontial pdf, use MLE to estimate the lambda and 5%, 95% interval.
library(MASS)
## Warning: package 'MASS' was built under R version 3.3.3
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
library(survival)
## Warning: package 'survival' was built under R version 3.3.3
library(fitdistrplus)
## Warning: package 'fitdistrplus' was built under R version 3.3.3
ft<-fitdistr(traindata$TotalBsmtSF, densfun="exponential")
ft$estimate
## rate
## 0.0009456896
lambda<-1/1/(sum(traindata$TotalBsmtSF)/length(traindata$TotalBsmtSF))
set.seed(123)
s<-rexp(1000, lambda)
hist(s, pch=20, breaks=25, prob=FALSE, main="")
log(.05)/lambda * -1
## [1] 3167.776
log(.95)/lambda * -1
## [1] 54.23904
Comparing to the quantile of 5% and 95% TotalBsmtSF:
quantile(traindata$TotalBsmtSF,0.95)
## 95%
## 1753
quantile(traindata$TotalBsmtSF,0.05)
## 5%
## 519.3
The fitted exponetial distribution shifts to left side comparing to the real data in trainning set. The sample data needs to be modified to fit the trainning data.
traindata<-data.frame(select_if(train, is.numeric))
Modeling. Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.
Remove 5% outliners from SalePrice, remove outliners, then remove all missing value in traindata.
library(outliers)
traindata['t']<-scores(traindata$SalePrice, type="t", prob=0.95)
traindata<-traindata[!traindata$t,]
nrow(traindata<-na.omit(traindata))
## [1] 1023
Put all variable into lm().
rmodel<-lm(SalePrice~ . , data=traindata)
Next, use stepAIC selects the model based on Akaike Information Criteria. The goal is to find the model with the smallest AIC by removing or adding variables in the scope.
library(MASS)
step <- stepAIC(rmodel, direction="both")
## Start: AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF +
## LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath +
## FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF +
## OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch +
## PoolArea + MiscVal + MoSold + YrSold + t
##
##
## Step: AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF +
## LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath +
## FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF +
## OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch +
## PoolArea + MiscVal + MoSold + YrSold
##
##
## Step: AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF +
## LowQualFinSF + BsmtFullBath + BsmtHalfBath + FullBath + HalfBath +
## BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces +
## GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF +
## EnclosedPorch + X3SsnPorch + ScreenPorch + PoolArea + MiscVal +
## MoSold + YrSold
##
##
## Step: AIC=20627.66
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF +
## BsmtFullBath + BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr +
## KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt +
## GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch +
## X3SsnPorch + ScreenPorch + PoolArea + MiscVal + MoSold +
## YrSold
##
## Df Sum of Sq RSS AIC
## - BsmtHalfBath 1 2.2071e+06 5.4498e+11 20626
## - MiscVal 1 6.7276e+06 5.4498e+11 20626
## - X3SsnPorch 1 8.0866e+06 5.4498e+11 20626
## - MasVnrArea 1 9.8221e+06 5.4499e+11 20626
## - LotFrontage 1 1.0112e+07 5.4499e+11 20626
## - MoSold 1 1.6713e+08 5.4514e+11 20626
## - BedroomAbvGr 1 1.8968e+08 5.4517e+11 20626
## - HalfBath 1 2.7581e+08 5.4525e+11 20626
## - LowQualFinSF 1 3.3293e+08 5.4531e+11 20626
## - LotArea 1 3.8867e+08 5.4536e+11 20626
## - GarageYrBlt 1 4.2682e+08 5.4540e+11 20627
## - Id 1 7.2155e+08 5.4570e+11 20627
## - OpenPorchSF 1 7.6264e+08 5.4574e+11 20627
## <none> 5.4498e+11 20628
## - YrSold 1 1.1136e+09 5.4609e+11 20628
## - EnclosedPorch 1 1.3720e+09 5.4635e+11 20628
## - GarageArea 1 1.4733e+09 5.4645e+11 20628
## - BsmtFinSF2 1 1.7679e+09 5.4674e+11 20629
## - BsmtUnfSF 1 1.8214e+09 5.4680e+11 20629
## - BsmtFinSF1 1 2.0599e+09 5.4704e+11 20630
## - WoodDeckSF 1 3.1320e+09 5.4811e+11 20632
## - TotRmsAbvGrd 1 3.2194e+09 5.4820e+11 20632
## - KitchenAbvGr 1 7.6518e+09 5.5263e+11 20640
## - YearRemodAdd 1 8.7108e+09 5.5369e+11 20642
## - FullBath 1 8.7931e+09 5.5377e+11 20642
## - ScreenPorch 1 9.7814e+09 5.5476e+11 20644
## - GarageCars 1 9.9027e+09 5.5488e+11 20644
## - X1stFlrSF 1 1.0189e+10 5.5517e+11 20645
## - BsmtFullBath 1 1.3390e+10 5.5837e+11 20651
## - PoolArea 1 1.3578e+10 5.5855e+11 20651
## - OverallCond 1 1.5313e+10 5.6029e+11 20654
## - Fireplaces 1 1.5468e+10 5.6044e+11 20654
## - X2ndFlrSF 1 1.6501e+10 5.6148e+11 20656
## - MSSubClass 1 1.7511e+10 5.6249e+11 20658
## - YearBuilt 1 2.2817e+10 5.6779e+11 20668
## - OverallQual 1 8.9026e+10 6.3400e+11 20781
##
## Step: AIC=20625.67
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF +
## BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea +
## WoodDeckSF + OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch +
## PoolArea + MiscVal + MoSold + YrSold
##
## Df Sum of Sq RSS AIC
## - MiscVal 1 6.6174e+06 5.4498e+11 20624
## - X3SsnPorch 1 8.5351e+06 5.4499e+11 20624
## - LotFrontage 1 1.0076e+07 5.4499e+11 20624
## - MasVnrArea 1 1.0112e+07 5.4499e+11 20624
## - MoSold 1 1.6892e+08 5.4515e+11 20624
## - BedroomAbvGr 1 1.8763e+08 5.4517e+11 20624
## - HalfBath 1 2.7534e+08 5.4525e+11 20624
## - LowQualFinSF 1 3.3423e+08 5.4531e+11 20624
## - LotArea 1 3.8783e+08 5.4537e+11 20624
## - GarageYrBlt 1 4.3122e+08 5.4541e+11 20625
## - Id 1 7.2312e+08 5.4570e+11 20625
## - OpenPorchSF 1 7.6120e+08 5.4574e+11 20625
## <none> 5.4498e+11 20626
## - YrSold 1 1.1179e+09 5.4610e+11 20626
## - EnclosedPorch 1 1.3739e+09 5.4635e+11 20626
## - GarageArea 1 1.4741e+09 5.4645e+11 20626
## - BsmtFinSF2 1 1.7952e+09 5.4677e+11 20627
## - BsmtUnfSF 1 1.8192e+09 5.4680e+11 20627
## - BsmtFinSF1 1 2.1154e+09 5.4709e+11 20628
## + BsmtHalfBath 1 2.2071e+06 5.4498e+11 20628
## - WoodDeckSF 1 3.1611e+09 5.4814e+11 20630
## - TotRmsAbvGrd 1 3.2199e+09 5.4820e+11 20630
## - KitchenAbvGr 1 7.6496e+09 5.5263e+11 20638
## - YearRemodAdd 1 8.7543e+09 5.5373e+11 20640
## - FullBath 1 8.8690e+09 5.5385e+11 20640
## - ScreenPorch 1 9.7793e+09 5.5476e+11 20642
## - GarageCars 1 9.9301e+09 5.5491e+11 20642
## - X1stFlrSF 1 1.0187e+10 5.5517e+11 20643
## - PoolArea 1 1.3576e+10 5.5855e+11 20649
## - BsmtFullBath 1 1.4477e+10 5.5946e+11 20651
## - OverallCond 1 1.5412e+10 5.6039e+11 20652
## - Fireplaces 1 1.5514e+10 5.6049e+11 20652
## - X2ndFlrSF 1 1.6509e+10 5.6149e+11 20654
## - MSSubClass 1 1.7542e+10 5.6252e+11 20656
## - YearBuilt 1 2.2895e+10 5.6787e+11 20666
## - OverallQual 1 8.9025e+10 6.3400e+11 20779
##
## Step: AIC=20623.68
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF +
## BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea +
## WoodDeckSF + OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch +
## PoolArea + MoSold + YrSold
##
## Df Sum of Sq RSS AIC
## - X3SsnPorch 1 9.3457e+06 5.4499e+11 20622
## - LotFrontage 1 9.5398e+06 5.4499e+11 20622
## - MasVnrArea 1 1.0551e+07 5.4500e+11 20622
## - MoSold 1 1.7146e+08 5.4516e+11 20622
## - BedroomAbvGr 1 1.8772e+08 5.4517e+11 20622
## - HalfBath 1 2.6989e+08 5.4525e+11 20622
## - LowQualFinSF 1 3.3288e+08 5.4532e+11 20622
## - LotArea 1 3.8796e+08 5.4537e+11 20622
## - GarageYrBlt 1 4.2582e+08 5.4541e+11 20623
## - Id 1 7.2006e+08 5.4571e+11 20623
## - OpenPorchSF 1 7.7021e+08 5.4576e+11 20623
## <none> 5.4498e+11 20624
## - YrSold 1 1.1126e+09 5.4610e+11 20624
## - EnclosedPorch 1 1.3784e+09 5.4636e+11 20624
## - GarageArea 1 1.4773e+09 5.4646e+11 20624
## - BsmtFinSF2 1 1.7931e+09 5.4678e+11 20625
## - BsmtUnfSF 1 1.8221e+09 5.4681e+11 20625
## - BsmtFinSF1 1 2.1149e+09 5.4710e+11 20626
## + MiscVal 1 6.6174e+06 5.4498e+11 20626
## + BsmtHalfBath 1 2.0968e+06 5.4498e+11 20626
## - WoodDeckSF 1 3.1661e+09 5.4815e+11 20628
## - TotRmsAbvGrd 1 3.2499e+09 5.4823e+11 20628
## - KitchenAbvGr 1 7.6472e+09 5.5263e+11 20636
## - YearRemodAdd 1 8.7478e+09 5.5373e+11 20638
## - FullBath 1 8.8656e+09 5.5385e+11 20638
## - GarageCars 1 9.9245e+09 5.5491e+11 20640
## - ScreenPorch 1 1.0152e+10 5.5514e+11 20641
## - X1stFlrSF 1 1.0186e+10 5.5517e+11 20641
## - PoolArea 1 1.3758e+10 5.5874e+11 20647
## - BsmtFullBath 1 1.4533e+10 5.5952e+11 20649
## - OverallCond 1 1.5575e+10 5.6056e+11 20651
## - Fireplaces 1 1.5649e+10 5.6063e+11 20651
## - X2ndFlrSF 1 1.6520e+10 5.6151e+11 20652
## - MSSubClass 1 1.7610e+10 5.6259e+11 20654
## - YearBuilt 1 2.2890e+10 5.6787e+11 20664
## - OverallQual 1 8.9168e+10 6.3415e+11 20777
##
## Step: AIC=20621.7
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF +
## BsmtFullBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea +
## WoodDeckSF + OpenPorchSF + EnclosedPorch + ScreenPorch +
## PoolArea + MoSold + YrSold
##
## Df Sum of Sq RSS AIC
## - LotFrontage 1 1.0575e+07 5.4500e+11 20620
## - MasVnrArea 1 1.0578e+07 5.4500e+11 20620
## - MoSold 1 1.7239e+08 5.4517e+11 20620
## - BedroomAbvGr 1 1.8916e+08 5.4518e+11 20620
## - HalfBath 1 2.7550e+08 5.4527e+11 20620
## - LowQualFinSF 1 3.3401e+08 5.4533e+11 20620
## - LotArea 1 3.8731e+08 5.4538e+11 20620
## - GarageYrBlt 1 4.2591e+08 5.4542e+11 20621
## - Id 1 7.3991e+08 5.4573e+11 20621
## - OpenPorchSF 1 7.6592e+08 5.4576e+11 20621
## <none> 5.4499e+11 20622
## - YrSold 1 1.1077e+09 5.4610e+11 20622
## - EnclosedPorch 1 1.3718e+09 5.4637e+11 20622
## - GarageArea 1 1.4747e+09 5.4647e+11 20623
## - BsmtFinSF2 1 1.7887e+09 5.4678e+11 20623
## - BsmtUnfSF 1 1.8212e+09 5.4682e+11 20623
## - BsmtFinSF1 1 2.1162e+09 5.4711e+11 20624
## + X3SsnPorch 1 9.3457e+06 5.4498e+11 20624
## + MiscVal 1 7.4280e+06 5.4499e+11 20624
## + BsmtHalfBath 1 2.5497e+06 5.4499e+11 20624
## - WoodDeckSF 1 3.1575e+09 5.4815e+11 20626
## - TotRmsAbvGrd 1 3.2420e+09 5.4824e+11 20626
## - KitchenAbvGr 1 7.6628e+09 5.5266e+11 20634
## - YearRemodAdd 1 8.7461e+09 5.5374e+11 20636
## - FullBath 1 8.9324e+09 5.5393e+11 20636
## - GarageCars 1 9.9353e+09 5.5493e+11 20638
## - ScreenPorch 1 1.0144e+10 5.5514e+11 20639
## - X1stFlrSF 1 1.0205e+10 5.5520e+11 20639
## - PoolArea 1 1.3761e+10 5.5875e+11 20645
## - BsmtFullBath 1 1.4525e+10 5.5952e+11 20647
## - OverallCond 1 1.5581e+10 5.6058e+11 20649
## - Fireplaces 1 1.5641e+10 5.6064e+11 20649
## - X2ndFlrSF 1 1.6516e+10 5.6151e+11 20650
## - MSSubClass 1 1.7626e+10 5.6262e+11 20652
## - YearBuilt 1 2.2881e+10 5.6788e+11 20662
## - OverallQual 1 8.9168e+10 6.3416e+11 20775
##
## Step: AIC=20619.72
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 +
## BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath +
## FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF +
## OpenPorchSF + EnclosedPorch + ScreenPorch + PoolArea + MoSold +
## YrSold
##
## Df Sum of Sq RSS AIC
## - MasVnrArea 1 1.0208e+07 5.4502e+11 20618
## - MoSold 1 1.7269e+08 5.4518e+11 20618
## - BedroomAbvGr 1 1.8372e+08 5.4519e+11 20618
## - HalfBath 1 2.7484e+08 5.4528e+11 20618
## - LowQualFinSF 1 3.3404e+08 5.4534e+11 20618
## - GarageYrBlt 1 4.4020e+08 5.4545e+11 20619
## - LotArea 1 5.2345e+08 5.4553e+11 20619
## - Id 1 7.3902e+08 5.4574e+11 20619
## - OpenPorchSF 1 7.6702e+08 5.4577e+11 20619
## <none> 5.4500e+11 20620
## - YrSold 1 1.0987e+09 5.4610e+11 20620
## - EnclosedPorch 1 1.3845e+09 5.4639e+11 20620
## - GarageArea 1 1.5143e+09 5.4652e+11 20621
## - BsmtFinSF2 1 1.7838e+09 5.4679e+11 20621
## - BsmtUnfSF 1 1.8110e+09 5.4682e+11 20621
## - BsmtFinSF1 1 2.1154e+09 5.4712e+11 20622
## + LotFrontage 1 1.0575e+07 5.4499e+11 20622
## + X3SsnPorch 1 1.0380e+07 5.4499e+11 20622
## + MiscVal 1 6.8764e+06 5.4500e+11 20622
## + BsmtHalfBath 1 2.5412e+06 5.4500e+11 20622
## - WoodDeckSF 1 3.1491e+09 5.4815e+11 20624
## - TotRmsAbvGrd 1 3.2669e+09 5.4827e+11 20624
## - KitchenAbvGr 1 7.6526e+09 5.5266e+11 20632
## - YearRemodAdd 1 8.7368e+09 5.5374e+11 20634
## - FullBath 1 8.9276e+09 5.5393e+11 20634
## - GarageCars 1 9.9388e+09 5.5494e+11 20636
## - ScreenPorch 1 1.0145e+10 5.5515e+11 20637
## - X1stFlrSF 1 1.0392e+10 5.5540e+11 20637
## - PoolArea 1 1.3817e+10 5.5882e+11 20643
## - BsmtFullBath 1 1.4515e+10 5.5952e+11 20645
## - OverallCond 1 1.5631e+10 5.6064e+11 20647
## - Fireplaces 1 1.5651e+10 5.6066e+11 20647
## - X2ndFlrSF 1 1.6580e+10 5.6158e+11 20648
## - MSSubClass 1 1.9362e+10 5.6437e+11 20653
## - YearBuilt 1 2.3234e+10 5.6824e+11 20660
## - OverallQual 1 8.9175e+10 6.3418e+11 20773
##
## Step: AIC=20617.74
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath +
## HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces +
## GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF +
## EnclosedPorch + ScreenPorch + PoolArea + MoSold + YrSold
##
## Df Sum of Sq RSS AIC
## - MoSold 1 1.7086e+08 5.4519e+11 20616
## - BedroomAbvGr 1 1.8471e+08 5.4520e+11 20616
## - HalfBath 1 2.7281e+08 5.4529e+11 20616
## - LowQualFinSF 1 3.3803e+08 5.4535e+11 20616
## - GarageYrBlt 1 4.3578e+08 5.4545e+11 20617
## - LotArea 1 5.3867e+08 5.4555e+11 20617
## - Id 1 7.3075e+08 5.4575e+11 20617
## - OpenPorchSF 1 7.7773e+08 5.4579e+11 20617
## <none> 5.4502e+11 20618
## - YrSold 1 1.1001e+09 5.4612e+11 20618
## - EnclosedPorch 1 1.3914e+09 5.4641e+11 20618
## - GarageArea 1 1.5050e+09 5.4652e+11 20619
## - BsmtFinSF2 1 1.7822e+09 5.4680e+11 20619
## - BsmtUnfSF 1 1.8019e+09 5.4682e+11 20619
## - BsmtFinSF1 1 2.1121e+09 5.4713e+11 20620
## + X3SsnPorch 1 1.0389e+07 5.4500e+11 20620
## + MasVnrArea 1 1.0208e+07 5.4500e+11 20620
## + LotFrontage 1 1.0205e+07 5.4500e+11 20620
## + MiscVal 1 7.3263e+06 5.4501e+11 20620
## + BsmtHalfBath 1 2.8493e+06 5.4501e+11 20620
## - WoodDeckSF 1 3.1561e+09 5.4817e+11 20622
## - TotRmsAbvGrd 1 3.2596e+09 5.4827e+11 20622
## - KitchenAbvGr 1 7.6454e+09 5.5266e+11 20630
## - YearRemodAdd 1 8.8359e+09 5.5385e+11 20632
## - FullBath 1 8.9929e+09 5.5401e+11 20633
## - GarageCars 1 9.9344e+09 5.5495e+11 20634
## - ScreenPorch 1 1.0137e+10 5.5515e+11 20635
## - X1stFlrSF 1 1.0401e+10 5.5542e+11 20635
## - PoolArea 1 1.3809e+10 5.5882e+11 20641
## - BsmtFullBath 1 1.4647e+10 5.5966e+11 20643
## - OverallCond 1 1.5624e+10 5.6064e+11 20645
## - Fireplaces 1 1.5653e+10 5.6067e+11 20645
## - X2ndFlrSF 1 1.6600e+10 5.6162e+11 20646
## - MSSubClass 1 1.9537e+10 5.6455e+11 20652
## - YearBuilt 1 2.3462e+10 5.6848e+11 20659
## - OverallQual 1 8.9238e+10 6.3425e+11 20771
##
## Step: AIC=20616.06
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath +
## HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces +
## GarageYrBlt + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF +
## EnclosedPorch + ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - BedroomAbvGr 1 1.7700e+08 5.4536e+11 20614
## - HalfBath 1 2.5897e+08 5.4544e+11 20615
## - LowQualFinSF 1 3.2825e+08 5.4551e+11 20615
## - GarageYrBlt 1 4.4697e+08 5.4563e+11 20615
## - LotArea 1 5.2866e+08 5.4571e+11 20615
## - Id 1 7.2191e+08 5.4591e+11 20615
## - OpenPorchSF 1 8.3698e+08 5.4602e+11 20616
## <none> 5.4519e+11 20616
## - YrSold 1 1.2573e+09 5.4644e+11 20616
## - EnclosedPorch 1 1.3278e+09 5.4651e+11 20617
## - GarageArea 1 1.4997e+09 5.4669e+11 20617
## - BsmtFinSF2 1 1.7412e+09 5.4693e+11 20617
## - BsmtUnfSF 1 1.7486e+09 5.4693e+11 20617
## + MoSold 1 1.7086e+08 5.4502e+11 20618
## - BsmtFinSF1 1 2.0677e+09 5.4725e+11 20618
## + X3SsnPorch 1 1.1372e+07 5.4517e+11 20618
## + LotFrontage 1 1.0536e+07 5.4518e+11 20618
## + MiscVal 1 9.9699e+06 5.4518e+11 20618
## + MasVnrArea 1 8.3841e+06 5.4518e+11 20618
## + BsmtHalfBath 1 4.8184e+06 5.4518e+11 20618
## - TotRmsAbvGrd 1 3.2084e+09 5.4839e+11 20620
## - WoodDeckSF 1 3.2276e+09 5.4841e+11 20620
## - KitchenAbvGr 1 7.5495e+09 5.5274e+11 20628
## - YearRemodAdd 1 8.8480e+09 5.5403e+11 20631
## - FullBath 1 9.0547e+09 5.5424e+11 20631
## - GarageCars 1 9.9742e+09 5.5516e+11 20633
## - ScreenPorch 1 1.0139e+10 5.5533e+11 20633
## - X1stFlrSF 1 1.0445e+10 5.5563e+11 20634
## - PoolArea 1 1.4057e+10 5.5924e+11 20640
## - BsmtFullBath 1 1.4627e+10 5.5981e+11 20641
## - OverallCond 1 1.5535e+10 5.6072e+11 20643
## - Fireplaces 1 1.5721e+10 5.6091e+11 20643
## - X2ndFlrSF 1 1.6673e+10 5.6186e+11 20645
## - MSSubClass 1 1.9861e+10 5.6505e+11 20651
## - YearBuilt 1 2.3331e+10 5.6852e+11 20657
## - OverallQual 1 9.0670e+10 6.3586e+11 20771
##
## Step: AIC=20614.39
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath +
## HalfBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt +
## GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch +
## ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - HalfBath 1 2.6203e+08 5.4562e+11 20613
## - LowQualFinSF 1 3.1357e+08 5.4568e+11 20613
## - GarageYrBlt 1 4.2622e+08 5.4579e+11 20613
## - LotArea 1 5.1285e+08 5.4588e+11 20613
## - Id 1 7.2592e+08 5.4609e+11 20614
## - OpenPorchSF 1 8.8702e+08 5.4625e+11 20614
## <none> 5.4536e+11 20614
## - YrSold 1 1.2569e+09 5.4662e+11 20615
## - EnclosedPorch 1 1.2997e+09 5.4666e+11 20615
## - GarageArea 1 1.5173e+09 5.4688e+11 20615
## - BsmtFinSF2 1 1.6886e+09 5.4705e+11 20616
## - BsmtUnfSF 1 1.7219e+09 5.4708e+11 20616
## + BedroomAbvGr 1 1.7700e+08 5.4519e+11 20616
## + MoSold 1 1.6316e+08 5.4520e+11 20616
## - BsmtFinSF1 1 2.0882e+09 5.4745e+11 20616
## + X3SsnPorch 1 1.2529e+07 5.4535e+11 20616
## + MiscVal 1 1.0298e+07 5.4535e+11 20616
## + MasVnrArea 1 9.3029e+06 5.4535e+11 20616
## + LotFrontage 1 5.1703e+06 5.4536e+11 20616
## + BsmtHalfBath 1 1.3014e+06 5.4536e+11 20616
## - TotRmsAbvGrd 1 3.1753e+09 5.4854e+11 20618
## - WoodDeckSF 1 3.1903e+09 5.4855e+11 20618
## - KitchenAbvGr 1 7.5009e+09 5.5286e+11 20626
## - FullBath 1 8.8975e+09 5.5426e+11 20629
## - YearRemodAdd 1 9.7153e+09 5.5508e+11 20631
## - GarageCars 1 9.9867e+09 5.5535e+11 20631
## - ScreenPorch 1 1.0018e+10 5.5538e+11 20631
## - X1stFlrSF 1 1.0397e+10 5.5576e+11 20632
## - PoolArea 1 1.4091e+10 5.5945e+11 20639
## - BsmtFullBath 1 1.4605e+10 5.5997e+11 20639
## - OverallCond 1 1.5358e+10 5.6072e+11 20641
## - Fireplaces 1 1.6158e+10 5.6152e+11 20642
## - X2ndFlrSF 1 1.6507e+10 5.6187e+11 20643
## - MSSubClass 1 1.9758e+10 5.6512e+11 20649
## - YearBuilt 1 2.3170e+10 5.6853e+11 20655
## - OverallQual 1 9.3163e+10 6.3853e+11 20774
##
## Step: AIC=20612.88
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + FullBath +
## KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageYrBlt +
## GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch +
## ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - LowQualFinSF 1 3.1847e+08 5.4594e+11 20612
## - GarageYrBlt 1 4.5512e+08 5.4608e+11 20612
## - LotArea 1 5.0763e+08 5.4613e+11 20612
## - Id 1 7.2860e+08 5.4635e+11 20612
## - OpenPorchSF 1 9.1530e+08 5.4654e+11 20613
## <none> 5.4562e+11 20613
## - EnclosedPorch 1 1.2536e+09 5.4688e+11 20613
## - YrSold 1 1.2575e+09 5.4688e+11 20613
## - GarageArea 1 1.4623e+09 5.4709e+11 20614
## - BsmtUnfSF 1 1.7201e+09 5.4735e+11 20614
## - BsmtFinSF2 1 1.7790e+09 5.4740e+11 20614
## + HalfBath 1 2.6203e+08 5.4536e+11 20614
## + BedroomAbvGr 1 1.8007e+08 5.4544e+11 20615
## + MoSold 1 1.4951e+08 5.4548e+11 20615
## + X3SsnPorch 1 1.8657e+07 5.4561e+11 20615
## - BsmtFinSF1 1 2.1219e+09 5.4775e+11 20615
## + MasVnrArea 1 7.4802e+06 5.4562e+11 20615
## + LotFrontage 1 4.7014e+06 5.4562e+11 20615
## + MiscVal 1 3.0754e+06 5.4562e+11 20615
## + BsmtHalfBath 1 9.9622e+05 5.4562e+11 20615
## - WoodDeckSF 1 3.2003e+09 5.4883e+11 20617
## - TotRmsAbvGrd 1 3.2270e+09 5.4885e+11 20617
## - KitchenAbvGr 1 7.6253e+09 5.5325e+11 20625
## - FullBath 1 9.2673e+09 5.5489e+11 20628
## - YearRemodAdd 1 9.7939e+09 5.5542e+11 20629
## - ScreenPorch 1 1.0242e+10 5.5587e+11 20630
## - GarageCars 1 1.0326e+10 5.5595e+11 20630
## - X1stFlrSF 1 1.0456e+10 5.5608e+11 20630
## - PoolArea 1 1.4182e+10 5.5981e+11 20637
## - BsmtFullBath 1 1.4440e+10 5.6006e+11 20638
## - OverallCond 1 1.5273e+10 5.6090e+11 20639
## - Fireplaces 1 1.6748e+10 5.6237e+11 20642
## - MSSubClass 1 1.9679e+10 5.6530e+11 20647
## - X2ndFlrSF 1 2.3916e+10 5.6954e+11 20655
## - YearBuilt 1 2.7193e+10 5.7282e+11 20661
## - OverallQual 1 9.2902e+10 6.3853e+11 20772
##
## Step: AIC=20611.48
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageYrBlt + GarageCars + GarageArea +
## WoodDeckSF + OpenPorchSF + EnclosedPorch + ScreenPorch +
## PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - GarageYrBlt 1 3.9982e+08 5.4634e+11 20610
## - LotArea 1 5.0286e+08 5.4645e+11 20610
## - Id 1 7.7940e+08 5.4672e+11 20611
## - OpenPorchSF 1 9.4543e+08 5.4689e+11 20611
## <none> 5.4594e+11 20612
## - EnclosedPorch 1 1.2324e+09 5.4718e+11 20612
## - YrSold 1 1.2458e+09 5.4719e+11 20612
## - GarageArea 1 1.5017e+09 5.4745e+11 20612
## - BsmtUnfSF 1 1.6770e+09 5.4762e+11 20613
## - BsmtFinSF2 1 1.7646e+09 5.4771e+11 20613
## + LowQualFinSF 1 3.1847e+08 5.4562e+11 20613
## + GrLivArea 1 3.1847e+08 5.4562e+11 20613
## + HalfBath 1 2.6693e+08 5.4568e+11 20613
## + BedroomAbvGr 1 1.6517e+08 5.4578e+11 20613
## + MoSold 1 1.4069e+08 5.4580e+11 20613
## - BsmtFinSF1 1 2.0630e+09 5.4801e+11 20613
## + X3SsnPorch 1 2.0175e+07 5.4592e+11 20613
## + MasVnrArea 1 1.0823e+07 5.4593e+11 20614
## + LotFrontage 1 4.8440e+06 5.4594e+11 20614
## + MiscVal 1 2.1890e+06 5.4594e+11 20614
## + BsmtHalfBath 1 2.0437e+06 5.4594e+11 20614
## - WoodDeckSF 1 3.2001e+09 5.4914e+11 20616
## - TotRmsAbvGrd 1 3.5533e+09 5.4950e+11 20616
## - KitchenAbvGr 1 8.0622e+09 5.5401e+11 20625
## - FullBath 1 9.3865e+09 5.5533e+11 20627
## - YearRemodAdd 1 9.8565e+09 5.5580e+11 20628
## - ScreenPorch 1 1.0108e+10 5.5605e+11 20628
## - GarageCars 1 1.0246e+10 5.5619e+11 20629
## - X1stFlrSF 1 1.0317e+10 5.5626e+11 20629
## - PoolArea 1 1.3870e+10 5.5981e+11 20635
## - BsmtFullBath 1 1.4584e+10 5.6053e+11 20636
## - OverallCond 1 1.5045e+10 5.6099e+11 20637
## - Fireplaces 1 1.6780e+10 5.6272e+11 20640
## - MSSubClass 1 1.9488e+10 5.6543e+11 20645
## - X2ndFlrSF 1 2.3605e+10 5.6955e+11 20653
## - YearBuilt 1 2.7011e+10 5.7295e+11 20659
## - OverallQual 1 9.3201e+10 6.3914e+11 20771
##
## Step: AIC=20610.23
## SalePrice ~ Id + MSSubClass + LotArea + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF +
## OpenPorchSF + EnclosedPorch + ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - LotArea 1 5.7428e+08 5.4692e+11 20609
## - Id 1 8.3474e+08 5.4718e+11 20610
## - OpenPorchSF 1 9.1880e+08 5.4726e+11 20610
## <none> 5.4634e+11 20610
## - GarageArea 1 1.1673e+09 5.4751e+11 20610
## - EnclosedPorch 1 1.2550e+09 5.4760e+11 20611
## - YrSold 1 1.2769e+09 5.4762e+11 20611
## - BsmtUnfSF 1 1.7219e+09 5.4807e+11 20611
## + GarageYrBlt 1 3.9982e+08 5.4594e+11 20612
## - BsmtFinSF2 1 1.8164e+09 5.4816e+11 20612
## + HalfBath 1 2.9383e+08 5.4605e+11 20612
## + LowQualFinSF 1 2.6317e+08 5.4608e+11 20612
## + GrLivArea 1 2.6317e+08 5.4608e+11 20612
## + MoSold 1 1.5086e+08 5.4619e+11 20612
## + BedroomAbvGr 1 1.4706e+08 5.4620e+11 20612
## - BsmtFinSF1 1 2.1180e+09 5.4846e+11 20612
## + X3SsnPorch 1 2.1283e+07 5.4632e+11 20612
## + LotFrontage 1 1.5391e+07 5.4633e+11 20612
## + BsmtHalfBath 1 6.2942e+06 5.4634e+11 20612
## + MasVnrArea 1 5.9863e+06 5.4634e+11 20612
## + MiscVal 1 3.4500e+03 5.4634e+11 20612
## - WoodDeckSF 1 3.0039e+09 5.4935e+11 20614
## - TotRmsAbvGrd 1 3.4744e+09 5.4982e+11 20615
## - KitchenAbvGr 1 7.8334e+09 5.5418e+11 20623
## - FullBath 1 9.2370e+09 5.5558e+11 20625
## - YearRemodAdd 1 9.4831e+09 5.5583e+11 20626
## - ScreenPorch 1 1.0185e+10 5.5653e+11 20627
## - GarageCars 1 1.0327e+10 5.5667e+11 20627
## - X1stFlrSF 1 1.0607e+10 5.5695e+11 20628
## - PoolArea 1 1.3745e+10 5.6009e+11 20634
## - BsmtFullBath 1 1.4782e+10 5.6113e+11 20636
## - OverallCond 1 1.5825e+10 5.6217e+11 20637
## - Fireplaces 1 1.7532e+10 5.6388e+11 20641
## - MSSubClass 1 2.0007e+10 5.6635e+11 20645
## - X2ndFlrSF 1 2.4032e+10 5.7038e+11 20652
## - YearBuilt 1 3.1459e+10 5.7780e+11 20666
## - OverallQual 1 9.3129e+10 6.3947e+11 20769
##
## Step: AIC=20609.3
## SalePrice ~ Id + MSSubClass + OverallQual + OverallCond + YearBuilt +
## YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
## X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF +
## EnclosedPorch + ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - Id 1 8.7315e+08 5.4779e+11 20609
## - OpenPorchSF 1 9.4972e+08 5.4787e+11 20609
## <none> 5.4692e+11 20609
## - EnclosedPorch 1 1.1752e+09 5.4809e+11 20610
## - GarageArea 1 1.3136e+09 5.4823e+11 20610
## - YrSold 1 1.3307e+09 5.4825e+11 20610
## + LotArea 1 5.7428e+08 5.4634e+11 20610
## + GarageYrBlt 1 4.7125e+08 5.4645e+11 20610
## - BsmtUnfSF 1 1.7495e+09 5.4867e+11 20611
## + HalfBath 1 2.9037e+08 5.4663e+11 20611
## - BsmtFinSF2 1 1.8635e+09 5.4878e+11 20611
## + LowQualFinSF 1 2.5407e+08 5.4666e+11 20611
## + GrLivArea 1 2.5407e+08 5.4666e+11 20611
## + LotFrontage 1 1.7630e+08 5.4674e+11 20611
## + MoSold 1 1.4250e+08 5.4678e+11 20611
## + BedroomAbvGr 1 1.3046e+08 5.4679e+11 20611
## + X3SsnPorch 1 2.5012e+07 5.4689e+11 20611
## + MasVnrArea 1 1.8386e+07 5.4690e+11 20611
## + BsmtHalfBath 1 5.1087e+06 5.4691e+11 20611
## + MiscVal 1 1.9594e+05 5.4692e+11 20611
## - BsmtFinSF1 1 2.2527e+09 5.4917e+11 20612
## - WoodDeckSF 1 3.0724e+09 5.4999e+11 20613
## - TotRmsAbvGrd 1 3.4224e+09 5.5034e+11 20614
## - KitchenAbvGr 1 7.8106e+09 5.5473e+11 20622
## - FullBath 1 9.1584e+09 5.5608e+11 20624
## - YearRemodAdd 1 9.4527e+09 5.5637e+11 20625
## - GarageCars 1 1.0270e+10 5.5719e+11 20626
## - ScreenPorch 1 1.0297e+10 5.5721e+11 20626
## - X1stFlrSF 1 1.1804e+10 5.5872e+11 20629
## - PoolArea 1 1.3294e+10 5.6021e+11 20632
## - BsmtFullBath 1 1.4958e+10 5.6188e+11 20635
## - OverallCond 1 1.5622e+10 5.6254e+11 20636
## - Fireplaces 1 1.8628e+10 5.6555e+11 20642
## - MSSubClass 1 2.3824e+10 5.7074e+11 20651
## - X2ndFlrSF 1 2.6244e+10 5.7316e+11 20655
## - YearBuilt 1 3.1047e+10 5.7796e+11 20664
## - OverallQual 1 9.2561e+10 6.3948e+11 20767
##
## Step: AIC=20608.93
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
## YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
## X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageCars + GarageArea + WoodDeckSF + OpenPorchSF +
## EnclosedPorch + ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - OpenPorchSF 1 8.9172e+08 5.4868e+11 20609
## <none> 5.4779e+11 20609
## - EnclosedPorch 1 1.1448e+09 5.4894e+11 20609
## + Id 1 8.7315e+08 5.4692e+11 20609
## - YrSold 1 1.3168e+09 5.4911e+11 20609
## - GarageArea 1 1.4087e+09 5.4920e+11 20610
## + LotArea 1 6.1269e+08 5.4718e+11 20610
## + GarageYrBlt 1 5.3498e+08 5.4726e+11 20610
## - BsmtUnfSF 1 1.7463e+09 5.4954e+11 20610
## + LowQualFinSF 1 2.9847e+08 5.4749e+11 20610
## + GrLivArea 1 2.9847e+08 5.4749e+11 20610
## + HalfBath 1 2.9572e+08 5.4749e+11 20610
## - BsmtFinSF2 1 1.8825e+09 5.4967e+11 20610
## + LotFrontage 1 1.8473e+08 5.4761e+11 20611
## + MoSold 1 1.3321e+08 5.4766e+11 20611
## + BedroomAbvGr 1 1.3151e+08 5.4766e+11 20611
## + X3SsnPorch 1 5.7416e+07 5.4773e+11 20611
## + BsmtHalfBath 1 8.5136e+06 5.4778e+11 20611
## + MasVnrArea 1 5.6666e+06 5.4779e+11 20611
## + MiscVal 1 1.6268e+06 5.4779e+11 20611
## - BsmtFinSF1 1 2.3504e+09 5.5014e+11 20611
## - WoodDeckSF 1 3.1087e+09 5.5090e+11 20613
## - TotRmsAbvGrd 1 3.2398e+09 5.5103e+11 20613
## - KitchenAbvGr 1 7.6421e+09 5.5543e+11 20621
## - FullBath 1 9.0367e+09 5.5683e+11 20624
## - YearRemodAdd 1 9.5639e+09 5.5735e+11 20625
## - GarageCars 1 1.0001e+10 5.5779e+11 20625
## - ScreenPorch 1 1.0140e+10 5.5793e+11 20626
## - X1stFlrSF 1 1.1773e+10 5.5956e+11 20629
## - PoolArea 1 1.3617e+10 5.6141e+11 20632
## - BsmtFullBath 1 1.4561e+10 5.6235e+11 20634
## - OverallCond 1 1.5475e+10 5.6327e+11 20635
## - Fireplaces 1 1.8702e+10 5.6649e+11 20641
## - MSSubClass 1 2.4158e+10 5.7195e+11 20651
## - X2ndFlrSF 1 2.6451e+10 5.7424e+11 20655
## - YearBuilt 1 3.0866e+10 5.7866e+11 20663
## - OverallQual 1 9.4215e+10 6.4201e+11 20769
##
## Step: AIC=20608.6
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
## YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
## X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageCars + GarageArea + WoodDeckSF + EnclosedPorch +
## ScreenPorch + PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## - EnclosedPorch 1 9.5826e+08 5.4964e+11 20608
## <none> 5.4868e+11 20609
## + OpenPorchSF 1 8.9172e+08 5.4779e+11 20609
## + Id 1 8.1514e+08 5.4787e+11 20609
## - YrSold 1 1.4324e+09 5.5011e+11 20609
## + LotArea 1 6.4228e+08 5.4804e+11 20609
## - GarageArea 1 1.5888e+09 5.5027e+11 20610
## + GarageYrBlt 1 5.0451e+08 5.4818e+11 20610
## + LowQualFinSF 1 3.2670e+08 5.4836e+11 20610
## + GrLivArea 1 3.2670e+08 5.4836e+11 20610
## + HalfBath 1 3.2432e+08 5.4836e+11 20610
## + LotFrontage 1 1.9055e+08 5.4849e+11 20610
## + MoSold 1 1.8567e+08 5.4850e+11 20610
## + BedroomAbvGr 1 1.7490e+08 5.4851e+11 20610
## - BsmtUnfSF 1 2.0581e+09 5.5074e+11 20610
## + X3SsnPorch 1 4.5272e+07 5.4864e+11 20611
## + MasVnrArea 1 1.5796e+07 5.4867e+11 20611
## + BsmtHalfBath 1 4.5271e+06 5.4868e+11 20611
## + MiscVal 1 6.9648e+04 5.4868e+11 20611
## - BsmtFinSF2 1 2.1635e+09 5.5085e+11 20611
## - BsmtFinSF1 1 2.6500e+09 5.5133e+11 20612
## - WoodDeckSF 1 2.9029e+09 5.5159e+11 20612
## - TotRmsAbvGrd 1 3.0458e+09 5.5173e+11 20612
## - KitchenAbvGr 1 7.7107e+09 5.5639e+11 20621
## - FullBath 1 9.0846e+09 5.5777e+11 20623
## - GarageCars 1 9.6777e+09 5.5836e+11 20625
## - YearRemodAdd 1 1.0124e+10 5.5881e+11 20625
## - ScreenPorch 1 1.0487e+10 5.5917e+11 20626
## - X1stFlrSF 1 1.2176e+10 5.6086e+11 20629
## - PoolArea 1 1.3976e+10 5.6266e+11 20632
## - BsmtFullBath 1 1.4892e+10 5.6357e+11 20634
## - OverallCond 1 1.5160e+10 5.6384e+11 20635
## - Fireplaces 1 1.8865e+10 5.6755e+11 20641
## - MSSubClass 1 2.4349e+10 5.7303e+11 20651
## - X2ndFlrSF 1 2.9315e+10 5.7800e+11 20660
## - YearBuilt 1 3.1013e+10 5.7969e+11 20663
## - OverallQual 1 9.4452e+10 6.4313e+11 20769
##
## Step: AIC=20608.38
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
## YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
## X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch +
## PoolArea + YrSold
##
## Df Sum of Sq RSS AIC
## <none> 5.4964e+11 20608
## + EnclosedPorch 1 9.5826e+08 5.4868e+11 20609
## + Id 1 7.9402e+08 5.4885e+11 20609
## + OpenPorchSF 1 7.0523e+08 5.4894e+11 20609
## - YrSold 1 1.4495e+09 5.5109e+11 20609
## + LotArea 1 5.6133e+08 5.4908e+11 20609
## + GarageYrBlt 1 5.2360e+08 5.4912e+11 20609
## - GarageArea 1 1.6663e+09 5.5131e+11 20610
## + LowQualFinSF 1 3.0316e+08 5.4934e+11 20610
## + GrLivArea 1 3.0316e+08 5.4934e+11 20610
## + HalfBath 1 2.7642e+08 5.4936e+11 20610
## + LotFrontage 1 2.0859e+08 5.4943e+11 20610
## + BedroomAbvGr 1 1.4683e+08 5.4949e+11 20610
## + MoSold 1 1.2523e+08 5.4952e+11 20610
## - BsmtUnfSF 1 2.0301e+09 5.5167e+11 20610
## + X3SsnPorch 1 3.0487e+07 5.4961e+11 20610
## + MasVnrArea 1 2.0544e+07 5.4962e+11 20610
## + BsmtHalfBath 1 6.9368e+06 5.4963e+11 20610
## + MiscVal 1 4.9818e+05 5.4964e+11 20610
## - BsmtFinSF2 1 2.2388e+09 5.5188e+11 20611
## - BsmtFinSF1 1 2.5428e+09 5.5218e+11 20611
## - WoodDeckSF 1 2.7191e+09 5.5236e+11 20611
## - TotRmsAbvGrd 1 2.7904e+09 5.5243e+11 20612
## - KitchenAbvGr 1 7.9493e+09 5.5759e+11 20621
## - FullBath 1 9.0684e+09 5.5871e+11 20623
## - GarageCars 1 9.7097e+09 5.5935e+11 20624
## - ScreenPorch 1 9.8054e+09 5.5945e+11 20625
## - YearRemodAdd 1 1.0465e+10 5.6011e+11 20626
## - X1stFlrSF 1 1.2302e+10 5.6194e+11 20629
## - PoolArea 1 1.3435e+10 5.6308e+11 20631
## - OverallCond 1 1.4373e+10 5.6401e+11 20633
## - BsmtFullBath 1 1.5378e+10 5.6502e+11 20635
## - Fireplaces 1 1.8846e+10 5.6849e+11 20641
## - MSSubClass 1 2.4785e+10 5.7443e+11 20652
## - X2ndFlrSF 1 3.0286e+10 5.7993e+11 20661
## - YearBuilt 1 3.1322e+10 5.8096e+11 20663
## - OverallQual 1 9.7972e+10 6.4761e+11 20774
step$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## SalePrice ~ Id + MSSubClass + LotFrontage + LotArea + OverallQual +
## OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 +
## BsmtFinSF2 + BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF +
## LowQualFinSF + GrLivArea + BsmtFullBath + BsmtHalfBath +
## FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageYrBlt + GarageCars + GarageArea + WoodDeckSF +
## OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch +
## PoolArea + MiscVal + MoSold + YrSold + t
##
## Final Model:
## SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
## YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
## X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
## Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch +
## PoolArea + YrSold
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 987 544976140366 20627.66
## 2 - t 0 0 987 544976140366 20627.66
## 3 - GrLivArea 0 0 987 544976140366 20627.66
## 4 - TotalBsmtSF 0 0 987 544976140366 20627.66
## 5 - BsmtHalfBath 1 2207060 988 544978347426 20625.67
## 6 - MiscVal 1 6617357 989 544984964783 20623.68
## 7 - X3SsnPorch 1 9345745 990 544994310528 20621.70
## 8 - LotFrontage 1 10574577 991 545004885105 20619.72
## 9 - MasVnrArea 1 10208382 992 545015093487 20617.74
## 10 - MoSold 1 170864174 993 545185957661 20616.06
## 11 - BedroomAbvGr 1 177002598 994 545362960259 20614.39
## 12 - HalfBath 1 262034782 995 545624995040 20612.88
## 13 - LowQualFinSF 1 318465833 996 545943460873 20611.48
## 14 - GarageYrBlt 1 399824726 997 546343285599 20610.23
## 15 - LotArea 1 574282008 998 546917567607 20609.30
## 16 - Id 1 873148554 999 547790716162 20608.93
## 17 - OpenPorchSF 1 891715890 1000 548682432052 20608.60
## 18 - EnclosedPorch 1 958263789 1001 549640695841 20608.38
Suggested model is:
SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch + PoolArea + YrSold
The following use summary to check p-value for each variable one by one from the highest value, remove the variable if p-value is greater than 0.05.
finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch +
PoolArea + YrSold, data=traindata)
summary(finalmodel)
##
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF +
## ScreenPorch + PoolArea + YrSold, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -238541 -13301 -889 11862 73178
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.993e+05 1.116e+06 0.627 0.53110
## MSSubClass -1.375e+02 2.046e+01 -6.718 3.08e-11 ***
## OverallQual 1.341e+04 1.004e+03 13.358 < 2e-16 ***
## OverallCond 4.455e+03 8.708e+02 5.116 3.73e-07 ***
## YearBuilt 3.255e+02 4.309e+01 7.553 9.59e-14 ***
## YearRemodAdd 2.294e+02 5.254e+01 4.366 1.40e-05 ***
## BsmtFinSF1 8.093e+00 3.761e+00 2.152 0.03164 *
## BsmtFinSF2 1.132e+01 5.605e+00 2.019 0.04373 *
## BsmtUnfSF 6.543e+00 3.403e+00 1.923 0.05479 .
## X1stFlrSF 2.312e+01 4.885e+00 4.733 2.53e-06 ***
## X2ndFlrSF 2.732e+01 3.678e+00 7.427 2.38e-13 ***
## BsmtFullBath 1.049e+04 1.982e+03 5.292 1.49e-07 ***
## FullBath 8.644e+03 2.127e+03 4.064 5.20e-05 ***
## KitchenAbvGr -1.625e+04 4.272e+03 -3.805 0.00015 ***
## TotRmsAbvGrd 2.109e+03 9.355e+02 2.254 0.02439 *
## Fireplaces 8.325e+03 1.421e+03 5.859 6.33e-09 ***
## GarageCars 9.782e+03 2.326e+03 4.205 2.84e-05 ***
## GarageArea 1.353e+01 7.766e+00 1.742 0.08181 .
## WoodDeckSF 1.528e+01 6.866e+00 2.225 0.02628 *
## ScreenPorch 5.760e+01 1.363e+01 4.226 2.60e-05 ***
## PoolArea -9.918e+01 2.005e+01 -4.947 8.86e-07 ***
## YrSold -9.028e+02 5.557e+02 -1.625 0.10454
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23430 on 1001 degrees of freedom
## Multiple R-squared: 0.8056, Adjusted R-squared: 0.8015
## F-statistic: 197.6 on 21 and 1001 DF, p-value: < 2.2e-16
remove YrSold
finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
Fireplaces + GarageCars + GarageArea + WoodDeckSF + ScreenPorch +
PoolArea , data=traindata)
summary(finalmodel)
##
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF +
## ScreenPorch + PoolArea, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -239944 -13223 -705 11666 72101
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.107e+06 9.782e+04 -11.318 < 2e-16 ***
## MSSubClass -1.364e+02 2.047e+01 -6.665 4.37e-11 ***
## OverallQual 1.341e+04 1.005e+03 13.349 < 2e-16 ***
## OverallCond 4.431e+03 8.714e+02 5.085 4.38e-07 ***
## YearBuilt 3.263e+02 4.313e+01 7.567 8.62e-14 ***
## YearRemodAdd 2.254e+02 5.253e+01 4.291 1.95e-05 ***
## BsmtFinSF1 8.284e+00 3.762e+00 2.202 0.027902 *
## BsmtFinSF2 1.124e+01 5.609e+00 2.003 0.045401 *
## BsmtUnfSF 6.572e+00 3.405e+00 1.930 0.053914 .
## X1stFlrSF 2.306e+01 4.889e+00 4.717 2.73e-06 ***
## X2ndFlrSF 2.721e+01 3.681e+00 7.394 3.01e-13 ***
## BsmtFullBath 1.027e+04 1.979e+03 5.190 2.54e-07 ***
## FullBath 8.550e+03 2.128e+03 4.018 6.31e-05 ***
## KitchenAbvGr -1.663e+04 4.269e+03 -3.895 0.000105 ***
## TotRmsAbvGrd 2.166e+03 9.356e+02 2.315 0.020835 *
## Fireplaces 8.365e+03 1.422e+03 5.883 5.49e-09 ***
## GarageCars 9.989e+03 2.325e+03 4.297 1.90e-05 ***
## GarageArea 1.316e+01 7.769e+00 1.694 0.090567 .
## WoodDeckSF 1.506e+01 6.870e+00 2.192 0.028615 *
## ScreenPorch 5.727e+01 1.364e+01 4.198 2.93e-05 ***
## PoolArea -9.745e+01 2.004e+01 -4.863 1.34e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23450 on 1002 degrees of freedom
## Multiple R-squared: 0.8051, Adjusted R-squared: 0.8012
## F-statistic: 207 on 20 and 1002 DF, p-value: < 2.2e-16
remove GarageArea
finalmodel<-lm(SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt +
YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF +
X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd +
Fireplaces + GarageCars + WoodDeckSF + ScreenPorch +
PoolArea , data=traindata)
summary(finalmodel)
##
## Call:
## lm(formula = SalePrice ~ MSSubClass + OverallQual + OverallCond +
## YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF +
## X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr +
## TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch +
## PoolArea, data = traindata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -235931 -13269 -1062 12134 72295
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.108e+06 9.791e+04 -11.319 < 2e-16 ***
## MSSubClass -1.391e+02 2.043e+01 -6.808 1.70e-11 ***
## OverallQual 1.340e+04 1.006e+03 13.326 < 2e-16 ***
## OverallCond 4.464e+03 8.720e+02 5.119 3.68e-07 ***
## YearBuilt 3.290e+02 4.314e+01 7.628 5.54e-14 ***
## YearRemodAdd 2.234e+02 5.256e+01 4.249 2.34e-05 ***
## BsmtFinSF1 8.887e+00 3.749e+00 2.371 0.017937 *
## BsmtFinSF2 1.161e+01 5.610e+00 2.070 0.038694 *
## BsmtUnfSF 6.950e+00 3.401e+00 2.044 0.041263 *
## X1stFlrSF 2.419e+01 4.848e+00 4.989 7.17e-07 ***
## X2ndFlrSF 2.797e+01 3.657e+00 7.647 4.81e-14 ***
## BsmtFullBath 1.036e+04 1.980e+03 5.232 2.05e-07 ***
## FullBath 8.408e+03 2.128e+03 3.951 8.34e-05 ***
## KitchenAbvGr -1.663e+04 4.273e+03 -3.891 0.000106 ***
## TotRmsAbvGrd 2.066e+03 9.347e+02 2.211 0.027277 *
## Fireplaces 8.072e+03 1.413e+03 5.714 1.46e-08 ***
## GarageCars 1.287e+04 1.586e+03 8.112 1.44e-15 ***
## WoodDeckSF 1.496e+01 6.876e+00 2.176 0.029776 *
## ScreenPorch 5.652e+01 1.365e+01 4.142 3.73e-05 ***
## PoolArea -9.455e+01 1.998e+01 -4.731 2.55e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23470 on 1003 degrees of freedom
## Multiple R-squared: 0.8046, Adjusted R-squared: 0.8009
## F-statistic: 217.3 on 19 and 1003 DF, p-value: < 2.2e-16
The finalmodel:
SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + PoolArea
Check the lm() residual.
plot(finalmodel)
Though there is very few outliner still exist, most of standardized residuals are line alone the line. the model is probablily goodfit for the data. Now extract coefficient from the summary.
coef<-summary(finalmodel)$coefficients[1:20]
coef
## [1] -1.108198e+06 -1.390496e+02 1.340014e+04 4.463995e+03 3.290395e+02
## [6] 2.233629e+02 8.887390e+00 1.161355e+01 6.950429e+00 2.418612e+01
## [11] 2.796644e+01 1.035946e+04 8.408329e+03 -1.662665e+04 2.066350e+03
## [16] 8.071619e+03 1.286920e+04 1.496366e+01 5.652352e+01 -9.454670e+01
The next is to use test set data to predit SalePrice.
test<-read.csv("https://raw.githubusercontent.com/czhu505/Data605-/master/test.csv",stringsAsFactors = F)
SalePrice ~ MSSubClass + OverallQual + OverallCond + YearBuilt + YearRemodAdd + BsmtFinSF1 + BsmtFinSF2 + BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + PoolArea
colnames<-c('Id','MSSubClass','OverallQual','OverallCond',
'YearBuilt' ,'YearRemodAdd' , 'BsmtFinSF1' , 'BsmtFinSF2' , 'BsmtUnfSF' ,
'X1stFlrSF', 'X2ndFlrSF' , 'BsmtFullBath' , 'FullBath' , 'KitchenAbvGr' ,
'TotRmsAbvGrd' , 'Fireplaces' , 'GarageCars' , 'WoodDeckSF' , 'ScreenPorch',
'PoolArea')
test1<- data.frame(test[, colnames])
summary(test1)
## Id MSSubClass OverallQual OverallCond
## Min. :1461 Min. : 20.00 Min. : 1.000 Min. :1.000
## 1st Qu.:1826 1st Qu.: 20.00 1st Qu.: 5.000 1st Qu.:5.000
## Median :2190 Median : 50.00 Median : 6.000 Median :5.000
## Mean :2190 Mean : 57.38 Mean : 6.079 Mean :5.554
## 3rd Qu.:2554 3rd Qu.: 70.00 3rd Qu.: 7.000 3rd Qu.:6.000
## Max. :2919 Max. :190.00 Max. :10.000 Max. :9.000
##
## YearBuilt YearRemodAdd BsmtFinSF1 BsmtFinSF2
## Min. :1879 Min. :1950 Min. : 0.0 Min. : 0.00
## 1st Qu.:1953 1st Qu.:1963 1st Qu.: 0.0 1st Qu.: 0.00
## Median :1973 Median :1992 Median : 350.5 Median : 0.00
## Mean :1971 Mean :1984 Mean : 439.2 Mean : 52.62
## 3rd Qu.:2001 3rd Qu.:2004 3rd Qu.: 753.5 3rd Qu.: 0.00
## Max. :2010 Max. :2010 Max. :4010.0 Max. :1526.00
## NA's :1 NA's :1
## BsmtUnfSF X1stFlrSF X2ndFlrSF BsmtFullBath
## Min. : 0.0 Min. : 407.0 Min. : 0 Min. :0.0000
## 1st Qu.: 219.2 1st Qu.: 873.5 1st Qu.: 0 1st Qu.:0.0000
## Median : 460.0 Median :1079.0 Median : 0 Median :0.0000
## Mean : 554.3 Mean :1156.5 Mean : 326 Mean :0.4345
## 3rd Qu.: 797.8 3rd Qu.:1382.5 3rd Qu.: 676 3rd Qu.:1.0000
## Max. :2140.0 Max. :5095.0 Max. :1862 Max. :3.0000
## NA's :1 NA's :2
## FullBath KitchenAbvGr TotRmsAbvGrd Fireplaces
## Min. :0.000 Min. :0.000 Min. : 3.000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.: 5.000 1st Qu.:0.0000
## Median :2.000 Median :1.000 Median : 6.000 Median :0.0000
## Mean :1.571 Mean :1.042 Mean : 6.385 Mean :0.5812
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.: 7.000 3rd Qu.:1.0000
## Max. :4.000 Max. :2.000 Max. :15.000 Max. :4.0000
##
## GarageCars WoodDeckSF ScreenPorch PoolArea
## Min. :0.000 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median :2.000 Median : 0.00 Median : 0.00 Median : 0.000
## Mean :1.766 Mean : 93.17 Mean : 17.06 Mean : 1.744
## 3rd Qu.:2.000 3rd Qu.: 168.00 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :5.000 Max. :1424.00 Max. :576.00 Max. :800.000
## NA's :1
Now replace all NA to 0, and add a new column as “SalePrice”.
test1[is.na(test1)] <- 0
head(test1)
## Id MSSubClass OverallQual OverallCond YearBuilt YearRemodAdd
## 1 1461 20 5 6 1961 1961
## 2 1462 20 6 6 1958 1958
## 3 1463 60 5 5 1997 1998
## 4 1464 60 6 6 1998 1998
## 5 1465 120 8 5 1992 1992
## 6 1466 60 6 5 1993 1994
## BsmtFinSF1 BsmtFinSF2 BsmtUnfSF X1stFlrSF X2ndFlrSF BsmtFullBath
## 1 468 144 270 896 0 0
## 2 923 0 406 1329 0 0
## 3 791 0 137 928 701 0
## 4 602 0 324 926 678 0
## 5 263 0 1017 1280 0 0
## 6 0 0 763 763 892 0
## FullBath KitchenAbvGr TotRmsAbvGrd Fireplaces GarageCars WoodDeckSF
## 1 1 1 5 0 1 140
## 2 1 1 6 0 1 393
## 3 2 1 6 1 2 212
## 4 2 1 7 1 2 360
## 5 2 1 5 0 2 0
## 6 2 1 7 1 2 157
## ScreenPorch PoolArea
## 1 120 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 144 0
## 6 0 0
Use finalmodel modle to calculate SalePrice using data from test set.
test1["SalePrice"] <- coef[1]
test1$MSSubClass<-test1$MSSubClass*coef[2]
test1$OverallQual<-test1$OverallQual*coef[3]
test1$OverallCond<-test1$OverallCond*coef[4]
test1$OverallCond<-test1$YearBuilt*coef[5]
test1$YearRemodAdd<-test1$YearRemodAdd*coef[6]
test1$BsmtFinSF1<-test1$BsmtFinSF1*coef[7]
test1$BsmtFinSF2<-test1$BsmtFinSF2*coef[8]
test1$BsmtUnfSF<-test1$BsmtUnfSF*coef[9]
test1$X1stFlrSF<-test1$X1stFlrSF*coef[10]
test1$X2ndFlrSF<-test1$X2ndFlrSF*coef[11]
test1$BsmtFullBath<-test1$BsmtFullBath*coef[12]
test1$FullBath<-test1$FullBath*coef[13]
test1$KitchenAbvGr<-test1$KitchenAbvGr*coef[14]
test1$TotRmsAbvGrd<-test1$TotRmsAbvGrd*coef[15]
test1$Fireplaces<-test1$Fireplaces*coef[16]
test1$GarageCars<-test1$GarageCars*coef[17]
test1$WoodDeckSF<-test1$WoodDeckSF*coef[18]
test1$ScreenPorch<-test1$ScreenPorch*coef[19]
test1$PoolArea<-test1$PoolArea*coef[20]
test1["SalePrice"]<-test1$MSSubClass + test1$OverallQual + test1$OverallCond +
test1$YearBuilt + test1$YearRemodAdd + test1$BsmtFinSF1 + test1$BsmtFinSF2 + test1$BsmtUnfSF + test1$X1stFlrSF + test1$X2ndFlrSF + test1$BsmtFullBath + test1$FullBath + test1$KitchenAbvGr + test1$TotRmsAbvGrd + test1$Fireplaces + test1$GarageCars + test1$WoodDeckSF + test1$ScreenPorch + test1$PoolArea+test1$SalePrice
Write preidted SalePrice and Id to csv file.
predict<- data.frame(test1$Id, test1$SalePrice)
head(predict)
## test1.Id test1.SalePrice
## 1 1461 94482.97
## 2 1462 119081.49
## 3 1463 155429.19
## 4 1464 172368.75
## 5 1465 169458.25
## 6 1466 166530.99
#write.csv(predict, file = "C:/Users/czhu5/OneDrive/Desktop/605/predict.csv")
My Score: 0.30252
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/leaderboard