Please refer to the Final Exam Document.

2 Problem 1

2.1 Part 1

Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \(\mu = \sigma =(N+1)/2\).

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.002   3.186   5.489   5.476   7.718   9.999

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -14.677   1.720   5.419   5.450   9.122  26.899

2.2 Part 2

Probability: Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.

5 points. a. P(X>x | X>y) b. P(X>x, Y>y) c. P(X<x | X>y)
5 points. Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.

5 points. Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

## [1] 5.489
## [1] 1.72

2.2.1 Part 2a

\[P(X>x|X>y) = \frac{P(X>x,X>y)}{P(X>y)}\]

## [1] 0.5453157

2.2.2 Part 2b P(X>x, Y>y)

We know that x is the median of X, so P(X>x) is about 0.5.

Also, we know that y is the 1st quantile of Y, so P(Y>y) is about 0.75.

Therefore, \(P(X>x, Y>y) = P(X>x)\cdot P(Y>y) = 0.5 \cdot 0.75 = 0.375\)

## [1] 0.375

2.2.3 Part 2c P(X<x | X>y)

\[P(X<x|X>y) = \frac{P(X<x,X>y)}{P(X>y)}\] It is obvious that Part C = 1 - Part A. We can also prove that using the calculation below.

## [1] 0.4546843
## [1] 1

2.2.4 Part 2d

Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.

The marginal probability of P(X>x, Y>y) is 0.3756, which is similar to our answer in Part 2b 0.375.

As the difference is 0.0006, which is relatively small. I would conclude that the formula \(P(X>x\;and\;Y>y) = P(X>x)P(Y>y)\) holds.

2.2.5 Part 2e

Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

Fisher’s Exact Test is a statistical test used to determine if there are nonrandom associations between two categorical variables. It works better when the sample is small, some cells have less than 5 counts in the contingency table.

Chi-Square Test is a statistical test commonly used for testing the relationship between categorical variables. It works better for large samples and requires each cell of the contingency matrix has at least 5 counts.

The null hypothesis for both is that no relationship exists on the categorical variables in the population, i.e. they are independent. If we reject the null hypothesis, it is likely that the categorical variables are dependent.

## 
##  Fisher's Exact Test for Count Data
## 
## data:  table2
## p-value = 0.7995
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.9242273 1.1100187
## sample estimates:
## odds ratio 
##   1.012883
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table2
## X-squared = 0.064533, df = 1, p-value = 0.7995

The p-value we got from Fisher’s Exact Test is 0.7995, which is greater than 0.05. We accept the null hypothesis that the variables are very likely independent of each other.

The p-value we got from Chi-Square Test is also 0.7995, which is greater than 0.05. We accept the null hypothesis that the variables are very likely independent of each other.

As we have a large sample size (N=10000) here with over 1k counts in each conditions in table2 and conventionally Chi-Sqaure test works better than Fisher’s Exact Test for large samples, it is more appropriate to use Chi-Sqaure Test here for this question.

3 Problem 2

You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. I want you to do the following.

5 points. Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?

5 points. Linear Algebra and Correlation. Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.

5 points. Calculus-Based Probability & Statistics. Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of \(\lambda\) for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, \(\lambda\))). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.

10 points. Modeling. Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.

3.1 Data Description

Please refer to the data description file.

3.2 Read Data

## Warning: package 'corrplot' was built under R version 3.6.3
## [1] 1460   81
##        Id           MSSubClass       MSZoning     LotFrontage    
##  Min.   :   1.0   Min.   : 20.0   C (all):  10   Min.   : 21.00  
##  1st Qu.: 365.8   1st Qu.: 20.0   FV     :  65   1st Qu.: 59.00  
##  Median : 730.5   Median : 50.0   RH     :  16   Median : 69.00  
##  Mean   : 730.5   Mean   : 56.9   RL     :1151   Mean   : 70.05  
##  3rd Qu.:1095.2   3rd Qu.: 70.0   RM     : 218   3rd Qu.: 80.00  
##  Max.   :1460.0   Max.   :190.0                  Max.   :313.00  
##                                                  NA's   :259     
##     LotArea        Street      Alley      LotShape  LandContour
##  Min.   :  1300   Grvl:   6   Grvl:  50   IR1:484   Bnk:  63   
##  1st Qu.:  7554   Pave:1454   Pave:  41   IR2: 41   HLS:  50   
##  Median :  9478               NA's:1369   IR3: 10   Low:  36   
##  Mean   : 10517                           Reg:925   Lvl:1311   
##  3rd Qu.: 11602                                                
##  Max.   :215245                                                
##                                                                
##   Utilities      LotConfig    LandSlope   Neighborhood   Condition1  
##  AllPub:1459   Corner : 263   Gtl:1382   NAmes  :225   Norm   :1260  
##  NoSeWa:   1   CulDSac:  94   Mod:  65   CollgCr:150   Feedr  :  81  
##                FR2    :  47   Sev:  13   OldTown:113   Artery :  48  
##                FR3    :   4              Edwards:100   RRAn   :  26  
##                Inside :1052              Somerst: 86   PosN   :  19  
##                                          Gilbert: 79   RRAe   :  11  
##                                          (Other):707   (Other):  15  
##    Condition2     BldgType      HouseStyle   OverallQual    
##  Norm   :1445   1Fam  :1220   1Story :726   Min.   : 1.000  
##  Feedr  :   6   2fmCon:  31   2Story :445   1st Qu.: 5.000  
##  Artery :   2   Duplex:  52   1.5Fin :154   Median : 6.000  
##  PosN   :   2   Twnhs :  43   SLvl   : 65   Mean   : 6.099  
##  RRNn   :   2   TwnhsE: 114   SFoyer : 37   3rd Qu.: 7.000  
##  PosA   :   1                 1.5Unf : 14   Max.   :10.000  
##  (Other):   2                 (Other): 19                   
##   OverallCond      YearBuilt     YearRemodAdd    RoofStyle   
##  Min.   :1.000   Min.   :1872   Min.   :1950   Flat   :  13  
##  1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967   Gable  :1141  
##  Median :5.000   Median :1973   Median :1994   Gambrel:  11  
##  Mean   :5.575   Mean   :1971   Mean   :1985   Hip    : 286  
##  3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004   Mansard:   7  
##  Max.   :9.000   Max.   :2010   Max.   :2010   Shed   :   2  
##                                                              
##     RoofMatl     Exterior1st   Exterior2nd    MasVnrType    MasVnrArea    
##  CompShg:1434   VinylSd:515   VinylSd:504   BrkCmn : 15   Min.   :   0.0  
##  Tar&Grv:  11   HdBoard:222   MetalSd:214   BrkFace:445   1st Qu.:   0.0  
##  WdShngl:   6   MetalSd:220   HdBoard:207   None   :864   Median :   0.0  
##  WdShake:   5   Wd Sdng:206   Wd Sdng:197   Stone  :128   Mean   : 103.7  
##  ClyTile:   1   Plywood:108   Plywood:142   NA's   :  8   3rd Qu.: 166.0  
##  Membran:   1   CemntBd: 61   CmentBd: 60                 Max.   :1600.0  
##  (Other):   2   (Other):128   (Other):136                 NA's   :8       
##  ExterQual ExterCond  Foundation  BsmtQual   BsmtCond    BsmtExposure
##  Ex: 52    Ex:   3   BrkTil:146   Ex  :121   Fa  :  45   Av  :221    
##  Fa: 14    Fa:  28   CBlock:634   Fa  : 35   Gd  :  65   Gd  :134    
##  Gd:488    Gd: 146   PConc :647   Gd  :618   Po  :   2   Mn  :114    
##  TA:906    Po:   1   Slab  : 24   TA  :649   TA  :1311   No  :953    
##            TA:1282   Stone :  6   NA's: 37   NA's:  37   NA's: 38    
##                      Wood  :  3                                      
##                                                                      
##  BsmtFinType1   BsmtFinSF1     BsmtFinType2   BsmtFinSF2     
##  ALQ :220     Min.   :   0.0   ALQ :  19    Min.   :   0.00  
##  BLQ :148     1st Qu.:   0.0   BLQ :  33    1st Qu.:   0.00  
##  GLQ :418     Median : 383.5   GLQ :  14    Median :   0.00  
##  LwQ : 74     Mean   : 443.6   LwQ :  46    Mean   :  46.55  
##  Rec :133     3rd Qu.: 712.2   Rec :  54    3rd Qu.:   0.00  
##  Unf :430     Max.   :5644.0   Unf :1256    Max.   :1474.00  
##  NA's: 37                      NA's:  38                     
##    BsmtUnfSF       TotalBsmtSF      Heating     HeatingQC CentralAir
##  Min.   :   0.0   Min.   :   0.0   Floor:   1   Ex:741    N:  95    
##  1st Qu.: 223.0   1st Qu.: 795.8   GasA :1428   Fa: 49    Y:1365    
##  Median : 477.5   Median : 991.5   GasW :  18   Gd:241              
##  Mean   : 567.2   Mean   :1057.4   Grav :   7   Po:  1              
##  3rd Qu.: 808.0   3rd Qu.:1298.2   OthW :   2   TA:428              
##  Max.   :2336.0   Max.   :6110.0   Wall :   4                       
##                                                                     
##  Electrical     X1stFlrSF      X2ndFlrSF     LowQualFinSF    
##  FuseA:  94   Min.   : 334   Min.   :   0   Min.   :  0.000  
##  FuseF:  27   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000  
##  FuseP:   3   Median :1087   Median :   0   Median :  0.000  
##  Mix  :   1   Mean   :1163   Mean   : 347   Mean   :  5.845  
##  SBrkr:1334   3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000  
##  NA's :   1   Max.   :4692   Max.   :2065   Max.   :572.000  
##                                                              
##    GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath    
##  Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :1464   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565  
##  3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000  
##                                                                   
##     HalfBath       BedroomAbvGr    KitchenAbvGr   KitchenQual
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Ex:100     
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Fa: 39     
##  Median :0.0000   Median :3.000   Median :1.000   Gd:586     
##  Mean   :0.3829   Mean   :2.866   Mean   :1.047   TA:735     
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000              
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000              
##                                                              
##   TotRmsAbvGrd    Functional    Fireplaces    FireplaceQu   GarageType 
##  Min.   : 2.000   Maj1:  14   Min.   :0.000   Ex  : 24    2Types :  6  
##  1st Qu.: 5.000   Maj2:   5   1st Qu.:0.000   Fa  : 33    Attchd :870  
##  Median : 6.000   Min1:  31   Median :1.000   Gd  :380    Basment: 19  
##  Mean   : 6.518   Min2:  34   Mean   :0.613   Po  : 20    BuiltIn: 88  
##  3rd Qu.: 7.000   Mod :  15   3rd Qu.:1.000   TA  :313    CarPort:  9  
##  Max.   :14.000   Sev :   1   Max.   :3.000   NA's:690    Detchd :387  
##                   Typ :1360                               NA's   : 81  
##   GarageYrBlt   GarageFinish   GarageCars      GarageArea     GarageQual 
##  Min.   :1900   Fin :352     Min.   :0.000   Min.   :   0.0   Ex  :   3  
##  1st Qu.:1961   RFn :422     1st Qu.:1.000   1st Qu.: 334.5   Fa  :  48  
##  Median :1980   Unf :605     Median :2.000   Median : 480.0   Gd  :  14  
##  Mean   :1979   NA's: 81     Mean   :1.767   Mean   : 473.0   Po  :   3  
##  3rd Qu.:2002                3rd Qu.:2.000   3rd Qu.: 576.0   TA  :1311  
##  Max.   :2010                Max.   :4.000   Max.   :1418.0   NA's:  81  
##  NA's   :81                                                              
##  GarageCond  PavedDrive   WoodDeckSF      OpenPorchSF     EnclosedPorch   
##  Ex  :   2   N:  90     Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  Fa  :  35   P:  30     1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Gd  :   9   Y:1340     Median :  0.00   Median : 25.00   Median :  0.00  
##  Po  :   7              Mean   : 94.24   Mean   : 46.66   Mean   : 21.95  
##  TA  :1326              3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00  
##  NA's:  81              Max.   :857.00   Max.   :547.00   Max.   :552.00  
##                                                                           
##    X3SsnPorch      ScreenPorch        PoolArea        PoolQC    
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Ex  :   2  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   Fa  :   2  
##  Median :  0.00   Median :  0.00   Median :  0.000   Gd  :   3  
##  Mean   :  3.41   Mean   : 15.06   Mean   :  2.759   NA's:1453  
##  3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000              
##  Max.   :508.00   Max.   :480.00   Max.   :738.000              
##                                                                 
##    Fence      MiscFeature    MiscVal             MoSold      
##  GdPrv:  59   Gar2:   2   Min.   :    0.00   Min.   : 1.000  
##  GdWo :  54   Othr:   2   1st Qu.:    0.00   1st Qu.: 5.000  
##  MnPrv: 157   Shed:  49   Median :    0.00   Median : 6.000  
##  MnWw :  11   TenC:   1   Mean   :   43.49   Mean   : 6.322  
##  NA's :1179   NA's:1406   3rd Qu.:    0.00   3rd Qu.: 8.000  
##                           Max.   :15500.00   Max.   :12.000  
##                                                              
##      YrSold        SaleType    SaleCondition    SalePrice     
##  Min.   :2006   WD     :1267   Abnorml: 101   Min.   : 34900  
##  1st Qu.:2007   New    : 122   AdjLand:   4   1st Qu.:129975  
##  Median :2008   COD    :  43   Alloca :  12   Median :163000  
##  Mean   :2008   ConLD  :   9   Family :  20   Mean   :180921  
##  3rd Qu.:2009   ConLI  :   5   Normal :1198   3rd Qu.:214000  
##  Max.   :2010   ConLw  :   5   Partial: 125   Max.   :755000  
##                 (Other):   9

We have 81 variables and 1460 observations in the training set, where SalePrice is the response variable.

Check Data Type

By reading the data description, we know that the variable MSSubClass is a categorical variable identifies the type of dwelling involved in the sale. Thus, we need to change its data type for both training set and testing set.

3.3 Descriptive and Inferential Statistics

3.3.1 Plots

  • summary of training set
##        Id           MSSubClass     MSZoning     LotFrontage    
##  Min.   :   1.0   20     :536   C (all):  10   Min.   : 21.00  
##  1st Qu.: 365.8   60     :299   FV     :  65   1st Qu.: 59.00  
##  Median : 730.5   50     :144   RH     :  16   Median : 69.00  
##  Mean   : 730.5   120    : 87   RL     :1151   Mean   : 70.05  
##  3rd Qu.:1095.2   30     : 69   RM     : 218   3rd Qu.: 80.00  
##  Max.   :1460.0   160    : 63                  Max.   :313.00  
##                   (Other):262                  NA's   :259     
##     LotArea        Street      Alley      LotShape  LandContour
##  Min.   :  1300   Grvl:   6   Grvl:  50   IR1:484   Bnk:  63   
##  1st Qu.:  7554   Pave:1454   Pave:  41   IR2: 41   HLS:  50   
##  Median :  9478               NA's:1369   IR3: 10   Low:  36   
##  Mean   : 10517                           Reg:925   Lvl:1311   
##  3rd Qu.: 11602                                                
##  Max.   :215245                                                
##                                                                
##   Utilities      LotConfig    LandSlope   Neighborhood   Condition1  
##  AllPub:1459   Corner : 263   Gtl:1382   NAmes  :225   Norm   :1260  
##  NoSeWa:   1   CulDSac:  94   Mod:  65   CollgCr:150   Feedr  :  81  
##                FR2    :  47   Sev:  13   OldTown:113   Artery :  48  
##                FR3    :   4              Edwards:100   RRAn   :  26  
##                Inside :1052              Somerst: 86   PosN   :  19  
##                                          Gilbert: 79   RRAe   :  11  
##                                          (Other):707   (Other):  15  
##    Condition2     BldgType      HouseStyle   OverallQual    
##  Norm   :1445   1Fam  :1220   1Story :726   Min.   : 1.000  
##  Feedr  :   6   2fmCon:  31   2Story :445   1st Qu.: 5.000  
##  Artery :   2   Duplex:  52   1.5Fin :154   Median : 6.000  
##  PosN   :   2   Twnhs :  43   SLvl   : 65   Mean   : 6.099  
##  RRNn   :   2   TwnhsE: 114   SFoyer : 37   3rd Qu.: 7.000  
##  PosA   :   1                 1.5Unf : 14   Max.   :10.000  
##  (Other):   2                 (Other): 19                   
##   OverallCond      YearBuilt     YearRemodAdd    RoofStyle   
##  Min.   :1.000   Min.   :1872   Min.   :1950   Flat   :  13  
##  1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967   Gable  :1141  
##  Median :5.000   Median :1973   Median :1994   Gambrel:  11  
##  Mean   :5.575   Mean   :1971   Mean   :1985   Hip    : 286  
##  3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004   Mansard:   7  
##  Max.   :9.000   Max.   :2010   Max.   :2010   Shed   :   2  
##                                                              
##     RoofMatl     Exterior1st   Exterior2nd    MasVnrType    MasVnrArea    
##  CompShg:1434   VinylSd:515   VinylSd:504   BrkCmn : 15   Min.   :   0.0  
##  Tar&Grv:  11   HdBoard:222   MetalSd:214   BrkFace:445   1st Qu.:   0.0  
##  WdShngl:   6   MetalSd:220   HdBoard:207   None   :864   Median :   0.0  
##  WdShake:   5   Wd Sdng:206   Wd Sdng:197   Stone  :128   Mean   : 103.7  
##  ClyTile:   1   Plywood:108   Plywood:142   NA's   :  8   3rd Qu.: 166.0  
##  Membran:   1   CemntBd: 61   CmentBd: 60                 Max.   :1600.0  
##  (Other):   2   (Other):128   (Other):136                 NA's   :8       
##  ExterQual ExterCond  Foundation  BsmtQual   BsmtCond    BsmtExposure
##  Ex: 52    Ex:   3   BrkTil:146   Ex  :121   Fa  :  45   Av  :221    
##  Fa: 14    Fa:  28   CBlock:634   Fa  : 35   Gd  :  65   Gd  :134    
##  Gd:488    Gd: 146   PConc :647   Gd  :618   Po  :   2   Mn  :114    
##  TA:906    Po:   1   Slab  : 24   TA  :649   TA  :1311   No  :953    
##            TA:1282   Stone :  6   NA's: 37   NA's:  37   NA's: 38    
##                      Wood  :  3                                      
##                                                                      
##  BsmtFinType1   BsmtFinSF1     BsmtFinType2   BsmtFinSF2     
##  ALQ :220     Min.   :   0.0   ALQ :  19    Min.   :   0.00  
##  BLQ :148     1st Qu.:   0.0   BLQ :  33    1st Qu.:   0.00  
##  GLQ :418     Median : 383.5   GLQ :  14    Median :   0.00  
##  LwQ : 74     Mean   : 443.6   LwQ :  46    Mean   :  46.55  
##  Rec :133     3rd Qu.: 712.2   Rec :  54    3rd Qu.:   0.00  
##  Unf :430     Max.   :5644.0   Unf :1256    Max.   :1474.00  
##  NA's: 37                      NA's:  38                     
##    BsmtUnfSF       TotalBsmtSF      Heating     HeatingQC CentralAir
##  Min.   :   0.0   Min.   :   0.0   Floor:   1   Ex:741    N:  95    
##  1st Qu.: 223.0   1st Qu.: 795.8   GasA :1428   Fa: 49    Y:1365    
##  Median : 477.5   Median : 991.5   GasW :  18   Gd:241              
##  Mean   : 567.2   Mean   :1057.4   Grav :   7   Po:  1              
##  3rd Qu.: 808.0   3rd Qu.:1298.2   OthW :   2   TA:428              
##  Max.   :2336.0   Max.   :6110.0   Wall :   4                       
##                                                                     
##  Electrical     X1stFlrSF      X2ndFlrSF     LowQualFinSF    
##  FuseA:  94   Min.   : 334   Min.   :   0   Min.   :  0.000  
##  FuseF:  27   1st Qu.: 882   1st Qu.:   0   1st Qu.:  0.000  
##  FuseP:   3   Median :1087   Median :   0   Median :  0.000  
##  Mix  :   1   Mean   :1163   Mean   : 347   Mean   :  5.845  
##  SBrkr:1334   3rd Qu.:1391   3rd Qu.: 728   3rd Qu.:  0.000  
##  NA's :   1   Max.   :4692   Max.   :2065   Max.   :572.000  
##                                                              
##    GrLivArea     BsmtFullBath     BsmtHalfBath        FullBath    
##  Min.   : 334   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:1130   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :1464   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :1515   Mean   :0.4253   Mean   :0.05753   Mean   :1.565  
##  3rd Qu.:1777   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :5642   Max.   :3.0000   Max.   :2.00000   Max.   :3.000  
##                                                                   
##     HalfBath       BedroomAbvGr    KitchenAbvGr   KitchenQual
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Ex:100     
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Fa: 39     
##  Median :0.0000   Median :3.000   Median :1.000   Gd:586     
##  Mean   :0.3829   Mean   :2.866   Mean   :1.047   TA:735     
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000              
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000              
##                                                              
##   TotRmsAbvGrd    Functional    Fireplaces    FireplaceQu   GarageType 
##  Min.   : 2.000   Maj1:  14   Min.   :0.000   Ex  : 24    2Types :  6  
##  1st Qu.: 5.000   Maj2:   5   1st Qu.:0.000   Fa  : 33    Attchd :870  
##  Median : 6.000   Min1:  31   Median :1.000   Gd  :380    Basment: 19  
##  Mean   : 6.518   Min2:  34   Mean   :0.613   Po  : 20    BuiltIn: 88  
##  3rd Qu.: 7.000   Mod :  15   3rd Qu.:1.000   TA  :313    CarPort:  9  
##  Max.   :14.000   Sev :   1   Max.   :3.000   NA's:690    Detchd :387  
##                   Typ :1360                               NA's   : 81  
##   GarageYrBlt   GarageFinish   GarageCars      GarageArea     GarageQual 
##  Min.   :1900   Fin :352     Min.   :0.000   Min.   :   0.0   Ex  :   3  
##  1st Qu.:1961   RFn :422     1st Qu.:1.000   1st Qu.: 334.5   Fa  :  48  
##  Median :1980   Unf :605     Median :2.000   Median : 480.0   Gd  :  14  
##  Mean   :1979   NA's: 81     Mean   :1.767   Mean   : 473.0   Po  :   3  
##  3rd Qu.:2002                3rd Qu.:2.000   3rd Qu.: 576.0   TA  :1311  
##  Max.   :2010                Max.   :4.000   Max.   :1418.0   NA's:  81  
##  NA's   :81                                                              
##  GarageCond  PavedDrive   WoodDeckSF      OpenPorchSF     EnclosedPorch   
##  Ex  :   2   N:  90     Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  Fa  :  35   P:  30     1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00  
##  Gd  :   9   Y:1340     Median :  0.00   Median : 25.00   Median :  0.00  
##  Po  :   7              Mean   : 94.24   Mean   : 46.66   Mean   : 21.95  
##  TA  :1326              3rd Qu.:168.00   3rd Qu.: 68.00   3rd Qu.:  0.00  
##  NA's:  81              Max.   :857.00   Max.   :547.00   Max.   :552.00  
##                                                                           
##    X3SsnPorch      ScreenPorch        PoolArea        PoolQC    
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Ex  :   2  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000   Fa  :   2  
##  Median :  0.00   Median :  0.00   Median :  0.000   Gd  :   3  
##  Mean   :  3.41   Mean   : 15.06   Mean   :  2.759   NA's:1453  
##  3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000              
##  Max.   :508.00   Max.   :480.00   Max.   :738.000              
##                                                                 
##    Fence      MiscFeature    MiscVal             MoSold      
##  GdPrv:  59   Gar2:   2   Min.   :    0.00   Min.   : 1.000  
##  GdWo :  54   Othr:   2   1st Qu.:    0.00   1st Qu.: 5.000  
##  MnPrv: 157   Shed:  49   Median :    0.00   Median : 6.000  
##  MnWw :  11   TenC:   1   Mean   :   43.49   Mean   : 6.322  
##  NA's :1179   NA's:1406   3rd Qu.:    0.00   3rd Qu.: 8.000  
##                           Max.   :15500.00   Max.   :12.000  
##                                                              
##      YrSold        SaleType    SaleCondition    SalePrice     
##  Min.   :2006   WD     :1267   Abnorml: 101   Min.   : 34900  
##  1st Qu.:2007   New    : 122   AdjLand:   4   1st Qu.:129975  
##  Median :2008   COD    :  43   Alloca :  12   Median :163000  
##  Mean   :2008   ConLD  :   9   Family :  20   Mean   :180921  
##  3rd Qu.:2009   ConLI  :   5   Normal :1198   3rd Qu.:214000  
##  Max.   :2010   ConLw  :   5   Partial: 125   Max.   :755000  
##                 (Other):   9
  • SalePrice

It is our response variable. From the histogram, we can see that it is right skewed with most houses being sold below $200,000 and some between $200k to $400k.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   34900  129975  163000  180921  214000  755000

  • BldgType

Most of the building types are single-family detached with only little two-family convension, deplex, townhouse end unit and townhouse inside unit.

##   1Fam 2fmCon Duplex  Twnhs TwnhsE 
##   1220     31     52     43    114

  • HouseStyle

Most of the house style is one story and two story.

## 1.5Fin 1.5Unf 1Story 2.5Fin 2.5Unf 2Story SFoyer   SLvl 
##    154     14    726      8     11    445     37     65

  • OverallQual

The overall quality is between average and good.

##   1   2   3   4   5   6   7   8   9  10 
##   2   3  20 116 397 374 319 168  43  18

  • OverallCond

The overall condition of the houses are average.

##   1   2   3   4   5   6   7   8   9 
##   1   5  25  57 821 252 205  72  22

  • YearBuilt

Most of the houses are built after 1950s.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1872    1954    1973    1971    2000    2010

3.3.2 Scatterplot matrix and correlation matrix

From the plots below, we have the OverallQual being highly correlated to our dependent variable SalePrice with correlation coefficient 0.79. Our independent variable GrLivArea and FullBath are also correlated to our dependent variable SalePrice with correlation coefficients 0.71 and 0.56 respectively. These values make sense as the large the above ground living area and the more the full bath come with bigger house. And that the bigger the house, the higher the sale price.

3.3.3 Test the hypothesis

Limit to three quantitative variables. Test the hypothesis that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval.

  1. SalePrice vs GrLivArea

The p-value is nearly 0 and the 80% confidence interval does not include 0, so we reject the null hypothesis of true correlation equals zero.

## 
##  Pearson's product-moment correlation
## 
## data:  train$SalePrice and train$GrLivArea
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.6915087 0.7249450
## sample estimates:
##       cor 
## 0.7086245
  1. SalePrice vs OverallQual

The p-value is nearly 0 and the 80% confidence interval does not include 0, so we reject the null hypothesis of true correlation equals zero.

## 
##  Pearson's product-moment correlation
## 
## data:  train$SalePrice and train$OverallQual
## t = 49.364, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.7780752 0.8032204
## sample estimates:
##       cor 
## 0.7909816
  1. GrLivArea vs OverallQual

The p-value is nearly 0 and the 80% confidence interval does not include 0, so we reject the null hypothesis of true correlation equals zero.

## 
##  Pearson's product-moment correlation
## 
## data:  train$GrLivArea and train$OverallQual
## t = 28.121, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.5708061 0.6143422
## sample estimates:
##       cor 
## 0.5930074

Although these variables are highly correlated to each other, this does not imply absolute causation between them, but it is commonly believed that larger house and better quality sells at high price. We can hope the independent variables can help us explain the response variable.

Familywise error rate (FWE or FWER) is the probability of a coming to at least one false conclusion in a series of hypothesis tests. It is the probability of making at least one Type I Error. It is also called alpha inflation or cumulative Type I error.

\(FWE \leq 1-(1-\alpha)^{c}\) where \(\alpha\) is the alpha level for an individual test (e.g. 0.5) and \(c\) is the number of comparisons/tests.

Thus, for this question, \(\alpha = 0.2\) for an 80% confidence interval and \(c=3\) for three variables used for the hypothesis test.

\[FWE \leq 1-(1-0.2)^{3} = 0.488\]

It means that we have about 48.8% chance of making at least one Type I Error across the three hypothesis tests.

I would not be worried as we can reduce the alpha, i.e. increase the confidence interval, for all three tests to reduce the familywise error rate.

3.4 Linear Algebra and Correlation

  1. Invert the correlation matrix from above. This is known as the precision matrix and contains variance inflation factors on the diagonal.
##             OverallQual  GrLivArea SalePrice
## OverallQual   2.6865350 -0.1753704 -2.000728
## GrLivArea    -0.1753704  2.0200794 -1.292763
## SalePrice    -2.0007280 -1.2927630  3.498623
  1. Multiply the correlation matrix by the precision matrix.
##             OverallQual GrLivArea SalePrice
## OverallQual           1         0         0
## GrLivArea             0         1         0
## SalePrice             0         0         1
  1. Multiply the precision matrix by the correlation matrix.
##             OverallQual GrLivArea SalePrice
## OverallQual           1         0         0
## GrLivArea             0         1         0
## SalePrice             0         0         1
  1. Comduct LU decomposition on the matrix
## $L
##           [,1]      [,2] [,3]
## [1,] 1.0000000 0.0000000    0
## [2,] 0.5930074 1.0000000    0
## [3,] 0.7909816 0.3695063    1
## 
## $U
##      [,1]          [,2]      [,3]
## [1,]    1  5.930074e-01 0.7909816
## [2,]    0  6.483422e-01 0.2395665
## [3,]    0 -2.775558e-17 0.2858268

check the result:

##             OverallQual GrLivArea SalePrice
## OverallQual           0         0         0
## GrLivArea             0         0         0
## SalePrice             0         0         0

3.5 Calculus-Based Probability & Statistics

To find a variable that is skewed to the right from the training set, first choose one that does not have NA values and study their skewness.

## [1] 12.18262
## [1] 1.682041
## [1] 4.246521
## [1] 0.9183784
## [1] 1.521124
## [1] 1.373929
## [1] 0.81136
## [1] 0.1796113
## [1] 1.53821
## [1] 2.359486
## [1] 3.083526
## [1] 10.28318
## [1] 4.113747
## [1] 14.79792
##            Id    MSSubClass      MSZoning   LotFrontage       LotArea 
##             0             0             0            NA             0 
##        Street         Alley      LotShape   LandContour     Utilities 
##             0            NA             0             0             0 
##     LotConfig     LandSlope  Neighborhood    Condition1    Condition2 
##             0             0             0             0             0 
##      BldgType    HouseStyle   OverallQual   OverallCond     YearBuilt 
##             0             0             0             0             0 
##  YearRemodAdd     RoofStyle      RoofMatl   Exterior1st   Exterior2nd 
##             0             0             0             0             0 
##    MasVnrType    MasVnrArea     ExterQual     ExterCond    Foundation 
##            NA            NA             0             0             0 
##      BsmtQual      BsmtCond  BsmtExposure  BsmtFinType1    BsmtFinSF1 
##            NA            NA            NA            NA           467 
##  BsmtFinType2    BsmtFinSF2     BsmtUnfSF   TotalBsmtSF       Heating 
##            NA          1293           118            37             0 
##     HeatingQC    CentralAir    Electrical     X1stFlrSF     X2ndFlrSF 
##             0             0            NA             0           829 
##  LowQualFinSF     GrLivArea  BsmtFullBath  BsmtHalfBath      FullBath 
##          1434             0           856          1378             9 
##      HalfBath  BedroomAbvGr  KitchenAbvGr   KitchenQual  TotRmsAbvGrd 
##           913             6             1             0             0 
##    Functional    Fireplaces   FireplaceQu    GarageType   GarageYrBlt 
##             0           690            NA            NA            NA 
##  GarageFinish    GarageCars    GarageArea    GarageQual    GarageCond 
##            NA            81            81            NA            NA 
##    PavedDrive    WoodDeckSF   OpenPorchSF EnclosedPorch    X3SsnPorch 
##             0           761           656          1252          1436 
##   ScreenPorch      PoolArea        PoolQC         Fence   MiscFeature 
##          1344          1453            NA            NA            NA 
##       MiscVal        MoSold        YrSold      SaleType SaleCondition 
##          1408             0             0             0             0 
##     SalePrice 
##             0

By looking at the training set, PoolArea have high skewness but there are many 0s (1453). It happens to many other variables too.

Therefore, I will pick one with less 0s and reasonable skewness, the BsmtFinSF1. Adding 1 to the datapoints so that the minimum value is absolutely above zero.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0   383.5   443.6   712.2  5644.0

Next, find the 5th and 95th percentiles using the cumulative distribution function.

## [1]   22.80704 1332.02158

The 5th percentile of the exponential distribution at the optimal lambda is 22.75574. The 95th percentile is 1329.02585.

Also, generate a 95% confidence interval from the empirical data, assuming normality.

## [1] -449.2961 1338.5756

Assuming the data is normally distributed (which is clearly not the case), the 95% confidence interval for the data is (-449.2961, 1338.5756). This result proves that the distribution is not normal as it does not make sense for us to have negative values from our data.

Finally, provide the empirical 5th percentile and 95th percentile of the data.

##   5%  95% 
##    1 1275

The 5th and 95th quantiles are 1 and 1275 respectively. We have 1 as our 5th because we added 1 to the datapoints so that our minimum value is absolutely above zero. To show the true data, the 5th quantile should deduct by 1.

Discuss

As the variable BsmtFinSF1 has some 0s in the training set, these datapoints provides a large portion of the right-skewed nature. It may be because some houses do not have basements or livable basement to be counted into the dataset.

3.6 Modeling

Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.

1st model:

I created the 1st model by removing all categorical variables.

# 1st model: dropping categorical variables
train1 <- train %>% dplyr::select(-Id, -MSSubClass, -MSZoning, -Street, -Alley, -LotShape, -LandContour, -Utilities, -LotConfig, -LandSlope, -Neighborhood, -Condition1, -Condition2, -BldgType, -HouseStyle, -RoofStyle, -RoofMatl, -Exterior1st, -Exterior2nd, -MasVnrType, -ExterQual, -ExterCond, -Foundation, -BsmtQual, -BsmtCond, -BsmtExposure, -BsmtFinType1, -BsmtFinType2, -Heating, -HeatingQC, -CentralAir, -Electrical, -KitchenQual, -Functional, -FireplaceQu, -GarageType, -GarageFinish, -GarageQual, -GarageCond, -PavedDrive, -PoolQC, -Fence, -MiscFeature, -SaleType, -SaleCondition, -GarageYrBlt)

test1 <- test %>% dplyr::select(-Id, -MSSubClass, -MSZoning, -Street, -Alley, -LotShape, -LandContour, -Utilities, -LotConfig, -LandSlope, -Neighborhood, -Condition1, -Condition2, -BldgType, -HouseStyle, -RoofStyle, -RoofMatl, -Exterior1st, -Exterior2nd, -MasVnrType, -ExterQual, -ExterCond, -Foundation, -BsmtQual, -BsmtCond, -BsmtExposure, -BsmtFinType1, -BsmtFinType2, -Heating, -HeatingQC, -CentralAir, -Electrical, -KitchenQual, -Functional, -FireplaceQu, -GarageType, -GarageFinish, -GarageQual, -GarageCond, -PavedDrive, -PoolQC, -Fence, -MiscFeature, -SaleType, -SaleCondition, -GarageYrBlt)
test1$BsmtFinSF1[is.na(test1$BsmtFinSF1)] <- 0
test1$BsmtFinSF2[is.na(test1$BsmtFinSF2)] <- 0
test1$BsmtUnfSF[is.na(test1$BsmtUnfSF)] <- 0
test1$TotalBsmtSF[is.na(test1$TotalBsmtSF)] <- 0
test1$BsmtFullBath[is.na(test$BsmtFullBath)] <- 0
test1$BsmtHalfBath[is.na(test1$BsmtHalfBath)] <- 0
test1$GarageCars[is.na(test1$GarageCars)] <- 0
test1$GarageArea[is.na(test1$GarageArea)] <- 0
## 
## Call:
## lm(formula = SalePrice ~ ., data = train1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -498568  -16541   -2102   13641  308685 
## 
## Coefficients: (2 not defined because of singularities)
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.617e+05  1.431e+06   0.113 0.910027    
## LotFrontage    5.319e+01  2.858e+01   1.861 0.062979 .  
## LotArea        4.523e-01  1.018e-01   4.444 9.53e-06 ***
## OverallQual    1.650e+04  1.199e+03  13.756  < 2e-16 ***
## OverallCond    4.792e+03  1.037e+03   4.621 4.16e-06 ***
## YearBuilt      3.150e+02  6.172e+01   5.104 3.78e-07 ***
## YearRemodAdd   1.616e+02  6.696e+01   2.414 0.015901 *  
## MasVnrArea     2.928e+01  6.009e+00   4.873 1.22e-06 ***
## BsmtFinSF1     2.016e+01  4.731e+00   4.261 2.17e-05 ***
## BsmtFinSF2     9.052e+00  7.151e+00   1.266 0.205801    
## BsmtUnfSF      1.089e+01  4.251e+00   2.562 0.010523 *  
## TotalBsmtSF           NA         NA      NA       NA    
## X1stFlrSF      4.920e+01  5.846e+00   8.417  < 2e-16 ***
## X2ndFlrSF      4.149e+01  4.919e+00   8.436  < 2e-16 ***
## LowQualFinSF   1.853e+01  1.991e+01   0.930 0.352294    
## GrLivArea             NA         NA      NA       NA    
## BsmtFullBath   7.924e+03  2.637e+03   3.005 0.002706 ** 
## BsmtHalfBath   4.056e+02  4.139e+03   0.098 0.921947    
## FullBath       3.128e+03  2.851e+03   1.097 0.272708    
## HalfBath      -1.417e+03  2.701e+03  -0.525 0.599930    
## BedroomAbvGr  -9.047e+03  1.703e+03  -5.312 1.26e-07 ***
## KitchenAbvGr  -2.435e+04  4.931e+03  -4.939 8.77e-07 ***
## TotRmsAbvGrd   5.838e+03  1.249e+03   4.674 3.23e-06 ***
## Fireplaces     3.665e+03  1.795e+03   2.042 0.041355 *  
## GarageCars     1.045e+04  2.894e+03   3.613 0.000313 ***
## GarageArea     2.527e+00  9.829e+00   0.257 0.797112    
## WoodDeckSF     2.618e+01  8.102e+00   3.232 0.001258 ** 
## OpenPorchSF    1.074e+00  1.537e+01   0.070 0.944302    
## EnclosedPorch  1.334e+01  1.709e+01   0.780 0.435294    
## X3SsnPorch     2.378e+01  3.179e+01   0.748 0.454577    
## ScreenPorch    5.475e+01  1.742e+01   3.142 0.001711 ** 
## PoolArea      -4.099e+01  2.401e+01  -1.707 0.088025 .  
## MiscVal       -1.873e-01  1.885e+00  -0.099 0.920862    
## MoSold         2.402e+01  3.494e+02   0.069 0.945196    
## YrSold        -5.818e+02  7.113e+02  -0.818 0.413563    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35240 on 1427 degrees of freedom
## Multiple R-squared:  0.8075, Adjusted R-squared:  0.8032 
## F-statistic: 187.1 on 32 and 1427 DF,  p-value: < 2.2e-16

The R-square of our model1 is 0.8075, and the adjusted R-squared is 0.8032.

The residuals are randomly dispersed around y=0 with some outliers.

The QQ plot also shows some outliers at both end of the graph.

The histogram is unimodal and fairly normal.

2nd model:

## Start:  AIC=30604.98
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     GrLivArea + BsmtFullBath + BsmtHalfBath + FullBath + HalfBath + 
##     BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + GarageArea + WoodDeckSF + OpenPorchSF + EnclosedPorch + 
##     X3SsnPorch + ScreenPorch + PoolArea + MiscVal + MoSold + 
##     YrSold
## 
## 
## Step:  AIC=30604.98
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + TotalBsmtSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + 
##     BsmtFullBath + BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr + 
##     KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + 
##     WoodDeckSF + OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold
## 
## 
## Step:  AIC=30604.98
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + MoSold + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - MoSold         1 5.8708e+06 1.7723e+12 30603
## - OpenPorchSF    1 6.0643e+06 1.7723e+12 30603
## - BsmtHalfBath   1 1.1928e+07 1.7723e+12 30603
## - MiscVal        1 1.2263e+07 1.7723e+12 30603
## - GarageArea     1 8.2117e+07 1.7724e+12 30603
## - HalfBath       1 3.4182e+08 1.7726e+12 30603
## - X3SsnPorch     1 6.9491e+08 1.7730e+12 30604
## - EnclosedPorch  1 7.5636e+08 1.7731e+12 30604
## - YrSold         1 8.3079e+08 1.7731e+12 30604
## - LowQualFinSF   1 1.0752e+09 1.7734e+12 30604
## - FullBath       1 1.4953e+09 1.7738e+12 30604
## - BsmtFinSF2     1 1.9899e+09 1.7743e+12 30605
## <none>                        1.7723e+12 30605
## - PoolArea       1 3.6193e+09 1.7759e+12 30606
## - LotFrontage    1 4.3004e+09 1.7766e+12 30607
## - Fireplaces     1 5.1777e+09 1.7775e+12 30607
## - YearRemodAdd   1 7.2378e+09 1.7795e+12 30609
## - BsmtUnfSF      1 8.1492e+09 1.7804e+12 30610
## - BsmtFullBath   1 1.1212e+10 1.7835e+12 30612
## - ScreenPorch    1 1.2263e+10 1.7846e+12 30613
## - WoodDeckSF     1 1.2972e+10 1.7853e+12 30614
## - GarageCars     1 1.6210e+10 1.7885e+12 30616
## - BsmtFinSF1     1 2.2551e+10 1.7948e+12 30621
## - LotArea        1 2.4522e+10 1.7968e+12 30623
## - OverallCond    1 2.6522e+10 1.7988e+12 30625
## - TotRmsAbvGrd   1 2.7132e+10 1.7994e+12 30625
## - MasVnrArea     1 2.9489e+10 1.8018e+12 30627
## - KitchenAbvGr   1 3.0299e+10 1.8026e+12 30628
## - YearBuilt      1 3.2352e+10 1.8047e+12 30629
## - BedroomAbvGr   1 3.5047e+10 1.8073e+12 30632
## - X1stFlrSF      1 8.7983e+10 1.8603e+12 30674
## - X2ndFlrSF      1 8.8390e+10 1.8607e+12 30674
## - OverallQual    1 2.3503e+11 2.0073e+12 30785
## 
## Step:  AIC=30602.98
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     OpenPorchSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + MiscVal + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - OpenPorchSF    1 6.7525e+06 1.7723e+12 30601
## - MiscVal        1 1.2276e+07 1.7723e+12 30601
## - BsmtHalfBath   1 1.2428e+07 1.7723e+12 30601
## - GarageArea     1 8.1743e+07 1.7724e+12 30601
## - HalfBath       1 3.4577e+08 1.7727e+12 30601
## - X3SsnPorch     1 6.9988e+08 1.7730e+12 30602
## - EnclosedPorch  1 7.5278e+08 1.7731e+12 30602
## - YrSold         1 8.6929e+08 1.7732e+12 30602
## - LowQualFinSF   1 1.0716e+09 1.7734e+12 30602
## - FullBath       1 1.4946e+09 1.7738e+12 30602
## - BsmtFinSF2     1 1.9862e+09 1.7743e+12 30603
## <none>                        1.7723e+12 30603
## - PoolArea       1 3.6432e+09 1.7759e+12 30604
## - LotFrontage    1 4.3061e+09 1.7766e+12 30605
## + MoSold         1 5.8708e+06 1.7723e+12 30605
## - Fireplaces     1 5.1911e+09 1.7775e+12 30605
## - YearRemodAdd   1 7.2388e+09 1.7795e+12 30607
## - BsmtUnfSF      1 8.1433e+09 1.7804e+12 30608
## - BsmtFullBath   1 1.1213e+10 1.7835e+12 30610
## - ScreenPorch    1 1.2278e+10 1.7846e+12 30611
## - WoodDeckSF     1 1.2996e+10 1.7853e+12 30612
## - GarageCars     1 1.6217e+10 1.7885e+12 30614
## - BsmtFinSF1     1 2.2550e+10 1.7949e+12 30619
## - LotArea        1 2.4517e+10 1.7968e+12 30621
## - OverallCond    1 2.6517e+10 1.7988e+12 30623
## - TotRmsAbvGrd   1 2.7154e+10 1.7995e+12 30623
## - MasVnrArea     1 2.9496e+10 1.8018e+12 30625
## - KitchenAbvGr   1 3.0327e+10 1.8026e+12 30626
## - YearBuilt      1 3.2351e+10 1.8047e+12 30627
## - BedroomAbvGr   1 3.5081e+10 1.8074e+12 30630
## - X1stFlrSF      1 8.8045e+10 1.8604e+12 30672
## - X2ndFlrSF      1 8.8457e+10 1.8608e+12 30672
## - OverallQual    1 2.3637e+11 2.0087e+12 30784
## 
## Step:  AIC=30600.99
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     BsmtHalfBath + FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + GarageArea + WoodDeckSF + 
##     EnclosedPorch + X3SsnPorch + ScreenPorch + PoolArea + MiscVal + 
##     YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - BsmtHalfBath   1 1.2451e+07 1.7723e+12 30599
## - MiscVal        1 1.2546e+07 1.7723e+12 30599
## - GarageArea     1 8.6407e+07 1.7724e+12 30599
## - HalfBath       1 3.4012e+08 1.7727e+12 30599
## - X3SsnPorch     1 6.9681e+08 1.7730e+12 30600
## - EnclosedPorch  1 7.4630e+08 1.7731e+12 30600
## - YrSold         1 8.8057e+08 1.7732e+12 30600
## - LowQualFinSF   1 1.0741e+09 1.7734e+12 30600
## - FullBath       1 1.5163e+09 1.7738e+12 30600
## - BsmtFinSF2     1 2.0023e+09 1.7743e+12 30601
## <none>                        1.7723e+12 30601
## - PoolArea       1 3.6411e+09 1.7760e+12 30602
## - LotFrontage    1 4.3016e+09 1.7766e+12 30603
## + OpenPorchSF    1 6.7525e+06 1.7723e+12 30603
## + MoSold         1 6.5590e+06 1.7723e+12 30603
## - Fireplaces     1 5.1894e+09 1.7775e+12 30603
## - YearRemodAdd   1 7.2834e+09 1.7796e+12 30605
## - BsmtUnfSF      1 8.2551e+09 1.7806e+12 30606
## - BsmtFullBath   1 1.1258e+10 1.7836e+12 30608
## - ScreenPorch    1 1.2305e+10 1.7846e+12 30609
## - WoodDeckSF     1 1.3017e+10 1.7853e+12 30610
## - GarageCars     1 1.6262e+10 1.7886e+12 30612
## - BsmtFinSF1     1 2.2672e+10 1.7950e+12 30618
## - LotArea        1 2.4519e+10 1.7968e+12 30619
## - OverallCond    1 2.6517e+10 1.7988e+12 30621
## - TotRmsAbvGrd   1 2.7148e+10 1.7995e+12 30621
## - MasVnrArea     1 2.9541e+10 1.8019e+12 30623
## - KitchenAbvGr   1 3.0465e+10 1.8028e+12 30624
## - YearBuilt      1 3.2356e+10 1.8047e+12 30625
## - BedroomAbvGr   1 3.5173e+10 1.8075e+12 30628
## - X1stFlrSF      1 8.8301e+10 1.8606e+12 30670
## - X2ndFlrSF      1 8.9277e+10 1.8616e+12 30671
## - OverallQual    1 2.3682e+11 2.0091e+12 30782
## 
## Step:  AIC=30599
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + EnclosedPorch + 
##     X3SsnPorch + ScreenPorch + PoolArea + MiscVal + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - MiscVal        1 1.3300e+07 1.7723e+12 30597
## - GarageArea     1 8.4752e+07 1.7724e+12 30597
## - HalfBath       1 3.4363e+08 1.7727e+12 30597
## - X3SsnPorch     1 7.0512e+08 1.7730e+12 30598
## - EnclosedPorch  1 7.4911e+08 1.7731e+12 30598
## - YrSold         1 8.9026e+08 1.7732e+12 30598
## - LowQualFinSF   1 1.0749e+09 1.7734e+12 30598
## - FullBath       1 1.5047e+09 1.7738e+12 30598
## - BsmtFinSF2     1 2.0608e+09 1.7744e+12 30599
## <none>                        1.7723e+12 30599
## - PoolArea       1 3.6402e+09 1.7760e+12 30600
## - LotFrontage    1 4.2920e+09 1.7766e+12 30601
## + BsmtHalfBath   1 1.2451e+07 1.7723e+12 30601
## + MoSold         1 7.0879e+06 1.7723e+12 30601
## + OpenPorchSF    1 6.7752e+06 1.7723e+12 30601
## - Fireplaces     1 5.2060e+09 1.7775e+12 30601
## - YearRemodAdd   1 7.3280e+09 1.7797e+12 30603
## - BsmtUnfSF      1 8.2431e+09 1.7806e+12 30604
## - BsmtFullBath   1 1.2070e+10 1.7844e+12 30607
## - ScreenPorch    1 1.2321e+10 1.7846e+12 30607
## - WoodDeckSF     1 1.3086e+10 1.7854e+12 30608
## - GarageCars     1 1.6311e+10 1.7886e+12 30610
## - BsmtFinSF1     1 2.3280e+10 1.7956e+12 30616
## - LotArea        1 2.4599e+10 1.7969e+12 30617
## - OverallCond    1 2.6732e+10 1.7991e+12 30619
## - TotRmsAbvGrd   1 2.7136e+10 1.7995e+12 30619
## - MasVnrArea     1 2.9606e+10 1.8019e+12 30621
## - KitchenAbvGr   1 3.0454e+10 1.8028e+12 30622
## - YearBuilt      1 3.2400e+10 1.8047e+12 30623
## - BedroomAbvGr   1 3.5350e+10 1.8077e+12 30626
## - X1stFlrSF      1 8.8296e+10 1.8606e+12 30668
## - X2ndFlrSF      1 8.9270e+10 1.8616e+12 30669
## - OverallQual    1 2.3684e+11 2.0092e+12 30780
## 
## Step:  AIC=30597.01
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + GarageArea + WoodDeckSF + EnclosedPorch + 
##     X3SsnPorch + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - GarageArea     1 8.3027e+07 1.7724e+12 30595
## - HalfBath       1 3.4274e+08 1.7727e+12 30595
## - X3SsnPorch     1 7.0411e+08 1.7730e+12 30596
## - EnclosedPorch  1 7.4450e+08 1.7731e+12 30596
## - YrSold         1 8.9086e+08 1.7732e+12 30596
## - LowQualFinSF   1 1.0775e+09 1.7734e+12 30596
## - FullBath       1 1.5088e+09 1.7738e+12 30596
## - BsmtFinSF2     1 2.0559e+09 1.7744e+12 30597
## <none>                        1.7723e+12 30597
## - PoolArea       1 3.6578e+09 1.7760e+12 30598
## - LotFrontage    1 4.3400e+09 1.7767e+12 30599
## + MiscVal        1 1.3300e+07 1.7723e+12 30599
## + BsmtHalfBath   1 1.3205e+07 1.7723e+12 30599
## + MoSold         1 7.1344e+06 1.7723e+12 30599
## + OpenPorchSF    1 7.0546e+06 1.7723e+12 30599
## - Fireplaces     1 5.2028e+09 1.7775e+12 30599
## - YearRemodAdd   1 7.3278e+09 1.7797e+12 30601
## - BsmtUnfSF      1 8.2335e+09 1.7806e+12 30602
## - BsmtFullBath   1 1.2127e+10 1.7845e+12 30605
## - ScreenPorch    1 1.2307e+10 1.7846e+12 30605
## - WoodDeckSF     1 1.3091e+10 1.7854e+12 30606
## - GarageCars     1 1.6373e+10 1.7887e+12 30608
## - BsmtFinSF1     1 2.3271e+10 1.7956e+12 30614
## - LotArea        1 2.4602e+10 1.7969e+12 30615
## - OverallCond    1 2.6761e+10 1.7991e+12 30617
## - TotRmsAbvGrd   1 2.7147e+10 1.7995e+12 30617
## - MasVnrArea     1 2.9644e+10 1.8020e+12 30619
## - KitchenAbvGr   1 3.0663e+10 1.8030e+12 30620
## - YearBuilt      1 3.2393e+10 1.8047e+12 30622
## - BedroomAbvGr   1 3.5340e+10 1.8077e+12 30624
## - X1stFlrSF      1 8.8501e+10 1.8608e+12 30666
## - X2ndFlrSF      1 8.9308e+10 1.8616e+12 30667
## - OverallQual    1 2.3691e+11 2.0092e+12 30778
## 
## Step:  AIC=30595.08
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + HalfBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + 
##     Fireplaces + GarageCars + WoodDeckSF + EnclosedPorch + X3SsnPorch + 
##     ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - HalfBath       1 3.6575e+08 1.7728e+12 30593
## - X3SsnPorch     1 7.0155e+08 1.7731e+12 30594
## - EnclosedPorch  1 7.5684e+08 1.7732e+12 30594
## - YrSold         1 8.8200e+08 1.7733e+12 30594
## - LowQualFinSF   1 1.1015e+09 1.7735e+12 30594
## - FullBath       1 1.4616e+09 1.7739e+12 30594
## - BsmtFinSF2     1 2.0774e+09 1.7745e+12 30595
## <none>                        1.7724e+12 30595
## - PoolArea       1 3.6306e+09 1.7761e+12 30596
## - LotFrontage    1 4.4372e+09 1.7769e+12 30597
## + GarageArea     1 8.3027e+07 1.7723e+12 30597
## + OpenPorchSF    1 1.1678e+07 1.7724e+12 30597
## + MiscVal        1 1.1575e+07 1.7724e+12 30597
## + BsmtHalfBath   1 1.1466e+07 1.7724e+12 30597
## + MoSold         1 6.8991e+06 1.7724e+12 30597
## - Fireplaces     1 5.1203e+09 1.7775e+12 30597
## - YearRemodAdd   1 7.2728e+09 1.7797e+12 30599
## - BsmtUnfSF      1 8.3063e+09 1.7807e+12 30600
## - BsmtFullBath   1 1.2149e+10 1.7846e+12 30603
## - ScreenPorch    1 1.2304e+10 1.7847e+12 30603
## - WoodDeckSF     1 1.3092e+10 1.7855e+12 30604
## - BsmtFinSF1     1 2.3623e+10 1.7960e+12 30612
## - LotArea        1 2.4683e+10 1.7971e+12 30613
## - OverallCond    1 2.7012e+10 1.7994e+12 30615
## - TotRmsAbvGrd   1 2.7094e+10 1.7995e+12 30615
## - MasVnrArea     1 2.9856e+10 1.8023e+12 30618
## - KitchenAbvGr   1 3.0843e+10 1.8033e+12 30618
## - YearBuilt      1 3.2650e+10 1.8051e+12 30620
## - BedroomAbvGr   1 3.5599e+10 1.8080e+12 30622
## - GarageCars     1 5.1346e+10 1.8238e+12 30635
## - X1stFlrSF      1 9.0928e+10 1.8633e+12 30666
## - X2ndFlrSF      1 9.1217e+10 1.8636e+12 30666
## - OverallQual    1 2.3683e+11 2.0092e+12 30776
## 
## Step:  AIC=30593.38
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + WoodDeckSF + EnclosedPorch + X3SsnPorch + ScreenPorch + 
##     PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - X3SsnPorch     1 6.8111e+08 1.7735e+12 30592
## - EnclosedPorch  1 7.8997e+08 1.7736e+12 30592
## - YrSold         1 8.9849e+08 1.7737e+12 30592
## - LowQualFinSF   1 1.0980e+09 1.7739e+12 30592
## - BsmtFinSF2     1 2.0078e+09 1.7748e+12 30593
## - FullBath       1 2.3840e+09 1.7752e+12 30593
## <none>                        1.7728e+12 30593
## - PoolArea       1 3.5344e+09 1.7763e+12 30594
## + HalfBath       1 3.6575e+08 1.7724e+12 30595
## - LotFrontage    1 4.5395e+09 1.7773e+12 30595
## + GarageArea     1 1.0604e+08 1.7727e+12 30595
## + BsmtHalfBath   1 1.4680e+07 1.7728e+12 30595
## + MoSold         1 1.0763e+07 1.7728e+12 30595
## + MiscVal        1 1.0508e+07 1.7728e+12 30595
## + OpenPorchSF    1 3.6677e+06 1.7728e+12 30595
## - Fireplaces     1 4.9296e+09 1.7777e+12 30595
## - YearRemodAdd   1 7.2378e+09 1.7800e+12 30597
## - BsmtUnfSF      1 8.3074e+09 1.7811e+12 30598
## - ScreenPorch    1 1.2114e+10 1.7849e+12 30601
## - BsmtFullBath   1 1.2391e+10 1.7852e+12 30602
## - WoodDeckSF     1 1.3084e+10 1.7859e+12 30602
## - BsmtFinSF1     1 2.3521e+10 1.7963e+12 30611
## - LotArea        1 2.4822e+10 1.7976e+12 30612
## - TotRmsAbvGrd   1 2.6941e+10 1.7997e+12 30613
## - OverallCond    1 2.7073e+10 1.7999e+12 30614
## - MasVnrArea     1 2.9725e+10 1.8025e+12 30616
## - KitchenAbvGr   1 3.0965e+10 1.8038e+12 30617
## - YearBuilt      1 3.3491e+10 1.8063e+12 30619
## - BedroomAbvGr   1 3.5505e+10 1.8083e+12 30620
## - GarageCars     1 5.1180e+10 1.8240e+12 30633
## - X1stFlrSF      1 9.0697e+10 1.8635e+12 30664
## - X2ndFlrSF      1 1.1235e+11 1.8851e+12 30681
## - OverallQual    1 2.3743e+11 2.0102e+12 30775
## 
## Step:  AIC=30591.94
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + WoodDeckSF + EnclosedPorch + ScreenPorch + PoolArea + 
##     YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - EnclosedPorch  1 7.4067e+08 1.7742e+12 30591
## - YrSold         1 8.6736e+08 1.7743e+12 30591
## - LowQualFinSF   1 1.1106e+09 1.7746e+12 30591
## - BsmtFinSF2     1 1.9338e+09 1.7754e+12 30592
## <none>                        1.7735e+12 30592
## - FullBath       1 2.4463e+09 1.7759e+12 30592
## - PoolArea       1 3.5630e+09 1.7770e+12 30593
## + X3SsnPorch     1 6.8111e+08 1.7728e+12 30593
## + HalfBath       1 3.4531e+08 1.7731e+12 30594
## - LotFrontage    1 4.5872e+09 1.7781e+12 30594
## + GarageArea     1 1.0250e+08 1.7734e+12 30594
## + BsmtHalfBath   1 2.3307e+07 1.7734e+12 30594
## + MoSold         1 1.6912e+07 1.7735e+12 30594
## + MiscVal        1 9.6788e+06 1.7735e+12 30594
## + OpenPorchSF    1 1.6283e+06 1.7735e+12 30594
## - Fireplaces     1 4.9102e+09 1.7784e+12 30594
## - YearRemodAdd   1 7.2740e+09 1.7807e+12 30596
## - BsmtUnfSF      1 8.2272e+09 1.7817e+12 30597
## - ScreenPorch    1 1.1889e+10 1.7854e+12 30600
## - BsmtFullBath   1 1.2352e+10 1.7858e+12 30600
## - WoodDeckSF     1 1.2776e+10 1.7862e+12 30600
## - BsmtFinSF1     1 2.3418e+10 1.7969e+12 30609
## - LotArea        1 2.4934e+10 1.7984e+12 30610
## - TotRmsAbvGrd   1 2.6738e+10 1.8002e+12 30612
## - OverallCond    1 2.7424e+10 1.8009e+12 30612
## - MasVnrArea     1 2.9731e+10 1.8032e+12 30614
## - KitchenAbvGr   1 3.1236e+10 1.8047e+12 30615
## - YearBuilt      1 3.3503e+10 1.8070e+12 30617
## - BedroomAbvGr   1 3.5666e+10 1.8091e+12 30619
## - GarageCars     1 5.1331e+10 1.8248e+12 30632
## - X1stFlrSF      1 9.1812e+10 1.8653e+12 30664
## - X2ndFlrSF      1 1.1272e+11 1.8862e+12 30680
## - OverallQual    1 2.3711e+11 2.0106e+12 30773
## 
## Step:  AIC=30590.55
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + WoodDeckSF + ScreenPorch + PoolArea + YrSold
## 
##                 Df  Sum of Sq        RSS   AIC
## - YrSold         1 8.7772e+08 1.7751e+12 30589
## - LowQualFinSF   1 1.0525e+09 1.7753e+12 30589
## - BsmtFinSF2     1 2.0345e+09 1.7762e+12 30590
## <none>                        1.7742e+12 30591
## - FullBath       1 2.5220e+09 1.7767e+12 30591
## - PoolArea       1 3.3994e+09 1.7776e+12 30591
## + EnclosedPorch  1 7.4067e+08 1.7735e+12 30592
## + X3SsnPorch     1 6.3182e+08 1.7736e+12 30592
## + HalfBath       1 3.7723e+08 1.7738e+12 30592
## - LotFrontage    1 4.6138e+09 1.7788e+12 30592
## + GarageArea     1 1.1724e+08 1.7741e+12 30593
## + BsmtHalfBath   1 2.6501e+07 1.7742e+12 30593
## + MoSold         1 9.5962e+06 1.7742e+12 30593
## + MiscVal        1 5.7320e+06 1.7742e+12 30593
## + OpenPorchSF    1 5.9028e+05 1.7742e+12 30593
## - Fireplaces     1 4.9216e+09 1.7791e+12 30593
## - YearRemodAdd   1 7.4372e+09 1.7816e+12 30595
## - BsmtUnfSF      1 8.3471e+09 1.7826e+12 30595
## - ScreenPorch    1 1.1297e+10 1.7855e+12 30598
## - WoodDeckSF     1 1.2373e+10 1.7866e+12 30599
## - BsmtFullBath   1 1.2706e+10 1.7869e+12 30599
## - BsmtFinSF1     1 2.3388e+10 1.7976e+12 30608
## - LotArea        1 2.4656e+10 1.7989e+12 30609
## - TotRmsAbvGrd   1 2.6364e+10 1.8006e+12 30610
## - OverallCond    1 2.6761e+10 1.8010e+12 30610
## - MasVnrArea     1 2.9539e+10 1.8037e+12 30613
## - KitchenAbvGr   1 3.1682e+10 1.8059e+12 30614
## - YearBuilt      1 3.4276e+10 1.8085e+12 30617
## - BedroomAbvGr   1 3.5592e+10 1.8098e+12 30618
## - GarageCars     1 5.1675e+10 1.8259e+12 30631
## - X1stFlrSF      1 9.2145e+10 1.8664e+12 30663
## - X2ndFlrSF      1 1.1396e+11 1.8882e+12 30679
## - OverallQual    1 2.4202e+11 2.0162e+12 30775
## 
## Step:  AIC=30589.27
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + LowQualFinSF + BsmtFullBath + 
##     FullBath + BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + WoodDeckSF + ScreenPorch + PoolArea
## 
##                 Df  Sum of Sq        RSS   AIC
## - LowQualFinSF   1 1.1002e+09 1.7762e+12 30588
## - BsmtFinSF2     1 1.9971e+09 1.7771e+12 30589
## <none>                        1.7751e+12 30589
## - FullBath       1 2.4879e+09 1.7776e+12 30589
## - PoolArea       1 3.2131e+09 1.7783e+12 30590
## + YrSold         1 8.7772e+08 1.7742e+12 30591
## + EnclosedPorch  1 7.5103e+08 1.7743e+12 30591
## + X3SsnPorch     1 6.0137e+08 1.7745e+12 30591
## + HalfBath       1 3.9449e+08 1.7747e+12 30591
## - LotFrontage    1 4.5772e+09 1.7797e+12 30591
## + GarageArea     1 1.0745e+08 1.7750e+12 30591
## + MoSold         1 5.5582e+07 1.7750e+12 30591
## + BsmtHalfBath   1 3.9915e+07 1.7750e+12 30591
## + MiscVal        1 6.1631e+06 1.7751e+12 30591
## + OpenPorchSF    1 6.3444e+05 1.7751e+12 30591
## - Fireplaces     1 4.9470e+09 1.7800e+12 30591
## - YearRemodAdd   1 7.2099e+09 1.7823e+12 30593
## - BsmtUnfSF      1 8.3748e+09 1.7835e+12 30594
## - ScreenPorch    1 1.1162e+10 1.7862e+12 30596
## - WoodDeckSF     1 1.2196e+10 1.7873e+12 30597
## - BsmtFullBath   1 1.2351e+10 1.7874e+12 30597
## - BsmtFinSF1     1 2.3519e+10 1.7986e+12 30607
## - LotArea        1 2.4770e+10 1.7999e+12 30608
## - TotRmsAbvGrd   1 2.6471e+10 1.8016e+12 30609
## - OverallCond    1 2.6563e+10 1.8016e+12 30609
## - MasVnrArea     1 2.9383e+10 1.8045e+12 30611
## - KitchenAbvGr   1 3.2131e+10 1.8072e+12 30614
## - YearBuilt      1 3.4467e+10 1.8096e+12 30615
## - BedroomAbvGr   1 3.5476e+10 1.8106e+12 30616
## - GarageCars     1 5.2248e+10 1.8273e+12 30630
## - X1stFlrSF      1 9.2097e+10 1.8672e+12 30661
## - X2ndFlrSF      1 1.1394e+11 1.8890e+12 30678
## - OverallQual    1 2.4236e+11 2.0174e+12 30774
## 
## Step:  AIC=30588.17
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtFinSF2 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + 
##     BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces + 
##     GarageCars + WoodDeckSF + ScreenPorch + PoolArea
## 
##                 Df  Sum of Sq        RSS   AIC
## - BsmtFinSF2     1 2.0445e+09 1.7782e+12 30588
## <none>                        1.7762e+12 30588
## - FullBath       1 2.6423e+09 1.7788e+12 30588
## - PoolArea       1 2.9956e+09 1.7792e+12 30589
## + LowQualFinSF   1 1.1002e+09 1.7751e+12 30589
## + GrLivArea      1 1.1002e+09 1.7751e+12 30589
## + YrSold         1 9.2539e+08 1.7753e+12 30589
## + EnclosedPorch  1 6.9156e+08 1.7755e+12 30590
## + X3SsnPorch     1 6.1423e+08 1.7756e+12 30590
## + HalfBath       1 3.8980e+08 1.7758e+12 30590
## - LotFrontage    1 4.6481e+09 1.7808e+12 30590
## + GarageArea     1 1.3326e+08 1.7761e+12 30590
## - Fireplaces     1 4.7892e+09 1.7810e+12 30590
## + MoSold         1 4.5034e+07 1.7761e+12 30590
## + BsmtHalfBath   1 4.1396e+07 1.7761e+12 30590
## + MiscVal        1 8.0154e+06 1.7762e+12 30590
## + OpenPorchSF    1 2.1766e+06 1.7762e+12 30590
## - YearRemodAdd   1 7.4504e+09 1.7836e+12 30592
## - BsmtUnfSF      1 8.4443e+09 1.7846e+12 30593
## - ScreenPorch    1 1.1308e+10 1.7875e+12 30595
## - WoodDeckSF     1 1.2252e+10 1.7884e+12 30596
## - BsmtFullBath   1 1.2430e+10 1.7886e+12 30596
## - BsmtFinSF1     1 2.3612e+10 1.7998e+12 30606
## - LotArea        1 2.4725e+10 1.8009e+12 30606
## - OverallCond    1 2.5897e+10 1.8021e+12 30607
## - TotRmsAbvGrd   1 2.8727e+10 1.8049e+12 30610
## - MasVnrArea     1 2.9054e+10 1.8052e+12 30610
## - YearBuilt      1 3.3369e+10 1.8096e+12 30613
## - KitchenAbvGr   1 3.3406e+10 1.8096e+12 30613
## - BedroomAbvGr   1 3.5675e+10 1.8119e+12 30615
## - GarageCars     1 5.1606e+10 1.8278e+12 30628
## - X1stFlrSF      1 9.1165e+10 1.8674e+12 30659
## - X2ndFlrSF      1 1.1285e+11 1.8890e+12 30676
## - OverallQual    1 2.4470e+11 2.0209e+12 30775
## 
## Step:  AIC=30587.85
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + FullBath + BedroomAbvGr + 
##     KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + 
##     ScreenPorch + PoolArea
## 
##                 Df  Sum of Sq        RSS   AIC
## - FullBath       1 2.4245e+09 1.7807e+12 30588
## <none>                        1.7782e+12 30588
## + BsmtFinSF2     1 2.0445e+09 1.7762e+12 30588
## + TotalBsmtSF    1 2.0445e+09 1.7762e+12 30588
## - PoolArea       1 2.8365e+09 1.7811e+12 30588
## + LowQualFinSF   1 1.1477e+09 1.7771e+12 30589
## + GrLivArea      1 1.1477e+09 1.7771e+12 30589
## + YrSold         1 8.8753e+08 1.7773e+12 30589
## + EnclosedPorch  1 7.8775e+08 1.7774e+12 30589
## + X3SsnPorch     1 5.4005e+08 1.7777e+12 30589
## + HalfBath       1 3.1990e+08 1.7779e+12 30590
## - Fireplaces     1 4.5743e+09 1.7828e+12 30590
## + GarageArea     1 1.5919e+08 1.7781e+12 30590
## + BsmtHalfBath   1 1.2425e+08 1.7781e+12 30590
## - LotFrontage    1 4.8147e+09 1.7830e+12 30590
## + MoSold         1 3.2650e+07 1.7782e+12 30590
## + OpenPorchSF    1 1.4392e+07 1.7782e+12 30590
## + MiscVal        1 4.1196e+06 1.7782e+12 30590
## - BsmtUnfSF      1 6.3998e+09 1.7846e+12 30591
## - YearRemodAdd   1 7.0167e+09 1.7852e+12 30592
## - ScreenPorch    1 1.2223e+10 1.7905e+12 30596
## - WoodDeckSF     1 1.2789e+10 1.7910e+12 30596
## - BsmtFullBath   1 1.5199e+10 1.7934e+12 30598
## - BsmtFinSF1     1 2.3214e+10 1.8014e+12 30605
## - LotArea        1 2.6093e+10 1.8043e+12 30607
## - OverallCond    1 2.6208e+10 1.8044e+12 30607
## - TotRmsAbvGrd   1 2.8059e+10 1.8063e+12 30609
## - MasVnrArea     1 2.8822e+10 1.8071e+12 30609
## - YearBuilt      1 3.4833e+10 1.8131e+12 30614
## - KitchenAbvGr   1 3.5145e+10 1.8134e+12 30614
## - BedroomAbvGr   1 3.5214e+10 1.8134e+12 30615
## - GarageCars     1 5.0929e+10 1.8292e+12 30627
## - X2ndFlrSF      1 1.1296e+11 1.8912e+12 30676
## - X1stFlrSF      1 1.1764e+11 1.8959e+12 30679
## - OverallQual    1 2.4878e+11 2.0270e+12 30777
## 
## Step:  AIC=30587.84
## SalePrice ~ LotFrontage + LotArea + OverallQual + OverallCond + 
##     YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + BsmtUnfSF + 
##     X1stFlrSF + X2ndFlrSF + BsmtFullBath + BedroomAbvGr + KitchenAbvGr + 
##     TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + ScreenPorch + 
##     PoolArea
## 
##                 Df  Sum of Sq        RSS   AIC
## <none>                        1.7807e+12 30588
## + FullBath       1 2.4245e+09 1.7782e+12 30588
## - PoolArea       1 2.9855e+09 1.7836e+12 30588
## + BsmtFinSF2     1 1.8267e+09 1.7788e+12 30588
## + TotalBsmtSF    1 1.8267e+09 1.7788e+12 30588
## + LowQualFinSF   1 1.2958e+09 1.7794e+12 30589
## + GrLivArea      1 1.2958e+09 1.7794e+12 30589
## + HalfBath       1 1.2236e+09 1.7794e+12 30589
## + YrSold         1 8.5867e+08 1.7798e+12 30589
## + EnclosedPorch  1 8.5439e+08 1.7798e+12 30589
## + X3SsnPorch     1 5.9863e+08 1.7801e+12 30589
## - Fireplaces     1 4.3912e+09 1.7850e+12 30589
## - LotFrontage    1 4.5169e+09 1.7852e+12 30590
## + GarageArea     1 9.2015e+07 1.7806e+12 30590
## + BsmtHalfBath   1 6.3457e+07 1.7806e+12 30590
## + MoSold         1 3.8328e+07 1.7806e+12 30590
## + OpenPorchSF    1 2.9157e+07 1.7806e+12 30590
## + MiscVal        1 7.3356e+06 1.7806e+12 30590
## - BsmtUnfSF      1 6.2109e+09 1.7869e+12 30591
## - YearRemodAdd   1 8.4608e+09 1.7891e+12 30593
## - ScreenPorch    1 1.1864e+10 1.7925e+12 30596
## - WoodDeckSF     1 1.2769e+10 1.7934e+12 30596
## - BsmtFullBath   1 1.3851e+10 1.7945e+12 30597
## - BsmtFinSF1     1 2.2575e+10 1.8032e+12 30604
## - OverallCond    1 2.5229e+10 1.8059e+12 30606
## - LotArea        1 2.6602e+10 1.8073e+12 30608
## - MasVnrArea     1 2.7902e+10 1.8086e+12 30609
## - TotRmsAbvGrd   1 2.8036e+10 1.8087e+12 30609
## - KitchenAbvGr   1 3.2925e+10 1.8136e+12 30613
## - BedroomAbvGr   1 3.3424e+10 1.8141e+12 30613
## - YearBuilt      1 4.1970e+10 1.8226e+12 30620
## - GarageCars     1 5.1817e+10 1.8325e+12 30628
## - X2ndFlrSF      1 1.3184e+11 1.9125e+12 30690
## - X1stFlrSF      1 1.3286e+11 1.9135e+12 30691
## - OverallQual    1 2.5618e+11 2.0368e+12 30782
## 
## Call:
## lm(formula = SalePrice ~ LotFrontage + LotArea + OverallQual + 
##     OverallCond + YearBuilt + YearRemodAdd + MasVnrArea + BsmtFinSF1 + 
##     BsmtUnfSF + X1stFlrSF + X2ndFlrSF + BsmtFullBath + BedroomAbvGr + 
##     KitchenAbvGr + TotRmsAbvGrd + Fireplaces + GarageCars + WoodDeckSF + 
##     ScreenPorch + PoolArea, data = train1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -500134  -16262   -1921   13800  305988 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.007e+06  1.172e+05  -8.590  < 2e-16 ***
## LotFrontage   5.412e+01  2.833e+01   1.911 0.056259 .  
## LotArea       4.676e-01  1.009e-01   4.637 3.86e-06 ***
## OverallQual   1.693e+04  1.176e+03  14.389  < 2e-16 ***
## OverallCond   4.586e+03  1.016e+03   4.515 6.84e-06 ***
## YearBuilt     3.058e+02  5.251e+01   5.824 7.08e-09 ***
## YearRemodAdd  1.715e+02  6.559e+01   2.615 0.009020 ** 
## MasVnrArea    2.828e+01  5.955e+00   4.748 2.25e-06 ***
## BsmtFinSF1    1.694e+01  3.965e+00   4.271 2.07e-05 ***
## BsmtUnfSF     8.228e+00  3.673e+00   2.240 0.025219 *  
## X1stFlrSF     5.378e+01  5.190e+00  10.362  < 2e-16 ***
## X2ndFlrSF     4.188e+01  4.057e+00  10.322  < 2e-16 ***
## BsmtFullBath  8.118e+03  2.426e+03   3.346 0.000842 ***
## BedroomAbvGr -8.686e+03  1.671e+03  -5.197 2.31e-07 ***
## KitchenAbvGr -2.456e+04  4.761e+03  -5.158 2.84e-07 ***
## TotRmsAbvGrd  5.832e+03  1.225e+03   4.760 2.13e-06 ***
## Fireplaces    3.333e+03  1.769e+03   1.884 0.059795 .  
## GarageCars    1.107e+04  1.710e+03   6.471 1.33e-10 ***
## WoodDeckSF    2.571e+01  8.002e+00   3.212 0.001346 ** 
## ScreenPorch   5.289e+01  1.708e+01   3.096 0.001997 ** 
## PoolArea     -3.688e+01  2.375e+01  -1.553 0.120577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35180 on 1439 degrees of freedom
## Multiple R-squared:  0.8066, Adjusted R-squared:  0.8039 
## F-statistic: 300.1 on 20 and 1439 DF,  p-value: < 2.2e-16

The backward stepwise function starts with my model1. At each step, it eliminates the worst variable from the model to improve the AIC value. The same step continues until there are no further ways to improve the AIC value.

It results with multiple R-squared 0.8066 and adjusted R-squared 0.8039.

The residuals are randomly dispersed around y=0 with some outliers.

The QQ plot also shows some outliers at both end of the graph.

The histogram is unimodal and fairly normal.

Predict the SalePrice on test dataset.

## Warning in predict.lm(model1, test1): prediction from a rank-deficient fit
## may be misleading

Pred1 got 0.44206 on Kaggle.

Pred2 got 0.44345 on Kaggle.

Kaggle

Kaggle Username: sinyingwong

Kaggle Screenshot

Kaggle Screenshot

Future improvement

Improve the model by using kNN, Decision Tree, Bootstrap, Bagging, and/or other combinations of the algorithms.