HOUSE PRICES: ADVANCED REGRESSION TECHNIQUES

House Price Predictors

Kaggle.com, House Prices: Advanced Regression Techniques competition.. A playground competition’s dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

The data set contains 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, and this challenge will predict the final price of each home.

Load Required Libraries

library(ggplot2)
library(plyr)
library(tidyverse)
library(reactable)
library(scales)
library(summarytools)
library(plotly)
library(psych)
library(car)
library(corrr)
library(corrplot)
library(correlation)
library(Matrix)
library(moments)
library(MASS)
library(ggrepel)
library(psych)
library(caret)
library(Hmisc)
library(matlib)
library(graphics)
library(ggpubr)
library(leaps)

The data sets, train.csv and test.csv was transferred to a personal Github respository, loaded with read_csv function and transformed to a data frame.

Descriptive and Inferential Statistics

1. UNIVARIATE DESCRIPTIVE Statistics and appropriate plots for the training data set.

The Univariate descriptive statistics analysis will examine house prices train data set distribution, central tendency, and variability base on these following functions: 1. summary(): base function(generic), provides the minimum, 1st quartile, median, mean, 3rd quartile, and maximum values. 2. descr(): summarytools package, calculates mean, sd, min, Q1, median, Q3, max, MAD, IQR, CV, skewness, SE.skewness, and kurtosis on numerical vectors. (*) Not available when using sampling weights. 3. freq(): summarytools package, displays weighted or unweighted frequencies, including NA counts and proportions. 4. describeBy(): psych package, report basic summary statistics by a grouping variable.

Understanding the data

## Train has 1460 rows and 81 columns.
## Test has 1459 rows and 80  columns.

Preview of the dataset and its structure:

library(reactable)
reactable(head(df_train), striped = TRUE, bordered = TRUE, wrap = FALSE) #first 6 observations

The dataset contains 1460 observations** and 81, identifying the type of dwelling involved in the House Prices sale. The dataset have different variable types: numeric (discrete) and character (ordinal) that have limited number of unique character strings to create a factor variable.

Retrieve the column names of the data set:

colnames(df_train)
##  [1] "Id"            "MSSubClass"    "MSZoning"      "LotFrontage"  
##  [5] "LotArea"       "Street"        "Alley"         "LotShape"     
##  [9] "LandContour"   "Utilities"     "LotConfig"     "LandSlope"    
## [13] "Neighborhood"  "Condition1"    "Condition2"    "BldgType"     
## [17] "HouseStyle"    "OverallQual"   "OverallCond"   "YearBuilt"    
## [21] "YearRemodAdd"  "RoofStyle"     "RoofMatl"      "Exterior1st"  
## [25] "Exterior2nd"   "MasVnrType"    "MasVnrArea"    "ExterQual"    
## [29] "ExterCond"     "Foundation"    "BsmtQual"      "BsmtCond"     
## [33] "BsmtExposure"  "BsmtFinType1"  "BsmtFinSF1"    "BsmtFinType2" 
## [37] "BsmtFinSF2"    "BsmtUnfSF"     "TotalBsmtSF"   "Heating"      
## [41] "HeatingQC"     "CentralAir"    "Electrical"    "X1stFlrSF"    
## [45] "X2ndFlrSF"     "LowQualFinSF"  "GrLivArea"     "BsmtFullBath" 
## [49] "BsmtHalfBath"  "FullBath"      "HalfBath"      "BedroomAbvGr" 
## [53] "KitchenAbvGr"  "KitchenQual"   "TotRmsAbvGrd"  "Functional"   
## [57] "Fireplaces"    "FireplaceQu"   "GarageType"    "GarageYrBlt"  
## [61] "GarageFinish"  "GarageCars"    "GarageArea"    "GarageQual"   
## [65] "GarageCond"    "PavedDrive"    "WoodDeckSF"    "OpenPorchSF"  
## [69] "EnclosedPorch" "X3SsnPorch"    "ScreenPorch"   "PoolArea"     
## [73] "PoolQC"        "Fence"         "MiscFeature"   "MiscVal"      
## [77] "MoSold"        "YrSold"        "SaleType"      "SaleCondition"
## [81] "SalePrice"

Check data frame for NULL and NA values:

is.null(df_train)
## [1] FALSE
any(is.na(df_train))
## [1] TRUE

The high-level summary of the training dataset from Kaggle, without converting character variables to factor or removal of NA’s/Null values in the data frame.

The SalePrice variable is continuous and a log transform will make it as “normal” as possible for the statistical analysis.

#convert a monetary variable to log, it helps reduce the impact of outliners and decreases the skewness in the data set.
log_train <- df_train %>% mutate(log_SalePrice = log(SalePrice))

Histogram of SalePrice (unfiltered and filtered): the plot shows a right skew with a median sale price distribution in the range of 163,000 and 180,921.

par(mfrow= c(2,1))
hist(log_train$SalePrice, col = "darkmagenta", main = "Histogram of Sale Price") 
hist(log_train$log_SalePrice, col = "goldenrod", main = "Histogram of Log Transform Sale Price")

The summary of the variable (SalePrice) shows the rescaled observations that will provide a homogeneous grouping for normal distribution.

#actual observations of Sale Price
summary(log_train$SalePrice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   34900  129975  163000  180921  214000  755000
#log transformation of SalePrice
summary(log_train$log_SalePrice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.46   11.78   12.00   12.02   12.27   13.53

Numeric summary of the data for the independent variables and the dependent variables:

summary(log_train)
##        Id           MSSubClass      MSZoning          LotFrontage    
##  Min.   :   1.0   Min.   : 20.0   Length:1460        Min.   : 21.00  
##  1st Qu.: 365.8   1st Qu.: 20.0   Class :character   1st Qu.: 59.00  
##  Median : 730.5   Median : 50.0   Mode  :character   Median : 69.00  
##  Mean   : 730.5   Mean   : 56.9                      Mean   : 70.05  
##  3rd Qu.:1095.2   3rd Qu.: 70.0                      3rd Qu.: 80.00  
##  Max.   :1460.0   Max.   :190.0                      Max.   :313.00  
##                                                      NA's   :259     
##     LotArea          Street             Alley             LotShape        
##  Min.   :  1300   Length:1460        Length:1460        Length:1460       
##  1st Qu.:  7554   Class :character   Class :character   Class :character  
##  Median :  9478   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 10517                                                           
##  3rd Qu.: 11602                                                           
##  Max.   :215245                                                           
##                                                                           
##  LandContour         Utilities          LotConfig          LandSlope        
##  Length:1460        Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Neighborhood        Condition1         Condition2          BldgType        
##  Length:1460        Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   HouseStyle         OverallQual      OverallCond      YearBuilt   
##  Length:1460        Min.   : 1.000   Min.   :1.000   Min.   :1872  
##  Class :character   1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954  
##  Mode  :character   Median : 6.000   Median :5.000   Median :1973  
##                     Mean   : 6.099   Mean   :5.575   Mean   :1971  
##                     3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2000  
##                     Max.   :10.000   Max.   :9.000   Max.   :2010  
##                                                                    
##   YearRemodAdd   RoofStyle           RoofMatl         Exterior1st       
##  Min.   :1950   Length:1460        Length:1460        Length:1460       
##  1st Qu.:1967   Class :character   Class :character   Class :character  
##  Median :1994   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :1985                                                           
##  3rd Qu.:2004                                                           
##  Max.   :2010                                                           
##                                                                         
##  Exterior2nd         MasVnrType          MasVnrArea      ExterQual        
##  Length:1460        Length:1460        Min.   :   0.0   Length:1460       
##  Class :character   Class :character   1st Qu.:   0.0   Class :character  
##  Mode  :character   Mode  :character   Median :   0.0   Mode  :character  
##                                        Mean   : 103.7                     
##                                        3rd Qu.: 166.0                     
##                                        Max.   :1600.0                     
##                                        NA's   :8                          
##   ExterCond          Foundation          BsmtQual           BsmtCond        
##  Length:1460        Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  BsmtExposure       BsmtFinType1         BsmtFinSF1     BsmtFinType2      
##  Length:1460        Length:1460        Min.   :   0.0   Length:1460       
##  Class :character   Class :character   1st Qu.:   0.0   Class :character  
##  Mode  :character   Mode  :character   Median : 383.5   Mode  :character  
##                                        Mean   : 443.6                     
##                                        3rd Qu.: 712.2                     
##                                        Max.   :5644.0                     
##                                                                           
##    BsmtFinSF2        BsmtUnfSF       TotalBsmtSF       Heating         
##  Min.   :   0.00   Min.   :   0.0   Min.   :   0.0   Length:1460       
##  1st Qu.:   0.00   1st Qu.: 223.0   1st Qu.: 795.8   Class :character  
##  Median :   0.00   Median : 477.5   Median : 991.5   Mode  :character  
##  Mean   :  46.55   Mean   : 567.2   Mean   :1057.4                     
##  3rd Qu.:   0.00   3rd Qu.: 808.0   3rd Qu.:1298.2                     
##  Max.   :1474.00   Max.   :2336.0   Max.   :6110.0                     
##                                                                        
##   HeatingQC          CentralAir         Electrical          X1stFlrSF   
##  Length:1460        Length:1460        Length:1460        Min.   : 334  
##  Class :character   Class :character   Class :character   1st Qu.: 882  
##  Mode  :character   Mode  :character   Mode  :character   Median :1087  
##                                                           Mean   :1163  
##                                                           3rd Qu.:1391  
##                                                           Max.   :4692  
##                                                                         
##    X2ndFlrSF     LowQualFinSF       GrLivArea     BsmtFullBath   
##  Min.   :   0   Min.   :  0.000   Min.   : 334   Min.   :0.0000  
##  1st Qu.:   0   1st Qu.:  0.000   1st Qu.:1130   1st Qu.:0.0000  
##  Median :   0   Median :  0.000   Median :1464   Median :0.0000  
##  Mean   : 347   Mean   :  5.845   Mean   :1515   Mean   :0.4253  
##  3rd Qu.: 728   3rd Qu.:  0.000   3rd Qu.:1777   3rd Qu.:1.0000  
##  Max.   :2065   Max.   :572.000   Max.   :5642   Max.   :3.0000  
##                                                                  
##   BsmtHalfBath        FullBath        HalfBath       BedroomAbvGr  
##  Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000  
##  Median :0.00000   Median :2.000   Median :0.0000   Median :3.000  
##  Mean   :0.05753   Mean   :1.565   Mean   :0.3829   Mean   :2.866  
##  3rd Qu.:0.00000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.000  
##  Max.   :2.00000   Max.   :3.000   Max.   :2.0000   Max.   :8.000  
##                                                                    
##   KitchenAbvGr   KitchenQual         TotRmsAbvGrd     Functional       
##  Min.   :0.000   Length:1460        Min.   : 2.000   Length:1460       
##  1st Qu.:1.000   Class :character   1st Qu.: 5.000   Class :character  
##  Median :1.000   Mode  :character   Median : 6.000   Mode  :character  
##  Mean   :1.047                      Mean   : 6.518                     
##  3rd Qu.:1.000                      3rd Qu.: 7.000                     
##  Max.   :3.000                      Max.   :14.000                     
##                                                                        
##    Fireplaces    FireplaceQu         GarageType         GarageYrBlt  
##  Min.   :0.000   Length:1460        Length:1460        Min.   :1900  
##  1st Qu.:0.000   Class :character   Class :character   1st Qu.:1961  
##  Median :1.000   Mode  :character   Mode  :character   Median :1980  
##  Mean   :0.613                                         Mean   :1979  
##  3rd Qu.:1.000                                         3rd Qu.:2002  
##  Max.   :3.000                                         Max.   :2010  
##                                                        NA's   :81    
##  GarageFinish         GarageCars      GarageArea      GarageQual       
##  Length:1460        Min.   :0.000   Min.   :   0.0   Length:1460       
##  Class :character   1st Qu.:1.000   1st Qu.: 334.5   Class :character  
##  Mode  :character   Median :2.000   Median : 480.0   Mode  :character  
##                     Mean   :1.767   Mean   : 473.0                     
##                     3rd Qu.:2.000   3rd Qu.: 576.0                     
##                     Max.   :4.000   Max.   :1418.0                     
##                                                                        
##   GarageCond         PavedDrive          WoodDeckSF      OpenPorchSF    
##  Length:1460        Length:1460        Min.   :  0.00   Min.   :  0.00  
##  Class :character   Class :character   1st Qu.:  0.00   1st Qu.:  0.00  
##  Mode  :character   Mode  :character   Median :  0.00   Median : 25.00  
##                                        Mean   : 94.24   Mean   : 46.66  
##                                        3rd Qu.:168.00   3rd Qu.: 68.00  
##                                        Max.   :857.00   Max.   :547.00  
##                                                                         
##  EnclosedPorch      X3SsnPorch      ScreenPorch        PoolArea      
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000  
##  Median :  0.00   Median :  0.00   Median :  0.00   Median :  0.000  
##  Mean   : 21.95   Mean   :  3.41   Mean   : 15.06   Mean   :  2.759  
##  3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000  
##  Max.   :552.00   Max.   :508.00   Max.   :480.00   Max.   :738.000  
##                                                                      
##     PoolQC             Fence           MiscFeature           MiscVal        
##  Length:1460        Length:1460        Length:1460        Min.   :    0.00  
##  Class :character   Class :character   Class :character   1st Qu.:    0.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :    0.00  
##                                                           Mean   :   43.49  
##                                                           3rd Qu.:    0.00  
##                                                           Max.   :15500.00  
##                                                                             
##      MoSold           YrSold       SaleType         SaleCondition     
##  Min.   : 1.000   Min.   :2006   Length:1460        Length:1460       
##  1st Qu.: 5.000   1st Qu.:2007   Class :character   Class :character  
##  Median : 6.000   Median :2008   Mode  :character   Mode  :character  
##  Mean   : 6.322   Mean   :2008                                        
##  3rd Qu.: 8.000   3rd Qu.:2009                                        
##  Max.   :12.000   Max.   :2010                                        
##                                                                       
##    SalePrice      log_SalePrice  
##  Min.   : 34900   Min.   :10.46  
##  1st Qu.:129975   1st Qu.:11.78  
##  Median :163000   Median :12.00  
##  Mean   :180921   Mean   :12.02  
##  3rd Qu.:214000   3rd Qu.:12.27  
##  Max.   :755000   Max.   :13.53  
## 

The descr() function is a descriptive (univariate) statistics for numerical vectors.

#summarytools package
descr(select_if(log_train, is.numeric), style = "rmarkdown")
## ### Descriptive Statistics  
## 
## |          &nbsp; | BedroomAbvGr | BsmtFinSF1 | BsmtFinSF2 | BsmtFullBath | BsmtHalfBath |
## |----------------:|-------------:|-----------:|-----------:|-------------:|-------------:|
## |        **Mean** |         2.87 |     443.64 |      46.55 |         0.43 |         0.06 |
## |     **Std.Dev** |         0.82 |     456.10 |     161.32 |         0.52 |         0.24 |
## |         **Min** |         0.00 |       0.00 |       0.00 |         0.00 |         0.00 |
## |          **Q1** |         2.00 |       0.00 |       0.00 |         0.00 |         0.00 |
## |      **Median** |         3.00 |     383.50 |       0.00 |         0.00 |         0.00 |
## |          **Q3** |         3.00 |     712.50 |       0.00 |         1.00 |         0.00 |
## |         **Max** |         8.00 |    5644.00 |    1474.00 |         3.00 |         2.00 |
## |         **MAD** |         0.00 |     568.58 |       0.00 |         0.00 |         0.00 |
## |         **IQR** |         1.00 |     712.25 |       0.00 |         1.00 |         0.00 |
## |          **CV** |         0.28 |       1.03 |       3.47 |         1.22 |         4.15 |
## |    **Skewness** |         0.21 |       1.68 |       4.25 |         0.59 |         4.09 |
## | **SE.Skewness** |         0.06 |       0.06 |       0.06 |         0.06 |         0.06 |
## |    **Kurtosis** |         2.21 |      11.06 |      20.01 |        -0.84 |        16.31 |
## |     **N.Valid** |      1460.00 |    1460.00 |    1460.00 |      1460.00 |      1460.00 |
## |   **Pct.Valid** |       100.00 |     100.00 |     100.00 |       100.00 |       100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | BsmtUnfSF | EnclosedPorch | Fireplaces | FullBath | GarageArea |
## |----------------:|----------:|--------------:|-----------:|---------:|-----------:|
## |        **Mean** |    567.24 |         21.95 |       0.61 |     1.57 |     472.98 |
## |     **Std.Dev** |    441.87 |         61.12 |       0.64 |     0.55 |     213.80 |
## |         **Min** |      0.00 |          0.00 |       0.00 |     0.00 |       0.00 |
## |          **Q1** |    223.00 |          0.00 |       0.00 |     1.00 |     333.00 |
## |      **Median** |    477.50 |          0.00 |       1.00 |     2.00 |     480.00 |
## |          **Q3** |    808.00 |          0.00 |       1.00 |     2.00 |     576.00 |
## |         **Max** |   2336.00 |        552.00 |       3.00 |     3.00 |    1418.00 |
## |         **MAD** |    426.99 |          0.00 |       1.48 |     0.00 |     177.91 |
## |         **IQR** |    585.00 |          0.00 |       1.00 |     1.00 |     241.50 |
## |          **CV** |      0.78 |          2.78 |       1.05 |     0.35 |       0.45 |
## |    **Skewness** |      0.92 |          3.08 |       0.65 |     0.04 |       0.18 |
## | **SE.Skewness** |      0.06 |          0.06 |       0.06 |     0.06 |       0.06 |
## |    **Kurtosis** |      0.46 |         10.37 |      -0.22 |    -0.86 |       0.90 |
## |     **N.Valid** |   1460.00 |       1460.00 |    1460.00 |  1460.00 |    1460.00 |
## |   **Pct.Valid** |    100.00 |        100.00 |     100.00 |   100.00 |     100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | GarageCars | GarageYrBlt | GrLivArea | HalfBath |      Id | KitchenAbvGr |
## |----------------:|-----------:|------------:|----------:|---------:|--------:|-------------:|
## |        **Mean** |       1.77 |     1978.51 |   1515.46 |     0.38 |  730.50 |         1.05 |
## |     **Std.Dev** |       0.75 |       24.69 |    525.48 |     0.50 |  421.61 |         0.22 |
## |         **Min** |       0.00 |     1900.00 |    334.00 |     0.00 |    1.00 |         0.00 |
## |          **Q1** |       1.00 |     1961.00 |   1129.00 |     0.00 |  365.50 |         1.00 |
## |      **Median** |       2.00 |     1980.00 |   1464.00 |     0.00 |  730.50 |         1.00 |
## |          **Q3** |       2.00 |     2002.00 |   1777.50 |     1.00 | 1095.50 |         1.00 |
## |         **Max** |       4.00 |     2010.00 |   5642.00 |     2.00 | 1460.00 |         3.00 |
## |         **MAD** |       0.00 |       31.13 |    483.33 |     0.00 |  541.15 |         0.00 |
## |         **IQR** |       1.00 |       41.00 |    647.25 |     1.00 |  729.50 |         0.00 |
## |          **CV** |       0.42 |        0.01 |      0.35 |     1.31 |    0.58 |         0.21 |
## |    **Skewness** |      -0.34 |       -0.65 |      1.36 |     0.67 |    0.00 |         4.48 |
## | **SE.Skewness** |       0.06 |        0.07 |      0.06 |     0.06 |    0.06 |         0.06 |
## |    **Kurtosis** |       0.21 |       -0.42 |      4.86 |    -1.08 |   -1.20 |        21.42 |
## |     **N.Valid** |    1460.00 |     1379.00 |   1460.00 |  1460.00 | 1460.00 |      1460.00 |
## |   **Pct.Valid** |     100.00 |       94.45 |    100.00 |   100.00 |  100.00 |       100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | log_SalePrice |   LotArea | LotFrontage | LowQualFinSF | MasVnrArea |
## |----------------:|--------------:|----------:|------------:|-------------:|-----------:|
## |        **Mean** |         12.02 |  10516.83 |       70.05 |         5.84 |     103.69 |
## |     **Std.Dev** |          0.40 |   9981.26 |       24.28 |        48.62 |     181.07 |
## |         **Min** |         10.46 |   1300.00 |       21.00 |         0.00 |       0.00 |
## |          **Q1** |         11.77 |   7549.00 |       59.00 |         0.00 |       0.00 |
## |      **Median** |         12.00 |   9478.50 |       69.00 |         0.00 |       0.00 |
## |          **Q3** |         12.27 |  11603.00 |       80.00 |         0.00 |     166.00 |
## |         **Max** |         13.53 | 215245.00 |      313.00 |       572.00 |    1600.00 |
## |         **MAD** |          0.36 |   2962.23 |       16.31 |         0.00 |       0.00 |
## |         **IQR** |          0.50 |   4048.00 |       21.00 |         0.00 |     166.00 |
## |          **CV** |          0.03 |      0.95 |        0.35 |         8.32 |       1.75 |
## |    **Skewness** |          0.12 |     12.18 |        2.16 |         8.99 |       2.66 |
## | **SE.Skewness** |          0.06 |      0.06 |        0.07 |         0.06 |       0.06 |
## |    **Kurtosis** |          0.80 |    202.26 |       17.34 |        82.83 |      10.03 |
## |     **N.Valid** |       1460.00 |   1460.00 |     1201.00 |      1460.00 |    1452.00 |
## |   **Pct.Valid** |        100.00 |    100.00 |       82.26 |       100.00 |      99.45 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; |  MiscVal |  MoSold | MSSubClass | OpenPorchSF | OverallCond | OverallQual |
## |----------------:|---------:|--------:|-----------:|------------:|------------:|------------:|
## |        **Mean** |    43.49 |    6.32 |      56.90 |       46.66 |        5.58 |        6.10 |
## |     **Std.Dev** |   496.12 |    2.70 |      42.30 |       66.26 |        1.11 |        1.38 |
## |         **Min** |     0.00 |    1.00 |      20.00 |        0.00 |        1.00 |        1.00 |
## |          **Q1** |     0.00 |    5.00 |      20.00 |        0.00 |        5.00 |        5.00 |
## |      **Median** |     0.00 |    6.00 |      50.00 |       25.00 |        5.00 |        6.00 |
## |          **Q3** |     0.00 |    8.00 |      70.00 |       68.00 |        6.00 |        7.00 |
## |         **Max** | 15500.00 |   12.00 |     190.00 |      547.00 |        9.00 |       10.00 |
## |         **MAD** |     0.00 |    2.97 |      44.48 |       37.06 |        0.00 |        1.48 |
## |         **IQR** |     0.00 |    3.00 |      50.00 |       68.00 |        1.00 |        2.00 |
## |          **CV** |    11.41 |    0.43 |       0.74 |        1.42 |        0.20 |        0.23 |
## |    **Skewness** |    24.43 |    0.21 |       1.40 |        2.36 |        0.69 |        0.22 |
## | **SE.Skewness** |     0.06 |    0.06 |       0.06 |        0.06 |        0.06 |        0.06 |
## |    **Kurtosis** |   697.64 |   -0.41 |       1.56 |        8.44 |        1.09 |        0.09 |
## |     **N.Valid** |  1460.00 | 1460.00 |    1460.00 |     1460.00 |     1460.00 |     1460.00 |
## |   **Pct.Valid** |   100.00 |  100.00 |     100.00 |      100.00 |      100.00 |      100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | PoolArea | SalePrice | ScreenPorch | TotalBsmtSF | TotRmsAbvGrd |
## |----------------:|---------:|----------:|------------:|------------:|-------------:|
## |        **Mean** |     2.76 | 180921.20 |       15.06 |     1057.43 |         6.52 |
## |     **Std.Dev** |    40.18 |  79442.50 |       55.76 |      438.71 |         1.63 |
## |         **Min** |     0.00 |  34900.00 |        0.00 |        0.00 |         2.00 |
## |          **Q1** |     0.00 | 129950.00 |        0.00 |      795.50 |         5.00 |
## |      **Median** |     0.00 | 163000.00 |        0.00 |      991.50 |         6.00 |
## |          **Q3** |     0.00 | 214000.00 |        0.00 |     1298.50 |         7.00 |
## |         **Max** |   738.00 | 755000.00 |      480.00 |     6110.00 |        14.00 |
## |         **MAD** |     0.00 |  56338.80 |        0.00 |      347.67 |         1.48 |
## |         **IQR** |     0.00 |  84025.00 |        0.00 |      502.50 |         2.00 |
## |          **CV** |    14.56 |      0.44 |        3.70 |        0.41 |         0.25 |
## |    **Skewness** |    14.80 |      1.88 |        4.11 |        1.52 |         0.67 |
## | **SE.Skewness** |     0.06 |      0.06 |        0.06 |        0.06 |         0.06 |
## |    **Kurtosis** |   222.19 |      6.50 |       18.34 |       13.18 |         0.87 |
## |     **N.Valid** |  1460.00 |   1460.00 |     1460.00 |     1460.00 |      1460.00 |
## |   **Pct.Valid** |   100.00 |    100.00 |      100.00 |      100.00 |       100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | WoodDeckSF | X1stFlrSF | X2ndFlrSF | X3SsnPorch | YearBuilt |
## |----------------:|-----------:|----------:|----------:|-----------:|----------:|
## |        **Mean** |      94.24 |   1162.63 |    346.99 |       3.41 |   1971.27 |
## |     **Std.Dev** |     125.34 |    386.59 |    436.53 |      29.32 |     30.20 |
## |         **Min** |       0.00 |    334.00 |      0.00 |       0.00 |   1872.00 |
## |          **Q1** |       0.00 |    882.00 |      0.00 |       0.00 |   1954.00 |
## |      **Median** |       0.00 |   1087.00 |      0.00 |       0.00 |   1973.00 |
## |          **Q3** |     168.00 |   1391.50 |    728.00 |       0.00 |   2000.00 |
## |         **Max** |     857.00 |   4692.00 |   2065.00 |     508.00 |   2010.00 |
## |         **MAD** |       0.00 |    347.67 |      0.00 |       0.00 |     37.06 |
## |         **IQR** |     168.00 |    509.25 |    728.00 |       0.00 |     46.00 |
## |          **CV** |       1.33 |      0.33 |      1.26 |       8.60 |      0.02 |
## |    **Skewness** |       1.54 |      1.37 |      0.81 |      10.28 |     -0.61 |
## | **SE.Skewness** |       0.06 |      0.06 |      0.06 |       0.06 |      0.06 |
## |    **Kurtosis** |       2.97 |      5.71 |     -0.56 |     123.06 |     -0.45 |
## |     **N.Valid** |    1460.00 |   1460.00 |   1460.00 |    1460.00 |   1460.00 |
## |   **Pct.Valid** |     100.00 |    100.00 |    100.00 |     100.00 |    100.00 |
## 
## Table: Table continues below
## 
##  
## 
## |          &nbsp; | YearRemodAdd |  YrSold |
## |----------------:|-------------:|--------:|
## |        **Mean** |      1984.87 | 2007.82 |
## |     **Std.Dev** |        20.65 |    1.33 |
## |         **Min** |      1950.00 | 2006.00 |
## |          **Q1** |      1967.00 | 2007.00 |
## |      **Median** |      1994.00 | 2008.00 |
## |          **Q3** |      2004.00 | 2009.00 |
## |         **Max** |      2010.00 | 2010.00 |
## |         **MAD** |        19.27 |    1.48 |
## |         **IQR** |        37.00 |    2.00 |
## |          **CV** |         0.01 |    0.00 |
## |    **Skewness** |        -0.50 |    0.10 |
## | **SE.Skewness** |         0.06 |    0.06 |
## |    **Kurtosis** |        -1.27 |   -1.19 |
## |     **N.Valid** |      1460.00 | 1460.00 |
## |   **Pct.Valid** |       100.00 |  100.00 |

The freq() function will create a frequency table for the data set variable Heating, showing frequencies, proportions, as well as missing data information.

Heating Type of heating
Floor Floor Furnace
GasA Gas forced warm air furnace
GasW Gas hot water or steam heat
Grav Gravity furnace
OthW Hot water or steam heat other than gas
Wall Wall furnace
#report.nas = FALSE argument removes information about missing values
#summarytools package - freq() function
house.heat <- freq(log_train$Heating, report.nas = TRUE, style = "rmarkdown") 
house.heat
## ### Frequencies  
## #### log_train$Heating  
## **Type:** Character  
## 
## |     &nbsp; | Freq | % Valid | % Valid Cum. | % Total | % Total Cum. |
## |-----------:|-----:|--------:|-------------:|--------:|-------------:|
## |  **Floor** |    1 |   0.068 |        0.068 |   0.068 |        0.068 |
## |   **GasA** | 1428 |  97.808 |       97.877 |  97.808 |       97.877 |
## |   **GasW** |   18 |   1.233 |       99.110 |   1.233 |       99.110 |
## |   **Grav** |    7 |   0.479 |       99.589 |   0.479 |       99.589 |
## |   **OthW** |    2 |   0.137 |       99.726 |   0.137 |       99.726 |
## |   **Wall** |    4 |   0.274 |      100.000 |   0.274 |      100.000 |
## | **\<NA\>** |    0 |         |              |   0.000 |      100.000 |
## |  **Total** | 1460 | 100.000 |      100.000 | 100.000 |      100.000 |

The describeBy() function allows to report several summary statistics (i.e., number of valid cases, mean, standard deviation, median, trimmed mean and others) by a grouping variable as depicted in the Foundation column in data frame, log_train.char.

Foundation Type of foundation
BrkTil Brick & Tile
CBlock Cinder Block
PConc Poured Concrete
Slab Slab
Stone Stone
Wood Wood
#psych library

describeBy(
  log_train,
  log_train$Foundation # grouping variable
)
## 
##  Descriptive statistics by group 
## group: BrkTil
##                vars   n      mean       sd    median   trimmed      mad
## Id                1 146    735.23   439.63    680.00    736.48   548.56
## MSSubClass        2 146     58.46    37.06     50.00     51.10    29.65
## MSZoning*         3 146      3.46     0.63      4.00      3.53     0.00
## LotFrontage       4 138     60.96    15.62     60.00     58.91    11.86
## LotArea           5 146   9159.17  4773.15   8510.00   8428.81  3395.15
## Street*           6 146      1.00     0.00      1.00      1.00     0.00
## Alley*            7  35      1.26     0.44      1.00      1.21     0.00
## LotShape*         8 146      3.50     1.10      4.00      3.74     0.00
## LandContour*      9 146      3.58     1.02      4.00      3.84     0.00
## Utilities*       10 146      1.00     0.00      1.00      1.00     0.00
## LotConfig*       11 146      3.29     1.26      4.00      3.47     0.00
## LandSlope*       12 146      1.05     0.28      1.00      1.00     0.00
## Neighborhood*    13 146      5.01     2.84      5.00      4.90     2.97
## Condition1*      14 146      2.86     0.81      3.00      2.92     0.00
## Condition2*      15 146      2.98     0.30      3.00      3.00     0.00
## BldgType*        16 146      1.06     0.24      1.00      1.00     0.00
## HouseStyle*      17 146      2.97     1.94      3.00      2.85     2.97
## OverallQual      18 146      5.45     1.25      5.00      5.45     1.48
## OverallCond      19 146      6.20     1.57      6.00      6.29     1.48
## YearBuilt        20 146   1921.02    13.93   1922.00   1922.47     8.90
## YearRemodAdd     21 146   1971.62    24.31   1950.00   1970.07     0.00
## RoofStyle*       22 146      1.18     0.56      1.00      1.01     0.00
## RoofMatl*        23 146      1.01     0.08      1.00      1.00     0.00
## Exterior1st*     24 146      6.32     2.09      7.00      6.65     1.48
## Exterior2nd*     25 146      7.88     2.73      9.00      8.18     1.48
## MasVnrType*      26 146      2.97     0.20      3.00      3.00     0.00
## MasVnrArea       27 146      7.00    49.67      0.00      0.00     0.00
## ExterQual*       28 146      3.88     0.42      4.00      4.00     0.00
## ExterCond*       29 146      3.64     0.67      4.00      3.80     0.00
## Foundation*      30 146      1.00     0.00      1.00      1.00     0.00
## BsmtQual*        31 145      3.62     0.72      4.00      3.78     0.00
## BsmtCond*        32 145      3.52     1.05      4.00      3.77     0.00
## BsmtExposure*    33 145      3.86     0.42      4.00      3.97     0.00
## BsmtFinType1*    34 145      4.85     1.73      6.00      5.15     0.00
## BsmtFinSF1       35 146    165.84   257.10      0.00    113.31     0.00
## BsmtFinType2*    36 145      4.90     0.56      5.00      5.00     0.00
## BsmtFinSF2       37 146     19.73   101.34      0.00      0.00     0.00
## BsmtUnfSF        38 146    629.05   318.62    673.00    637.44   323.95
## TotalBsmtSF      39 146    814.62   232.18    793.00    809.52   181.62
## Heating*         40 146      1.19     0.57      1.00      1.03     0.00
## HeatingQC*       41 146      2.50     1.23      3.00      2.50     1.48
## CentralAir*      42 146      1.72     0.45      2.00      1.77     0.00
## Electrical*      43 146      4.13     1.59      5.00      4.40     0.00
## X1stFlrSF        44 146    975.08   241.23    941.50    950.92   215.72
## X2ndFlrSF        45 146    455.01   404.55    513.00    418.34   474.43
## LowQualFinSF     46 146     21.99    94.04      0.00      0.00     0.00
## GrLivArea        47 146   1452.08   564.36   1364.50   1386.21   510.01
## BsmtFullBath     48 146      0.20     0.40      0.00      0.13     0.00
## BsmtHalfBath     49 146      0.03     0.18      0.00      0.00     0.00
## FullBath         50 146      1.33     0.54      1.00      1.26     0.00
## HalfBath         51 146      0.21     0.41      0.00      0.14     0.00
## BedroomAbvGr     52 146      2.92     0.90      3.00      2.86     1.48
## KitchenAbvGr     53 146      1.08     0.29      1.00      1.00     0.00
## KitchenQual*     54 146      3.55     0.79      4.00      3.72     0.00
## TotRmsAbvGrd     55 146      6.55     1.68      6.00      6.43     1.48
## Functional*      56 146      5.64     1.07      6.00      5.96     0.00
## Fireplaces       57 146      0.47     0.62      0.00      0.37     0.00
## FireplaceQu*     58  58      3.40     0.90      3.00      3.33     0.00
## GarageType*      59 129      4.64     0.96      5.00      4.91     0.00
## GarageYrBlt      60 129   1947.57    29.10   1937.00   1945.18    25.20
## GarageFinish*    61 129      2.89     0.42      3.00      3.00     0.00
## GarageCars       62 146      1.31     0.74      1.00      1.32     0.00
## GarageArea       63 146    344.66   209.00    308.00    337.04   161.60
## GarageQual*      64 129      4.21     1.31      5.00      4.38     0.00
## GarageCond*      65 129      3.56     1.03      4.00      3.80     0.00
## PavedDrive*      66 146      2.41     0.87      3.00      2.51     0.00
## WoodDeckSF       67 146     50.23   108.67      0.00     22.25     0.00
## OpenPorchSF      68 146     33.48    79.72      0.00     15.13     0.00
## EnclosedPorch    69 146     72.67    90.90      0.00     58.45     0.00
## X3SsnPorch       70 146      0.99    11.92      0.00      0.00     0.00
## ScreenPorch      71 146     17.82    65.98      0.00      0.00     0.00
## PoolArea         72 146      0.00     0.00      0.00      0.00     0.00
## PoolQC*          73   0       NaN       NA        NA       NaN       NA
## Fence*           74  34      2.53     0.90      3.00      2.57     0.00
## MiscFeature*     75   6      1.00     0.00      1.00      1.00     0.00
## MiscVal          76 146     27.53   142.85      0.00      0.00     0.00
## MoSold           77 146      6.45     2.50      6.00      6.39     1.48
## YrSold           78 146   2007.73     1.23   2008.00   2007.70     1.48
## SaleType*        79 146      6.71     1.16      7.00      7.00     0.00
## SaleCondition*   80 146      4.57     1.21      5.00      4.93     0.00
## SalePrice        81 146 132291.08 54592.39 125250.00 126199.13 35211.75
## log_SalePrice    82 146     11.72     0.37     11.74     11.72     0.29
##                     min       max     range  skew kurtosis      se
## Id                 4.00   1444.00   1440.00  0.06    -1.25   36.38
## MSSubClass        30.00    190.00    160.00  2.64     6.89    3.07
## MSZoning*          1.00      4.00      3.00 -1.06     1.46    0.05
## LotFrontage       30.00    130.00    100.00  1.61     3.62    1.33
## LotArea         3636.00  45600.00  41964.00  3.79    22.80  395.03
## Street*            1.00      1.00      0.00   NaN      NaN    0.00
## Alley*             1.00      2.00      1.00  1.06    -0.89    0.07
## LotShape*          1.00      4.00      3.00 -1.76     1.17    0.09
## LandContour*       1.00      4.00      3.00 -2.05     2.30    0.08
## Utilities*         1.00      1.00      0.00   NaN      NaN    0.00
## LotConfig*         1.00      4.00      3.00 -1.22    -0.49    0.10
## LandSlope*         1.00      3.00      2.00  5.50    31.29    0.02
## Neighborhood*      1.00     10.00      9.00 -0.03    -1.12    0.23
## Condition1*        1.00      6.00      5.00 -0.04     3.21    0.07
## Condition2*        1.00      5.00      4.00 -0.57    28.65    0.02
## BldgType*          1.00      2.00      1.00  3.61    11.09    0.02
## HouseStyle*        1.00      6.00      5.00  0.49    -1.25    0.16
## OverallQual        1.00     10.00      9.00  0.13     2.23    0.10
## OverallCond        1.00      9.00      8.00 -0.55     0.38    0.13
## YearBuilt       1872.00   1954.00     82.00 -1.10     1.82    1.15
## YearRemodAdd    1950.00   2008.00     58.00  0.33    -1.78    2.01
## RoofStyle*         1.00      4.00      3.00  3.09     8.54    0.05
## RoofMatl*          1.00      2.00      1.00 11.84   139.04    0.01
## Exterior1st*       1.00      9.00      8.00 -1.02     0.28    0.17
## Exterior2nd*       1.00     11.00     10.00 -0.86    -0.37    0.23
## MasVnrType*        1.00      3.00      2.00 -7.96    67.27    0.02
## MasVnrArea         0.00    435.00    435.00  7.17    51.82    4.11
## ExterQual*         1.00      4.00      3.00 -4.21    20.10    0.03
## ExterCond*         1.00      4.00      3.00 -1.88     2.94    0.06
## Foundation*        1.00      1.00      0.00   NaN      NaN    0.00
## BsmtQual*          1.00      4.00      3.00 -1.66     1.41    0.06
## BsmtCond*          1.00      4.00      3.00 -1.83     1.48    0.09
## BsmtExposure*      1.00      4.00      3.00 -3.68    16.63    0.03
## BsmtFinType1*      1.00      6.00      5.00 -1.23    -0.04    0.14
## BsmtFinSF1         0.00   1128.00   1128.00  1.65     2.28   21.28
## BsmtFinType2*      1.00      5.00      4.00 -5.86    34.78    0.05
## BsmtFinSF2         0.00    692.00    692.00  5.22    26.75    8.39
## BsmtUnfSF          0.00   1470.00   1470.00 -0.20    -0.48   26.37
## TotalBsmtSF        0.00   1559.00   1559.00  0.16     1.64   19.21
## Heating*           1.00      4.00      3.00  3.16     9.78    0.05
## HeatingQC*         1.00      4.00      3.00 -0.08    -1.60    0.10
## CentralAir*        1.00      2.00      1.00 -0.97    -1.08    0.04
## Electrical*        1.00      5.00      4.00 -1.32    -0.18    0.13
## X1stFlrSF        520.00   1687.00   1167.00  0.91     0.67   19.96
## X2ndFlrSF          0.00   1818.00   1818.00  0.52    -0.07   33.48
## LowQualFinSF       0.00    572.00    572.00  4.41    18.75    7.78
## GrLivArea        520.00   3608.00   3088.00  1.19     1.77   46.71
## BsmtFullBath       0.00      1.00      1.00  1.50     0.24    0.03
## BsmtHalfBath       0.00      1.00      1.00  5.07    23.86    0.02
## FullBath           0.00      3.00      3.00  1.10     0.54    0.04
## HalfBath           0.00      1.00      1.00  1.39    -0.06    0.03
## BedroomAbvGr       1.00      5.00      4.00  0.39    -0.41    0.07
## KitchenAbvGr       1.00      3.00      2.00  4.00    16.74    0.02
## KitchenQual*       1.00      4.00      3.00 -1.71     2.07    0.07
## TotRmsAbvGrd       4.00     12.00      8.00  0.73     0.48    0.14
## Functional*        1.00      6.00      5.00 -3.04     8.39    0.09
## Fireplaces         0.00      2.00      2.00  0.98    -0.12    0.05
## FireplaceQu*       1.00      5.00      4.00  0.74     0.12    0.12
## GarageType*        1.00      5.00      4.00 -2.42     4.15    0.08
## GarageYrBlt     1900.00   2007.00    107.00  0.69    -0.90    2.56
## GarageFinish*      1.00      3.00      2.00 -3.85    13.72    0.04
## GarageCars         0.00      3.00      3.00  0.16    -0.27    0.06
## GarageArea         0.00    880.00    880.00  0.40    -0.12   17.30
## GarageQual*        1.00      5.00      4.00 -1.09    -0.72    0.12
## GarageCond*        1.00      4.00      3.00 -1.98     2.08    0.09
## PavedDrive*        1.00      3.00      2.00 -0.89    -1.09    0.07
## WoodDeckSF         0.00    509.00    509.00  2.44     5.46    8.99
## OpenPorchSF        0.00    547.00    547.00  4.02    19.48    6.60
## EnclosedPorch      0.00    330.00    330.00  1.00    -0.18    7.52
## X3SsnPorch         0.00    144.00    144.00 11.84   139.04    0.99
## ScreenPorch        0.00    480.00    480.00  4.66    24.56    5.46
## PoolArea           0.00      0.00      0.00   NaN      NaN    0.00
## PoolQC*             Inf      -Inf      -Inf    NA       NA      NA
## Fence*             1.00      4.00      3.00 -0.70    -0.76    0.15
## MiscFeature*       1.00      1.00      0.00   NaN      NaN    0.00
## MiscVal            0.00   1150.00   1150.00  5.63    33.61   11.82
## MoSold             1.00     12.00     11.00  0.22    -0.33    0.21
## YrSold          2006.00   2010.00      4.00  0.05    -1.10    0.10
## SaleType*          1.00      7.00      6.00 -4.11    15.98    0.10
## SaleCondition*     1.00      6.00      5.00 -2.47     4.34    0.10
## SalePrice      37900.00 475000.00 437100.00  2.31    10.37 4518.10
## log_SalePrice     10.54     13.07      2.53  0.02     1.42    0.03
## ------------------------------------------------------------ 
## group: CBlock
##                vars   n      mean       sd    median   trimmed      mad
## Id                1 634    728.26   421.05    729.50    728.01   549.30
## MSSubClass        2 634     52.68    44.35     30.00     43.11    14.83
## MSZoning*         3 634      3.10     0.42      3.00      3.04     0.00
## LotFrontage       4 494     70.80    24.56     70.00     70.38    14.83
## LotArea           5 634  11272.36 13814.10   9600.00   9727.21  2816.94
## Street*           6 634      1.99     0.10      2.00      2.00     0.00
## Alley*            7  19      1.16     0.37      1.00      1.12     0.00
## LotShape*         8 634      3.01     1.40      4.00      3.14     0.00
## LandContour*      9 634      3.79     0.69      4.00      4.00     0.00
## Utilities*       10 634      1.00     0.04      1.00      1.00     0.00
## LotConfig*       11 634      3.28     1.21      4.00      3.47     0.00
## LandSlope*       12 634      1.08     0.32      1.00      1.00     0.00
## Neighborhood*    13 634     12.22     4.94     12.00     12.37     5.93
## Condition1*      14 634      2.95     0.79      3.00      2.96     0.00
## Condition2*      15 634      3.00     0.17      3.00      3.00     0.00
## BldgType*        16 634      1.41     1.02      1.00      1.12     0.00
## HouseStyle*      17 634      3.90     1.98      3.00      3.79     0.00
## OverallQual      18 634      5.42     0.96      5.00      5.40     1.48
## OverallCond      19 634      5.83     1.19      6.00      5.80     1.48
## YearBuilt        20 634   1961.25    16.72   1963.50   1962.92    14.08
## YearRemodAdd     21 634   1975.22    17.68   1972.00   1974.56    19.27
## RoofStyle*       22 634      2.47     0.90      2.00      2.35     0.00
## RoofMatl*        23 634      1.12     0.73      1.00      1.00     0.00
## Exterior1st*     24 634      8.02     2.67      7.00      7.99     1.48
## Exterior2nd*     25 634      8.72     3.00      8.00      8.74     2.97
## MasVnrType*      26 634      2.66     0.57      3.00      2.68     0.00
## MasVnrArea       27 634     89.56   148.31      0.00     58.35     0.00
## ExterQual*       28 634      3.86     0.43      4.00      3.99     0.00
## ExterCond*       29 634      4.69     0.77      5.00      4.88     0.00
## Foundation*      30 634      1.00     0.00      1.00      1.00     0.00
## BsmtQual*        31 625      3.72     0.51      4.00      3.80     0.00
## BsmtCond*        32 625      2.90     0.39      3.00      3.00     0.00
## BsmtExposure*    33 625      3.38     1.08      4.00      3.60     0.00
## BsmtFinType1*    34 625      3.24     1.87      3.00      3.18     2.97
## BsmtFinSF1       35 634    477.12   361.70    466.50    449.41   372.13
## BsmtFinType2*    36 625      5.49     1.18      6.00      5.81     0.00
## BsmtFinSF2       37 634     80.89   202.21      0.00     24.47     0.00
## BsmtUnfSF        38 634    443.47   360.11    388.00    401.89   343.22
## TotalBsmtSF      39 634   1001.49   335.76    954.50    984.78   266.13
## Heating*         40 634      2.01     0.13      2.00      2.00     0.00
## HeatingQC*       41 634      3.47     1.70      5.00      3.58     0.00
## CentralAir*      42 634      1.95     0.23      2.00      2.00     0.00
## Electrical*      43 634      2.80     0.59      3.00      2.99     0.00
## X1stFlrSF        44 634   1121.46   345.06   1056.00   1092.94   286.88
## X2ndFlrSF        45 634    228.71   363.28      0.00    161.02     0.00
## LowQualFinSF     46 634      5.32    47.28      0.00      0.00     0.00
## GrLivArea        47 634   1355.50   460.64   1263.00   1306.17   433.66
## BsmtFullBath     48 634      0.45     0.54      0.00      0.41     0.00
## BsmtHalfBath     49 634      0.09     0.29      0.00      0.00     0.00
## FullBath         50 634      1.32     0.52      1.00      1.28     0.00
## HalfBath         51 634      0.33     0.50      0.00      0.27     0.00
## BedroomAbvGr     52 634      2.87     0.83      3.00      2.84     0.00
## KitchenAbvGr     53 634      1.06     0.24      1.00      1.00     0.00
## KitchenQual*     54 634      3.70     0.60      4.00      3.83     0.00
## TotRmsAbvGrd     55 634      6.13     1.53      6.00      6.02     1.48
## Functional*      56 634      6.66     1.10      7.00      7.00     0.00
## Fireplaces       57 634      0.59     0.69      0.00      0.48     0.00
## FireplaceQu*     58 299      3.92     1.12      4.00      4.02     1.48
## GarageType*      59 588      3.34     1.85      2.00      3.19     0.00
## GarageYrBlt      60 588   1966.81    14.97   1967.00   1967.00    14.83
## GarageFinish*    61 588      2.46     0.72      3.00      2.57     0.00
## GarageCars       62 634      1.50     0.68      2.00      1.55     0.00
## GarageArea       63 634    410.85   192.21    440.00    413.15   195.70
## GarageQual*      64 588      3.93     0.36      4.00      4.00     0.00
## GarageCond*      65 588      4.89     0.55      5.00      5.00     0.00
## PavedDrive*      66 634      2.88     0.45      3.00      3.00     0.00
## WoodDeckSF       67 634     82.46   131.25      0.00     55.75     0.00
## OpenPorchSF      68 634     33.79    63.71      0.00     18.60     0.00
## EnclosedPorch    69 634     21.32    59.30      0.00      3.33     0.00
## X3SsnPorch       70 634      3.34    28.18      0.00      0.00     0.00
## ScreenPorch      71 634     18.96    60.77      0.00      0.24     0.00
## PoolArea         72 634      3.91    49.58      0.00      0.00     0.00
## PoolQC*          73   4      1.50     0.58      1.50      1.50     0.74
## Fence*           74 190      2.49     0.83      3.00      2.56     0.00
## MiscFeature*     75  35      2.89     0.53      3.00      3.00     0.00
## MiscVal          76 634     70.97   719.89      0.00      0.00     0.00
## MoSold           77 634      6.26     2.72      6.00      6.19     2.97
## YrSold           78 634   2007.87     1.35   2008.00   2007.83     1.48
## SaleType*        79 634      7.51     1.75      8.00      8.00     0.00
## SaleCondition*   80 634      4.60     1.17      5.00      4.97     0.00
## SalePrice        81 634 149805.71 48295.04 141500.00 144623.83 33358.50
## log_SalePrice    82 634     11.87     0.31     11.86     11.87     0.24
##                     min       max     range   skew kurtosis      se
## Id                 2.00   1460.00   1458.00  -0.01    -1.22   16.72
## MSSubClass        20.00    190.00    170.00   1.57     1.87    1.76
## MSZoning*          1.00      4.00      3.00  -0.29     7.50    0.02
## LotFrontage       21.00    313.00    292.00   2.07    19.17    1.10
## LotArea         1300.00 215245.00 213945.00  10.14   122.71  548.63
## Street*            1.00      2.00      1.00 -10.11   100.35    0.00
## Alley*             1.00      2.00      1.00   1.73     1.06    0.09
## LotShape*          1.00      4.00      3.00  -0.72    -1.47    0.06
## LandContour*       1.00      4.00      3.00  -3.33     9.93    0.03
## Utilities*         1.00      2.00      1.00  25.06   627.01    0.00
## LotConfig*         1.00      4.00      3.00  -1.18    -0.47    0.05
## LandSlope*         1.00      3.00      2.00   4.09    17.32    0.01
## Neighborhood*      1.00     23.00     22.00  -0.17    -0.58    0.20
## Condition1*        1.00      8.00      7.00   1.79    11.08    0.03
## Condition2*        1.00      6.00      5.00   8.12   197.74    0.01
## BldgType*          1.00      5.00      4.00   2.44     4.75    0.04
## HouseStyle*        1.00      8.00      7.00   0.65    -0.61    0.08
## OverallQual        2.00     10.00      8.00   0.24     1.28    0.04
## OverallCond        2.00      9.00      7.00   0.20     0.16    0.05
## YearBuilt       1875.00   2009.00    134.00  -1.18     2.39    0.66
## YearRemodAdd    1950.00   2010.00     60.00   0.31    -1.04    0.70
## RoofStyle*         1.00      6.00      5.00   1.22    -0.03    0.04
## RoofMatl*          1.00      7.00      6.00   6.28    39.15    0.03
## Exterior1st*       1.00     13.00     12.00   0.28    -0.79    0.11
## Exterior2nd*       1.00     14.00     13.00   0.03    -0.72    0.12
## MasVnrType*        1.00      4.00      3.00  -0.45     0.00    0.02
## MasVnrArea         0.00   1115.00   1115.00   2.27     7.42    5.89
## ExterQual*         1.00      4.00      3.00  -3.71    15.85    0.02
## ExterCond*         1.00      5.00      4.00  -2.17     3.25    0.03
## Foundation*        1.00      1.00      0.00    NaN      NaN    0.00
## BsmtQual*          1.00      4.00      3.00  -1.74     3.04    0.02
## BsmtCond*          1.00      3.00      2.00  -4.17    16.38    0.02
## BsmtExposure*      1.00      4.00      3.00  -1.39     0.27    0.04
## BsmtFinType1*      1.00      6.00      5.00   0.21    -1.47    0.07
## BsmtFinSF1         0.00   1880.00   1880.00   0.53     0.01   14.36
## BsmtFinType2*      1.00      6.00      5.00  -2.44     5.01    0.05
## BsmtFinSF2         0.00   1474.00   1474.00   3.05    10.18    8.03
## BsmtUnfSF          0.00   1907.00   1907.00   1.13     1.40   14.30
## TotalBsmtSF        0.00   2223.00   2223.00   0.43     1.37   13.33
## Heating*           1.00      4.00      3.00   8.79   115.84    0.01
## HeatingQC*         1.00      5.00      4.00  -0.44    -1.50    0.07
## CentralAir*        1.00      2.00      1.00  -3.95    13.65    0.01
## Electrical*        1.00      3.00      2.00  -2.64     5.13    0.02
## X1stFlrSF        438.00   2898.00   2460.00   0.94     1.37   13.70
## X2ndFlrSF          0.00   1540.00   1540.00   1.27     0.36   14.43
## LowQualFinSF       0.00    528.00    528.00   9.43    91.14    1.88
## GrLivArea        438.00   3447.00   3009.00   1.11     1.62   18.29
## BsmtFullBath       0.00      3.00      3.00   0.66    -0.38    0.02
## BsmtHalfBath       0.00      2.00      2.00   3.05     8.12    0.01
## FullBath           0.00      3.00      3.00   0.84    -0.03    0.02
## HalfBath           0.00      2.00      2.00   1.09    -0.01    0.02
## BedroomAbvGr       0.00      8.00      8.00   0.50     3.90    0.03
## KitchenAbvGr       0.00      2.00      2.00   3.47    11.93    0.01
## KitchenQual*       1.00      4.00      3.00  -2.35     6.08    0.02
## TotRmsAbvGrd       3.00     14.00     11.00   1.10     2.46    0.06
## Functional*        1.00      7.00      6.00  -3.19     9.38    0.04
## Fireplaces         0.00      3.00      3.00   0.85    -0.18    0.03
## FireplaceQu*       1.00      5.00      4.00  -0.31    -1.38    0.06
## GarageType*        1.00      6.00      5.00   0.68    -1.47    0.08
## GarageYrBlt     1906.00   2009.00    103.00  -0.27     0.74    0.62
## GarageFinish*      1.00      3.00      2.00  -0.94    -0.50    0.03
## GarageCars         0.00      4.00      4.00  -0.27     0.29    0.03
## GarageArea         0.00   1356.00   1356.00   0.22     1.67    7.63
## GarageQual*        1.00      4.00      3.00  -5.56    31.09    0.01
## GarageCond*        1.00      5.00      4.00  -5.14    25.63    0.02
## PavedDrive*        1.00      3.00      2.00  -3.78    12.70    0.02
## WoodDeckSF         0.00    857.00    857.00   1.90     4.41    5.21
## OpenPorchSF        0.00    523.00    523.00   2.88    10.94    2.53
## EnclosedPorch      0.00    318.00    318.00   2.78     6.72    2.35
## X3SsnPorch         0.00    407.00    407.00   9.55   102.06    1.12
## ScreenPorch        0.00    440.00    440.00   3.43    12.08    2.41
## PoolArea           0.00    738.00    738.00  12.77   164.07    1.97
## PoolQC*            1.00      2.00      1.00   0.00    -2.44    0.29
## Fence*             1.00      4.00      3.00  -0.62    -0.58    0.06
## MiscFeature*       1.00      4.00      3.00  -2.44     6.96    0.09
## MiscVal            0.00  15500.00  15500.00  18.00   357.03   28.59
## MoSold             1.00     12.00     11.00   0.21    -0.42    0.11
## YrSold          2006.00   2010.00      4.00   0.07    -1.22    0.05
## SaleType*          1.00      8.00      7.00  -3.33     9.25    0.07
## SaleCondition*     1.00      6.00      5.00  -2.65     5.23    0.05
## SalePrice      34900.00 402861.00 367961.00   1.52     4.27 1918.04
## log_SalePrice     10.46     12.91      2.45  -0.12     2.05    0.01
## ------------------------------------------------------------ 
## group: PConc
##                vars   n      mean       sd    median   trimmed      mad
## Id                1 647    727.94   418.74    732.00    727.53   529.29
## MSSubClass        2 647     60.26    41.01     60.00     54.51    59.30
## MSZoning*         3 647      2.88     0.69      3.00      2.99     0.00
## LotFrontage       4 542     71.70    25.58     70.00     70.38    19.27
## LotArea           5 647  10139.60  5585.26   9591.00   9668.80  3100.12
## Street*           6 647      1.00     0.00      1.00      1.00     0.00
## Alley*            7  34      1.79     0.41      2.00      1.86     0.00
## LotShape*         8 647      2.73     1.44      4.00      2.78     0.00
## LandContour*      9 647      3.82     0.62      4.00      4.00     0.00
## Utilities*       10 647      1.00     0.00      1.00      1.00     0.00
## LotConfig*       11 647      4.05     1.58      5.00      4.30     0.00
## LandSlope*       12 647      1.04     0.22      1.00      1.00     0.00
## Neighborhood*    13 647     10.84     5.97     12.00     10.70     8.90
## Condition1*      14 647      3.13     0.86      3.00      3.00     0.00
## Condition2*      15 647      2.00     0.07      2.00      2.00     0.00
## BldgType*        16 647      1.66     1.45      1.00      1.34     0.00
## HouseStyle*      17 647      4.44     1.71      3.00      4.44     2.97
## OverallQual      18 647      6.98     1.24      7.00      7.00     1.48
## OverallCond      19 647      5.20     0.68      5.00      5.04     0.00
## YearBuilt        20 647   1993.31    23.09   2002.00   1999.26     5.93
## YearRemodAdd     21 647   1998.05    13.59   2003.00   2001.29     5.93
## RoofStyle*       22 647      2.41     0.81      2.00      2.26     0.00
## RoofMatl*        23 647      2.01     0.20      2.00      2.00     0.00
## Exterior1st*     24 647      9.56     2.69     11.00     10.02     0.00
## Exterior2nd*     25 647     10.32     3.06     12.00     10.80     0.00
## MasVnrType*      26 639      2.81     0.70      3.00      2.77     1.48
## MasVnrArea       27 639    143.21   217.66     30.00     97.11    44.48
## ExterQual*       28 647      3.14     0.74      3.00      3.23     0.00
## ExterCond*       29 647      2.94     0.26      3.00      3.00     0.00
## Foundation*      30 647      1.00     0.00      1.00      1.00     0.00
## BsmtQual*        31 644      2.73     0.88      3.00      2.79     0.00
## BsmtCond*        32 644      2.92     0.29      3.00      3.00     0.00
## BsmtExposure*    33 643      3.02     1.25      4.00      3.15     0.00
## BsmtFinType1*    34 644      3.94     1.64      3.00      4.02     0.00
## BsmtFinSF1       35 647    492.05   544.04    405.00    425.76   600.45
## BsmtFinType2*    36 643      5.88     0.63      6.00      6.00     0.00
## BsmtFinSF2       37 647     21.32   119.74      0.00      0.00     0.00
## BsmtUnfSF        38 647    695.33   493.69    600.00    652.56   480.36
## TotalBsmtSF      39 647   1208.70   478.74   1151.00   1177.59   459.61
## Heating*         40 647      1.00     0.00      1.00      1.00     0.00
## HeatingQC*       41 647      1.46     0.97      1.00      1.23     0.00
## CentralAir*      42 647      1.99     0.11      2.00      2.00     0.00
## Electrical*      43 646      3.96     0.34      4.00      4.00     0.00
## X1stFlrSF        44 647   1248.05   428.38   1199.00   1215.09   446.26
## X2ndFlrSF        45 647    436.88   477.50      0.00    387.45     0.00
## LowQualFinSF     46 647      2.93    33.03      0.00      0.00     0.00
## GrLivArea        47 647   1687.86   520.92   1626.00   1641.11   410.68
## BsmtFullBath     48 647      0.47     0.51      0.00      0.46     0.00
## BsmtHalfBath     49 647      0.03     0.19      0.00      0.00     0.00
## FullBath         50 647      1.85     0.44      2.00      1.90     0.00
## HalfBath         51 647      0.49     0.51      0.00      0.48     0.00
## BedroomAbvGr     52 647      2.84     0.77      3.00      2.85     0.00
## KitchenAbvGr     53 647      1.01     0.12      1.00      1.00     0.00
## KitchenQual*     54 647      2.92     0.85      3.00      3.03     0.00
## TotRmsAbvGrd     55 647      6.87     1.59      7.00      6.80     1.48
## Functional*      56 647      4.94     0.43      5.00      5.00     0.00
## Fireplaces       57 647      0.69     0.59      1.00      0.66     0.00
## FireplaceQu*     58 404      3.65     1.15      3.00      3.71     0.00
## GarageType*      59 633      1.61     1.10      1.00      1.38     0.00
## GarageYrBlt      60 633   1996.24    16.97   2002.00   2000.22     5.93
## GarageFinish*    61 633      1.76     0.74      2.00      1.70     1.48
## GarageCars       62 647      2.15     0.62      2.00      2.19     0.00
## GarageArea       63 647    566.15   196.96    539.00    562.04   161.60
## GarageQual*      64 633      2.98     0.18      3.00      3.00     0.00
## GarageCond*      65 633      3.98     0.24      4.00      4.00     0.00
## PavedDrive*      66 647      2.95     0.31      3.00      3.00     0.00
## WoodDeckSF       67 647    118.43   119.68    120.00    102.27   177.91
## OpenPorchSF      68 647     63.61    61.88     48.00     54.41    47.44
## EnclosedPorch    69 647     10.24    44.99      0.00      0.00     0.00
## X3SsnPorch       70 647      3.70    31.21      0.00      0.00     0.00
## ScreenPorch      71 647     11.39    48.83      0.00      0.00     0.00
## PoolArea         72 647      2.39    35.12      0.00      0.00     0.00
## PoolQC*          73   3      1.33     0.58      1.00      1.33     0.00
## Fence*           74  51      2.18     0.91      3.00      2.22     0.00
## MiscFeature*     75   8      1.00     0.00      1.00      1.00     0.00
## MiscVal          76 647      9.40   102.85      0.00      0.00     0.00
## MoSold           77 647      6.38     2.73      6.00      6.30     2.97
## YrSold           78 647   2007.77     1.31   2008.00   2007.71     1.48
## SaleType*        79 647      8.53     1.06      9.00      8.74     0.00
## SaleCondition*   80 647      4.99     0.97      5.00      5.11     0.00
## SalePrice        81 647 225230.44 86865.98 205000.00 214800.87 62417.46
## log_SalePrice    82 647     12.26     0.35     12.23     12.25     0.31
##                     min       max     range   skew kurtosis      se
## Id                 1.00   1456.00   1455.00   0.01    -1.19   16.46
## MSSubClass        20.00    190.00    170.00   1.08     0.55    1.61
## MSZoning*          1.00      4.00      3.00  -1.65     3.05    0.03
## LotFrontage       24.00    313.00    289.00   2.11    15.31    1.10
## LotArea         2117.00  63887.00  61770.00   3.92    27.87  219.58
## Street*            1.00      1.00      0.00    NaN      NaN    0.00
## Alley*             1.00      2.00      1.00  -1.39    -0.06    0.07
## LotShape*          1.00      4.00      3.00  -0.29    -1.87    0.06
## LandContour*       1.00      4.00      3.00  -3.37    10.25    0.02
## Utilities*         1.00      1.00      0.00    NaN      NaN    0.00
## LotConfig*         1.00      5.00      4.00  -1.16    -0.47    0.06
## LandSlope*         1.00      3.00      2.00   5.78    36.25    0.01
## Neighborhood*      1.00     22.00     21.00   0.11    -1.36    0.23
## Condition1*        1.00      9.00      8.00   4.22    21.10    0.03
## Condition2*        1.00      3.00      2.00   4.82   211.78    0.00
## BldgType*          1.00      5.00      4.00   1.76     1.19    0.06
## HouseStyle*        1.00      8.00      7.00   0.04    -1.29    0.07
## OverallQual        3.00     10.00      7.00  -0.15     0.12    0.05
## OverallCond        2.00      9.00      7.00   2.35     8.78    0.03
## YearBuilt       1885.00   2010.00    125.00  -2.47     5.33    0.91
## YearRemodAdd    1950.00   2010.00     60.00  -2.34     4.99    0.53
## RoofStyle*         1.00      5.00      4.00   1.45     0.20    0.03
## RoofMatl*          1.00      5.00      4.00  12.67   182.05    0.01
## Exterior1st*       1.00     13.00     12.00  -1.38     0.55    0.11
## Exterior2nd*       1.00     14.00     13.00  -1.25     0.09    0.12
## MasVnrType*        1.00      4.00      3.00   0.26    -0.93    0.03
## MasVnrArea         0.00   1600.00   1600.00   2.32     7.19    8.61
## ExterQual*         1.00      4.00      3.00  -1.27     2.44    0.03
## ExterCond*         1.00      3.00      2.00  -4.25    18.90    0.01
## Foundation*        1.00      1.00      0.00    NaN      NaN    0.00
## BsmtQual*          1.00      4.00      3.00  -1.05     0.19    0.03
## BsmtCond*          1.00      3.00      2.00  -3.94    16.16    0.01
## BsmtExposure*      1.00      4.00      3.00  -0.70    -1.26    0.05
## BsmtFinType1*      1.00      6.00      5.00   0.15    -1.28    0.06
## BsmtFinSF1         0.00   5644.00   5644.00   1.82    11.25   21.39
## BsmtFinType2*      1.00      6.00      5.00  -5.98    37.13    0.02
## BsmtFinSF2         0.00   1127.00   1127.00   6.66    47.60    4.71
## BsmtUnfSF          0.00   2336.00   2336.00   0.70    -0.29   19.41
## TotalBsmtSF        0.00   6110.00   6110.00   2.21    17.29   18.82
## Heating*           1.00      1.00      0.00    NaN      NaN    0.00
## HeatingQC*         1.00      4.00      3.00   1.76     1.41    0.04
## CentralAir*        1.00      2.00      1.00  -8.80    75.64    0.00
## Electrical*        1.00      4.00      3.00  -8.01    63.75    0.01
## X1stFlrSF        520.00   4692.00   4172.00   1.46     6.77   16.84
## X2ndFlrSF          0.00   2065.00   2065.00   0.48    -1.08   18.77
## LowQualFinSF       0.00    481.00    481.00  12.34   155.17    1.30
## GrLivArea        672.00   5642.00   4970.00   1.86     8.46   20.48
## BsmtFullBath       0.00      2.00      2.00   0.28    -1.52    0.02
## BsmtHalfBath       0.00      2.00      2.00   5.79    35.60    0.01
## FullBath           0.00      3.00      3.00  -0.93     1.81    0.02
## HalfBath           0.00      2.00      2.00   0.12    -1.81    0.02
## BedroomAbvGr       0.00      6.00      6.00  -0.26     1.10    0.03
## KitchenAbvGr       1.00      3.00      2.00  11.10   137.25    0.00
## KitchenQual*       1.00      4.00      3.00  -1.13     0.94    0.03
## TotRmsAbvGrd       3.00     12.00      9.00   0.39     0.39    0.06
## Functional*        1.00      5.00      4.00  -8.10    66.90    0.02
## Fireplaces         0.00      3.00      3.00   0.30    -0.14    0.02
## FireplaceQu*       1.00      5.00      4.00  -0.14    -0.75    0.06
## GarageType*        1.00      4.00      3.00   1.38     0.14    0.04
## GarageYrBlt     1910.00   2010.00    100.00  -2.70     7.67    0.67
## GarageFinish*      1.00      3.00      2.00   0.41    -1.07    0.03
## GarageCars         0.00      4.00      4.00  -0.61     1.85    0.02
## GarageArea         0.00   1418.00   1418.00   0.23     1.46    7.74
## GarageQual*        1.00      3.00      2.00  -9.50    93.90    0.01
## GarageCond*        1.00      4.00      3.00 -11.26   129.48    0.01
## PavedDrive*        1.00      3.00      2.00  -5.89    33.68    0.01
## WoodDeckSF         0.00    668.00    668.00   1.08     1.59    4.71
## OpenPorchSF        0.00    406.00    406.00   1.50     2.97    2.43
## EnclosedPorch      0.00    552.00    552.00   5.71    43.05    1.77
## X3SsnPorch         0.00    508.00    508.00  10.57   132.73    1.23
## ScreenPorch        0.00    396.00    396.00   4.54    21.31    1.92
## PoolArea           0.00    555.00    555.00  14.63   213.08    1.38
## PoolQC*            1.00      2.00      1.00   0.38    -2.33    0.33
## Fence*             1.00      3.00      2.00  -0.34    -1.73    0.13
## MiscFeature*       1.00      1.00      0.00    NaN      NaN    0.00
## MiscVal            0.00   2000.00   2000.00  14.64   247.23    4.04
## MoSold             1.00     12.00     11.00   0.21    -0.45    0.11
## YrSold          2006.00   2010.00      4.00   0.15    -1.16    0.05
## SaleType*          1.00      9.00      8.00  -3.12    13.83    0.04
## SaleCondition*     1.00      6.00      5.00  -2.95    10.06    0.04
## SalePrice      78000.00 755000.00 677000.00   1.79     5.73 3415.05
## log_SalePrice     11.26     13.53      2.27   0.28     0.58    0.01
## ------------------------------------------------------------ 
## group: Slab
##                vars  n      mean       sd    median   trimmed      mad      min
## Id                1 24    781.67   400.82    882.00    800.25   469.24    18.00
## MSSubClass        2 24     63.12    42.16     72.50     59.75    29.65    20.00
## MSZoning*         3 24      2.08     0.41      2.00      2.05     0.00     1.00
## LotFrontage       4 19     65.21    11.32     64.00     64.06     5.93    50.00
## LotArea           5 24   9117.62  3554.16   8369.50   8585.55  2003.73  5000.00
## Street*           6 24      1.00     0.00      1.00      1.00     0.00     1.00
## Alley*            7  0       NaN       NA        NA       NaN       NA      Inf
## LotShape*         8 24      2.71     0.69      3.00      2.85     0.00     1.00
## LandContour*      9 24      2.79     0.59      3.00      2.95     0.00     1.00
## Utilities*       10 24      1.00     0.00      1.00      1.00     0.00     1.00
## LotConfig*       11 24      3.17     1.27      4.00      3.30     0.00     1.00
## LandSlope*       12 24      1.04     0.20      1.00      1.00     0.00     1.00
## Neighborhood*    13 24      4.75     2.47      6.00      4.80     2.97     1.00
## Condition1*      14 24      1.92     0.41      2.00      1.95     0.00     1.00
## Condition2*      15 24      1.00     0.00      1.00      1.00     0.00     1.00
## BldgType*        16 24      1.88     0.99      1.00      1.85     0.00     1.00
## HouseStyle*      17 24      2.08     0.65      2.00      2.05     0.00     1.00
## OverallQual      18 24      4.29     1.20      4.00      4.30     1.48     1.00
## OverallCond      19 24      4.75     1.07      5.00      4.70     0.00     3.00
## YearBuilt        20 24   1959.58    15.92   1955.00   1958.65    10.38  1930.00
## YearRemodAdd     21 24   1965.17    19.02   1956.00   1962.55     8.90  1950.00
## RoofStyle*       22 24      2.08     0.41      2.00      2.05     0.00     1.00
## RoofMatl*        23 24      1.04     0.20      1.00      1.00     0.00     1.00
## Exterior1st*     24 24      5.50     2.47      5.50      5.50     3.71     1.00
## Exterior2nd*     25 24      6.67     2.97      6.00      6.75     3.71     1.00
## MasVnrType*      26 24      1.83     0.38      2.00      1.90     0.00     1.00
## MasVnrArea       27 24     51.46   133.53      0.00     19.75     0.00     0.00
## ExterQual*       28 24      2.79     0.59      3.00      2.95     0.00     1.00
## ExterCond*       29 24      2.62     0.77      3.00      2.75     0.00     1.00
## Foundation*      30 24      1.00     0.00      1.00      1.00     0.00     1.00
## BsmtQual*        31  0       NaN       NA        NA       NaN       NA      Inf
## BsmtCond*        32  0       NaN       NA        NA       NaN       NA      Inf
## BsmtExposure*    33  0       NaN       NA        NA       NaN       NA      Inf
## BsmtFinType1*    34  0       NaN       NA        NA       NaN       NA      Inf
## BsmtFinSF1       35 24      0.00     0.00      0.00      0.00     0.00     0.00
## BsmtFinType2*    36  0       NaN       NA        NA       NaN       NA      Inf
## BsmtFinSF2       37 24      0.00     0.00      0.00      0.00     0.00     0.00
## BsmtUnfSF        38 24      0.00     0.00      0.00      0.00     0.00     0.00
## TotalBsmtSF      39 24      0.00     0.00      0.00      0.00     0.00     0.00
## Heating*         40 24      1.38     0.77      1.00      1.25     0.00     1.00
## HeatingQC*       41 24      2.88     1.19      3.00      2.95     1.48     1.00
## CentralAir*      42 24      1.62     0.49      2.00      1.65     0.00     1.00
## Electrical*      43 24      2.50     0.78      3.00      2.60     0.00     1.00
## X1stFlrSF        44 24   1118.42   430.45   1064.00   1117.25   347.67   334.00
## X2ndFlrSF        45 24    218.83   404.93      0.00    135.25     0.00     0.00
## LowQualFinSF     46 24      2.21    10.82      0.00      0.00     0.00     0.00
## GrLivArea        47 24   1339.46   506.17   1174.00   1321.30   501.12   334.00
## BsmtFullBath     48 24      0.00     0.00      0.00      0.00     0.00     0.00
## BsmtHalfBath     49 24      0.00     0.00      0.00      0.00     0.00     0.00
## FullBath         50 24      1.67     0.56      2.00      1.65     0.00     1.00
## HalfBath         51 24      0.00     0.00      0.00      0.00     0.00     0.00
## BedroomAbvGr     52 24      2.92     1.14      3.00      2.85     1.48     1.00
## KitchenAbvGr     53 24      1.46     0.51      1.00      1.45     0.00     1.00
## KitchenQual*     54 24      2.71     0.69      3.00      2.85     0.00     1.00
## TotRmsAbvGrd     55 24      6.50     2.28      6.00      6.45     2.22     2.00
## Functional*      56 24      3.46     0.98      4.00      3.65     0.00     1.00
## Fireplaces       57 24      0.33     0.56      0.00      0.25     0.00     0.00
## FireplaceQu*     58  7      2.86     1.21      3.00      2.86     1.48     1.00
## GarageType*      59 20      2.75     1.41      3.50      2.81     0.74     1.00
## GarageYrBlt      60 20   1967.20    14.89   1964.50   1966.25    19.27  1945.00
## GarageFinish*    61 20      1.90     0.31      2.00      2.00     0.00     1.00
## GarageCars       62 24      1.50     0.78      2.00      1.60     0.00     0.00
## GarageArea       63 24    375.04   203.45    405.00    382.85   167.53     0.00
## GarageQual*      64 20      1.00     0.00      1.00      1.00     0.00     1.00
## GarageCond*      65 20      1.95     0.22      2.00      2.00     0.00     1.00
## PavedDrive*      66 24      2.46     0.88      3.00      2.55     0.00     1.00
## WoodDeckSF       67 24     23.00    54.57      0.00     10.60     0.00     0.00
## OpenPorchSF      68 24      8.71    30.25      0.00      1.45     0.00     0.00
## EnclosedPorch    69 24     23.00    62.19      0.00      8.85     0.00     0.00
## X3SsnPorch       70 24      0.00     0.00      0.00      0.00     0.00     0.00
## ScreenPorch      71 24      0.00     0.00      0.00      0.00     0.00     0.00
## PoolArea         72 24      0.00     0.00      0.00      0.00     0.00     0.00
## PoolQC*          73  0       NaN       NA        NA       NaN       NA      Inf
## Fence*           74  2      1.50     0.71      1.50      1.50     0.74     1.00
## MiscFeature*     75  3      1.67     0.58      2.00      1.67     0.00     1.00
## MiscVal          76 24    216.67   746.39      0.00     25.00     0.00     0.00
## MoSold           77 24      5.83     2.46      6.00      5.85     1.48     1.00
## YrSold           78 24   2008.04     1.49   2009.00   2008.05     1.48  2006.00
## SaleType*        79 24      1.96     0.20      2.00      2.00     0.00     1.00
## SaleCondition*   80 24      1.88     0.34      2.00      1.95     0.00     1.00
## SalePrice        81 24 107365.62 34213.98 104150.00 105748.75 21884.66 39300.00
## log_SalePrice    82 24     11.53     0.34     11.55     11.55     0.22    10.58
##                     max     range  skew kurtosis      se
## Id               1413.0   1395.00 -0.43    -1.06   81.82
## MSSubClass        190.0    170.00  0.85     0.90    8.61
## MSZoning*           3.0      2.00  0.63     2.24    0.08
## LotFrontage       100.0     50.00  1.27     2.25    2.60
## LotArea         21750.0  16750.00  1.96     4.20  725.49
## Street*             1.0      0.00   NaN      NaN    0.00
## Alley*             -Inf      -Inf    NA       NA      NA
## LotShape*           3.0      2.00 -1.88     1.76    0.14
## LandContour*        3.0      2.00 -2.42     4.32    0.12
## Utilities*          1.0      0.00   NaN      NaN    0.00
## LotConfig*          4.0      3.00 -0.90    -1.08    0.26
## LandSlope*          2.0      1.00  4.30    17.24    0.04
## Neighborhood*       8.0      7.00 -0.17    -1.59    0.50
## Condition1*         3.0      2.00 -0.63     2.24    0.08
## Condition2*         1.0      0.00   NaN      NaN    0.00
## BldgType*           3.0      2.00  0.24    -1.98    0.20
## HouseStyle*         4.0      3.00  0.82     1.50    0.13
## OverallQual         7.0      6.00 -0.40     0.92    0.24
## OverallCond         7.0      4.00  0.08    -0.11    0.22
## YearBuilt        2003.0     73.00  0.78     0.32    3.25
## YearRemodAdd     2007.0     57.00  1.03    -0.26    3.88
## RoofStyle*          3.0      2.00  0.63     2.24    0.08
## RoofMatl*           2.0      1.00  4.30    17.24    0.04
## Exterior1st*       10.0      9.00  0.09    -1.19    0.50
## Exterior2nd*       11.0     10.00 -0.11    -1.30    0.61
## MasVnrType*         2.0      1.00 -1.68     0.86    0.08
## MasVnrArea        500.0    500.00  2.29     3.91   27.26
## ExterQual*          3.0      2.00 -2.42     4.32    0.12
## ExterCond*          3.0      2.00 -1.50     0.37    0.16
## Foundation*         1.0      0.00   NaN      NaN    0.00
## BsmtQual*          -Inf      -Inf    NA       NA      NA
## BsmtCond*          -Inf      -Inf    NA       NA      NA
## BsmtExposure*      -Inf      -Inf    NA       NA      NA
## BsmtFinType1*      -Inf      -Inf    NA       NA      NA
## BsmtFinSF1          0.0      0.00   NaN      NaN    0.00
## BsmtFinType2*      -Inf      -Inf    NA       NA      NA
## BsmtFinSF2          0.0      0.00   NaN      NaN    0.00
## BsmtUnfSF           0.0      0.00   NaN      NaN    0.00
## TotalBsmtSF         0.0      0.00   NaN      NaN    0.00
## Heating*            3.0      2.00  1.50     0.37    0.16
## HeatingQC*          4.0      3.00 -0.36    -1.54    0.24
## CentralAir*         2.0      1.00 -0.48    -1.84    0.10
## Electrical*         3.0      2.00 -1.05    -0.58    0.16
## X1stFlrSF        2020.0   1686.00  0.10    -0.60   87.87
## X2ndFlrSF        1427.0   1427.00  1.65     1.61   82.66
## LowQualFinSF       53.0     53.00  4.30    17.24    2.21
## GrLivArea        2320.0   1986.00  0.28    -0.86  103.32
## BsmtFullBath        0.0      0.00   NaN      NaN    0.00
## BsmtHalfBath        0.0      0.00   NaN      NaN    0.00
## FullBath            3.0      2.00  0.05    -0.91    0.12
## HalfBath            0.0      0.00   NaN      NaN    0.00
## BedroomAbvGr        6.0      5.00  0.66    -0.01    0.23
## KitchenAbvGr        2.0      1.00  0.16    -2.06    0.10
## KitchenQual*        3.0      2.00 -1.88     1.76    0.14
## TotRmsAbvGrd       12.0     10.00  0.26    -0.21    0.47
## Functional*         4.0      3.00 -1.50     0.83    0.20
## Fireplaces          2.0      2.00  1.34     0.73    0.12
## FireplaceQu*        4.0      3.00 -0.25    -1.81    0.46
## GarageType*         4.0      3.00 -0.33    -1.86    0.32
## GarageYrBlt      2003.0     58.00  0.51    -0.55    3.33
## GarageFinish*       2.0      1.00 -2.47     4.32    0.07
## GarageCars          2.0      2.00 -1.05    -0.58    0.16
## GarageArea        672.0    672.00 -0.64    -0.53   41.53
## GarageQual*         1.0      0.00   NaN      NaN    0.00
## GarageCond*         2.0      1.00 -3.82    13.29    0.05
## PavedDrive*         3.0      2.00 -0.97    -1.04    0.18
## WoodDeckSF        186.0    186.00  1.94     2.25   11.14
## OpenPorchSF       144.0    144.00  3.75    13.70    6.18
## EnclosedPorch     190.0    190.00  2.13     2.67   12.69
## X3SsnPorch          0.0      0.00   NaN      NaN    0.00
## ScreenPorch         0.0      0.00   NaN      NaN    0.00
## PoolArea            0.0      0.00   NaN      NaN    0.00
## PoolQC*            -Inf      -Inf    NA       NA      NA
## Fence*              2.0      1.00  0.00    -2.75    0.50
## MiscFeature*        2.0      1.00 -0.38    -2.33    0.33
## MiscVal          3500.0   3500.00  3.62    12.73  152.36
## MoSold             11.0     10.00  0.05    -0.13    0.50
## YrSold           2010.0      4.00 -0.30    -1.62    0.30
## SaleType*           2.0      1.00 -4.30    17.24    0.04
## SaleCondition*      2.0      1.00 -2.13     2.64    0.07
## SalePrice      198500.0 159200.00  0.61     0.61 6983.90
## log_SalePrice      12.2      1.62 -0.65     1.13    0.07
## ------------------------------------------------------------ 
## group: Stone
##                vars n      mean       sd    median   trimmed      mad       min
## Id                1 6    888.50   436.03    810.50    888.50   430.70    247.00
## MSSubClass        2 6     78.33    58.11     70.00     78.33    14.83     20.00
## MSZoning*         3 6      2.33     0.82      2.50      2.33     0.74      1.00
## LotFrontage       4 6     66.67     4.63     66.00     66.67     2.97     60.00
## LotArea           5 6   9014.67  1622.67   8967.00   9014.67   318.76   6600.00
## Street*           6 6      1.00     0.00      1.00      1.00     0.00      1.00
## Alley*            7 3      1.67     0.58      2.00      1.67     0.00      1.00
## LotShape*         8 6      1.83     0.41      2.00      1.83     0.00      1.00
## LandContour*      9 6      1.83     0.41      2.00      1.83     0.00      1.00
## Utilities*       10 6      1.00     0.00      1.00      1.00     0.00      1.00
## LotConfig*       11 6      1.50     0.55      1.50      1.50     0.74      1.00
## LandSlope*       12 6      1.17     0.41      1.00      1.17     0.00      1.00
## Neighborhood*    13 6      3.00     1.26      3.50      3.00     0.74      1.00
## Condition1*      14 6      1.00     0.00      1.00      1.00     0.00      1.00
## Condition2*      15 6      1.00     0.00      1.00      1.00     0.00      1.00
## BldgType*        16 6      1.17     0.41      1.00      1.17     0.00      1.00
## HouseStyle*      17 6      2.50     0.84      3.00      2.50     0.00      1.00
## OverallQual      18 6      5.67     1.21      5.50      5.67     1.48      4.00
## OverallCond      19 6      7.00     1.67      7.00      7.00     0.74      4.00
## YearBuilt        20 6   1912.67    28.61   1905.00   1912.67    28.17   1880.00
## YearRemodAdd     21 6   1978.33    26.34   1980.50   1978.33    35.58   1950.00
## RoofStyle*       22 6      1.17     0.41      1.00      1.17     0.00      1.00
## RoofMatl*        23 6      1.00     0.00      1.00      1.00     0.00      1.00
## Exterior1st*     24 6      3.50     1.87      3.50      3.50     2.22      1.00
## Exterior2nd*     25 6      3.50     1.87      3.50      3.50     2.22      1.00
## MasVnrType*      26 6      1.00     0.00      1.00      1.00     0.00      1.00
## MasVnrArea       27 6      0.00     0.00      0.00      0.00     0.00      0.00
## ExterQual*       28 6      2.33     0.82      2.50      2.33     0.74      1.00
## ExterCond*       29 6      2.50     0.84      3.00      2.50     0.00      1.00
## Foundation*      30 6      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtQual*        31 6      1.83     0.41      2.00      1.83     0.00      1.00
## BsmtCond*        32 6      2.50     0.84      3.00      2.50     0.00      1.00
## BsmtExposure*    33 6      2.50     0.84      3.00      2.50     0.00      1.00
## BsmtFinType1*    34 6      1.83     0.41      2.00      1.83     0.00      1.00
## BsmtFinSF1       35 6     45.83   112.27      0.00     45.83     0.00      0.00
## BsmtFinType2*    36 6      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtFinSF2       37 6      0.00     0.00      0.00      0.00     0.00      0.00
## BsmtUnfSF        38 6    849.17   389.25    935.50    849.17   119.35    105.00
## TotalBsmtSF      39 6    895.00   408.88   1007.00    895.00   217.20    105.00
## Heating*         40 6      1.17     0.41      1.00      1.17     0.00      1.00
## HeatingQC*       41 6      2.17     0.75      2.00      2.17     0.74      1.00
## CentralAir*      42 6      1.50     0.55      1.50      1.50     0.74      1.00
## Electrical*      43 6      1.83     0.41      2.00      1.83     0.00      1.00
## X1stFlrSF        44 6   1093.83   229.89   1049.00   1093.83   245.37    859.00
## X2ndFlrSF        45 6    800.83   519.94   1007.00    800.83   339.52      0.00
## LowQualFinSF     46 6      0.00     0.00      0.00      0.00     0.00      0.00
## GrLivArea        47 6   1894.67   702.28   2134.00   1894.67   551.53    910.00
## BsmtFullBath     48 6      0.00     0.00      0.00      0.00     0.00      0.00
## BsmtHalfBath     49 6      0.00     0.00      0.00      0.00     0.00      0.00
## FullBath         50 6      1.50     0.55      1.50      1.50     0.74      1.00
## HalfBath         51 6      0.17     0.41      0.00      0.17     0.00      0.00
## BedroomAbvGr     52 6      3.50     0.84      4.00      3.50     0.00      2.00
## KitchenAbvGr     53 6      1.33     0.52      1.00      1.33     0.00      1.00
## KitchenQual*     54 6      2.17     0.75      2.00      2.17     0.74      1.00
## TotRmsAbvGrd     55 6      8.17     2.04      8.50      8.17     1.48      5.00
## Functional*      56 6      1.83     0.41      2.00      1.83     0.00      1.00
## Fireplaces       57 6      0.50     0.84      0.00      0.50     0.00      0.00
## FireplaceQu*     58 2      1.00     0.00      1.00      1.00     0.00      1.00
## GarageType*      59 6      1.50     0.55      1.50      1.50     0.74      1.00
## GarageYrBlt      60 6   1950.50    24.94   1951.50   1950.50    17.05   1910.00
## GarageFinish*    61 6      1.50     0.55      1.50      1.50     0.74      1.00
## GarageCars       62 6      1.67     1.21      1.00      1.67     0.00      1.00
## GarageArea       63 6    464.33   207.58    423.00    464.33    41.51    252.00
## GarageQual*      64 6      1.83     0.41      2.00      1.83     0.00      1.00
## GarageCond*      65 6      1.83     0.41      2.00      1.83     0.00      1.00
## PavedDrive*      66 6      1.67     0.52      2.00      1.67     0.00      1.00
## WoodDeckSF       67 6     74.17    92.52     34.00     74.17    50.41      0.00
## OpenPorchSF      68 6     67.83   111.32     30.00     67.83    44.48      0.00
## EnclosedPorch    69 6    124.33   142.05    105.00    124.33   111.19      0.00
## X3SsnPorch       70 6      0.00     0.00      0.00      0.00     0.00      0.00
## ScreenPorch      71 6      0.00     0.00      0.00      0.00     0.00      0.00
## PoolArea         72 6      0.00     0.00      0.00      0.00     0.00      0.00
## PoolQC*          73 0       NaN       NA        NA       NaN       NA       Inf
## Fence*           74 2      1.50     0.71      1.50      1.50     0.74      1.00
## MiscFeature*     75 1      1.00       NA      1.00      1.00     0.00      1.00
## MiscVal          76 6    416.67  1020.62      0.00    416.67     0.00      0.00
## MoSold           77 6      6.17     4.07      5.00      6.17     3.71      1.00
## YrSold           78 6   2008.67     1.51   2009.00   2008.67     1.48   2006.00
## SaleType*        79 6      1.00     0.00      1.00      1.00     0.00      1.00
## SaleCondition*   80 6      1.83     0.41      2.00      1.83     0.00      1.00
## SalePrice        81 6 165959.17 78557.70 126500.00 165959.17 31671.30 102776.00
## log_SalePrice    82 6     11.93     0.44     11.74     11.93     0.27     11.54
##                      max     range  skew kurtosis       se
## Id               1458.00   1211.00 -0.04    -1.60   178.01
## MSSubClass        190.00    170.00  0.99    -0.55    23.72
## MSZoning*           3.00      2.00 -0.48    -1.58     0.33
## LotFrontage        74.00     14.00  0.18    -1.23     1.89
## LotArea         11700.00   5100.00  0.21    -0.93   662.45
## Street*             1.00      0.00   NaN      NaN     0.00
## Alley*              2.00      1.00 -0.38    -2.33     0.33
## LotShape*           2.00      1.00 -1.36    -0.08     0.17
## LandContour*        2.00      1.00 -1.36    -0.08     0.17
## Utilities*          1.00      0.00   NaN      NaN     0.00
## LotConfig*          2.00      1.00  0.00    -2.31     0.22
## LandSlope*          2.00      1.00  1.36    -0.08     0.17
## Neighborhood*       4.00      3.00 -0.49    -1.70     0.52
## Condition1*         1.00      0.00   NaN      NaN     0.00
## Condition2*         1.00      0.00   NaN      NaN     0.00
## BldgType*           2.00      1.00  1.36    -0.08     0.17
## HouseStyle*         3.00      2.00 -0.85    -1.17     0.34
## OverallQual         7.00      3.00 -0.04    -1.88     0.49
## OverallCond         9.00      5.00 -0.64    -0.92     0.68
## YearBuilt        1953.00     73.00  0.30    -1.85    11.68
## YearRemodAdd     2006.00     56.00 -0.06    -2.18    10.75
## RoofStyle*          2.00      1.00  1.36    -0.08     0.17
## RoofMatl*           1.00      0.00   NaN      NaN     0.00
## Exterior1st*        6.00      5.00  0.00    -1.80     0.76
## Exterior2nd*        6.00      5.00  0.00    -1.80     0.76
## MasVnrType*         1.00      0.00   NaN      NaN     0.00
## MasVnrArea          0.00      0.00   NaN      NaN     0.00
## ExterQual*          3.00      2.00 -0.48    -1.58     0.33
## ExterCond*          3.00      2.00 -0.85    -1.17     0.34
## Foundation*         1.00      0.00   NaN      NaN     0.00
## BsmtQual*           2.00      1.00 -1.36    -0.08     0.17
## BsmtCond*           3.00      2.00 -0.85    -1.17     0.34
## BsmtExposure*       3.00      2.00 -0.85    -1.17     0.34
## BsmtFinType1*       2.00      1.00 -1.36    -0.08     0.17
## BsmtFinSF1        275.00    275.00  1.36    -0.08    45.83
## BsmtFinType2*       1.00      0.00   NaN      NaN     0.00
## BsmtFinSF2          0.00      0.00   NaN      NaN     0.00
## BsmtUnfSF        1240.00   1135.00 -0.97    -0.59   158.91
## TotalBsmtSF      1240.00   1135.00 -1.05    -0.56   166.92
## Heating*            2.00      1.00  1.36    -0.08     0.17
## HeatingQC*          3.00      2.00 -0.17    -1.54     0.31
## CentralAir*         2.00      1.00  0.00    -2.31     0.22
## Electrical*         2.00      1.00 -1.36    -0.08     0.17
## X1stFlrSF        1378.00    519.00  0.13    -2.13    93.85
## X2ndFlrSF        1320.00   1320.00 -0.50    -1.73   212.27
## LowQualFinSF        0.00      0.00   NaN      NaN     0.00
## GrLivArea        2640.00   1730.00 -0.34    -1.90   286.70
## BsmtFullBath        0.00      0.00   NaN      NaN     0.00
## BsmtHalfBath        0.00      0.00   NaN      NaN     0.00
## FullBath            2.00      1.00  0.00    -2.31     0.22
## HalfBath            1.00      1.00  1.36    -0.08     0.17
## BedroomAbvGr        4.00      2.00 -0.85    -1.17     0.34
## KitchenAbvGr        2.00      1.00  0.54    -1.96     0.21
## KitchenQual*        3.00      2.00 -0.17    -1.54     0.31
## TotRmsAbvGrd       11.00      6.00 -0.19    -1.39     0.83
## Functional*         2.00      1.00 -1.36    -0.08     0.17
## Fireplaces          2.00      2.00  0.85    -1.17     0.34
## FireplaceQu*        1.00      0.00   NaN      NaN     0.00
## GarageType*         2.00      1.00  0.00    -2.31     0.22
## GarageYrBlt      1985.00     75.00 -0.26    -1.21    10.18
## GarageFinish*       2.00      1.00  0.00    -2.31     0.22
## GarageCars          4.00      3.00  1.08    -0.64     0.49
## GarageArea        864.00    612.00  1.00    -0.52    84.74
## GarageQual*         2.00      1.00 -1.36    -0.08     0.17
## GarageCond*         2.00      1.00 -1.36    -0.08     0.17
## PavedDrive*         2.00      1.00 -0.54    -1.96     0.21
## WoodDeckSF        196.00    196.00  0.38    -2.00    37.77
## OpenPorchSF       287.00    287.00  1.16    -0.43    45.45
## EnclosedPorch     386.00    386.00  0.82    -0.88    57.99
## X3SsnPorch          0.00      0.00   NaN      NaN     0.00
## ScreenPorch         0.00      0.00   NaN      NaN     0.00
## PoolArea            0.00      0.00   NaN      NaN     0.00
## PoolQC*             -Inf      -Inf    NA       NA       NA
## Fence*              2.00      1.00  0.00    -2.75     0.50
## MiscFeature*        1.00      0.00    NA       NA       NA
## MiscVal          2500.00   2500.00  1.36    -0.08   416.67
## MoSold             12.00     11.00  0.26    -1.72     1.66
## YrSold           2010.00      4.00 -0.71    -1.15     0.61
## SaleType*           1.00      0.00   NaN      NaN     0.00
## SaleCondition*      2.00      1.00 -1.36    -0.08     0.17
## SalePrice      266500.00 163724.00  0.49    -1.96 32071.05
## log_SalePrice      12.49      0.95  0.43    -1.97     0.18
## ------------------------------------------------------------ 
## group: Wood
##                vars n      mean       sd    median   trimmed      mad       min
## Id                1 3    799.67   687.51   1181.00    799.67    45.96      6.00
## MSSubClass        2 3     53.33     5.77     50.00     53.33     0.00     50.00
## MSZoning*         3 3      1.00     0.00      1.00      1.00     0.00      1.00
## LotFrontage       4 2    118.50    47.38    118.50    118.50    49.67     85.00
## LotArea           5 3  12473.00  1501.48  12134.00  12473.00  1429.23  11170.00
## Street*           6 3      1.00     0.00      1.00      1.00     0.00      1.00
## Alley*            7 0       NaN       NA        NA       NaN       NA       Inf
## LotShape*         8 3      1.33     0.58      1.00      1.33     0.00      1.00
## LandContour*      9 3      1.67     0.58      2.00      1.67     0.00      1.00
## Utilities*       10 3      1.00     0.00      1.00      1.00     0.00      1.00
## LotConfig*       11 3      1.67     0.58      2.00      1.67     0.00      1.00
## LandSlope*       12 3      1.33     0.58      1.00      1.33     0.00      1.00
## Neighborhood*    13 3      2.00     1.00      2.00      2.00     1.48      1.00
## Condition1*      14 3      1.00     0.00      1.00      1.00     0.00      1.00
## Condition2*      15 3      1.00     0.00      1.00      1.00     0.00      1.00
## BldgType*        16 3      1.00     0.00      1.00      1.00     0.00      1.00
## HouseStyle*      17 3      1.33     0.58      1.00      1.33     0.00      1.00
## OverallQual      18 3      6.67     1.53      7.00      6.67     1.48      5.00
## OverallCond      19 3      5.67     1.15      5.00      5.67     0.00      5.00
## YearBuilt        20 3   1990.33     2.52   1990.00   1990.33     2.97   1988.00
## YearRemodAdd     21 3   1997.00     7.21   1995.00   1997.00     5.93   1991.00
## RoofStyle*       22 3      1.00     0.00      1.00      1.00     0.00      1.00
## RoofMatl*        23 3      1.00     0.00      1.00      1.00     0.00      1.00
## Exterior1st*     24 3      2.00     1.00      2.00      2.00     1.48      1.00
## Exterior2nd*     25 3      2.00     1.00      2.00      2.00     1.48      1.00
## MasVnrType*      26 3      1.00     0.00      1.00      1.00     0.00      1.00
## MasVnrArea       27 3      0.00     0.00      0.00      0.00     0.00      0.00
## ExterQual*       28 3      1.67     0.58      2.00      1.67     0.00      1.00
## ExterCond*       29 3      1.00     0.00      1.00      1.00     0.00      1.00
## Foundation*      30 3      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtQual*        31 3      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtCond*        32 3      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtExposure*    33 3      1.67     0.58      2.00      1.67     0.00      1.00
## BsmtFinType1*    34 3      1.33     0.58      1.00      1.33     0.00      1.00
## BsmtFinSF1       35 3    791.67   397.87    732.00    791.67   452.19    427.00
## BsmtFinType2*    36 3      1.00     0.00      1.00      1.00     0.00      1.00
## BsmtFinSF2       37 3      0.00     0.00      0.00      0.00     0.00      0.00
## BsmtUnfSF        38 3     65.33    66.01     64.00     65.33    94.89      0.00
## TotalBsmtSF      39 3    857.00   332.72    796.00    857.00   351.38    559.00
## Heating*         40 3      1.00     0.00      1.00      1.00     0.00      1.00
## HeatingQC*       41 3      1.33     0.58      1.00      1.33     0.00      1.00
## CentralAir*      42 3      1.00     0.00      1.00      1.00     0.00      1.00
## Electrical*      43 3      1.00     0.00      1.00      1.00     0.00      1.00
## X1stFlrSF        44 3   1058.00   251.72   1080.00   1058.00   323.21    796.00
## X2ndFlrSF        45 3    818.00   348.73    672.00    818.00   157.16    566.00
## LowQualFinSF     46 3      0.00     0.00      0.00      0.00     0.00      0.00
## GrLivArea        47 3   1876.00   585.92   1752.00   1876.00   578.21   1362.00
## BsmtFullBath     48 3      0.33     0.58      0.00      0.33     0.00      0.00
## BsmtHalfBath     49 3      0.00     0.00      0.00      0.00     0.00      0.00
## FullBath         50 3      1.67     0.58      2.00      1.67     0.00      1.00
## HalfBath         51 3      0.67     0.58      1.00      0.67     0.00      0.00
## BedroomAbvGr     52 3      3.00     1.73      4.00      3.00     0.00      1.00
## KitchenAbvGr     53 3      1.00     0.00      1.00      1.00     0.00      1.00
## KitchenQual*     54 3      1.00     0.00      1.00      1.00     0.00      1.00
## TotRmsAbvGrd     55 3      7.00     1.73      8.00      7.00     0.00      5.00
## Functional*      56 3      1.00     0.00      1.00      1.00     0.00      1.00
## Fireplaces       57 3      0.00     0.00      0.00      0.00     0.00      0.00
## FireplaceQu*     58 0       NaN       NA        NA       NaN       NA       Inf
## GarageType*      59 3      1.33     0.58      1.00      1.33     0.00      1.00
## GarageYrBlt      60 3   1990.33     2.52   1990.00   1990.33     2.97   1988.00
## GarageFinish*    61 3      2.00     1.00      2.00      2.00     1.48      1.00
## GarageCars       62 3      2.00     0.00      2.00      2.00     0.00      2.00
## GarageArea       63 3    555.00   119.66    492.00    555.00    17.79    480.00
## GarageQual*      64 3      1.00     0.00      1.00      1.00     0.00      1.00
## GarageCond*      65 3      1.00     0.00      1.00      1.00     0.00      1.00
## PavedDrive*      66 3      1.00     0.00      1.00      1.00     0.00      1.00
## WoodDeckSF       67 3    121.67   177.22     40.00    121.67    59.30      0.00
## OpenPorchSF      68 3     14.00    15.10     12.00     14.00    17.79      0.00
## EnclosedPorch    69 3      0.00     0.00      0.00      0.00     0.00      0.00
## X3SsnPorch       70 3    106.67   184.75      0.00    106.67     0.00      0.00
## ScreenPorch      71 3      0.00     0.00      0.00      0.00     0.00      0.00
## PoolArea         72 3      0.00     0.00      0.00      0.00     0.00      0.00
## PoolQC*          73 0       NaN       NA        NA       NaN       NA       Inf
## Fence*           74 2      1.50     0.71      1.50      1.50     0.74      1.00
## MiscFeature*     75 1      1.00       NA      1.00      1.00     0.00      1.00
## MiscVal          76 3    233.33   404.15      0.00    233.33     0.00      0.00
## MoSold           77 3      6.67     3.06      6.00      6.67     2.97      4.00
## YrSold           78 3   2008.33     2.08   2009.00   2008.33     1.48   2006.00
## SaleType*        79 3      1.00     0.00      1.00      1.00     0.00      1.00
## SaleCondition*   80 3      1.00     0.00      1.00      1.00     0.00      1.00
## SalePrice        81 3 185666.67 56695.09 164000.00 185666.67 31134.60 143000.00
## log_SalePrice    82 3     12.10     0.29     12.01     12.10     0.20     11.87
##                      max     range  skew kurtosis       se
## Id               1212.00   1206.00 -0.38    -2.33   396.93
## MSSubClass         60.00     10.00  0.38    -2.33     3.33
## MSZoning*           1.00      0.00   NaN      NaN     0.00
## LotFrontage       152.00     67.00  0.00    -2.75    33.50
## LotArea         14115.00   2945.00  0.21    -2.33   866.88
## Street*             1.00      0.00   NaN      NaN     0.00
## Alley*              -Inf      -Inf    NA       NA       NA
## LotShape*           2.00      1.00  0.38    -2.33     0.33
## LandContour*        2.00      1.00 -0.38    -2.33     0.33
## Utilities*          1.00      0.00   NaN      NaN     0.00
## LotConfig*          2.00      1.00 -0.38    -2.33     0.33
## LandSlope*          2.00      1.00  0.38    -2.33     0.33
## Neighborhood*       3.00      2.00  0.00    -2.33     0.58
## Condition1*         1.00      0.00   NaN      NaN     0.00
## Condition2*         1.00      0.00   NaN      NaN     0.00
## BldgType*           1.00      0.00   NaN      NaN     0.00
## HouseStyle*         2.00      1.00  0.38    -2.33     0.33
## OverallQual         8.00      3.00 -0.21    -2.33     0.88
## OverallCond         7.00      2.00  0.38    -2.33     0.67
## YearBuilt        1993.00      5.00  0.13    -2.33     1.45
## YearRemodAdd     2005.00     14.00  0.26    -2.33     4.16
## RoofStyle*          1.00      0.00   NaN      NaN     0.00
## RoofMatl*           1.00      0.00   NaN      NaN     0.00
## Exterior1st*        3.00      2.00  0.00    -2.33     0.58
## Exterior2nd*        3.00      2.00  0.00    -2.33     0.58
## MasVnrType*         1.00      0.00   NaN      NaN     0.00
## MasVnrArea          0.00      0.00   NaN      NaN     0.00
## ExterQual*          2.00      1.00 -0.38    -2.33     0.33
## ExterCond*          1.00      0.00   NaN      NaN     0.00
## Foundation*         1.00      0.00   NaN      NaN     0.00
## BsmtQual*           1.00      0.00   NaN      NaN     0.00
## BsmtCond*           1.00      0.00   NaN      NaN     0.00
## BsmtExposure*       2.00      1.00 -0.38    -2.33     0.33
## BsmtFinType1*       2.00      1.00  0.38    -2.33     0.33
## BsmtFinSF1       1216.00    789.00  0.15    -2.33   229.71
## BsmtFinType2*       1.00      0.00   NaN      NaN     0.00
## BsmtFinSF2          0.00      0.00   NaN      NaN     0.00
## BsmtUnfSF         132.00    132.00  0.02    -2.33    38.11
## TotalBsmtSF      1216.00    657.00  0.18    -2.33   192.10
## Heating*            1.00      0.00   NaN      NaN     0.00
## HeatingQC*          2.00      1.00  0.38    -2.33     0.33
## CentralAir*         1.00      0.00   NaN      NaN     0.00
## Electrical*         1.00      0.00   NaN      NaN     0.00
## X1stFlrSF        1298.00    502.00 -0.09    -2.33   145.33
## X2ndFlrSF        1216.00    650.00  0.35    -2.33   201.34
## LowQualFinSF        0.00      0.00   NaN      NaN     0.00
## GrLivArea        2514.00   1152.00  0.20    -2.33   338.28
## BsmtFullBath        1.00      1.00  0.38    -2.33     0.33
## BsmtHalfBath        0.00      0.00   NaN      NaN     0.00
## FullBath            2.00      1.00 -0.38    -2.33     0.33
## HalfBath            1.00      1.00 -0.38    -2.33     0.33
## BedroomAbvGr        4.00      3.00 -0.38    -2.33     1.00
## KitchenAbvGr        1.00      0.00   NaN      NaN     0.00
## KitchenQual*        1.00      0.00   NaN      NaN     0.00
## TotRmsAbvGrd        8.00      3.00 -0.38    -2.33     1.00
## Functional*         1.00      0.00   NaN      NaN     0.00
## Fireplaces          0.00      0.00   NaN      NaN     0.00
## FireplaceQu*        -Inf      -Inf    NA       NA       NA
## GarageType*         2.00      1.00  0.38    -2.33     0.33
## GarageYrBlt      1993.00      5.00  0.13    -2.33     1.45
## GarageFinish*       3.00      2.00  0.00    -2.33     0.58
## GarageCars          2.00      0.00   NaN      NaN     0.00
## GarageArea        693.00    213.00  0.38    -2.33    69.09
## GarageQual*         1.00      0.00   NaN      NaN     0.00
## GarageCond*         1.00      0.00   NaN      NaN     0.00
## PavedDrive*         1.00      0.00   NaN      NaN     0.00
## WoodDeckSF        325.00    325.00  0.36    -2.33   102.32
## OpenPorchSF        30.00     30.00  0.13    -2.33     8.72
## EnclosedPorch       0.00      0.00   NaN      NaN     0.00
## X3SsnPorch        320.00    320.00  0.38    -2.33   106.67
## ScreenPorch         0.00      0.00   NaN      NaN     0.00
## PoolArea            0.00      0.00   NaN      NaN     0.00
## PoolQC*             -Inf      -Inf    NA       NA       NA
## Fence*              2.00      1.00  0.00    -2.75     0.50
## MiscFeature*        1.00      0.00    NA       NA       NA
## MiscVal           700.00    700.00  0.38    -2.33   233.33
## MoSold             10.00      6.00  0.21    -2.33     1.76
## YrSold           2010.00      4.00 -0.29    -2.33     1.20
## SaleType*           1.00      0.00   NaN      NaN     0.00
## SaleCondition*      1.00      0.00   NaN      NaN     0.00
## SalePrice      250000.00 107000.00  0.33    -2.33 32732.93
## log_SalePrice      12.43      0.56  0.29    -2.33     0.17

PLOT Univariate Descriptive Statistics

Selecting 10 popular variables in describing houses for sale: (1) location of the house (Neighborhood), (2) number of full bathrooms (FullBath), (3) condition of the home (Condition 1 | Condition 2), (4) kitchen quality (KitchenQual), (5) Ground living area square feet (GrLivArea), (6) Month sold (MoSold), (7) Heating Type (Heating), (9) type of sale (SaleType), (10) and the dependent variable sale price (SalePrice).

The boxplot shows the SalePrice distribution for all listed homes with FullBath with plotly features. A home with 3 FullBath has the highest median SalePrice and no variation. A house with 1 FullBath has the lowest median SalePrice and alot of variation. A house with no FullBath seems most symmetrically distributed around its median value.

Variable Description
FullBath Full bathrooms above grade

The dots above and below the FullBath groups indicate those data points are outliers (i.e., extremely high or low), as seen for the homes with one FullBath and two FullBath.

#library(ggplot2)
#library(plotly)

fig <- plot_ly(data = log_train, y = ~log_SalePrice, x = ~FullBath, color = ~FullBath, type = "box", showlegend = FALSE)

fig

In the notched boxplot, it allows you to evaluate confidence intervals (by default 95% confidence interval) for the medians of each boxplot. The notch shows the level of uncertainty in the data.

In the plot, GrLivArea distribution is based on Heating types: indicating the group OtherW has the highest median value with no outliers. The group GasA indicates data points a significant amout of outliers. GasW groups seems to be symmetrically distributed around the median value.

Variable Description
GrLivArea Above grade (ground) living area square feet
fig2 <- plot_ly(data = log_train, x = ~Heating, y = ~GrLivArea, type = "box", color = ~Heating, notched = TRUE, showlegend = FALSE)

fig2

Plots on Qualitative variables

The barplot indicates the frequency (counts) of KitchenQual for Home Prices data set. The KitchenQual variable has four categories: TA, Gd, Ex, and Fa as indicated in the Home Sales data set arranged in descending order.

KitchenQual Kitchen quality
Ex Excellent
Gd Good
TA Typical/Average
Fa Fair
Po Poor

Boxplot shows the House SalePrice versus KitchenQual rating.

#library(scales)
ggplot(data=log_train[!is.na(log_train$KitchenQual),], aes(x=factor(KitchenQual), y=SalePrice))+
        geom_boxplot(col='blue', fill = "peru") + labs(x='Kitchen Quality') +
        scale_y_continuous(breaks= seq(0, 800000, by=100000), labels = comma)

The plot_ly function plots the KitchenQual categorical variable and indicates the rating for Excellent (Ex) is an important criteria for HouseSale.

BarChart indicates North Ames (NAmes) Neighborhood with the highest home sale based on Normal (Norm) Conditions for 1 & 2.

Condition1 | Proximity to various conditions Condition2 | Proximity to various conditions (if more than one is present) ———–|————————————- Artery | Adjacent to arterial street Feedr | Adjacent to feeder street Norm | Normal RRNn | Within 200’ of North-South Railroad RRAn | Adjacent to North-South Railroad PosN | Near positive off-site feature–park, greenbelt, etc. PosA | Adjacent to positive off-site feature RRNe | Within 200’ of East-West Railroad RRAe | Adjacent to East-West Railroad

library(plotly)

plot_bar2 <- plot_ly(log_train, x = ~ Neighborhood, y = ~ Condition1, type = 'bar', name = 'Condition 1')
plot_bar2 <- plot_bar2 %>% add_trace(y = ~ Condition2, name = 'Condition2')
plot_bar2 <- plot_bar2 %>% layout(yaxis = list(title = 'Count'), barmode = 'group')
plot_bar2

Plot (Lollipop chart) with Categorical and Numerical Variable

Visualize the relationship between a categorical (SaleType) and numerical (MoSold) variable. The SaleType was converted to a factor variable and a created a table to show group frequency. In the plot between SaleType vs MoSold, the New group and WD group seems to have sold continuously each month and Con group has the least sale in the MoSold range.

house_saletype <- as.factor(log_train$SaleType)
table(house_saletype)
## house_saletype
##   COD   Con ConLD ConLI ConLw   CWD   New   Oth    WD 
##    43     2     9     5     5     4   122     3  1267
# install.packages("ggplot2")
#library(ggplot2)

ggplot(log_train, aes(x = SaleType, y = MoSold)) +
  geom_segment(aes(x = SaleType, xend = SaleType, y = 0, yend = MoSold),
               color = "tomato", lwd = 1) +
  geom_point(size = 4, pch = 21, bg = 4, col = 1) +
  geom_text(aes(label = SaleType), color = "grey0", size = 3) +
  scale_x_discrete(labels = paste("Group", 1:10)) +
  theme(axis.text.x = element_text(angle = 90,
                                   vjust = 0.5, hjust = 1)) 

2. Scatterplot matrix

Scatterplot matrix on two independent variables and dependent variable.

The scatterplots will indicate whether there is a potential link between two quantitative variables. The scatterplot correlation between the independent variables (BedroomAbvGr and GarageCars) and the dependent variable (log_SalePrice) shows no association between variables.

# car package
scatterplotMatrix(~log_SalePrice + BedroomAbvGr + GarageCars, data = log_train,
            diagonal = FALSE,                     # Remove kernel density estimates
            regLine = list(col = "green",         # Linear regression line color
                           lwd = 3),              # Linear regression line width 
            smooth = list(col.smooth = "red",     # Non-parametric mean color
                          col.spread = "blue"))   # Non-parametric variance color

3. Correlation matrix

Correlation matrix any three quantitative variables in the dataset.

Variable Identification / Restructure data set

Compute correlation matrix on df_train.num data set to view numerical variables correlation using Pearson method.

Indexing Numeric Variables

numericVars <- which(sapply(df_train, is.numeric)) #index vector numeric variables
numericVarNames <- names(numericVars) #saving names vector for use later on
cat('There are', length(numericVars), 'numeric variables')
## There are 38 numeric variables
#library(corrr)
house_numVar <- df_train[, numericVars]
cor_numVar <- cor(house_numVar, use="pairwise.complete.obs") #correlations of all numeric variables

#sort on decreasing correlations with SalePrice
cor_sorted <- as.matrix(sort(cor_numVar[,'SalePrice'], decreasing = TRUE))
 #select only high corelations
CorHigh <- names(which(apply(cor_sorted, 1, function(x) abs(x)>0.5)))
cor_numVar <- cor_numVar[CorHigh, CorHigh]


corrplot.mixed(cor_numVar, tl.col="black", tl.pos = "lt", tl.srt=45)

There are 10 numeric variables (independent) with a correlation greater than 0.5 between the dependent variable, SalePrice. The top three correlated independent variables with SalePrice are: OverallQual, GrLivArea, and GarageCars.

#set correlation coefficient for the top three variables
(corr_top3 <- data.frame(round(as.matrix(cor_numVar[1:4, 1:4]),2)))
##             SalePrice OverallQual GrLivArea GarageCars
## SalePrice        1.00        0.79      0.71       0.64
## OverallQual      0.79        1.00      0.59       0.60
## GrLivArea        0.71        0.59      1.00       0.47
## GarageCars       0.64        0.60      0.47       1.00
#correlation matrix of the top 3 correlated variables with SalePrice
corr3_matrix <- cor(corr_top3)
#graph of correlation matrix - top 3 correlated variables
corrplot(corr3_matrix, method="pie", tl.srt = 45)

Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. - Nullhypothesis,\(H_0\): The correlation between each pairwise set of variables is zero. - Alternative hypothesis,\(H_a\): The correlation between each pairwise set of variable is not equal to zero - Significance level,\(0.05\)

The Pearson correlation method is used to measure linear dependence between two variables (x and y), the correlation coefficient, hypothesis test, and confidence interval (80%).

#correlation test between SalePrice ~ OverallQual
cor.test(df_train$SalePrice, df_train$OverallQual, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  df_train$SalePrice and df_train$OverallQual
## t = 49.364, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.7780752 0.8032204
## sample estimates:
##       cor 
## 0.7909816
#correlation test between SalePrice ~ GrLivArea
cor.test(df_train$SalePrice, df_train$GrLivArea, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  df_train$SalePrice and df_train$GrLivArea
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.6915087 0.7249450
## sample estimates:
##       cor 
## 0.7086245
#correlation test between SalePrice ~ GarageCars
cor.test(df_train$SalePrice, df_train$GarageCars, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  df_train$SalePrice and df_train$GarageCars
## t = 31.839, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.6201771 0.6597899
## sample estimates:
##       cor 
## 0.6404092

Discuss the meaning of your analysis.

The Pearson pairwise correlation analysis reveals that the relationship between variables in the dataset is not equal to zero. We can conclude that OverallQual, GrLivArea, and GarageCars supports the alternative hypothesis of a correlation between each pairwise set is not equal to zero.

Would you be worried about familywise error? Why or why not?

A familywise error can exist to produce false positive results when conducting a multiple hypothesis tests at once. In the correlation model, it conducted more than 10 different comparisons using an alpha level of \(\alpha = .05\). The family-wise error rate would be calculated as:

fw_error <- 1 - (1-.05)^3
paste0("Family-wise error rate of ", round(fw_error, 3), " will increase the probability of an error on at least one of the hypothesis tests.")
## [1] "Family-wise error rate of 0.143 will increase the probability of an error on at least one of the hypothesis tests."

Linear Algebra and Correlation.

Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.

#correlation matrix of the top 3 correlated variables with SalePrice
(corr3_matrix)
##              SalePrice OverallQual  GrLivArea GarageCars
## SalePrice    1.0000000   0.4823030  0.1171696 -0.3736374
## OverallQual  0.4823030   1.0000000 -0.3092674 -0.2739513
## GrLivArea    0.1171696  -0.3092674  1.0000000 -0.8293202
## GarageCars  -0.3736374  -0.2739513 -0.8293202  1.0000000
#inverse correlation matrix
(invcorr <- matrix_inverse(corr3_matrix))
##                SalePrice OverallQual    GrLivArea GarageCars
## SalePrice    1.418920414  -0.5642499  0.004725651  0.3961108
## OverallQual -0.564249916   0.9477893 -0.214494076 -0.3810742
## GrLivArea    0.004725651  -0.2144941  0.329767644 -0.2062645
## GarageCars   0.396110785  -0.3810742 -0.206264507  0.4591154

Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix.

#multiply correlation matrix by precision matrix
(corr_inv <- corr3_matrix %*% invcorr)
##              SalePrice OverallQual  GrLivArea  GarageCars
## SalePrice   0.99933292  0.01012303  0.0169814  0.01660699
## OverallQual 0.01012303  0.84638169 -0.2576948 -0.25201313
## GrLivArea   0.01698140 -0.25769483  0.5677167 -0.42275224
## GarageCars  0.01660699 -0.25201313 -0.4227522  0.58656868
#multiply precision matrix by correlation matrix
(inv_corr <- invcorr %*% corr3_matrix)
##              SalePrice OverallQual  GrLivArea  GarageCars
## SalePrice   0.99933292  0.01012303  0.0169814  0.01660699
## OverallQual 0.01012303  0.84638169 -0.2576948 -0.25201313
## GrLivArea   0.01698140 -0.25769483  0.5677167 -0.42275224
## GarageCars  0.01660699 -0.25201313 -0.4227522  0.58656868
library(Matrix)

matrix_exp <- expand(lu(corr3_matrix))

for( i in 1:nrow(corr3_matrix) ){
  for( j in 1:ncol(corr3_matrix) ){
    # This doesn't do anything, but here you can think about how to check
    # where in the matrix you are by checking the relative values of i and j
    corr3_matrix[i,j] = corr3_matrix[i,j]
  }
}

Lower decomposition

lu_lower <- matrix_exp$L
matrix_exp$L
## 4 x 4 Matrix of class "dtrMatrix" (unitriangular)
##      [,1]       [,2]       [,3]       [,4]      
## [1,]  1.0000000          .          .          .
## [2,]  0.4823030  1.0000000          .          .
## [3,] -0.3736374 -0.1221617  1.0000000          .
## [4,]  0.1171696 -0.4766567 -0.9779518  1.0000000

Upper decomposition

lu_upper <- matrix_exp$U
matrix_exp$U
## 4 x 4 Matrix of class "dtrMatrix"
##      [,1]          [,2]          [,3]          [,4]         
## [1,]  1.000000e+00  4.823030e-01  1.171696e-01 -3.736374e-01
## [2,]             .  7.673839e-01 -3.657786e-01 -9.374488e-02
## [3,]             .             . -8.302254e-01  8.489431e-01
## [4,]             .             .             .  3.330669e-16
print(matrix_exp$P)
## 4 x 4 sparse Matrix of class "pMatrix"
##             
## [1,] | . . .
## [2,] . | . .
## [3,] . . . |
## [4,] . . | .
Calculus-Based Probability & Statistics

Fit a closed form distribution to data. Selecting a variable in the Kaggle.com training dataset that is skewed to the right, and shifting it so that the minimum value is absolutely above zero if necessary.

Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/Rdevel/library/MASS/html/fitdistr.html).

Compute the skewness of the data set, train.num, consisting only numerical variables from train set dataframe. In the table, the variable MiscVal has the highest value (positive “right” skew) and GarageYrBlt has the lowest value (negative “left” skew).

Separating numerical and character columns for future statistical testing:

#subset numeric columns with dplyr
train.num <- data.frame(select_if(log_train, is.numeric))
train.num[] <- lapply(train.num, function(x) as.numeric(as.character(x)))
reactable(train.num, wrap = FALSE)
#subset character columns with dplyr
train.char <- log_train[,!names(log_train) %in% colnames(train.num)]
reactable(train.char, wrap = FALSE)
#library(moments)
train_skew <- data.frame(skewness(train.num)) #calculate skewness 
train_skew <- cbind(Variable = rownames(train_skew), train_skew)
rownames(train_skew) <- 1:nrow(train_skew)
train_skew[order(train_skew$skewness.train.num.), ]
##         Variable skewness.train.num.
## 7      YearBuilt         -0.61283072
## 8   YearRemodAdd         -0.50304450
## 27    GarageCars         -0.34219690
## 1             Id          0.00000000
## 20      FullBath          0.03652398
## 37        YrSold          0.09616958
## 39 log_SalePrice          0.12121037
## 28    GarageArea          0.17979594
## 22  BedroomAbvGr          0.21157244
## 36        MoSold          0.21183506
## 5    OverallQual          0.21672098
## 18  BsmtFullBath          0.59545404
## 25    Fireplaces          0.64889763
## 21      HalfBath          0.67520283
## 24  TotRmsAbvGrd          0.67564577
## 6    OverallCond          0.69235521
## 15     X2ndFlrSF          0.81219427
## 12     BsmtUnfSF          0.91932270
## 17     GrLivArea          1.36515595
## 14     X1stFlrSF          1.37534174
## 2     MSSubClass          1.40621011
## 13   TotalBsmtSF          1.52268809
## 29    WoodDeckSF          1.53979170
## 10    BsmtFinSF1          1.68377090
## 38     SalePrice          1.88094075
## 30   OpenPorchSF          2.36191193
## 31 EnclosedPorch          3.08669647
## 19  BsmtHalfBath          4.09918567
## 33   ScreenPorch          4.11797738
## 11    BsmtFinSF2          4.25088802
## 23  KitchenAbvGr          4.48378409
## 16  LowQualFinSF          9.00208042
## 32    X3SsnPorch         10.29375236
## 4        LotArea         12.19514213
## 34      PoolArea         14.81313466
## 35       MiscVal         24.45163962
## 3    LotFrontage                  NA
## 9     MasVnrArea                  NA
## 26   GarageYrBlt                  NA

The histogram shows a right skewed distribution, most of the data falls to the right, or positive side of the graph peak. The mode is the highest point of the histogram, whereas the median and mean fall to the right of it.

hs <- hist(train_skew$skewness.train.num., col=rainbow(10), 
           main = "Skewness of Training Variables", xlab = "Training data Distribution Count")

Plot the density distribution of selected variable and compare the observed distribution to what we would expect if it were perfectly normal (dashed red line).

Right Skewed

#library(ggpubr)

# Distribution of MisVal variable (right skewed)
ggdensity(train.num, x = "MiscVal", fill = "blue", title="MisVal") + 
  scale_x_continuous(limits = c(1000, 1600)) +
  stat_overlay_normal_density(color = "red", linetype = "dashed", lwd = 2)
## Warning: Removed 1455 rows containing non-finite values (stat_density).
## Warning: Removed 1455 rows containing non-finite values
## (stat_overlay_normal_density).
## Warning: Removed 13 row(s) containing missing values (geom_path).

The summary function shows the data distribution for the variable.

# Select a variable in the Kaggle.com training dataset that is skewed to the right (MiscVal)

# Distribution of MisVal variable (right skewed)
skew_MiscVal <- train.num$MiscVal
summary(skew_MiscVal)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     0.00     0.00     0.00    43.49     0.00 15500.00

Left Skewed

# Distribution of GarageYrBlt variable (left skewed)
ggdensity(train.num, x = "YearBuilt", fill = "blue", title = "YearBuilt") +
  scale_x_continuous(limits = c(1800, 2100)) +
  stat_overlay_normal_density(color = "red", linetype = "dashed", lwd=2)

The summary function shows the data distribution for the variable.

# Distribution of YearBuilt variable (left skewed)
skew_YearBuilt <- train.num$YearBuilt
summary(skew_YearBuilt)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1872    1954    1973    1971    2000    2010

The summary function shows the distribution for the variable MiscVal shifted above zero.

# Distribution of MisVal variable (right skewed) with the min value shift above zero
skew_MiscVal2 <- skew_MiscVal+1
summary(skew_MiscVal2)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##     1.00     1.00     1.00    44.49     1.00 15501.00

Find the optimal value of λ for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, λ)). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality.

Run function fitdistr from the MASS package to fit an exponential probability density function (PDF)

# Then load the MASS package and run fitdistr to fit an exponential probability density function (PDF)
#library(MASS)
set.seed(1234)
exp.pdf <- fitdistr(skew_MiscVal2, densfun="exponential")

The optimal value of lambda for the distribution is the estimate attribute from the fitdistr response. The value is output below.

# Find the optimal value of lambda for this distribution
exp.pdf$estimate
##       rate 
## 0.02247745
lambda <- exp.pdf$estimate

Generate 1000 samples using the lambda value.

# then take 1000 samples from this exponential distribution using this value
samples <- rexp(1000, lambda)
(samples[1:5])
## [1] 111.3008414  10.9780661   0.2928249  77.5331024  17.2253819

The below histogram based on the 1000 samples shows a decreased range of values across the x-axis along with a less concentrated count along the y-axis. Yes, the new histogram is still right skewed, but not to same the degree. From visual inspection, the second bucket of the below histogram is much closer to half of the first bucket as compared to the initial histogram. Overall the data is a bit more uniformly distributed, though not all completely uniform, nor normal, and the range of values has decreased by almost half.

# Plot a histogram and compare it with a histogram of your original variable.
hist(samples)

Finally, provide the empirical 5th percentile and 95th percentile of the data.

# Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function
per5 <- qexp(.05, rate=lambda, lower.tail=T)
per95 <- qexp(.95, rate=lambda, lower.tail=T)

Given the lambda of the exponential PDF, the 5th percentile is 2.2819895 and the 95th percentile is 133.2772562.

MODELING

Required Libraries

library(ggplot2)
library(scales)
library(ggrepel)

Combine data

#create new dataframe
df.train <- df_train
df.test <- df_test
#Getting rid of the IDs but keeping the test IDs in a vector. These are needed to compose the submission file
test.labels <- df.test$Id
df.test$Id <- NULL
df.train$Id <- NULL

Since test dataset has no “Saleprice” variable. We will create it and then combine.

df.test$SalePrice <- rep(NA, 1459)
house <- rbind(df.train, df.test)

Check the variables numeric summary of the data (minimum, median, mean, and maximum) values of the independent variables and dependent variable:

View the object summaries depending on class (numeric): minimum value, maximum value, mean value, 1st quartile (25th percentile), and 3rd quartile (75th percentile)

#data exploration
head(house)
##   MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour
## 1         60       RL          65    8450   Pave  <NA>      Reg         Lvl
## 2         20       RL          80    9600   Pave  <NA>      Reg         Lvl
## 3         60       RL          68   11250   Pave  <NA>      IR1         Lvl
## 4         70       RL          60    9550   Pave  <NA>      IR1         Lvl
## 5         60       RL          84   14260   Pave  <NA>      IR1         Lvl
## 6         50       RL          85   14115   Pave  <NA>      IR1         Lvl
##   Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType
## 1    AllPub    Inside       Gtl      CollgCr       Norm       Norm     1Fam
## 2    AllPub       FR2       Gtl      Veenker      Feedr       Norm     1Fam
## 3    AllPub    Inside       Gtl      CollgCr       Norm       Norm     1Fam
## 4    AllPub    Corner       Gtl      Crawfor       Norm       Norm     1Fam
## 5    AllPub       FR2       Gtl      NoRidge       Norm       Norm     1Fam
## 6    AllPub    Inside       Gtl      Mitchel       Norm       Norm     1Fam
##   HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl
## 1     2Story           7           5      2003         2003     Gable  CompShg
## 2     1Story           6           8      1976         1976     Gable  CompShg
## 3     2Story           7           5      2001         2002     Gable  CompShg
## 4     2Story           7           5      1915         1970     Gable  CompShg
## 5     2Story           8           5      2000         2000     Gable  CompShg
## 6     1.5Fin           5           5      1993         1995     Gable  CompShg
##   Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation
## 1     VinylSd     VinylSd    BrkFace        196        Gd        TA      PConc
## 2     MetalSd     MetalSd       None          0        TA        TA     CBlock
## 3     VinylSd     VinylSd    BrkFace        162        Gd        TA      PConc
## 4     Wd Sdng     Wd Shng       None          0        TA        TA     BrkTil
## 5     VinylSd     VinylSd    BrkFace        350        Gd        TA      PConc
## 6     VinylSd     VinylSd       None          0        TA        TA       Wood
##   BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
## 1       Gd       TA           No          GLQ        706          Unf
## 2       Gd       TA           Gd          ALQ        978          Unf
## 3       Gd       TA           Mn          GLQ        486          Unf
## 4       TA       Gd           No          ALQ        216          Unf
## 5       Gd       TA           Av          GLQ        655          Unf
## 6       Gd       TA           No          GLQ        732          Unf
##   BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical
## 1          0       150         856    GasA        Ex          Y      SBrkr
## 2          0       284        1262    GasA        Ex          Y      SBrkr
## 3          0       434         920    GasA        Ex          Y      SBrkr
## 4          0       540         756    GasA        Gd          Y      SBrkr
## 5          0       490        1145    GasA        Ex          Y      SBrkr
## 6          0        64         796    GasA        Ex          Y      SBrkr
##   X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath
## 1       856       854            0      1710            1            0        2
## 2      1262         0            0      1262            0            1        2
## 3       920       866            0      1786            1            0        2
## 4       961       756            0      1717            1            0        1
## 5      1145      1053            0      2198            1            0        2
## 6       796       566            0      1362            1            0        1
##   HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional
## 1        1            3            1          Gd            8        Typ
## 2        0            3            1          TA            6        Typ
## 3        1            3            1          Gd            6        Typ
## 4        0            3            1          Gd            7        Typ
## 5        1            4            1          Gd            9        Typ
## 6        1            1            1          TA            5        Typ
##   Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars
## 1          0        <NA>     Attchd        2003          RFn          2
## 2          1          TA     Attchd        1976          RFn          2
## 3          1          TA     Attchd        2001          RFn          2
## 4          1          Gd     Detchd        1998          Unf          3
## 5          1          TA     Attchd        2000          RFn          3
## 6          0        <NA>     Attchd        1993          Unf          2
##   GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF
## 1        548         TA         TA          Y          0          61
## 2        460         TA         TA          Y        298           0
## 3        608         TA         TA          Y          0          42
## 4        642         TA         TA          Y          0          35
## 5        836         TA         TA          Y        192          84
## 6        480         TA         TA          Y         40          30
##   EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature
## 1             0          0           0        0   <NA>  <NA>        <NA>
## 2             0          0           0        0   <NA>  <NA>        <NA>
## 3             0          0           0        0   <NA>  <NA>        <NA>
## 4           272          0           0        0   <NA>  <NA>        <NA>
## 5             0          0           0        0   <NA>  <NA>        <NA>
## 6             0        320           0        0   <NA> MnPrv        Shed
##   MiscVal MoSold YrSold SaleType SaleCondition SalePrice
## 1       0      2   2008       WD        Normal    208500
## 2       0      5   2007       WD        Normal    181500
## 3       0      9   2008       WD        Normal    223500
## 4       0      2   2006       WD       Abnorml    140000
## 5       0     12   2008       WD        Normal    250000
## 6     700     10   2009       WD        Normal    143000
dim(house)
## [1] 2919   80

Check the variables numeric summary for the data (minimum, median, mean, and maximum): independent variables and dependent variables

View the object summaries depending on class (numeric): minimum value, maximum value, mean value, 1st quartile (25th percentile), and 3rd quartile (75th percentile)

summary(house)
##    MSSubClass       MSZoning          LotFrontage        LotArea      
##  Min.   : 20.00   Length:2919        Min.   : 21.00   Min.   :  1300  
##  1st Qu.: 20.00   Class :character   1st Qu.: 59.00   1st Qu.:  7478  
##  Median : 50.00   Mode  :character   Median : 68.00   Median :  9453  
##  Mean   : 57.14                      Mean   : 69.31   Mean   : 10168  
##  3rd Qu.: 70.00                      3rd Qu.: 80.00   3rd Qu.: 11570  
##  Max.   :190.00                      Max.   :313.00   Max.   :215245  
##                                      NA's   :486                      
##     Street             Alley             LotShape         LandContour       
##  Length:2919        Length:2919        Length:2919        Length:2919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Utilities          LotConfig          LandSlope         Neighborhood      
##  Length:2919        Length:2919        Length:2919        Length:2919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Condition1         Condition2          BldgType          HouseStyle       
##  Length:2919        Length:2919        Length:2919        Length:2919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   OverallQual      OverallCond      YearBuilt     YearRemodAdd 
##  Min.   : 1.000   Min.   :1.000   Min.   :1872   Min.   :1950  
##  1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954   1st Qu.:1965  
##  Median : 6.000   Median :5.000   Median :1973   Median :1993  
##  Mean   : 6.089   Mean   :5.565   Mean   :1971   Mean   :1984  
##  3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2001   3rd Qu.:2004  
##  Max.   :10.000   Max.   :9.000   Max.   :2010   Max.   :2010  
##                                                                
##   RoofStyle           RoofMatl         Exterior1st        Exterior2nd       
##  Length:2919        Length:2919        Length:2919        Length:2919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   MasVnrType          MasVnrArea      ExterQual          ExterCond        
##  Length:2919        Min.   :   0.0   Length:2919        Length:2919       
##  Class :character   1st Qu.:   0.0   Class :character   Class :character  
##  Mode  :character   Median :   0.0   Mode  :character   Mode  :character  
##                     Mean   : 102.2                                        
##                     3rd Qu.: 164.0                                        
##                     Max.   :1600.0                                        
##                     NA's   :23                                            
##   Foundation          BsmtQual           BsmtCond         BsmtExposure      
##  Length:2919        Length:2919        Length:2919        Length:2919       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  BsmtFinType1         BsmtFinSF1     BsmtFinType2         BsmtFinSF2     
##  Length:2919        Min.   :   0.0   Length:2919        Min.   :   0.00  
##  Class :character   1st Qu.:   0.0   Class :character   1st Qu.:   0.00  
##  Mode  :character   Median : 368.5   Mode  :character   Median :   0.00  
##                     Mean   : 441.4                      Mean   :  49.58  
##                     3rd Qu.: 733.0                      3rd Qu.:   0.00  
##                     Max.   :5644.0                      Max.   :1526.00  
##                     NA's   :1                           NA's   :1        
##    BsmtUnfSF       TotalBsmtSF       Heating           HeatingQC        
##  Min.   :   0.0   Min.   :   0.0   Length:2919        Length:2919       
##  1st Qu.: 220.0   1st Qu.: 793.0   Class :character   Class :character  
##  Median : 467.0   Median : 989.5   Mode  :character   Mode  :character  
##  Mean   : 560.8   Mean   :1051.8                                        
##  3rd Qu.: 805.5   3rd Qu.:1302.0                                        
##  Max.   :2336.0   Max.   :6110.0                                        
##  NA's   :1        NA's   :1                                             
##   CentralAir         Electrical          X1stFlrSF      X2ndFlrSF     
##  Length:2919        Length:2919        Min.   : 334   Min.   :   0.0  
##  Class :character   Class :character   1st Qu.: 876   1st Qu.:   0.0  
##  Mode  :character   Mode  :character   Median :1082   Median :   0.0  
##                                        Mean   :1160   Mean   : 336.5  
##                                        3rd Qu.:1388   3rd Qu.: 704.0  
##                                        Max.   :5095   Max.   :2065.0  
##                                                                       
##   LowQualFinSF        GrLivArea     BsmtFullBath     BsmtHalfBath    
##  Min.   :   0.000   Min.   : 334   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:   0.000   1st Qu.:1126   1st Qu.:0.0000   1st Qu.:0.00000  
##  Median :   0.000   Median :1444   Median :0.0000   Median :0.00000  
##  Mean   :   4.694   Mean   :1501   Mean   :0.4299   Mean   :0.06136  
##  3rd Qu.:   0.000   3rd Qu.:1744   3rd Qu.:1.0000   3rd Qu.:0.00000  
##  Max.   :1064.000   Max.   :5642   Max.   :3.0000   Max.   :2.00000  
##                                    NA's   :2        NA's   :2        
##     FullBath        HalfBath       BedroomAbvGr   KitchenAbvGr  
##  Min.   :0.000   Min.   :0.0000   Min.   :0.00   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.00   1st Qu.:1.000  
##  Median :2.000   Median :0.0000   Median :3.00   Median :1.000  
##  Mean   :1.568   Mean   :0.3803   Mean   :2.86   Mean   :1.045  
##  3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.00   3rd Qu.:1.000  
##  Max.   :4.000   Max.   :2.0000   Max.   :8.00   Max.   :3.000  
##                                                                 
##  KitchenQual         TotRmsAbvGrd     Functional          Fireplaces    
##  Length:2919        Min.   : 2.000   Length:2919        Min.   :0.0000  
##  Class :character   1st Qu.: 5.000   Class :character   1st Qu.:0.0000  
##  Mode  :character   Median : 6.000   Mode  :character   Median :1.0000  
##                     Mean   : 6.452                      Mean   :0.5971  
##                     3rd Qu.: 7.000                      3rd Qu.:1.0000  
##                     Max.   :15.000                      Max.   :4.0000  
##                                                                         
##  FireplaceQu         GarageType         GarageYrBlt   GarageFinish      
##  Length:2919        Length:2919        Min.   :1895   Length:2919       
##  Class :character   Class :character   1st Qu.:1960   Class :character  
##  Mode  :character   Mode  :character   Median :1979   Mode  :character  
##                                        Mean   :1978                     
##                                        3rd Qu.:2002                     
##                                        Max.   :2207                     
##                                        NA's   :159                      
##    GarageCars      GarageArea      GarageQual         GarageCond       
##  Min.   :0.000   Min.   :   0.0   Length:2919        Length:2919       
##  1st Qu.:1.000   1st Qu.: 320.0   Class :character   Class :character  
##  Median :2.000   Median : 480.0   Mode  :character   Mode  :character  
##  Mean   :1.767   Mean   : 472.9                                        
##  3rd Qu.:2.000   3rd Qu.: 576.0                                        
##  Max.   :5.000   Max.   :1488.0                                        
##  NA's   :1       NA's   :1                                             
##   PavedDrive          WoodDeckSF       OpenPorchSF     EnclosedPorch   
##  Length:2919        Min.   :   0.00   Min.   :  0.00   Min.   :   0.0  
##  Class :character   1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.:   0.0  
##  Mode  :character   Median :   0.00   Median : 26.00   Median :   0.0  
##                     Mean   :  93.71   Mean   : 47.49   Mean   :  23.1  
##                     3rd Qu.: 168.00   3rd Qu.: 70.00   3rd Qu.:   0.0  
##                     Max.   :1424.00   Max.   :742.00   Max.   :1012.0  
##                                                                        
##    X3SsnPorch       ScreenPorch        PoolArea          PoolQC         
##  Min.   :  0.000   Min.   :  0.00   Min.   :  0.000   Length:2919       
##  1st Qu.:  0.000   1st Qu.:  0.00   1st Qu.:  0.000   Class :character  
##  Median :  0.000   Median :  0.00   Median :  0.000   Mode  :character  
##  Mean   :  2.602   Mean   : 16.06   Mean   :  2.252                     
##  3rd Qu.:  0.000   3rd Qu.:  0.00   3rd Qu.:  0.000                     
##  Max.   :508.000   Max.   :576.00   Max.   :800.000                     
##                                                                         
##     Fence           MiscFeature           MiscVal             MoSold      
##  Length:2919        Length:2919        Min.   :    0.00   Min.   : 1.000  
##  Class :character   Class :character   1st Qu.:    0.00   1st Qu.: 4.000  
##  Mode  :character   Mode  :character   Median :    0.00   Median : 6.000  
##                                        Mean   :   50.83   Mean   : 6.213  
##                                        3rd Qu.:    0.00   3rd Qu.: 8.000  
##                                        Max.   :17000.00   Max.   :12.000  
##                                                                           
##      YrSold       SaleType         SaleCondition        SalePrice     
##  Min.   :2006   Length:2919        Length:2919        Min.   : 34900  
##  1st Qu.:2007   Class :character   Class :character   1st Qu.:129975  
##  Median :2008   Mode  :character   Mode  :character   Median :163000  
##  Mean   :2008                                         Mean   :180921  
##  3rd Qu.:2009                                         3rd Qu.:214000  
##  Max.   :2010                                         Max.   :755000  
##                                                       NA's   :1459

Dependent(Response) Variable

summary(house$SalePrice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   34900  129975  163000  180921  214000  755000    1459

Predictors (Numeric)

Correlations with SalePrice

numericHouse <- which(sapply(house, is.numeric)) #index vector numeric variables
numericHouseNames <- names(numericHouse) #saving names vector for use later on
cat('There are', length(numericHouse), 'numeric variables')
## There are 37 numeric variables
#library(scales)
ggplot(data=house[!is.na(house$SalePrice),], aes(x=factor(OverallQual), y=SalePrice))+
        geom_boxplot(col='blue', fill = "pink") + labs(x='Overall Quality') +
        scale_y_continuous(breaks= seq(0, 800000, by=100000), labels = comma)

Above Grade (Ground) Living Area (square feet)

library(ggrepel)
ggplot(data=house[!is.na(house$SalePrice),], aes(x=GrLivArea, y=SalePrice))+
        geom_point(col='blue') + geom_smooth(method = "lm", se=FALSE, color="black", aes(group=1)) +
        scale_y_continuous(breaks= seq(0, 800000, by=100000), labels = comma) +
        geom_text_repel(aes(label = ifelse(house$GrLivArea[!is.na(house$SalePrice)]>4500, rownames(house), '')))
## `geom_smooth()` using formula 'y ~ x'

#outliner
house[c(524, 1299), c('SalePrice', 'GrLivArea', 'OverallQual')]
##      SalePrice GrLivArea OverallQual
## 524     184750      4676          10
## 1299    160000      5642          10

Missing data, label encoding, and factorizing variables

Check NULL and NA values in data frame columns

#check dataset for NULL values
is.null(house)
## [1] FALSE
#check dataset NA values
NAcol <- which(colSums(is.na(house)) > 0)
sort(colSums(sapply(house[NAcol], is.na)), decreasing = TRUE)
##       PoolQC  MiscFeature        Alley        Fence    SalePrice  FireplaceQu 
##         2909         2814         2721         2348         1459         1420 
##  LotFrontage  GarageYrBlt GarageFinish   GarageQual   GarageCond   GarageType 
##          486          159          159          159          159          157 
##     BsmtCond BsmtExposure     BsmtQual BsmtFinType2 BsmtFinType1   MasVnrType 
##           82           82           81           80           79           24 
##   MasVnrArea     MSZoning    Utilities BsmtFullBath BsmtHalfBath   Functional 
##           23            4            2            2            2            2 
##  Exterior1st  Exterior2nd   BsmtFinSF1   BsmtFinSF2    BsmtUnfSF  TotalBsmtSF 
##            1            1            1            1            1            1 
##   Electrical  KitchenQual   GarageCars   GarageArea     SaleType 
##            1            1            1            1            1
#replace na value with zeros in r dataframe
nuhouse <- house
nuhouse[is.na(nuhouse)] = 0
any(is.na(nuhouse))
## [1] FALSE
Multiple Analysis: Restructure dataset

Perform dummy or treatment coding for categorical variables for use in regression or ANOVA. This coding will consists of creating dichotomous variables where each level of the categorical variable is contrasted to a specified reference level.

# creating the factor variable
#all_dataftr <- all_data %>% mutate_if(is.integer, as.numeric)
houseFactor <- nuhouse %>% mutate_if(is.character, as.factor)

Changing some numeric variables into factors Variables with NA’s are complete with zero value, and all character variables are converted into either numeric labels of into factors. There are some variables that are recorded as numeric and will be revalued as a categorical variable.

These classes are coded as numbers, but really are categories.

#MSSubClass (integer)
str(houseFactor$MSSubClass)
##  int [1:2919] 60 20 60 70 60 50 20 60 50 190 ...
#MSubClass (factor)
houseFactor$MSSubClass <- as.factor(houseFactor$MSSubClass)

#library(plyr)
#revalue for better readability (plyr package)
houseFactor$MSSubClass<-revalue(houseFactor$MSSubClass, c('20'='1 story 1946+', '30'='1 story 1945-', '40'='1 story unf attic', '45'='1,5 story unf', '50'='1,5 story fin', '60'='2 story 1946+', '70'='2 story 1945-', '75'='2,5 story all ages', '80'='split/multi level', '85'='split foyer', '90'='duplex all style/age', '120'='1 story PUD 1946+', '150'='1,5 story PUD all', '160'='2 story PUD 1946+', '180'='PUD multilevel', '190'='2 family conversion'))

str(houseFactor$MSSubClass)
##  Factor w/ 16 levels "1 story 1946+",..: 6 1 6 7 6 5 1 6 5 16 ...
  • Year Sold (YrSold)
#YrSold (integer)
str(houseFactor$YrSold)
##  int [1:2919] 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
#YrSold (factor)
houseFactor$YrSold <- as.factor(houseFactor$YrSold)
str(houseFactor$YrSold)
##  Factor w/ 5 levels "2006","2007",..: 3 2 3 1 3 4 2 4 3 3 ...
  • Month Sold (MoSold)
#MoSold (integer)
str(houseFactor$MoSold)
##  int [1:2919] 2 5 9 2 12 10 8 11 4 1 ...
#MoSold (factor)
houseFactor$MoSold <- as.factor(houseFactor$MoSold)
str(houseFactor$MoSold)
##  Factor w/ 12 levels "1","2","3","4",..: 2 5 9 2 12 10 8 11 4 1 ...
  • Year Built (YearBuilt)
#YearBuilt (integer)
str(houseFactor$YearBuilt)
##  int [1:2919] 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
#YearBuilt (factor)
houseFactor$YearBuilt <- as.factor(houseFactor$YearBuilt)
str(houseFactor$YearBuilt)
##  Factor w/ 118 levels "1872","1875",..: 111 84 109 26 108 101 112 81 42 49 ...
  • Remodel Date (YearRemodAdd)
#YearRemodAdd (integer)
str(houseFactor$YearRemodAdd)
##  int [1:2919] 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
#YearRemodAdd (factor)
houseFactor$YearRemodAdd <- as.factor(houseFactor$YearRemodAdd)
str(houseFactor$YearRemodAdd)
##  Factor w/ 61 levels "1950","1951",..: 54 27 53 21 51 46 56 24 1 1 ...
  • Overall Material and Finish of the House (OverallQual)
#OverallQual (integer)
str(houseFactor$OverallQual)
##  int [1:2919] 7 6 7 7 8 5 8 7 7 5 ...
#OverallQual (factor)
houseFactor$OverallQual <- as.factor(houseFactor$OverallQual)
str(houseFactor$OverallQual)
##  Factor w/ 10 levels "1","2","3","4",..: 7 6 7 7 8 5 8 7 7 5 ...
  • Overall Condition of the house (OverallCond)
#OverallCond (integer)
str(houseFactor$OverallCond)
##  int [1:2919] 5 8 5 5 5 5 5 6 5 6 ...
#OverallCond (factor)
houseFactor$OverallCond <- as.factor(houseFactor$OverallCond)
str(houseFactor$OverallCond)
##  Factor w/ 9 levels "1","2","3","4",..: 5 8 5 5 5 5 5 6 5 6 ...

Summary View

head(houseFactor)
##      MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour
## 1 2 story 1946+       RL          65    8450   Pave     0      Reg         Lvl
## 2 1 story 1946+       RL          80    9600   Pave     0      Reg         Lvl
## 3 2 story 1946+       RL          68   11250   Pave     0      IR1         Lvl
## 4 2 story 1945-       RL          60    9550   Pave     0      IR1         Lvl
## 5 2 story 1946+       RL          84   14260   Pave     0      IR1         Lvl
## 6 1,5 story fin       RL          85   14115   Pave     0      IR1         Lvl
##   Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType
## 1    AllPub    Inside       Gtl      CollgCr       Norm       Norm     1Fam
## 2    AllPub       FR2       Gtl      Veenker      Feedr       Norm     1Fam
## 3    AllPub    Inside       Gtl      CollgCr       Norm       Norm     1Fam
## 4    AllPub    Corner       Gtl      Crawfor       Norm       Norm     1Fam
## 5    AllPub       FR2       Gtl      NoRidge       Norm       Norm     1Fam
## 6    AllPub    Inside       Gtl      Mitchel       Norm       Norm     1Fam
##   HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl
## 1     2Story           7           5      2003         2003     Gable  CompShg
## 2     1Story           6           8      1976         1976     Gable  CompShg
## 3     2Story           7           5      2001         2002     Gable  CompShg
## 4     2Story           7           5      1915         1970     Gable  CompShg
## 5     2Story           8           5      2000         2000     Gable  CompShg
## 6     1.5Fin           5           5      1993         1995     Gable  CompShg
##   Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation
## 1     VinylSd     VinylSd    BrkFace        196        Gd        TA      PConc
## 2     MetalSd     MetalSd       None          0        TA        TA     CBlock
## 3     VinylSd     VinylSd    BrkFace        162        Gd        TA      PConc
## 4     Wd Sdng     Wd Shng       None          0        TA        TA     BrkTil
## 5     VinylSd     VinylSd    BrkFace        350        Gd        TA      PConc
## 6     VinylSd     VinylSd       None          0        TA        TA       Wood
##   BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
## 1       Gd       TA           No          GLQ        706          Unf
## 2       Gd       TA           Gd          ALQ        978          Unf
## 3       Gd       TA           Mn          GLQ        486          Unf
## 4       TA       Gd           No          ALQ        216          Unf
## 5       Gd       TA           Av          GLQ        655          Unf
## 6       Gd       TA           No          GLQ        732          Unf
##   BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical
## 1          0       150         856    GasA        Ex          Y      SBrkr
## 2          0       284        1262    GasA        Ex          Y      SBrkr
## 3          0       434         920    GasA        Ex          Y      SBrkr
## 4          0       540         756    GasA        Gd          Y      SBrkr
## 5          0       490        1145    GasA        Ex          Y      SBrkr
## 6          0        64         796    GasA        Ex          Y      SBrkr
##   X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath
## 1       856       854            0      1710            1            0        2
## 2      1262         0            0      1262            0            1        2
## 3       920       866            0      1786            1            0        2
## 4       961       756            0      1717            1            0        1
## 5      1145      1053            0      2198            1            0        2
## 6       796       566            0      1362            1            0        1
##   HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional
## 1        1            3            1          Gd            8        Typ
## 2        0            3            1          TA            6        Typ
## 3        1            3            1          Gd            6        Typ
## 4        0            3            1          Gd            7        Typ
## 5        1            4            1          Gd            9        Typ
## 6        1            1            1          TA            5        Typ
##   Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars
## 1          0           0     Attchd        2003          RFn          2
## 2          1          TA     Attchd        1976          RFn          2
## 3          1          TA     Attchd        2001          RFn          2
## 4          1          Gd     Detchd        1998          Unf          3
## 5          1          TA     Attchd        2000          RFn          3
## 6          0           0     Attchd        1993          Unf          2
##   GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF
## 1        548         TA         TA          Y          0          61
## 2        460         TA         TA          Y        298           0
## 3        608         TA         TA          Y          0          42
## 4        642         TA         TA          Y          0          35
## 5        836         TA         TA          Y        192          84
## 6        480         TA         TA          Y         40          30
##   EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature
## 1             0          0           0        0      0     0           0
## 2             0          0           0        0      0     0           0
## 3             0          0           0        0      0     0           0
## 4           272          0           0        0      0     0           0
## 5             0          0           0        0      0     0           0
## 6             0        320           0        0      0 MnPrv        Shed
##   MiscVal MoSold YrSold SaleType SaleCondition SalePrice
## 1       0      2   2008       WD        Normal    208500
## 2       0      5   2007       WD        Normal    181500
## 3       0      9   2008       WD        Normal    223500
## 4       0      2   2006       WD       Abnorml    140000
## 5       0     12   2008       WD        Normal    250000
## 6     700     10   2009       WD        Normal    143000
Assumptions for Mutiple Regression
  1. Independence of observations(aka no autocorrelation)

The correlation table shows the relationship between numeric variables, and seems relatively small.

#library(corrr)
#set correlation coefficient (or covariance) to "pearson (default)"
houseCor <- correlate(select_if(houseFactor, is.numeric), diagonal = 1)
houseCor
## # A tibble: 30 x 31
##    term         LotFrontage  LotArea MasVnrArea BsmtFinSF1 BsmtFinSF2  BsmtUnfSF
##    <chr>              <dbl>    <dbl>      <dbl>      <dbl>      <dbl>      <dbl>
##  1 LotFrontage      1       0.135        0.109      0.0692   -0.00468  0.137    
##  2 LotArea          0.135   1            0.125      0.194     0.0841   0.0216   
##  3 MasVnrArea       0.109   0.125        1          0.302    -0.0146   0.0882   
##  4 BsmtFinSF1       0.0692  0.194        0.302      1        -0.0549  -0.477    
##  5 BsmtFinSF2      -0.00468 0.0841      -0.0146    -0.0549    1       -0.238    
##  6 BsmtUnfSF        0.137   0.0216       0.0882    -0.477    -0.238    1        
##  7 TotalBsmtSF      0.206   0.254        0.394      0.537     0.0896   0.413    
##  8 X1stFlrSF        0.242   0.332        0.392      0.458     0.0844   0.297    
##  9 X2ndFlrSF       -0.00466 0.0315       0.119     -0.162    -0.0977  -0.0000324
## 10 LowQualFinSF     0.0190  0.000554    -0.0574    -0.0660   -0.00491  0.0469   
## # ... with 20 more rows, and 24 more variables: TotalBsmtSF <dbl>,
## #   X1stFlrSF <dbl>, X2ndFlrSF <dbl>, LowQualFinSF <dbl>, GrLivArea <dbl>,
## #   BsmtFullBath <dbl>, BsmtHalfBath <dbl>, FullBath <dbl>, HalfBath <dbl>,
## #   BedroomAbvGr <dbl>, KitchenAbvGr <dbl>, TotRmsAbvGrd <dbl>,
## #   Fireplaces <dbl>, GarageYrBlt <dbl>, GarageCars <dbl>, GarageArea <dbl>,
## #   WoodDeckSF <dbl>, OpenPorchSF <dbl>, EnclosedPorch <dbl>, X3SsnPorch <dbl>,
## #   ScreenPorch <dbl>, PoolArea <dbl>, MiscVal <dbl>, SalePrice <dbl>
  1. Normality

The histogram shows a almost symmetrical distribution, the mean and median of the data are roughly the same and are approximately at the center of the data.

hist(log(houseFactor$SalePrice), col = "blue", border = "yellow", 
     main = "Natural Log Distribution of Sale Price", xlab = "Sale Price")

  1. Linearity Visualizing the Relationship in the Data

The pairs() function provides a plot matrix, consisting of scatterplots for only eight numeric variable-combination from the houseFactor dataframe.

The pairwise combination plot shows: * the data frame names of the numeric variables diagonally * the other cells of the plot matrix show a scatterplot (i.e. correlation plot) of each variable combination * the left figure in second row illustrates the correlation between log_SalePrice and MoSol and so on …

#use select_if() function to select only numeric variables

pairs(~ SalePrice + LotArea + PoolArea + X1stFlrSF + LotFrontage + Fireplaces + KitchenAbvGr + LowQualFinSF, data = houseFactor, gap = 0.5, main = "Pairs matrix", pch = 21, 
      bg = c("red", "green3", "blue", "yellow"), upper.panel = NULL) 

  1. Homoscedasticity

We will check this after we make the model.

Mult-Factor Linear Regression Model

PreProcessing predictor variables

#subset numeric columns with dplyr
houseNums <- data.frame(select_if(houseFactor, is.numeric))
houseNums[] <- lapply(houseNums, function(x) as.numeric(as.character(x)))


#subset character columns with dplyr
houseChars <- houseFactor[,!names(houseFactor) %in% colnames(houseNums)]
houseChars <- houseChars[, names(houseChars) != 'SalePrice']


cat('There are', length(houseNums), 'numeric variables, and', length(houseChars), 'factor variables')
## There are 30 numeric variables, and 50 factor variables

8.3.1 Skewness and normalizing of the numeric predictors

#library(psych)

for(i in 1:ncol(houseNums)){
        if (abs(skew(houseNums[,i]))>0.8){
                houseNums[,i] <- log(houseNums[,i] +1)
        }
}

Normalizing the data

library(caret)
PreNum <- preProcess(houseNums, method=c("center", "scale"))
(PreNum)
## Created from 2919 samples and 30 variables
## 
## Pre-processing:
##   - centered (30)
##   - ignored (0)
##   - scaled (30)
DFnorm <- predict(PreNum, houseNums)
dim(DFnorm)
## [1] 2919   30
  • One hot encoding the categorical variables

To do this one-hot encoding, I am using the model.matrix() function.

DFdummies <- as.data.frame(model.matrix(~.-1, houseChars))
dim(DFdummies)
## [1] 2919  457
  • Removing levels with few or no observations in train or test
#check if some values are absent in the test set
ZerocolTest <- which(colSums(DFdummies[(nrow(houseFactor[!is.na(houseFactor$SalePrice),])+1):nrow(houseFactor),])==0)
colnames(DFdummies[ZerocolTest])
## character(0)

Also taking out variables with less than 10 ‘ones’ in the train set.

fewOnes <- which(colSums(DFdummies[1:nrow(houseFactor[!is.na(houseFactor$SalePrice),]),])<10)
colnames(DFdummies[fewOnes])
##  [1] "MSSubClass1 story unf attic" "MSSubClass1,5 story PUD all"
##  [3] "UtilitiesNoSeWa"             "Condition1RRNe"             
##  [5] "Condition1RRNn"              "Condition2PosA"             
##  [7] "Condition2PosN"              "Condition2RRAe"             
##  [9] "Condition2RRAn"              "Condition2RRNn"             
## [11] "HouseStyle2.5Fin"            "YearBuilt1875"              
## [13] "YearBuilt1879"               "YearBuilt1880"              
## [15] "YearBuilt1882"               "YearBuilt1885"              
## [17] "YearBuilt1890"               "YearBuilt1892"              
## [19] "YearBuilt1893"               "YearBuilt1895"              
## [21] "YearBuilt1896"               "YearBuilt1898"              
## [23] "YearBuilt1901"               "YearBuilt1902"              
## [25] "YearBuilt1904"               "YearBuilt1905"              
## [27] "YearBuilt1906"               "YearBuilt1907"              
## [29] "YearBuilt1908"               "YearBuilt1911"              
## [31] "YearBuilt1912"               "YearBuilt1913"              
## [33] "YearBuilt1914"               "YearBuilt1917"              
## [35] "YearBuilt1919"               "YearBuilt1927"              
## [37] "YearBuilt1928"               "YearBuilt1929"              
## [39] "YearBuilt1931"               "YearBuilt1932"              
## [41] "YearBuilt1934"               "YearBuilt1937"              
## [43] "YearBuilt1942"               "YearBuilt1981"              
## [45] "YearBuilt1982"               "YearBuilt1983"              
## [47] "YearBuilt1985"               "YearBuilt1987"              
## [49] "YearBuilt1989"               "YearBuilt2010"              
## [51] "YearRemodAdd1982"            "RoofStyleShed"              
## [53] "RoofMatlMembran"             "RoofMatlMetal"              
## [55] "RoofMatlRoll"                "RoofMatlWdShake"            
## [57] "RoofMatlWdShngl"             "Exterior1stAsphShn"         
## [59] "Exterior1stBrkComm"          "Exterior1stCBlock"          
## [61] "Exterior1stImStucc"          "Exterior1stStone"           
## [63] "Exterior2ndAsphShn"          "Exterior2ndCBlock"          
## [65] "Exterior2ndOther"            "Exterior2ndStone"           
## [67] "ExterCondPo"                 "FoundationWood"             
## [69] "BsmtCondPo"                  "HeatingGrav"                
## [71] "HeatingOthW"                 "HeatingWall"                
## [73] "HeatingQCPo"                 "ElectricalFuseP"            
## [75] "ElectricalMix"               "FunctionalMaj2"             
## [77] "FunctionalSev"               "GarageQualEx"               
## [79] "GarageQualPo"                "GarageCondEx"               
## [81] "PoolQCEx"                    "PoolQCFa"                   
## [83] "PoolQCGd"                    "MiscFeatureGar2"            
## [85] "MiscFeatureOthr"             "MiscFeatureTenC"            
## [87] "SaleTypeCon"                 "SaleTypeConLI"              
## [89] "SaleTypeConLw"               "SaleTypeOth"
DFdummies <- DFdummies[,-fewOnes] #removing predictors
dim(DFdummies)
## [1] 2919  367
comb_house <- cbind(DFnorm, DFdummies) #combining all (now numeric) predictors into one dataframe 

8.5 Composing train and test sets

houseFactor$SalePrice <- log((houseFactor$SalePrice)+1) #default is the natural logarithm, "+1" is not necessary as there are no 0's
train.house <- comb_house[!is.na(houseFactor$SalePrice),]
test.house <- comb_house[is.na(houseFactor$SalePrice),]

Caret Package Model - LeapBackward Method

library(leaps)
library(caret)

# Set seed for reproducibility
set.seed(123)
# Set up repeated k-fold cross-validation
train.control <- trainControl(method = "cv", number = 10)
# Train the model
step.modelhouse <- train(SalePrice ~., data = train.house,
                    method = "leapBackward", 
                    tuneGrid = data.frame(nvmax = 1:6),
                    trControl = train.control
                    )
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
## Reordering variables and trying again:
#step.modelhouse$results
step.modelhouse$bestTune
##   nvmax
## 4     4
step.modelhouse
## Linear Regression with Backwards Selection 
## 
## 2919 samples
##  396 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 2626, 2627, 2627, 2627, 2627, 2627, ... 
## Resampling results across tuning parameters:
## 
##   nvmax  RMSE      Rsquared     MAE      
##   1      1.000388  0.003982680  0.9987917
##   2      1.001328  0.001858748  0.9991230
##   3      1.001086  0.002615625  0.9987535
##   4      1.000104  0.003540293  0.9975237
##   5      1.000943  0.002447034  0.9982924
##   6      1.001125  0.002327971  0.9979521
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was nvmax = 4.
plot(step.modelhouse)

Homoscedasticity

Plot fitted vs residual plot

# produce a residual vs fitted plot for visulaizting heteroscedasticity
plotres <- resid(step.modelhouse)
plot(fitted(step.modelhouse), plotres,
     pch = 21, col="brown")
abline(0,0, lwd = 3, col = "blue")

Q-Q Plot: the plot shows the residuals generated follow a roughly normal distribution with a heavy bottom tail. The majority of the data points falls on straight line of 45 degree angles, and the data is likely normally distributed.

#create Q-Q plot for residuals
qqnorm(plotres, col="blue")

#add a straight diagonal line to the plot
qqline(plotres, lwd = 3, col = "red") 

The density plot shows the residuals are normally distributed. The data has near symmetric rough bell-shaped curve that follows a normal distribution.

#Create density plot of residuals
plot(density(plotres), lwd = 4, col = "purple")

Prediction

predHouse = predict(step.modelhouse, test.house)
predHouse2 <- exp(predHouse)*100000
#predHouse2
#submit <- data.frame(Id = test.labels, SalePrice = predHouse2)
#write.csv(submit, file="C:/Users/andre/OneDrive/Documents/GitHub/DATA605/Final Exam/Kaggle_Submission.csv", quote=FALSE, row.names=FALSE)

Additional Final Reports

RPubs - Final (1/3): Playing With PageRank

RPubs - Final (2/3): Digit Recognizer