Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \[\frac{(N+1)}{2}\]
5 points. Probability. Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.
## [1] "N: 100 , Median X: 50.69 1st Quartile Y16.48, X<x: 0.5, X > x: 0.5, Y > y: 0.75, Y<y: 0.25, X > y: 0.839 , X<y: 0.161"
## [1] "P(X>x | X>y) = 0.84"
## [1] "P(X>x, Y>y) = 0.38"
## [1] "P(X<x | X>y)\t = 0.84"
5 points.Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.
## [1] "Joint Probability Matrix"
| Values | Y> 1st Quartile | Y< 1st Quartile | X Totals |
|---|---|---|---|
| X > Median | 0.375 | 0.125 | 0.5 |
| X < Median | 0.375 | 0.125 | 0.5 |
| Y Totals | 0.75 | 0.25 | 1 |
## [1] "Joint Prob, P(X>x and Y>y) 0.375"
## [1] "Marginal Prob, P(X>x)P(Y>y)) 0.375"
## [1] "Based on the below table, the joint probability is equal to the marginal probability"
5 points. Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
Chi Square
## [,1] [,2]
## [1,] 0.375 0.125
## [2,] 0.375 0.125
##
## Pearson's Chi-squared test
##
## data: chi.table
## X-squared = 0, df = 1, p-value = 1
Fisher Test
##
## Fisher's Exact Test for Count Data
##
## data: chi.table
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0 Inf
## sample estimates:
## odds ratio
## 0
You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/house-prices-advanced-regression-techniques . I want you to do the following.
5 points. Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
## [1] "Dimensions of the raw train.data c(1460, 81)"
| x | |
|---|---|
| Id | integer |
| MSSubClass | integer |
| MSZoning | character |
| LotFrontage | integer |
| LotArea | integer |
| Street | character |
| Alley | character |
| LotShape | character |
| LandContour | character |
| Utilities | character |
| LotConfig | character |
| LandSlope | character |
| Neighborhood | character |
| Condition1 | character |
| Condition2 | character |
| BldgType | character |
| HouseStyle | character |
| OverallQual | integer |
| OverallCond | integer |
| YearBuilt | integer |
| YearRemodAdd | integer |
| RoofStyle | character |
| RoofMatl | character |
| Exterior1st | character |
| Exterior2nd | character |
| MasVnrType | character |
| MasVnrArea | integer |
| ExterQual | character |
| ExterCond | character |
| Foundation | character |
| BsmtQual | character |
| BsmtCond | character |
| BsmtExposure | character |
| BsmtFinType1 | character |
| BsmtFinSF1 | integer |
| BsmtFinType2 | character |
| BsmtFinSF2 | integer |
| BsmtUnfSF | integer |
| TotalBsmtSF | integer |
| Heating | character |
| HeatingQC | character |
| CentralAir | character |
| Electrical | character |
| X1stFlrSF | integer |
| X2ndFlrSF | integer |
| LowQualFinSF | integer |
| GrLivArea | integer |
| BsmtFullBath | integer |
| BsmtHalfBath | integer |
| FullBath | integer |
| HalfBath | integer |
| BedroomAbvGr | integer |
| KitchenAbvGr | integer |
| KitchenQual | character |
| TotRmsAbvGrd | integer |
| Functional | character |
| Fireplaces | integer |
| FireplaceQu | character |
| GarageType | character |
| GarageYrBlt | integer |
| GarageFinish | character |
| GarageCars | integer |
| GarageArea | integer |
| GarageQual | character |
| GarageCond | character |
| PavedDrive | character |
| WoodDeckSF | integer |
| OpenPorchSF | integer |
| EnclosedPorch | integer |
| X3SsnPorch | integer |
| ScreenPorch | integer |
| PoolArea | integer |
| PoolQC | character |
| Fence | character |
| MiscFeature | character |
| MiscVal | integer |
| MoSold | integer |
| YrSold | integer |
| SaleType | character |
| SaleCondition | character |
| SalePrice | integer |
| Variable.Name | 1st Qu. | 3rd Qu. | Max. | Mean | Min. | Unique.Values | NA.COUNTS |
|---|---|---|---|---|---|---|---|
| Id | 365.8 | 1095.2 | 1460.0 | 730.5 | 1.0 | 1460 | 0 |
| LotArea | 7554 | 11602 | 215245 | 10517 | 1300 | 1073 | 0 |
| GrLivArea | 1130 | 1777 | 5642 | 1515 | 334 | 861 | 0 |
| BsmtUnfSF | 223.0 | 808.0 | 2336.0 | 567.2 | 0.0 | 780 | 0 |
| X1stFlrSF | 882 | 1391 | 4692 | 1163 | 334 | 753 | 0 |
| TotalBsmtSF | 795.8 | 1298.2 | 6110.0 | 1057.4 | 0.0 | 721 | 0 |
| SalePrice | 129975 | 214000 | 755000 | 180921 | 34900 | 663 | 0 |
| BsmtFinSF1 | 0.0 | 712.2 | 5644.0 | 443.6 | 0.0 | 637 | 0 |
| GarageArea | 334.5 | 576.0 | 1418.0 | 473.0 | 0.0 | 441 | 0 |
| X2ndFlrSF | 0 | 728 | 2065 | 347 | 0 | 417 | 0 |
| MasVnrArea | 0.0 | 166.0 | 1600.0 | 103.7 | 0.0 | 328 | 8 |
| WoodDeckSF | 0.00 | 168.00 | 857.00 | 94.24 | 0.00 | 274 | 0 |
| OpenPorchSF | 0.00 | 68.00 | 547.00 | 46.66 | 0.00 | 202 | 0 |
| BsmtFinSF2 | 0.00 | 0.00 | 1474.00 | 46.55 | 0.00 | 144 | 0 |
| EnclosedPorch | 0.00 | 0.00 | 552.00 | 21.95 | 0.00 | 120 | 0 |
| YearBuilt | 1954 | 2000 | 2010 | 1971 | 1872 | 112 | 0 |
| LotFrontage | 59.00 | 80.00 | 313.00 | 70.05 | 21.00 | 111 | 259 |
| GarageYrBlt | 1961 | 2002 | 2010 | 1979 | 1900 | 98 | 81 |
| ScreenPorch | 0.00 | 0.00 | 480.00 | 15.06 | 0.00 | 76 | 0 |
| YearRemodAdd | 1967 | 2004 | 2010 | 1985 | 1950 | 61 | 0 |
| LowQualFinSF | 0.000 | 0.000 | 572.000 | 5.845 | 0.000 | 24 | 0 |
| MiscVal | 0.00 | 0.00 | 15500.00 | 43.49 | 0.00 | 21 | 0 |
| X3SsnPorch | 0.00 | 0.00 | 508.00 | 3.41 | 0.00 | 20 | 0 |
| MSSubClass | 20.0 | 70.0 | 190.0 | 56.9 | 20.0 | 15 | 0 |
| MoSold | 5.000 | 8.000 | 12.000 | 6.322 | 1.000 | 12 | 0 |
| TotRmsAbvGrd | 5.000 | 7.000 | 14.000 | 6.518 | 2.000 | 12 | 0 |
| OverallQual | 5.000 | 7.000 | 10.000 | 6.099 | 1.000 | 10 | 0 |
| OverallCond | 5.000 | 6.000 | 9.000 | 5.575 | 1.000 | 9 | 0 |
| BedroomAbvGr | 2.000 | 3.000 | 8.000 | 2.866 | 0.000 | 8 | 0 |
| PoolArea | 0.000 | 0.000 | 738.000 | 2.759 | 0.000 | 8 | 0 |
| GarageCars | 1.000 | 2.000 | 4.000 | 1.767 | 0.000 | 5 | 0 |
| YrSold | 2007 | 2009 | 2010 | 2008 | 2006 | 5 | 0 |
| BsmtFullBath | 0.0000 | 1.0000 | 3.0000 | 0.4253 | 0.0000 | 4 | 0 |
| Fireplaces | 0.000 | 1.000 | 3.000 | 0.613 | 0.000 | 4 | 0 |
| FullBath | 1.000 | 2.000 | 3.000 | 1.565 | 0.000 | 4 | 0 |
| KitchenAbvGr | 1.000 | 1.000 | 3.000 | 1.047 | 0.000 | 4 | 0 |
| BsmtHalfBath | 0.00000 | 0.00000 | 2.00000 | 0.05753 | 0.00000 | 3 | 0 |
| HalfBath | 0.0000 | 1.0000 | 2.0000 | 0.3829 | 0.0000 | 3 | 0 |
##### Variable Density Graphs
The p value in all cases is very low, causing us to reject the null hypothesis that there is 0 correlation
In the train data, we are 80% confident that the data correlation coefficient is between 0.2315997 and 0.2940809 between GrLivArea & LotArea
In the train data, we are 80% confident that the data correlation coefficient is between 0.3713236 and 0.4333466 between GrLivArea & LotFrontage
In the train data, we are 80% confident that the data correlation coefficient is between 0.3953198 and 0.4559147between LotArea & LotFrontage
##
## Pearson's product-moment correlation
##
## data: arg_1 and arg_2
## t = 10.414, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.2315997 0.2940809
## sample estimates:
## cor
## 0.2631162
##
## Pearson's product-moment correlation
##
## data: arg_1 and arg_2
## t = 15.238, df = 1199, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.3713236 0.4333466
## sample estimates:
## cor
## 0.4027974
##
## Pearson's product-moment correlation
##
## data: arg_1 and arg_2
## t = 16.309, df = 1199, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.3953198 0.4559147
## sample estimates:
## cor
## 0.426095
5 points. Linear Algebra and Correlation. Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.
| GrLivArea | LotArea | LotFrontage | |
|---|---|---|---|
| GrLivArea | 1.0000000 | 0.2631162 | 0.4027974 |
| LotArea | 0.2631162 | 1.0000000 | 0.4260950 |
| LotFrontage | 0.4027974 | 0.4260950 | 1.0000000 |
| GrLivArea | LotArea | LotFrontage | |
|---|---|---|---|
| GrLivArea | 1.2084186 | -0.1350780 | -0.4291918 |
| LotArea | -0.1350780 | 1.2369313 | -0.4726412 |
| LotFrontage | -0.4291918 | -0.4726412 | 1.3742674 |
| GrLivArea | LotArea | LotFrontage | |
|---|---|---|---|
| GrLivArea | 1 | 0 | 0 |
| LotArea | 0 | 1 | 0 |
| LotFrontage | 0 | 0 | 1 |
| GrLivArea | LotArea | LotFrontage | |
|---|---|---|---|
| GrLivArea | 1 | 0 | 0 |
| LotArea | 0 | 1 | 0 |
| LotFrontage | 0 | 0 | 1 |
## [1] "LU decomposition of the 3 variable sample matrix"
## $L
## [,1] [,2] [,3]
## [1,] 1.0000000 0.0000000 0
## [2,] 0.2631162 1.0000000 0
## [3,] 0.4027974 0.3439223 1
##
## $U
## [,1] [,2] [,3]
## [1,] 1 0.2631162 0.4027974
## [2,] 0 0.9307699 0.3201125
## [3,] 0 0.0000000 0.7276604
## [1] "LU decomposition of the 3 variable inverted sample matrix"
## $L
## [,1] [,2] [,3]
## [1,] 1.0000000 0.000000 0
## [2,] -0.1117808 1.000000 0
## [3,] -0.3551682 -0.426095 1
##
## $U
## [,1] [,2] [,3]
## [1,] 1.208419e+00 -0.135078 -0.4291918
## [2,] 0.000000e+00 1.221832 -0.5206166
## [3,] 5.551115e-17 0.000000 1.0000000
5 points. Calculus-Based Probability & Statistics. Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of ??? for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, ???)). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.
## [1] "5th Percentile empirical data() 77.73"
## [1] "95th Percentile empirical data() 4539.92"
## [1] "5th Percentile normal data() 848"
## [1] "95th Percentile normal data() 2466.1"
10 points. Modeling. Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.
## [1] "All Starter Columns Dimensions for Reference 82"
## [1] "Number Columns Post nearZeroVar 61"
immediately remove the high NA counts from the combined dataset dataset: ‘PoolQC’ (2,909 nulls),‘MiscFeature’ (2,814 nulls), ‘Alley’ (2,721 nulls), ‘Fence’ (2,348 nulls),‘FireplaceQu’ (1,420 nulls) as these columns are majority null values
| NA. | COUNTS Vari | able.Name |
|---|---|---|
| 6 | 486 | LotFrontage |
| 7 | 159 | GarageYrBlt |
| 8 | 159 | GarageFinish |
| 9 | 157 | GarageType |
| 10 | 82 | BsmtExposure |
| 11 | 81 | BsmtQual |
| 12 | 79 | BsmtFinType1 |
| 13 | 24 | MasVnrType |
| 14 | 23 | MasVnrArea |
| 15 | 4 | MSZoning |
| 16 | 2 | BsmtFullBath |
| 17 | 2 | BsmtHalfBath |
| 18 | 1 | Exterior1st |
| 19 | 1 | Exterior2nd |
| 20 | 1 | BsmtFinSF1 |
| 21 | 1 | BsmtUnfSF |
| 22 | 1 | TotalBsmtSF |
| 23 | 1 | Electrical |
| 24 | 1 | KitchenQual |
| 25 | 1 | GarageCars |
| 26 | 1 | GarageArea |
| 27 | 1 | SaleType |
## [1] "Columns Post Extreme Null Removal 56"
##
## Call:
## lm(formula = SalePrice ~ ., data = clean.train %>% dplyr::select(-c(Exterior2nd,
## BsmtFinType1, GarageFinish)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -325440 -11730 -107 11314 234260
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.852e+05 1.236e+06 0.635 0.525386
## Id -1.204e+00 1.863e+00 -0.646 0.518114
## MSSubClass -9.681e+01 9.643e+01 -1.004 0.315576
## MSZoningFV 3.068e+04 1.435e+04 2.138 0.032693 *
## MSZoningRH 2.717e+04 1.416e+04 1.919 0.055191 .
## MSZoningRL 2.574e+04 1.202e+04 2.142 0.032399 *
## MSZoningRM 2.273e+04 1.126e+04 2.019 0.043688 *
## LotFrontage -1.286e+02 4.936e+01 -2.605 0.009301 **
## LotArea 3.725e-01 9.816e-02 3.794 0.000155 ***
## LotShapeIR2 9.824e+03 5.001e+03 1.965 0.049679 *
## LotShapeIR3 -2.484e+04 1.039e+04 -2.390 0.016979 *
## LotShapeReg 1.668e+03 1.910e+03 0.873 0.382718
## LotConfigCulDSac 7.635e+03 3.937e+03 1.939 0.052657 .
## LotConfigFR2 -1.001e+04 4.855e+03 -2.062 0.039428 *
## LotConfigFR3 -1.781e+04 1.515e+04 -1.176 0.239717
## LotConfigInside -1.245e+03 2.123e+03 -0.586 0.557740
## NeighborhoodBlueste -2.535e+03 2.285e+04 -0.111 0.911654
## NeighborhoodBrDale 1.020e+04 1.315e+04 0.775 0.438212
## NeighborhoodBrkSide -2.873e+03 1.106e+04 -0.260 0.795196
## NeighborhoodClearCr -9.427e+03 1.077e+04 -0.875 0.381653
## NeighborhoodCollgCr -4.367e+03 8.664e+03 -0.504 0.614311
## NeighborhoodCrawfor 1.457e+04 1.010e+04 1.442 0.149599
## NeighborhoodEdwards -2.283e+04 9.552e+03 -2.390 0.016967 *
## NeighborhoodGilbert -7.715e+03 9.293e+03 -0.830 0.406545
## NeighborhoodIDOTRR -9.908e+03 1.271e+04 -0.780 0.435712
## NeighborhoodMeadowV 1.549e+03 1.342e+04 0.115 0.908143
## NeighborhoodMitchel -1.472e+04 9.739e+03 -1.511 0.130907
## NeighborhoodNAmes -1.313e+04 9.279e+03 -1.416 0.157129
## NeighborhoodNoRidge 4.906e+04 1.008e+04 4.867 1.27e-06 ***
## NeighborhoodNPkVill 1.138e+04 1.326e+04 0.858 0.390826
## NeighborhoodNridgHt 3.210e+04 8.913e+03 3.601 0.000329 ***
## NeighborhoodNWAmes -1.144e+04 9.545e+03 -1.199 0.230891
## NeighborhoodOldTown -1.707e+04 1.139e+04 -1.499 0.134135
## NeighborhoodSawyer -8.574e+03 9.716e+03 -0.882 0.377692
## NeighborhoodSawyerW 1.724e+03 9.327e+03 0.185 0.853428
## NeighborhoodSomerst 1.003e+04 1.080e+04 0.929 0.353095
## NeighborhoodStoneBr 4.883e+04 9.879e+03 4.943 8.69e-07 ***
## NeighborhoodSWISU -1.689e+04 1.151e+04 -1.468 0.142415
## NeighborhoodTimber -7.122e+03 9.637e+03 -0.739 0.460015
## NeighborhoodVeenker 6.905e+03 1.257e+04 0.549 0.582906
## Condition1Feedr -5.308e+03 5.751e+03 -0.923 0.356208
## Condition1Norm 7.120e+03 4.763e+03 1.495 0.135187
## Condition1PosA 5.132e+03 1.171e+04 0.438 0.661363
## Condition1PosN -8.834e+03 8.340e+03 -1.059 0.289691
## Condition1RRAe -2.042e+04 1.096e+04 -1.863 0.062708 .
## Condition1RRAn 5.812e+03 7.786e+03 0.746 0.455505
## Condition1RRNe -3.154e+03 2.130e+04 -0.148 0.882332
## Condition1RRNn 1.283e+02 1.474e+04 0.009 0.993057
## BldgType2fmCon -4.154e+02 1.422e+04 -0.029 0.976699
## BldgTypeDuplex -1.270e+04 7.111e+03 -1.786 0.074392 .
## BldgTypeTwnhs -2.587e+04 1.170e+04 -2.210 0.027244 *
## BldgTypeTwnhsE -2.092e+04 1.050e+04 -1.991 0.046668 *
## HouseStyle1.5Unf 1.442e+04 9.024e+03 1.597 0.110422
## HouseStyle1Story 1.972e+04 4.978e+03 3.963 7.82e-05 ***
## HouseStyle2.5Fin -1.784e+04 1.405e+04 -1.269 0.204528
## HouseStyle2.5Unf -1.020e+04 1.025e+04 -0.996 0.319677
## HouseStyle2Story -1.048e+04 4.025e+03 -2.603 0.009338 **
## HouseStyleSFoyer 1.201e+04 7.398e+03 1.624 0.104651
## HouseStyleSLvl 1.081e+04 6.390e+03 1.691 0.091024 .
## OverallQual 9.020e+03 1.167e+03 7.732 2.10e-14 ***
## OverallCond 5.487e+03 9.967e+02 5.506 4.42e-08 ***
## YearBuilt 1.296e+02 8.620e+01 1.504 0.132888
## YearRemodAdd 3.619e+01 6.515e+01 0.555 0.578680
## RoofStyleGable 2.835e+03 9.566e+03 0.296 0.767008
## RoofStyleGambrel 8.482e+03 1.340e+04 0.633 0.526909
## RoofStyleHip 5.286e+03 9.767e+03 0.541 0.588439
## RoofStyleMansard 1.072e+04 1.491e+04 0.719 0.472413
## RoofStyleShed 8.407e+03 2.287e+04 0.368 0.713172
## Exterior1stAsphShn -4.446e+03 3.071e+04 -0.145 0.884924
## Exterior1stBrkComm -9.432e+03 2.281e+04 -0.414 0.679304
## Exterior1stBrkFace 1.651e+04 8.425e+03 1.959 0.050304 .
## Exterior1stCBlock 1.107e+04 3.211e+04 0.345 0.730342
## Exterior1stCemntBd -4.659e+02 8.792e+03 -0.053 0.957750
## Exterior1stHdBoard -5.398e+03 7.689e+03 -0.702 0.482797
## Exterior1stImStucc -2.435e+04 2.993e+04 -0.814 0.416075
## Exterior1stMetalSd 2.257e+02 7.434e+03 0.030 0.975781
## Exterior1stPlywood -3.322e+03 8.086e+03 -0.411 0.681302
## Exterior1stStone 3.570e+03 2.252e+04 0.159 0.874074
## Exterior1stStucco -1.377e+04 9.442e+03 -1.458 0.145071
## Exterior1stVinylSd -2.060e+02 7.494e+03 -0.027 0.978069
## Exterior1stWd Sdng -2.715e+03 7.372e+03 -0.368 0.712705
## Exterior1stWdShing -4.008e+03 9.235e+03 -0.434 0.664403
## MasVnrTypeBrkFace 1.129e+04 7.972e+03 1.417 0.156839
## MasVnrTypeNone 1.326e+04 8.025e+03 1.652 0.098785 .
## MasVnrTypeStone 1.341e+04 8.484e+03 1.581 0.114054
## MasVnrArea 1.236e+01 6.854e+00 1.804 0.071494 .
## ExterQualFa -1.700e+04 1.202e+04 -1.415 0.157440
## ExterQualGd -1.212e+04 5.721e+03 -2.119 0.034268 *
## ExterQualTA -1.431e+04 6.314e+03 -2.266 0.023630 *
## ExterCondFa -1.781e+04 1.883e+04 -0.946 0.344555
## ExterCondGd -2.151e+04 1.758e+04 -1.224 0.221289
## ExterCondPo -3.301e+04 3.505e+04 -0.942 0.346519
## ExterCondTA -2.030e+04 1.756e+04 -1.156 0.247892
## FoundationCBlock 6.522e+03 3.718e+03 1.754 0.079607 .
## FoundationPConc 6.411e+03 4.050e+03 1.583 0.113699
## FoundationSlab -3.416e+03 8.600e+03 -0.397 0.691236
## FoundationStone 7.674e+03 1.290e+04 0.595 0.552052
## FoundationWood -8.403e+03 1.764e+04 -0.476 0.633919
## BsmtQualFa -2.507e+04 7.412e+03 -3.383 0.000738 ***
## BsmtQualGd -2.471e+04 3.983e+03 -6.204 7.36e-10 ***
## BsmtQualTA -2.382e+04 4.896e+03 -4.865 1.29e-06 ***
## BsmtExposureGd 1.915e+04 3.515e+03 5.447 6.11e-08 ***
## BsmtExposureMn -2.623e+03 3.621e+03 -0.724 0.468965
## BsmtExposureNo -8.524e+03 2.610e+03 -3.266 0.001120 **
## BsmtFinSF1 6.354e+00 5.264e+00 1.207 0.227614
## BsmtUnfSF 4.160e-01 5.485e+00 0.076 0.939548
## TotalBsmtSF 1.913e+00 6.484e+00 0.295 0.768011
## HeatingQCFa -1.008e+03 5.078e+03 -0.198 0.842717
## HeatingQCGd -3.934e+03 2.478e+03 -1.587 0.112646
## HeatingQCPo -6.254e+03 3.237e+04 -0.193 0.846832
## HeatingQCTA -3.220e+03 2.440e+03 -1.319 0.187237
## CentralAirY 9.548e+02 4.329e+03 0.221 0.825488
## ElectricalFuseF 1.475e+03 6.825e+03 0.216 0.828910
## ElectricalFuseP -1.339e+03 2.027e+04 -0.066 0.947346
## ElectricalMix -9.648e+03 3.090e+04 -0.312 0.754918
## ElectricalSBrkr -1.201e+03 3.452e+03 -0.348 0.727933
## X1stFlrSF -1.632e+01 2.145e+01 -0.761 0.446804
## X2ndFlrSF 1.312e+01 2.059e+01 0.637 0.523934
## GrLivArea 5.313e+01 2.099e+01 2.531 0.011488 *
## BsmtFullBath 8.490e+03 2.246e+03 3.781 0.000164 ***
## BsmtHalfBath 3.777e+03 3.527e+03 1.071 0.284372
## FullBath 7.689e+03 2.578e+03 2.982 0.002916 **
## HalfBath 4.275e+03 2.445e+03 1.749 0.080602 .
## BedroomAbvGr -2.105e+03 1.594e+03 -1.321 0.186762
## KitchenQualFa -2.744e+04 7.278e+03 -3.770 0.000170 ***
## KitchenQualGd -2.875e+04 4.144e+03 -6.937 6.29e-12 ***
## KitchenQualTA -2.954e+04 4.668e+03 -6.328 3.41e-10 ***
## TotRmsAbvGrd 1.884e+03 1.088e+03 1.731 0.083601 .
## Fireplaces 4.615e+03 1.590e+03 2.902 0.003768 **
## GarageTypeAttchd 2.390e+04 1.279e+04 1.869 0.061829 .
## GarageTypeBasment 2.708e+04 1.469e+04 1.843 0.065515 .
## GarageTypeBuiltIn 1.846e+04 1.332e+04 1.386 0.166031
## GarageTypeCarPort 1.996e+04 1.639e+04 1.218 0.223551
## GarageTypeDetchd 2.291e+04 1.266e+04 1.810 0.070585 .
## GarageYrBlt 5.123e+01 6.271e+01 0.817 0.414053
## GarageCars 1.185e+04 2.574e+03 4.603 4.56e-06 ***
## GarageArea -7.653e+00 9.097e+00 -0.841 0.400350
## PavedDriveP -9.404e+02 6.436e+03 -0.146 0.883842
## PavedDriveY 2.286e+03 3.938e+03 0.580 0.561719
## WoodDeckSF 1.360e+01 6.803e+00 1.998 0.045889 *
## MoSold -4.212e+02 2.928e+02 -1.439 0.150530
## YrSold -6.068e+02 6.074e+02 -0.999 0.317915
## SaleTypeCon 2.461e+04 2.156e+04 1.141 0.253931
## SaleTypeConLD 1.572e+04 1.160e+04 1.356 0.175415
## SaleTypeConLI 1.116e+04 1.404e+04 0.794 0.427078
## SaleTypeConLw -3.278e+03 1.434e+04 -0.229 0.819258
## SaleTypeCWD 1.595e+04 1.559e+04 1.023 0.306689
## SaleTypeNew 3.341e+04 1.837e+04 1.819 0.069178 .
## SaleTypeOth 1.403e+04 1.766e+04 0.794 0.427169
## SaleTypeWD -3.837e+02 5.046e+03 -0.076 0.939395
## SaleConditionAdjLand 1.984e+04 1.681e+04 1.180 0.238160
## SaleConditionAlloca 1.841e+03 9.968e+03 0.185 0.853526
## SaleConditionFamily 1.154e+03 7.362e+03 0.157 0.875505
## SaleConditionNormal 6.541e+03 3.412e+03 1.917 0.055449 .
## SaleConditionPartial -1.804e+04 1.768e+04 -1.021 0.307600
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28160 on 1305 degrees of freedom
## Multiple R-squared: 0.8876, Adjusted R-squared: 0.8743
## F-statistic: 66.9 on 154 and 1305 DF, p-value: < 2.2e-16
## GVIF Df GVIF^(1/(2*Df))
## Id 1.134866 1 1.065301
## MSSubClass 30.599778 1 5.531707
## MSZoning 47.073460 4 1.618442
## LotFrontage 2.176681 1 1.475358
## LotArea 1.765511 1 1.328725
## LotShape 2.403450 3 1.157371
## LotConfig 2.221529 4 1.104922
## Neighborhood 98310.333590 24 1.270611
## Condition1 4.602658 8 1.100115
## BldgType 139.806280 4 1.854346
## HouseStyle 280.975519 7 1.495908
## OverallQual 4.787143 1 2.187954
## OverallCond 2.262404 1 1.504129
## YearBuilt 12.467703 1 3.530963
## YearRemodAdd 3.327361 1 1.824106
## RoofStyle 3.528657 5 1.134386
## Exterior1st 56.816311 14 1.155207
## MasVnrType 4.995470 3 1.307463
## MasVnrArea 2.822300 1 1.679970
## ExterQual 12.967330 3 1.532763
## ExterCond 2.511153 4 1.121977
## Foundation 19.183677 5 1.343672
## BsmtQual 10.687809 3 1.484162
## BsmtExposure 2.761571 3 1.184475
## BsmtFinSF1 10.602217 1 3.256105
## BsmtUnfSF 10.802945 1 3.286783
## TotalBsmtSF 14.882510 1 3.857786
## HeatingQC 4.366549 4 1.202312
## CentralAir 2.098587 1 1.448650
## Electrical 3.303583 4 1.161109
## X1stFlrSF 126.440297 1 11.244567
## X2ndFlrSF 148.555002 1 12.188314
## GrLivArea 223.790092 1 14.959615
## BsmtFullBath 2.497740 1 1.580424
## BsmtHalfBath 1.304203 1 1.142017
## FullBath 3.710723 1 1.926324
## HalfBath 2.780867 1 1.667593
## BedroomAbvGr 3.108065 1 1.762970
## KitchenQual 8.168256 3 1.419128
## TotRmsAbvGrd 5.753167 1 2.398576
## Fireplaces 1.933016 1 1.390329
## GarageType 6.962789 5 1.214167
## GarageYrBlt 4.163934 1 2.040572
## GarageCars 6.804504 1 2.608544
## GarageArea 6.957391 1 2.637687
## PavedDrive 1.950208 2 1.181735
## WoodDeckSF 1.337436 1 1.156476
## MoSold 1.152343 1 1.073472
## YrSold 1.196749 1 1.093960
## SaleType 117.611721 8 1.347110
## SaleCondition 118.967277 5 1.612660
##
## Call:
## lm(formula = SalePrice ~ ., data = clean.train %>% dplyr::select(MSZoning,
## LotFrontage, LotArea, LotShape, LotConfig, Neighborhood,
## BldgType, HouseStyle, OverallQual, OverallCond, ExterQual,
## BsmtQual, BsmtExposure, Exterior2nd, BsmtFullBath, FullBath,
## HalfBath, KitchenQual, Fireplaces, GarageType, GarageCars,
## WoodDeckSF, SaleCondition, SalePrice))
##
## Residuals:
## Min 1Q Median 3Q Max
## -248212 -14995 -606 13312 267801
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.310e+04 2.555e+04 1.686 0.091929 .
## MSZoningFV 2.330e+04 1.483e+04 1.571 0.116365
## MSZoningRH 2.521e+04 1.490e+04 1.692 0.090904 .
## MSZoningRL 1.835e+04 1.249e+04 1.470 0.141836
## MSZoningRM 1.627e+04 1.161e+04 1.401 0.161456
## LotFrontage -5.784e+00 5.074e+01 -0.114 0.909256
## LotArea 4.781e-01 1.043e-01 4.582 5.02e-06 ***
## LotShapeIR2 8.629e+03 5.344e+03 1.615 0.106620
## LotShapeIR3 -2.102e+04 1.070e+04 -1.965 0.049624 *
## LotShapeReg 7.717e+02 2.047e+03 0.377 0.706227
## LotConfigCulDSac 9.516e+03 4.165e+03 2.285 0.022471 *
## LotConfigFR2 -1.619e+04 5.197e+03 -3.115 0.001879 **
## LotConfigFR3 -2.453e+04 1.603e+04 -1.531 0.126124
## LotConfigInside -1.641e+03 2.259e+03 -0.726 0.467845
## NeighborhoodBlueste -1.578e+04 2.428e+04 -0.650 0.515747
## NeighborhoodBrDale -1.051e+03 1.384e+04 -0.076 0.939491
## NeighborhoodBrkSide -6.446e+03 1.126e+04 -0.572 0.567232
## NeighborhoodClearCr -6.812e+03 1.116e+04 -0.610 0.541809
## NeighborhoodCollgCr -7.046e+03 9.064e+03 -0.777 0.437075
## NeighborhoodCrawfor 2.248e+04 1.044e+04 2.153 0.031509 *
## NeighborhoodEdwards -1.923e+04 9.933e+03 -1.936 0.053106 .
## NeighborhoodGilbert -2.156e+04 9.671e+03 -2.229 0.025984 *
## NeighborhoodIDOTRR -1.525e+04 1.293e+04 -1.179 0.238461
## NeighborhoodMeadowV -3.178e+02 1.389e+04 -0.023 0.981752
## NeighborhoodMitchel -1.550e+04 1.022e+04 -1.517 0.129610
## NeighborhoodNAmes -8.859e+03 9.639e+03 -0.919 0.358209
## NeighborhoodNoRidge 6.564e+04 1.032e+04 6.362 2.71e-10 ***
## NeighborhoodNPkVill -3.902e+03 1.592e+04 -0.245 0.806480
## NeighborhoodNridgHt 2.630e+04 9.210e+03 2.855 0.004363 **
## NeighborhoodNWAmes -1.318e+04 9.950e+03 -1.324 0.185564
## NeighborhoodOldTown -1.751e+04 1.152e+04 -1.520 0.128749
## NeighborhoodSawyer -1.066e+04 1.017e+04 -1.049 0.294496
## NeighborhoodSawyerW -4.199e+03 9.762e+03 -0.430 0.667183
## NeighborhoodSomerst 1.641e+03 1.101e+04 0.149 0.881550
## NeighborhoodStoneBr 4.782e+04 1.043e+04 4.584 4.99e-06 ***
## NeighborhoodSWISU -1.793e+04 1.184e+04 -1.515 0.130072
## NeighborhoodTimber -1.251e+04 1.023e+04 -1.223 0.221597
## NeighborhoodVeenker 8.756e+03 1.319e+04 0.664 0.506866
## BldgType2fmCon -9.621e+03 6.171e+03 -1.559 0.119182
## BldgTypeDuplex -1.347e+04 5.258e+03 -2.561 0.010540 *
## BldgTypeTwnhs -3.453e+04 7.012e+03 -4.925 9.46e-07 ***
## BldgTypeTwnhsE -3.222e+04 4.633e+03 -6.956 5.43e-12 ***
## HouseStyle1.5Unf -1.620e+04 8.919e+03 -1.817 0.069483 .
## HouseStyle1Story 7.243e+01 3.433e+03 0.021 0.983170
## HouseStyle2.5Fin 2.979e+04 1.188e+04 2.507 0.012276 *
## HouseStyle2.5Unf -3.748e+03 1.018e+04 -0.368 0.712885
## HouseStyle2Story -3.963e+03 3.518e+03 -1.126 0.260251
## HouseStyleSFoyer -1.633e+04 6.970e+03 -2.343 0.019288 *
## HouseStyleSLvl -1.182e+04 5.409e+03 -2.186 0.028965 *
## OverallQual 1.318e+04 1.173e+03 11.232 < 2e-16 ***
## OverallCond 4.562e+03 8.885e+02 5.135 3.24e-07 ***
## ExterQualFa -2.018e+04 1.221e+04 -1.654 0.098452 .
## ExterQualGd -1.889e+04 6.017e+03 -3.139 0.001730 **
## ExterQualTA -2.333e+04 6.636e+03 -3.515 0.000454 ***
## BsmtQualFa -4.011e+04 7.510e+03 -5.340 1.09e-07 ***
## BsmtQualGd -3.194e+04 4.203e+03 -7.598 5.56e-14 ***
## BsmtQualTA -3.222e+04 5.016e+03 -6.423 1.84e-10 ***
## BsmtExposureGd 1.963e+04 3.726e+03 5.268 1.60e-07 ***
## BsmtExposureMn -3.217e+03 3.871e+03 -0.831 0.406110
## BsmtExposureNo -9.152e+03 2.789e+03 -3.282 0.001057 **
## Exterior2ndAsphShn 3.176e+03 1.958e+04 0.162 0.871184
## Exterior2ndBrk Cmn -1.180e+04 1.707e+04 -0.691 0.489370
## Exterior2ndBrkFace 9.101e+03 9.814e+03 0.927 0.353913
## Exterior2ndCBlock 8.213e+03 3.376e+04 0.243 0.807857
## Exterior2ndCmentBd -3.168e+03 9.098e+03 -0.348 0.727737
## Exterior2ndHdBoard -8.118e+03 7.744e+03 -1.048 0.294721
## Exterior2ndImStucc 1.720e+04 1.260e+04 1.365 0.172347
## Exterior2ndMetalSd -4.801e+03 7.538e+03 -0.637 0.524293
## Exterior2ndOther -1.592e+04 3.281e+04 -0.485 0.627453
## Exterior2ndPlywood -8.120e+03 7.937e+03 -1.023 0.306451
## Exterior2ndStone -2.758e+04 1.617e+04 -1.706 0.088269 .
## Exterior2ndStucco -1.327e+04 9.625e+03 -1.379 0.168063
## Exterior2ndVinylSd -4.169e+03 7.588e+03 -0.549 0.582869
## Exterior2ndWd Sdng -7.870e+03 7.556e+03 -1.041 0.297850
## Exterior2ndWd Shng -2.069e+04 8.809e+03 -2.348 0.019000 *
## BsmtFullBath 1.208e+04 1.833e+03 6.591 6.23e-11 ***
## FullBath 2.636e+04 2.353e+03 11.206 < 2e-16 ***
## HalfBath 1.362e+04 2.464e+03 5.526 3.92e-08 ***
## KitchenQualFa -3.654e+04 7.507e+03 -4.867 1.27e-06 ***
## KitchenQualGd -3.290e+04 4.428e+03 -7.431 1.89e-13 ***
## KitchenQualTA -3.501e+04 4.935e+03 -7.094 2.08e-12 ***
## Fireplaces 9.711e+03 1.603e+03 6.059 1.77e-09 ***
## GarageTypeAttchd 2.587e+04 1.349e+04 1.918 0.055367 .
## GarageTypeBasment 2.871e+04 1.551e+04 1.851 0.064360 .
## GarageTypeBuiltIn 3.408e+04 1.396e+04 2.441 0.014785 *
## GarageTypeCarPort 8.277e+03 1.703e+04 0.486 0.627087
## GarageTypeDetchd 2.263e+04 1.337e+04 1.692 0.090819 .
## GarageCars 1.362e+04 1.704e+03 7.990 2.85e-15 ***
## WoodDeckSF 2.172e+01 7.277e+00 2.985 0.002887 **
## SaleConditionAdjLand 8.243e+03 1.652e+04 0.499 0.618002
## SaleConditionAlloca 8.191e+03 1.032e+04 0.794 0.427569
## SaleConditionFamily -3.102e+02 7.779e+03 -0.040 0.968201
## SaleConditionNormal 3.894e+03 3.380e+03 1.152 0.249403
## SaleConditionPartial 1.295e+04 4.779e+03 2.709 0.006826 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30820 on 1366 degrees of freedom
## Multiple R-squared: 0.8591, Adjusted R-squared: 0.8495
## F-statistic: 89.55 on 93 and 1366 DF, p-value: < 2.2e-16
## GVIF Df GVIF^(1/(2*Df))
## MSZoning 32.569809 4 1.545617
## LotFrontage 1.920814 1 1.385934
## LotArea 1.665909 1 1.290701
## LotShape 1.945047 3 1.117262
## LotConfig 1.746427 4 1.072183
## Neighborhood 10660.912419 24 1.213144
## BldgType 7.163336 4 1.279056
## HouseStyle 9.279987 7 1.172494
## OverallQual 4.042981 1 2.010717
## OverallCond 1.501641 1 1.225415
## ExterQual 9.121737 3 1.445483
## BsmtQual 7.227337 3 1.390475
## BsmtExposure 2.273881 3 1.146730
## Exterior2nd 26.239573 15 1.115061
## BsmtFullBath 1.389493 1 1.178768
## FullBath 2.580369 1 1.606353
## HalfBath 2.358272 1 1.535667
## KitchenQual 5.836024 3 1.341795
## Fireplaces 1.639886 1 1.280580
## GarageType 3.766809 5 1.141819
## GarageCars 2.491990 1 1.578604
## WoodDeckSF 1.277969 1 1.130473
## SaleCondition 2.386563 5 1.090881
Of the two models, Model.2 has the largest f-statistic (89.55),no apparent Multi-collinearity issues, and adjusted R2 squared value (0.8495) that is only slightly smaller than Model.1 (0.8743). The predicted target wins for Model.2 appears to be the tighter fit graph (please view the above box plots).
Kaggle Submission