Problem 1
I chose to use X4 and Y4
1.1
1.1.a
\[P(X > x | Y > y) \\ \frac{P(X , Y)}{P(Y)} \] The above reads as the probability that an X value is greater than the 3rd quartile of the X values given that the Y value is greater than the 1st quartile of the Y values
## [1] 0.2
This means that when Y values are greater than the 1st quartile of Y, there is a 20% chance that the X is greater than the 3rd quartile of X
1.1.b
\(P(X > x, Y > y)\)
The above read as the probability that when X is greater than the 3rd quartile of X, Y is also greater than the 1st quartile of Y
## [1] 0.15
The probability of both of these events occuring is 15%
1.1.c
\[P(X < x | Y > y) \\ \frac{P(X, Y)}{P(Y)} \] The above reads as the probability that X is less than the 3rd quartile of X, given that Y is > than the 1st quartile of Y
## [1] 0.8
This means that there is an 80% probability that when Y is greater than the 1st quartile of Y, X is less than the 3rd quartile of X
I am adding the below just in case the probability was supposed to read less than or equal to \(P(X \leq x | Y > y)\)
## [1] 0.8
There is also an 80% chance of this occuring
1.2
## X/Y X<=x X>x Total
## 1 Y<=y 0.15 0.10 0.25
## 2 Y>y 0.60 0.15 0.75
## 3 Total 0.75 0.25 1.00
1.3
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: test
## X-squared = 0.8, df = 1, p-value = 0.3711
The chi-square test’s p-value of .3711 shows that we X & Y are not independent due to the failure to reach significance at .05.
Problem 2
## Parsed with column specification:
## cols(
## .default = col_character(),
## Id = col_integer(),
## MSSubClass = col_integer(),
## LotFrontage = col_integer(),
## LotArea = col_integer(),
## OverallQual = col_integer(),
## OverallCond = col_integer(),
## YearBuilt = col_integer(),
## YearRemodAdd = col_integer(),
## MasVnrArea = col_integer(),
## BsmtFinSF1 = col_integer(),
## BsmtFinSF2 = col_integer(),
## BsmtUnfSF = col_integer(),
## TotalBsmtSF = col_integer(),
## `1stFlrSF` = col_integer(),
## `2ndFlrSF` = col_integer(),
## LowQualFinSF = col_integer(),
## GrLivArea = col_integer(),
## BsmtFullBath = col_integer(),
## BsmtHalfBath = col_integer(),
## FullBath = col_integer()
## # ... with 18 more columns
## )
## See spec(...) for full column specifications.
## Parsed with column specification:
## cols(
## .default = col_character(),
## Id = col_integer(),
## MSSubClass = col_integer(),
## LotFrontage = col_integer(),
## LotArea = col_integer(),
## OverallQual = col_integer(),
## OverallCond = col_integer(),
## YearBuilt = col_integer(),
## YearRemodAdd = col_integer(),
## MasVnrArea = col_integer(),
## BsmtFinSF1 = col_integer(),
## BsmtFinSF2 = col_integer(),
## BsmtUnfSF = col_integer(),
## TotalBsmtSF = col_integer(),
## `1stFlrSF` = col_integer(),
## `2ndFlrSF` = col_integer(),
## LowQualFinSF = col_integer(),
## GrLivArea = col_integer(),
## BsmtFullBath = col_integer(),
## BsmtHalfBath = col_integer(),
## FullBath = col_integer()
## # ... with 17 more columns
## )
## See spec(...) for full column specifications.
2a
EDA
Correlation
Most correlated variables - OverallQual, GrLivArea, GarageCars are the most highly correlated variables with SalePrice
Correlation Matrix
## SalePrice OverallQual GrLivArea
## SalePrice 1.0000000 0.7909816 0.7086245
## OverallQual 0.7909816 1.0000000 0.5930074
## GrLivArea 0.7086245 0.5930074 1.0000000
Scatterplot Matrix
Hypothesis Test
GrLivArea & Saleprice
##
## Pearson's product-moment correlation
##
## data: train$GrLivArea and train$SalePrice
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.6915087 0.7249450
## sample estimates:
## cor
## 0.7086245
- The correlation between GrLivArea and SalePrice is not zero and the test shows the correlation to be ~.7086 and the p-value is very small showing statistical significance and thereby rejecting the null hypothesis
- The confidence interval for GrLivArea and SalePrice is ~.6915 - .7249
GarageCars & Saleprice
##
## Pearson's product-moment correlation
##
## data: train$GarageCars and train$SalePrice
## t = 31.839, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.6201771 0.6597899
## sample estimates:
## cor
## 0.6404092
- The correlation between GarageCars and SalePrice is not zero and the test shows the correlation to be ~.6404 and the p-value is very small showing statistical significance and thereby rejecting the null hypothesis
- The confidence interval for GarageCars and SalePrice is ~.6202 - .6598
OverallQual & Saleprice
##
## Pearson's product-moment correlation
##
## data: train$OverallQual and train$SalePrice
## t = 49.364, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.7780752 0.8032204
## sample estimates:
## cor
## 0.7909816
- The correlation between OverallQual and SalePrice is not zero and the test shows the correlation to be ~.7910 and the p-value is very small showing statistical significance and thereby rejecting the null hypothesis
- The confidence interval for OverallQual and SalePrice is ~.7781 - .8032
Family-Wise Error
## [1] 0.488
I am worried about a family-wise error given that the FWER is .488 very close to a 50% chance
2b
Invert Correlation Matrix
## SalePrice OverallQual GrLivArea
## SalePrice 1.0000000 0.7909816 0.7086245
## OverallQual 0.7909816 1.0000000 0.5930074
## GrLivArea 0.7086245 0.5930074 1.0000000
Multiply by Precision Matrix
## SalePrice OverallQual GrLivArea
## SalePrice 2.127801 2.002183 1.886307
## OverallQual 2.002183 1.977310 1.746524
## GrLivArea 1.886307 1.746524 1.853806
## SalePrice OverallQual GrLivArea
## SalePrice 2.127801 2.002183 1.886307
## OverallQual 2.002183 1.977310 1.746524
## GrLivArea 1.886307 1.746524 1.853806
LU Decomposition
## $L
## SalePrice OverallQual GrLivArea
## SalePrice 1.0000000 0.0000000 0
## OverallQual 0.9409636 1.0000000 0
## GrLivArea 0.8865055 -0.3045399 1
##
## $U
## SalePrice OverallQual GrLivArea
## SalePrice 2.127801 2.00218278 1.8863069
## OverallQual 0.000000 0.09332866 -0.0284223
## GrLivArea 0.000000 0.00000000 0.1729292
2c
## [1] 1300
## [1] 1e-09
Fitdistr
## rate
## 0.0001084972
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
CDF 5, 95%
## rate
## 472.7615
## rate
## 27611.15
Empirical 5, 95%
## 5%
## 2011.7
## 95%
## 16101.15
Confidence Interval
## [1] 8704.418 9729.238
Summary Table
The exponential distribution does not do a good job of estimating the actual data. The discrepancies/variance between the percentiles is vast. The bootstrapped confidence interval is a better estimator although still far from perfect.
2d
Clean Dataset
## missForest iteration 1 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
## missForest iteration 1 in progress...done!
## missForest iteration 2 in progress...done!
## missForest iteration 3 in progress...done!
## missForest iteration 4 in progress...done!
Regression Subsetting
## Reordering variables and trying again:
## Subset selection object
## Call: regsubsets.formula(SalePrice ~ ., data = newDfs[[1]], method = "exhaustive",
## nvmax = NULL, nbest = 1, really.big = T)
## 36 Variables (and intercept)
## Forced in Forced out
## MSSubClass FALSE FALSE
## LotFrontage FALSE FALSE
## LotArea FALSE FALSE
## OverallQual FALSE FALSE
## OverallCond FALSE FALSE
## YearBuilt FALSE FALSE
## YearRemodAdd FALSE FALSE
## MasVnrArea FALSE FALSE
## BsmtFinSF1 FALSE FALSE
## BsmtFinSF2 FALSE FALSE
## BsmtUnfSF FALSE FALSE
## `1stFlrSF` FALSE FALSE
## `2ndFlrSF` FALSE FALSE
## LowQualFinSF FALSE FALSE
## BsmtFullBath FALSE FALSE
## BsmtHalfBath FALSE FALSE
## FullBath FALSE FALSE
## HalfBath FALSE FALSE
## BedroomAbvGr FALSE FALSE
## KitchenAbvGr FALSE FALSE
## TotRmsAbvGrd FALSE FALSE
## Fireplaces FALSE FALSE
## GarageYrBlt FALSE FALSE
## GarageCars FALSE FALSE
## GarageArea FALSE FALSE
## WoodDeckSF FALSE FALSE
## OpenPorchSF FALSE FALSE
## EnclosedPorch FALSE FALSE
## `3SsnPorch` FALSE FALSE
## ScreenPorch FALSE FALSE
## PoolArea FALSE FALSE
## MiscVal FALSE FALSE
## MoSold FALSE FALSE
## YrSold FALSE FALSE
## TotalBsmtSF FALSE FALSE
## GrLivArea FALSE FALSE
## 1 subsets of each size up to 34
## Selection Algorithm: exhaustive
## MSSubClass LotFrontage LotArea OverallQual OverallCond YearBuilt
## 1 ( 1 ) " " " " " " "*" " " " "
## 2 ( 1 ) " " " " " " "*" " " " "
## 3 ( 1 ) " " " " " " "*" " " " "
## 4 ( 1 ) " " " " " " "*" " " " "
## 5 ( 1 ) "*" " " " " "*" " " " "
## 6 ( 1 ) "*" " " " " "*" " " "*"
## 7 ( 1 ) "*" " " " " "*" " " " "
## 8 ( 1 ) "*" " " " " "*" "*" "*"
## 9 ( 1 ) "*" " " " " "*" "*" "*"
## 10 ( 1 ) "*" " " "*" "*" "*" "*"
## 11 ( 1 ) "*" " " "*" "*" "*" "*"
## 12 ( 1 ) "*" " " "*" "*" "*" "*"
## 13 ( 1 ) "*" " " "*" "*" "*" "*"
## 14 ( 1 ) "*" " " "*" "*" "*" "*"
## 15 ( 1 ) "*" " " "*" "*" "*" "*"
## 16 ( 1 ) "*" " " "*" "*" "*" "*"
## 17 ( 1 ) "*" " " "*" "*" "*" "*"
## 18 ( 1 ) "*" " " "*" "*" "*" "*"
## 19 ( 1 ) "*" " " "*" "*" "*" "*"
## 20 ( 1 ) "*" "*" "*" "*" "*" "*"
## 21 ( 1 ) "*" "*" "*" "*" "*" "*"
## 22 ( 1 ) "*" "*" "*" "*" "*" "*"
## 23 ( 1 ) "*" "*" "*" "*" "*" "*"
## 24 ( 1 ) "*" "*" "*" "*" "*" "*"
## 25 ( 1 ) "*" "*" "*" "*" "*" "*"
## 26 ( 1 ) "*" "*" "*" "*" "*" "*"
## 27 ( 1 ) "*" "*" "*" "*" "*" "*"
## 28 ( 1 ) "*" "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" "*" "*" "*" "*" "*"
## 31 ( 1 ) "*" "*" "*" "*" "*" "*"
## 32 ( 1 ) "*" "*" "*" "*" "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*" "*"
## YearRemodAdd MasVnrArea BsmtFinSF1 BsmtFinSF2 BsmtUnfSF
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " "*" " " " "
## 4 ( 1 ) " " " " "*" " " " "
## 5 ( 1 ) " " " " "*" " " " "
## 6 ( 1 ) " " " " "*" " " " "
## 7 ( 1 ) "*" "*" "*" " " " "
## 8 ( 1 ) " " " " "*" " " " "
## 9 ( 1 ) " " "*" " " " " " "
## 10 ( 1 ) " " "*" " " " " " "
## 11 ( 1 ) " " "*" " " " " " "
## 12 ( 1 ) " " "*" "*" " " " "
## 13 ( 1 ) " " "*" "*" " " " "
## 14 ( 1 ) " " "*" "*" " " " "
## 15 ( 1 ) " " "*" "*" " " " "
## 16 ( 1 ) "*" "*" "*" " " " "
## 17 ( 1 ) "*" "*" "*" " " " "
## 18 ( 1 ) "*" "*" "*" " " " "
## 19 ( 1 ) "*" "*" "*" " " " "
## 20 ( 1 ) "*" "*" "*" " " " "
## 21 ( 1 ) "*" "*" "*" " " " "
## 22 ( 1 ) "*" "*" "*" " " " "
## 23 ( 1 ) "*" "*" "*" " " " "
## 24 ( 1 ) "*" "*" "*" " " " "
## 25 ( 1 ) "*" "*" "*" " " " "
## 26 ( 1 ) "*" "*" "*" " " " "
## 27 ( 1 ) "*" "*" "*" " " " "
## 28 ( 1 ) "*" "*" "*" " " " "
## 29 ( 1 ) "*" "*" "*" " " " "
## 30 ( 1 ) "*" "*" " " "*" "*"
## 31 ( 1 ) "*" "*" "*" "*" "*"
## 32 ( 1 ) "*" "*" "*" "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*"
## TotalBsmtSF `1stFlrSF` `2ndFlrSF` LowQualFinSF GrLivArea
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " "*"
## 3 ( 1 ) " " " " " " " " "*"
## 4 ( 1 ) " " " " " " " " "*"
## 5 ( 1 ) " " " " " " " " "*"
## 6 ( 1 ) " " " " " " " " "*"
## 7 ( 1 ) " " " " " " " " "*"
## 8 ( 1 ) " " " " " " " " "*"
## 9 ( 1 ) " " " " " " " " "*"
## 10 ( 1 ) " " " " " " " " "*"
## 11 ( 1 ) "*" " " " " " " "*"
## 12 ( 1 ) " " " " " " " " "*"
## 13 ( 1 ) " " " " " " " " "*"
## 14 ( 1 ) " " " " " " " " "*"
## 15 ( 1 ) "*" " " " " " " "*"
## 16 ( 1 ) "*" " " " " " " "*"
## 17 ( 1 ) "*" " " " " " " "*"
## 18 ( 1 ) "*" " " " " " " "*"
## 19 ( 1 ) "*" " " " " " " "*"
## 20 ( 1 ) "*" " " " " " " "*"
## 21 ( 1 ) "*" " " " " " " "*"
## 22 ( 1 ) "*" " " " " " " "*"
## 23 ( 1 ) "*" " " " " "*" "*"
## 24 ( 1 ) "*" " " " " "*" "*"
## 25 ( 1 ) "*" " " " " "*" "*"
## 26 ( 1 ) "*" " " " " "*" "*"
## 27 ( 1 ) "*" " " " " "*" "*"
## 28 ( 1 ) "*" " " " " "*" "*"
## 29 ( 1 ) "*" " " " " "*" "*"
## 30 ( 1 ) "*" " " " " "*" "*"
## 31 ( 1 ) " " " " " " "*" "*"
## 32 ( 1 ) " " " " " " "*" "*"
## 33 ( 1 ) " " " " " " "*" "*"
## 34 ( 1 ) " " "*" "*" "*" " "
## BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " "*"
## 9 ( 1 ) "*" " " " " " " "*"
## 10 ( 1 ) "*" " " " " " " "*"
## 11 ( 1 ) "*" " " " " " " "*"
## 12 ( 1 ) "*" " " " " " " "*"
## 13 ( 1 ) "*" " " " " " " "*"
## 14 ( 1 ) "*" " " " " " " "*"
## 15 ( 1 ) "*" " " " " " " "*"
## 16 ( 1 ) "*" " " " " " " "*"
## 17 ( 1 ) "*" " " " " " " "*"
## 18 ( 1 ) "*" " " " " " " "*"
## 19 ( 1 ) "*" " " "*" " " "*"
## 20 ( 1 ) "*" " " "*" " " "*"
## 21 ( 1 ) "*" " " "*" " " "*"
## 22 ( 1 ) "*" " " "*" " " "*"
## 23 ( 1 ) "*" " " "*" " " "*"
## 24 ( 1 ) "*" " " "*" "*" "*"
## 25 ( 1 ) "*" " " "*" "*" "*"
## 26 ( 1 ) "*" " " "*" "*" "*"
## 27 ( 1 ) "*" " " "*" "*" "*"
## 28 ( 1 ) "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" "*" "*" "*" "*"
## 31 ( 1 ) "*" "*" "*" "*" "*"
## 32 ( 1 ) "*" "*" "*" "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*"
## KitchenAbvGr TotRmsAbvGrd Fireplaces GarageYrBlt GarageCars
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " "*"
## 5 ( 1 ) " " " " " " " " "*"
## 6 ( 1 ) " " " " " " " " "*"
## 7 ( 1 ) " " " " " " " " "*"
## 8 ( 1 ) " " " " " " " " "*"
## 9 ( 1 ) " " " " " " " " "*"
## 10 ( 1 ) " " " " " " " " "*"
## 11 ( 1 ) " " " " " " " " "*"
## 12 ( 1 ) " " "*" " " " " "*"
## 13 ( 1 ) " " "*" " " " " "*"
## 14 ( 1 ) " " "*" " " " " "*"
## 15 ( 1 ) " " "*" " " " " "*"
## 16 ( 1 ) " " "*" " " " " "*"
## 17 ( 1 ) "*" "*" " " " " "*"
## 18 ( 1 ) "*" "*" "*" " " "*"
## 19 ( 1 ) "*" "*" "*" " " "*"
## 20 ( 1 ) "*" "*" "*" " " "*"
## 21 ( 1 ) "*" "*" "*" " " "*"
## 22 ( 1 ) "*" "*" "*" " " "*"
## 23 ( 1 ) "*" "*" "*" " " "*"
## 24 ( 1 ) "*" "*" "*" " " "*"
## 25 ( 1 ) "*" "*" "*" " " "*"
## 26 ( 1 ) "*" "*" "*" " " "*"
## 27 ( 1 ) "*" "*" "*" "*" "*"
## 28 ( 1 ) "*" "*" "*" "*" "*"
## 29 ( 1 ) "*" "*" "*" "*" "*"
## 30 ( 1 ) "*" "*" "*" "*" "*"
## 31 ( 1 ) "*" "*" "*" "*" "*"
## 32 ( 1 ) "*" "*" "*" "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*"
## GarageArea WoodDeckSF OpenPorchSF EnclosedPorch `3SsnPorch`
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## 12 ( 1 ) " " " " " " " " " "
## 13 ( 1 ) " " "*" " " " " " "
## 14 ( 1 ) " " "*" " " " " " "
## 15 ( 1 ) " " "*" " " " " " "
## 16 ( 1 ) " " "*" " " " " " "
## 17 ( 1 ) " " "*" " " " " " "
## 18 ( 1 ) " " "*" " " " " " "
## 19 ( 1 ) " " "*" " " " " " "
## 20 ( 1 ) " " "*" " " " " " "
## 21 ( 1 ) " " "*" " " " " " "
## 22 ( 1 ) " " "*" " " " " " "
## 23 ( 1 ) " " "*" " " " " " "
## 24 ( 1 ) " " "*" " " " " " "
## 25 ( 1 ) " " "*" " " "*" " "
## 26 ( 1 ) " " "*" " " "*" "*"
## 27 ( 1 ) " " "*" " " "*" "*"
## 28 ( 1 ) " " "*" " " "*" "*"
## 29 ( 1 ) " " "*" " " "*" "*"
## 30 ( 1 ) " " "*" " " "*" "*"
## 31 ( 1 ) " " "*" " " "*" "*"
## 32 ( 1 ) "*" "*" " " "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*"
## ScreenPorch PoolArea MiscVal MoSold YrSold
## 1 ( 1 ) " " " " " " " " " "
## 2 ( 1 ) " " " " " " " " " "
## 3 ( 1 ) " " " " " " " " " "
## 4 ( 1 ) " " " " " " " " " "
## 5 ( 1 ) " " " " " " " " " "
## 6 ( 1 ) " " " " " " " " " "
## 7 ( 1 ) " " " " " " " " " "
## 8 ( 1 ) " " " " " " " " " "
## 9 ( 1 ) " " " " " " " " " "
## 10 ( 1 ) " " " " " " " " " "
## 11 ( 1 ) " " " " " " " " " "
## 12 ( 1 ) " " " " " " " " " "
## 13 ( 1 ) " " " " " " " " " "
## 14 ( 1 ) "*" " " " " " " " "
## 15 ( 1 ) "*" " " " " " " " "
## 16 ( 1 ) "*" " " " " " " " "
## 17 ( 1 ) "*" " " " " " " " "
## 18 ( 1 ) "*" " " " " " " " "
## 19 ( 1 ) "*" " " " " " " " "
## 20 ( 1 ) "*" " " " " " " " "
## 21 ( 1 ) "*" "*" " " " " " "
## 22 ( 1 ) "*" "*" " " " " "*"
## 23 ( 1 ) "*" "*" " " " " "*"
## 24 ( 1 ) "*" "*" " " " " "*"
## 25 ( 1 ) "*" "*" " " " " "*"
## 26 ( 1 ) "*" "*" " " " " "*"
## 27 ( 1 ) "*" "*" " " " " "*"
## 28 ( 1 ) "*" "*" " " " " "*"
## 29 ( 1 ) "*" "*" "*" " " "*"
## 30 ( 1 ) "*" "*" "*" " " "*"
## 31 ( 1 ) "*" "*" "*" "*" "*"
## 32 ( 1 ) "*" "*" "*" "*" "*"
## 33 ( 1 ) "*" "*" "*" "*" "*"
## 34 ( 1 ) "*" "*" "*" "*" "*"
## (Intercept) MSSubClass LotFrontage LotArea OverallQual
## TRUE TRUE TRUE TRUE TRUE
## OverallCond YearBuilt YearRemodAdd MasVnrArea BsmtFinSF1
## TRUE TRUE TRUE TRUE TRUE
## BsmtFinSF2 BsmtUnfSF TotalBsmtSF `1stFlrSF` `2ndFlrSF`
## FALSE FALSE TRUE FALSE FALSE
## LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath
## FALSE TRUE TRUE FALSE TRUE
## HalfBath BedroomAbvGr KitchenAbvGr TotRmsAbvGrd Fireplaces
## FALSE TRUE TRUE TRUE TRUE
## GarageYrBlt GarageCars GarageArea WoodDeckSF OpenPorchSF
## FALSE TRUE FALSE TRUE FALSE
## EnclosedPorch `3SsnPorch` ScreenPorch PoolArea MiscVal
## FALSE FALSE TRUE FALSE FALSE
## MoSold YrSold
## FALSE FALSE
Model
##
## Call:
## lm(formula = SalePrice ~ MSSubClass + LotFrontage + LotArea +
## OverallQual + OverallCond + YearBuilt + YearRemodAdd + MasVnrArea +
## BsmtFinSF1 + TotalBsmtSF + GrLivArea + BsmtFullBath + FullBath +
## BedroomAbvGr + KitchenAbvGr + TotRmsAbvGrd + Fireplaces +
## GarageCars + WoodDeckSF + ScreenPorch, data = dfs[[1]])
##
## Residuals:
## Min 1Q Median 3Q Max
## -469771 -17778 -2466 14317 294973
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.801e+05 1.412e+05 -6.941 6.42e-12 ***
## MSSubClass -1.947e+02 3.110e+01 -6.261 5.35e-10 ***
## LotFrontage -1.054e+02 5.682e+01 -1.855 0.063885 .
## LotArea 5.623e-01 1.537e-01 3.657 0.000266 ***
## OverallQual 1.760e+04 1.367e+03 12.873 < 2e-16 ***
## OverallCond 4.304e+03 1.209e+03 3.561 0.000384 ***
## YearBuilt 2.912e+02 6.114e+01 4.763 2.15e-06 ***
## YearRemodAdd 1.768e+02 7.528e+01 2.349 0.019012 *
## MasVnrArea 3.704e+01 6.782e+00 5.461 5.78e-08 ***
## BsmtFinSF1 1.046e+01 3.559e+00 2.939 0.003359 **
## TotalBsmtSF 7.376e+00 3.650e+00 2.021 0.043506 *
## GrLivArea 4.441e+01 4.861e+00 9.135 < 2e-16 ***
## BsmtFullBath 9.861e+03 2.785e+03 3.540 0.000416 ***
## FullBath 7.485e+03 2.988e+03 2.505 0.012371 *
## BedroomAbvGr -1.013e+04 1.926e+03 -5.258 1.73e-07 ***
## KitchenAbvGr -1.227e+04 5.538e+03 -2.215 0.026931 *
## TotRmsAbvGrd 5.263e+03 1.407e+03 3.740 0.000193 ***
## Fireplaces 4.683e+03 2.061e+03 2.273 0.023236 *
## GarageCars 1.139e+04 1.898e+03 6.003 2.58e-09 ***
## WoodDeckSF 2.405e+01 9.581e+00 2.510 0.012212 *
## ScreenPorch 5.403e+01 1.959e+01 2.758 0.005907 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36480 on 1174 degrees of freedom
## (265 observations deleted due to missingness)
## Multiple R-squared: 0.8109, Adjusted R-squared: 0.8077
## F-statistic: 251.7 on 20 and 1174 DF, p-value: < 2.2e-16
Evaluation
The diagnostic plots show that a linear regression was appropriate. This model is able to account for .8077 or ~81% of the variation
Predict
Overall, the model does a good job of predicting the SalePrice with the exception of a few outliers
Kaggle
Username: https://www.kaggle.com/baroncurtin2 Score: .24747