————————————————————————————————————————–
This data set explores the extent that 79 housing attributes in Ames Iowa explain regional housing prices. There are 1460 rows of data, each representing a distinct attribute of a house recently sold in the Ames Iowa metropolitan area. These attributes can all potentially be used as predictor variables to the one response variable (“Sale Price”) which indicates the price that the house actually sold for in that market. Using a training data set from this superset of housing data, this exercise attempts to model and predict sales prices using test data from the same superset. The predictions derived from test data and the model will ultimately be compared against actual sales data to determine the accuracy of the predictions.
————————————————————————————————————————–
Meta-data was provided with the data sets for this exercise, including a data dictionary “data_description.txt”. In order to model attributes from complete cases, data was imputed for the attributes below. The primary consideration in imputing data was the conversion of “NA” and “NaN” data to “None” and 0 respectively. Doing so retains consistency with the original data for all cases without introducing skew and bias into the model caused by missing data.
Note that introducing “None” for “NA” simply indicates the absence of that attribute, i.e. a measurement for a garage when none exists or a Masonry facade when the facade is not masonry. Similarly, 0 is substituted for “NaN” when in the absence of a numeric value when the attribute is numeric, i.e. Lot Frontage in square feet when there is no Lot Frontage.
LotFrontage(4) 259, GarageYrBlt(60) 81, MasVnrArea(27) 8
Electrical(43) 1, MasVnrType(26) 8, BsmtQual(31) 57, BsmtCond(32) 57, BsmtFinType1(34) 57, BsmtExposure(33) 58, BsmtFinType2(36) 58, GarageType(59) 81, GarageFinish(61) 81, GarageQual(64) 81, GarageCond(65) 81, FireplaceQu(58) 690 Fence(74) 1179, Alley(7) 1369, MiscFeature(75) 1406, PoolQC(73) 1453
Significant skew and kurtosis exists for the attributes LotArea, LowQualFinSF, X3SsnPorch, ScreenPorch, PoolArea and MiscVal. These attributes are indicative of an uncommon occurance in the data set. The model will be determinative of whether these attributes truly represent “outliers” or are uncommon but valuable to include in the model.
vars | id | n | mean | sd | median | min | max | range | skew | kurtosis | se |
---|---|---|---|---|---|---|---|---|---|---|---|
MSSubClass | 2 | 1460 | 57 | 42 | 50 | 20 | 190 | 170 | 1 | 2 | 1 |
LotFrontage | 4 | 1201 | 70 | 24 | 69 | 21 | 313 | 292 | 2 | 17 | 1 |
LotArea | 5 | 1460 | 10517 | 9981 | 9478 | 1300 | 215245 | 213945 | 12 | 202 | 261 |
OverallQual | 18 | 1460 | 6 | 1 | 6 | 1 | 10 | 9 | 0 | 0 | 0 |
OverallCond | 19 | 1460 | 6 | 1 | 5 | 1 | 9 | 8 | 1 | 1 | 0 |
YearBuilt | 20 | 1460 | 1971 | 30 | 1973 | 1872 | 2010 | 138 | -1 | 0 | 1 |
YearRemodAdd | 21 | 1460 | 1985 | 21 | 1994 | 1950 | 2010 | 60 | -1 | -1 | 1 |
MasVnrArea | 27 | 1452 | 104 | 181 | 0 | 0 | 1600 | 1600 | 3 | 10 | 5 |
BsmtFinSF1 | 35 | 1460 | 444 | 456 | 384 | 0 | 5644 | 5644 | 2 | 11 | 12 |
BsmtFinSF2 | 37 | 1460 | 47 | 161 | 0 | 0 | 1474 | 1474 | 4 | 20 | 4 |
BsmtUnfSF | 38 | 1460 | 567 | 442 | 478 | 0 | 2336 | 2336 | 1 | 0 | 12 |
TotalBsmtSF | 39 | 1460 | 1057 | 439 | 992 | 0 | 6110 | 6110 | 2 | 13 | 11 |
X1stFlrSF | 44 | 1460 | 1163 | 387 | 1087 | 334 | 4692 | 4358 | 1 | 6 | 10 |
X2ndFlrSF | 45 | 1460 | 347 | 437 | 0 | 0 | 2065 | 2065 | 1 | -1 | 11 |
-LowQualFinSF | 46 | 1460 | 6 | 49 | 0 | 0 | 572 | 572 | 9 | 83 | 1 |
GrLivArea | 47 | 1460 | 1515 | 525 | 1464 | 334 | 5642 | 5308 | 1 | 5 | 14 |
BsmtFullBath | 48 | 1460 | 0 | 1 | 0 | 0 | 3 | 3 | 1 | -1 | 0 |
BsmtHalfBath | 49 | 1460 | 0 | 0 | 0 | 0 | 2 | 2 | 4 | 16 | 0 |
FullBath | 50 | 1460 | 2 | 1 | 2 | 0 | 3 | 3 | 0 | -1 | 0 |
HalfBath | 51 | 1460 | 0 | 1 | 0 | 0 | 2 | 2 | 1 | -1 | 0 |
BedroomAbvGr | 52 | 1460 | 3 | 1 | 3 | 0 | 8 | 8 | 0 | 2 | 0 |
KitchenAbvGr | 53 | 1460 | 1 | 0 | 1 | 0 | 3 | 3 | 4 | 21 | 0 |
TotRmsAbvGrd | 55 | 1460 | 7 | 2 | 6 | 2 | 14 | 12 | 1 | 1 | 0 |
Fireplaces | 57 | 1460 | 1 | 1 | 1 | 0 | 3 | 3 | 1 | 0 | 0 |
GarageYrBlt | 60 | 1379 | 1979 | 25 | 1980 | 1900 | 2010 | 110 | -1 | 0 | 1 |
-GarageCars | 62 | 1460 | 2 | 1 | 2 | 0 | 4 | 4 | 0 | 0 | 0 |
GarageArea | 63 | 1460 | 473 | 214 | 480 | 0 | 1418 | 1418 | 0 | 1 | 6 |
WoodDeckSF | 67 | 1460 | 94 | 125 | 0 | 0 | 857 | 857 | 2 | 3 | 3 |
OpenPorchSF | 68 | 1460 | 47 | 66 | 25 | 0 | 547 | 547 | 2 | 8 | 2 |
EnclosedPorch | 69 | 1460 | 22 | 61 | 0 | 0 | 552 | 552 | 3 | 10 | 2 |
X3SsnPorch | 70 | 1460 | 3 | 29 | 0 | 0 | 508 | 508 | 10 | 123 | 1 |
ScreenPorch | 71 | 1460 | 15 | 56 | 0 | 0 | 480 | 480 | 4 | 18 | 1 |
PoolArea | 72 | 1460 | 3 | 40 | 0 | 0 | 738 | 738 | 15 | 222 | 1 |
MiscVal | 76 | 1460 | 43 | 496 | 0 | 0 | 15500 | 15500 | 24 | 698 | 13 |
MoSold | 77 | 1460 | 6 | 3 | 6 | 1 | 12 | 11 | 0 | 0 | 0 |
YrSold | 78 | 1460 | 2008 | 1 | 2008 | 2006 | 2010 | 4 | 0 | -1 | 0 |
SalePrice | 81 | 1460 | 180921 | 79443 | 163000 | 4900 | 755000 | 720100 | 2 | 6 | 2079 |
Catagorical data is converted to factor data for inclusion in the model and represented below as proportional bar plots to indicate their relative relation to housing price.
Ordinal attributes are show as histograms. Most of these attributes contain enough cases that even when significant skew occurs they are still useful to an ordinary least squares model. The significance of each attribute will be tested by p-value, AIC and vif using the lm model in step 3.
————————————————————————————————————————–
————————————————————————————————————————–
## $title
## [1] "Zoning Proportions"
##
## attr(,"class")
## [1] "labels"
## NULL
## $title
## [1] "Exterior1st Proportions"
##
## attr(,"class")
## [1] "labels"
## Warning: Removed 1 rows containing non-finite values (stat_count).
————————————————————————————————————————–