#Problem Statement Determine what else influences housing prices in 98371 zip code.
#Recode LOCATION
aggregate(PRICE~LOCATION, sold98371.main, mean)
#Edgewood will be the reference group based on average mean price
NEIGHBORHOOD <- factor(LOCATION, levels = c('Edgewood','Puyallup','Puyallup Valley','Downtown','Nw Puyallup', 'North Puyallup', 'Fife', 'South Hill', 'Maplewood','Summit'))
#Neighborhood Reference Group Discussion Edgewood will be the reference group for the Neighborhood categorical variable based on average mean price per neighborhood. The neighborhood with the highest mean price was selected.
#Recode PROPERTY.TYPE
PROP.TYPE <- factor(PROP.TYPE, levels = c('Single Family Residential','Condo/Co-op'))
#Property Type Reference Group Discussion Single Family Residential will be the reference group based on average mean price per neighborhood. There are only 5 observations that are Condo/Co-op.
#Method of Categorical Method Coding Dummy coding is the method that was used to code the categorical variables for this model. Dummy coding creates new “dummy” variables for each category and codes them with a ‘1’ or ‘0’ to indicate the presence of a particular attribute. This method provides a straighforward way of understanding the regression results. Effects coding would not be useful in this case because we are not targeting a specific characteristic. Contrast coding is not useful because we are not particularly interested in the interplay among the categories.
#Regression Model
##
## Call:
## lm(formula = PRICE ~ BEDS + BATHS + SQFT + YEARBUILT + NEIGHBORHOOD +
## PROP.TYPE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -286829 -25623 -7628 21454 222522
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.129e+05 2.127e+05 -1.001 0.317629
## BEDS -1.400e+04 4.749e+03 -2.948 0.003432 **
## BATHS 1.033e+04 8.224e+03 1.257 0.209787
## SQFT 9.782e+01 6.559e+00 14.913 < 2e-16 ***
## YEARBUILT 2.300e+02 1.104e+02 2.083 0.038095 *
## NEIGHBORHOODPuyallup -3.336e+04 8.426e+03 -3.959 9.29e-05 ***
## NEIGHBORHOODPuyallup Valley -3.739e+04 1.105e+04 -3.383 0.000807 ***
## NEIGHBORHOODDowntown -2.853e+04 1.167e+04 -2.445 0.015021 *
## NEIGHBORHOODNw Puyallup -5.534e+04 1.839e+04 -3.009 0.002831 **
## NEIGHBORHOODNorth Puyallup -8.086e+03 3.216e+04 -0.251 0.801634
## NEIGHBORHOODFife -2.233e+04 3.896e+04 -0.573 0.566979
## NEIGHBORHOODSouth Hill -1.971e+04 5.500e+04 -0.358 0.720262
## NEIGHBORHOODMaplewood -2.790e+04 3.973e+04 -0.702 0.483069
## NEIGHBORHOODSummit -1.127e+04 5.474e+04 -0.206 0.837056
## PROP.TYPECondo/Co-op -1.080e+04 2.628e+04 -0.411 0.681384
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54150 on 316 degrees of freedom
## Multiple R-squared: 0.7384, Adjusted R-squared: 0.7268
## F-statistic: 63.72 on 14 and 316 DF, p-value: < 2.2e-16
#Interpretation of Coefficients & Statistical Significance
BEDS is moderately significant and PRICE would decrease $14k for each additional bedroom. BATHS is not significant; each additional bathroom would increase PRICE by $10k. SQFT is highly significant and PRICE would increase at $98 per square foot. YEARBUILT is somewhat significant and PRICE would increase by $230 for each year newer. PRICE in the Puyallup neighborhood would be $33k cheaper than Edgewood. Puyallup neighborhood has a high level of significance. PRICE in Puyallup Valley neighborhood would by $37k cheaper than Edgewood. Puyallup Valley neighborhood has a high level of significance. PRICE in Downtown neighborhood is $28k cheaper than Edgewood. Downtown neighborhood has modest level of significance. PRICE in Nw Puyallup neighborhood is $55k cheaper than Edgewood. Nw Puyallup neighborhood has moderate level of significance. PRICE in North Puyallup is $80k cheaper than Edgewood. North Puyallup neighborhood is not significant. PRICE in Fife is $22k cheaper than Edgewood. Fife neighborhood is not significant. PRICE in South Hill is $19k cheaper than Edgewood. South Hill neighborhood is not significant. PRICE in Maplewood is $27k cheaper than Edgewood. Maplewood neighborhood is not significant. PRICE in Summit is $11k cheaper than Edgewood. Edgewood neighborhood is not significant. PROPERTY TYPE is not significant and PRICE for Condo would be $10k cheaper than Single Family Residential property.
#VIF Regression & Interpretation
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## GVIF Df GVIF^(1/(2*Df))
## BEDS 2.259915 1 1.503301
## BATHS 4.145709 1 2.036101
## SQFT 3.401079 1 1.844202
## YEARBUILT 1.882488 1 1.372038
## NEIGHBORHOOD 1.532033 9 1.023983
## PROP.TYPE 1.159582 1 1.076839
There is no evidence of collinearity. All variables have inflation factors less than 5. Insignificant variables, BATHS and PROPERTY TYPE, are candidates for removal from the model.
#Partial F-test 1 & ANOVA (Model 1, Model 2)
##
## Call:
## lm(formula = PRICE ~ BEDS + SQFT + YEARBUILT + NEIGHBORHOOD +
## PROP.TYPE)
##
## Residuals:
## Min 1Q Median 3Q Max
## -282424 -26372 -8074 22859 226702
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.038e+05 2.002e+05 -1.518 0.130096
## BEDS -1.236e+04 4.571e+03 -2.705 0.007197 **
## SQFT 1.025e+02 5.377e+00 19.069 < 2e-16 ***
## YEARBUILT 2.801e+02 1.031e+02 2.717 0.006948 **
## NEIGHBORHOODPuyallup -3.504e+04 8.327e+03 -4.208 3.37e-05 ***
## NEIGHBORHOODPuyallup Valley -3.947e+04 1.094e+04 -3.609 0.000358 ***
## NEIGHBORHOODDowntown -3.000e+04 1.162e+04 -2.581 0.010303 *
## NEIGHBORHOODNw Puyallup -5.726e+04 1.834e+04 -3.121 0.001966 **
## NEIGHBORHOODNorth Puyallup -5.297e+03 3.211e+04 -0.165 0.869077
## NEIGHBORHOODFife -2.059e+04 3.897e+04 -0.528 0.597664
## NEIGHBORHOODSouth Hill -2.006e+04 5.505e+04 -0.364 0.715792
## NEIGHBORHOODMaplewood -3.558e+04 3.930e+04 -0.905 0.365905
## NEIGHBORHOODSummit -7.969e+03 5.473e+04 -0.146 0.884317
## PROP.TYPECondo/Co-op -1.232e+04 2.627e+04 -0.469 0.639503
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54200 on 317 degrees of freedom
## Multiple R-squared: 0.7371, Adjusted R-squared: 0.7263
## F-statistic: 68.38 on 13 and 317 DF, p-value: < 2.2e-16
When BATHS is removed we observe a modest decline in the adjusted R-squared and some improvement in the significance of YEARBUILT. The Partial F-test p value is higher than .05 so we reject the null hypothesis. BATHS can be removed from the model.
#Partial F-test 2 & ANOVA (Model 1, Model 3)
##
## Call:
## lm(formula = PRICE ~ BEDS + BATHS + SQFT + YEARBUILT + NEIGHBORHOOD)
##
## Residuals:
## Min 1Q Median 3Q Max
## -286747 -25768 -7594 21559 222273
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -194090.82 207445.91 -0.936 0.350180
## BEDS -13683.63 4679.33 -2.924 0.003701 **
## BATHS 10490.42 8204.31 1.279 0.201957
## SQFT 97.84 6.55 14.938 < 2e-16 ***
## YEARBUILT 219.66 107.39 2.045 0.041647 *
## NEIGHBORHOODPuyallup -33243.28 8409.98 -3.953 9.53e-05 ***
## NEIGHBORHOODPuyallup Valley -37107.19 11016.44 -3.368 0.000849 ***
## NEIGHBORHOODDowntown -29265.07 11518.06 -2.541 0.011536 *
## NEIGHBORHOODNw Puyallup -55271.86 18365.98 -3.009 0.002827 **
## NEIGHBORHOODNorth Puyallup -7736.43 32106.13 -0.241 0.809739
## NEIGHBORHOODFife -22072.53 38900.94 -0.567 0.570842
## NEIGHBORHOODSouth Hill -19974.21 54925.87 -0.364 0.716356
## NEIGHBORHOODMaplewood -27894.30 39682.29 -0.703 0.482609
## NEIGHBORHOODSummit -10988.02 54666.43 -0.201 0.840826
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54080 on 317 degrees of freedom
## Multiple R-squared: 0.7383, Adjusted R-squared: 0.7276
## F-statistic: 68.79 on 13 and 317 DF, p-value: < 2.2e-16
When PROPERTY TYPE is removed, we observe an slight improvement in the adjusted R-squared.
#Partial F-test 3 & ANOVA (Model 1, Model 4)
##
## Call:
## lm(formula = PRICE ~ BEDS + SQFT + YEARBUILT + NEIGHBORHOOD)
##
## Residuals:
## Min 1Q Median 3Q Max
## -282254 -26505 -7206 22689 226490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.839e+05 1.954e+05 -1.453 0.14722
## BEDS -1.197e+04 4.489e+03 -2.668 0.00803 **
## SQFT 1.027e+02 5.365e+00 19.133 < 2e-16 ***
## YEARBUILT 2.691e+02 1.003e+02 2.684 0.00766 **
## NEIGHBORHOODPuyallup -3.493e+04 8.314e+03 -4.201 3.45e-05 ***
## NEIGHBORHOODPuyallup Valley -3.918e+04 1.091e+04 -3.592 0.00038 ***
## NEIGHBORHOODDowntown -3.086e+04 1.146e+04 -2.692 0.00748 **
## NEIGHBORHOODNw Puyallup -5.721e+04 1.832e+04 -3.123 0.00196 **
## NEIGHBORHOODNorth Puyallup -4.849e+03 3.206e+04 -0.151 0.87986
## NEIGHBORHOODFife -2.027e+04 3.891e+04 -0.521 0.60287
## NEIGHBORHOODSouth Hill -2.036e+04 5.498e+04 -0.370 0.71133
## NEIGHBORHOODMaplewood -3.571e+04 3.925e+04 -0.910 0.36364
## NEIGHBORHOODSummit -7.593e+03 5.466e+04 -0.139 0.88960
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 54140 on 318 degrees of freedom
## Multiple R-squared: 0.7369, Adjusted R-squared: 0.727
## F-statistic: 74.24 on 12 and 318 DF, p-value: < 2.2e-16
When both BATHS and PROPERTY TYPE are removed, we observe relatively no change in the adjusted R-squared from the original model. Partial F-test results indicate that we cannot reject the null hypothesis.
#The Most Parsimonious Model Based on the results from interative regression and Partial F-test analysis, the most parsimonious model is:
alias(OLS.model.exc.baths.prop.type)
## Model :
## PRICE ~ BEDS + SQFT + YEARBUILT + NEIGHBORHOOD