library("ggplot2")
IH <- read.csv("IowaHousing.csv")
summary(lm(SalePrice ~ Year.Built, data = IH))
##
## Call:
## lm(formula = SalePrice ~ Year.Built, data = IH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -147394 -41227 -14502 23093 540805
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.727e+06 7.983e+04 -34.16 <2e-16 ***
## Year.Built 1.475e+03 4.049e+01 36.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 66280 on 2928 degrees of freedom
## Multiple R-squared: 0.3118, Adjusted R-squared: 0.3116
## F-statistic: 1327 on 1 and 2928 DF, p-value: < 2.2e-16
ggplot(IH, aes(Year.Built, SalePrice)) + geom_point() +geom_smooth(method = "lm") + ggtitle("Year Built effect on Sale Price")
summary(lm(SalePrice ~ Year.Built +Gr.Liv.Area, data = IH))
##
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area, data = IH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -458172 -26758 -2236 18514 306986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.106e+06 5.734e+04 -36.74 <2e-16 ***
## Year.Built 1.087e+03 2.938e+01 37.01 <2e-16 ***
## Gr.Liv.Area 9.597e+01 1.758e+00 54.60 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46660 on 2927 degrees of freedom
## Multiple R-squared: 0.6591, Adjusted R-squared: 0.6588
## F-statistic: 2829 on 2 and 2927 DF, p-value: < 2.2e-16
ggplot(IH, aes(Gr.Liv.Area, SalePrice))+ geom_point() + geom_smooth(method = "lm") + ggtitle("Graded Living Area effect on Sale Price")
summary(lm(SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF, data = IH))
##
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF,
## data = IH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -635809 -20961 -3176 17214 265742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.678e+06 5.577e+04 -30.09 <2e-16 ***
## Year.Built 8.554e+02 2.875e+01 29.75 <2e-16 ***
## Gr.Liv.Area 7.996e+01 1.754e+00 45.59 <2e-16 ***
## Total.Bsmt.SF 4.990e+01 2.138e+00 23.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42860 on 2925 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7124, Adjusted R-squared: 0.7121
## F-statistic: 2416 on 3 and 2925 DF, p-value: < 2.2e-16
ggplot(IH, aes(Total.Bsmt.SF, SalePrice))+ geom_jitter() + geom_smooth(method = "lm") + ggtitle("Total Basement Size effect on Sale Price")
FinalModel <- lm(SalePrice ~ Year.Built + Gr.Liv.Area+ Total.Bsmt.SF + Overall.Qual, data = IH)
summary(FinalModel)
##
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF +
## Overall.Qual, data = IH)
##
## Residuals:
## Min 1Q Median 3Q Max
## -533948 -20467 -2016 16238 269659
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.944e+05 5.569e+04 -16.06 <2e-16 ***
## Year.Built 4.145e+02 2.933e+01 14.13 <2e-16 ***
## Gr.Liv.Area 5.641e+01 1.734e+00 32.53 <2e-16 ***
## Total.Bsmt.SF 3.492e+01 1.945e+00 17.95 <2e-16 ***
## Overall.Qual 2.245e+04 7.604e+02 29.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37620 on 2924 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.7785, Adjusted R-squared: 0.7782
## F-statistic: 2569 on 4 and 2924 DF, p-value: < 2.2e-16
ggplot(IH, aes(Overall.Qual, SalePrice))+ geom_jitter() + geom_smooth(method = "lm") + ggtitle("Overall Quality effect on Sale Price")
By analysing the final model we get the formula of: \(\hat{SalePrice}\) = -16.06048 + 14.13250\(_{Year.Built}\) + 32.53026\(_{Gr.Liv.Area}\) + 17.9553\(_{Total.Bsmt.SF}\) + 29.52349\(_{Overall.Qual}\)
The R^2 value for this FinalModel is 0.7781747 or 77.82% for the sake of simplicity. This accounts for 77.82% variability in the SalePrice of a house accounted by all four predictors shown above.
summary(FinalModel)$adj.r.squared
## [1] 0.7781747
The coefficients for each predictor allows to analyse the relative monetary value that each predictor adds to the house’s overall market value.
summary(FinalModel)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -894418.37036 55690.653691 -16.06048 1.087151e-55
## Year.Built 414.47544 29.327820 14.13250 6.466203e-44
## Gr.Liv.Area 56.41303 1.734171 32.53026 2.150059e-198
## Total.Bsmt.SF 34.91585 1.944592 17.95536 1.816381e-68
## Overall.Qual 22448.19091 760.350142 29.52349 6.802852e-168
The P-Value is significant compared to the standard of p < 0.5 in all cases for each predictor. All p-values result in p = 2e-16
The t-statistic can be obtained from the linear model summary
summary(FinalModel)$coefficients[,3]
## (Intercept) Year.Built Gr.Liv.Area Total.Bsmt.SF Overall.Qual
## -16.06048 14.13250 32.53026 17.95536 29.52349