library("ggplot2")
IH <- read.csv("IowaHousing.csv")

I started with the Year in which the house was built. Newer houses tend to be more updated with today’s standards. Thus, making them more marketable.

summary(lm(SalePrice ~ Year.Built, data = IH))
## 
## Call:
## lm(formula = SalePrice ~ Year.Built, data = IH)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -147394  -41227  -14502   23093  540805 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.727e+06  7.983e+04  -34.16   <2e-16 ***
## Year.Built   1.475e+03  4.049e+01   36.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 66280 on 2928 degrees of freedom
## Multiple R-squared:  0.3118, Adjusted R-squared:  0.3116 
## F-statistic:  1327 on 1 and 2928 DF,  p-value: < 2.2e-16
ggplot(IH, aes(Year.Built, SalePrice)) + geom_point() +geom_smooth(method = "lm") + ggtitle("Year Built effect on Sale Price")

I decided to add the Graded Living Area (SF). Afterall, more space equals more “living.”

summary(lm(SalePrice ~ Year.Built +Gr.Liv.Area, data = IH))
## 
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area, data = IH)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -458172  -26758   -2236   18514  306986 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.106e+06  5.734e+04  -36.74   <2e-16 ***
## Year.Built   1.087e+03  2.938e+01   37.01   <2e-16 ***
## Gr.Liv.Area  9.597e+01  1.758e+00   54.60   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 46660 on 2927 degrees of freedom
## Multiple R-squared:  0.6591, Adjusted R-squared:  0.6588 
## F-statistic:  2829 on 2 and 2927 DF,  p-value: < 2.2e-16
ggplot(IH, aes(Gr.Liv.Area, SalePrice))+ geom_point() + geom_smooth(method = "lm") + ggtitle("Graded Living Area effect on Sale Price")

When it comes to choosing where to live, a good thing to keep in mind is the size of the basement. House seekers could be thinking of the extra space as guest quarters or a recreational room. Making the larger basement space more attractive.

summary(lm(SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF, data = IH))
## 
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF, 
##     data = IH)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -635809  -20961   -3176   17214  265742 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.678e+06  5.577e+04  -30.09   <2e-16 ***
## Year.Built     8.554e+02  2.875e+01   29.75   <2e-16 ***
## Gr.Liv.Area    7.996e+01  1.754e+00   45.59   <2e-16 ***
## Total.Bsmt.SF  4.990e+01  2.138e+00   23.34   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42860 on 2925 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7124, Adjusted R-squared:  0.7121 
## F-statistic:  2416 on 3 and 2925 DF,  p-value: < 2.2e-16
ggplot(IH, aes(Total.Bsmt.SF, SalePrice))+ geom_jitter() + geom_smooth(method = "lm") + ggtitle("Total Basement Size effect on Sale Price")

Final Model

I included the Overall Quality to the model to see if it had any substantial impact on Sale Price. Even if the house has everything you may need, it doesn’t hurt knowing whether it’s quality is on par to your standard of living.

FinalModel <- lm(SalePrice ~ Year.Built + Gr.Liv.Area+ Total.Bsmt.SF + Overall.Qual, data = IH)
summary(FinalModel)
## 
## Call:
## lm(formula = SalePrice ~ Year.Built + Gr.Liv.Area + Total.Bsmt.SF + 
##     Overall.Qual, data = IH)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -533948  -20467   -2016   16238  269659 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -8.944e+05  5.569e+04  -16.06   <2e-16 ***
## Year.Built     4.145e+02  2.933e+01   14.13   <2e-16 ***
## Gr.Liv.Area    5.641e+01  1.734e+00   32.53   <2e-16 ***
## Total.Bsmt.SF  3.492e+01  1.945e+00   17.95   <2e-16 ***
## Overall.Qual   2.245e+04  7.604e+02   29.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37620 on 2924 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7785, Adjusted R-squared:  0.7782 
## F-statistic:  2569 on 4 and 2924 DF,  p-value: < 2.2e-16
ggplot(IH, aes(Overall.Qual, SalePrice))+ geom_jitter() + geom_smooth(method = "lm") + ggtitle("Overall Quality effect on Sale Price")

The adjusted R squared

By analysing the final model we get the formula of: \(\hat{SalePrice}\) = -16.06048 + 14.13250\(_{Year.Built}\) + 32.53026\(_{Gr.Liv.Area}\) + 17.9553\(_{Total.Bsmt.SF}\) + 29.52349\(_{Overall.Qual}\)

The R^2 value for this FinalModel is 0.7781747 or 77.82% for the sake of simplicity. This accounts for 77.82% variability in the SalePrice of a house accounted by all four predictors shown above.

summary(FinalModel)$adj.r.squared
## [1] 0.7781747

Coeffiecients for each predictor

The coefficients for each predictor allows to analyse the relative monetary value that each predictor adds to the house’s overall market value.

summary(FinalModel)$coefficients
##                    Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept)   -894418.37036 55690.653691 -16.06048  1.087151e-55
## Year.Built        414.47544    29.327820  14.13250  6.466203e-44
## Gr.Liv.Area        56.41303     1.734171  32.53026 2.150059e-198
## Total.Bsmt.SF      34.91585     1.944592  17.95536  1.816381e-68
## Overall.Qual    22448.19091   760.350142  29.52349 6.802852e-168

p - Value

The P-Value is significant compared to the standard of p < 0.5 in all cases for each predictor. All p-values result in p = 2e-16

t - Statistic

The t-statistic can be obtained from the linear model summary

summary(FinalModel)$coefficients[,3]
##   (Intercept)    Year.Built   Gr.Liv.Area Total.Bsmt.SF  Overall.Qual 
##     -16.06048      14.13250      32.53026      17.95536      29.52349