Living Area- This variable had the strongest positive correlation with sales price, which is to be expected.
Lot Area- The pairs diagram doesn’t reveal it as clearly, but there is a positive relationship between lot area and sale price.
Age of the house- There is a strong negative correlation between the age of the house and the sale price. This relationship was more linear and had a tighter confidence interval than remodel age.
Central Air AC- I noticed that all homes above a certain $ amount had central air, so it must have a strong importance in pricing.
All of these variables had P-values of 2.2E-16
While not included in the final model, # of bathrooms (full + half) had a strong positive correlation with sale price. Lot area was ultimately chosen over # of bathrooms because it is a continuous variable and produced a higher R^2 value.
##
## Call:
## lm(formula = sale_price ~ gr_liv_area + lot_area + age + ac,
## data = t_test)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110698 -24315 -2101 18887 153105
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52622.8109 10118.8883 5.200 0.000000277 ***
## gr_liv_area 93.9162 3.4857 26.943 < 0.0000000000000002 ***
## lot_area 0.7566 0.1805 4.192 0.000032034 ***
## age -934.4950 57.4332 -16.271 < 0.0000000000000002 ***
## ac 13380.4335 7719.8676 1.733 0.0836 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38240 on 575 degrees of freedom
## Multiple R-squared: 0.7097, Adjusted R-squared: 0.7077
## F-statistic: 351.5 on 4 and 575 DF, p-value: < 0.00000000000000022