Living Area- This variable had the strongest positive correlation with sales price, which is to be expected.
Lot Area- The pairs diagram doesn’t reveal it as clearly, but there is a positive relationship between lot area and sale price.
Age of the house- There is a strong negative correlation between the age of the house and the sale price. This relationship was more linear and had a tighter confidence interval than remodel age.
Central Air AC- I noticed that all homes above a certain $ amount had central air, so it must have a strong importance in pricing.
All of these variables had P-values of 2.2E-16
While not included in the final model, # of bathrooms (full + half) had a strong positive correlation with sale price. Lot area was ultimately chosen over # of bathrooms because it is a continuous variable and produced a higher R^2 value.
## 
## Call:
## lm(formula = sale_price ~ gr_liv_area + lot_area + age + ac, 
##     data = t_test)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -110698  -24315   -2101   18887  153105 
## 
## Coefficients:
##               Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 52622.8109 10118.8883   5.200          0.000000277 ***
## gr_liv_area    93.9162     3.4857  26.943 < 0.0000000000000002 ***
## lot_area        0.7566     0.1805   4.192          0.000032034 ***
## age          -934.4950    57.4332 -16.271 < 0.0000000000000002 ***
## ac          13380.4335  7719.8676   1.733               0.0836 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38240 on 575 degrees of freedom
## Multiple R-squared:  0.7097, Adjusted R-squared:  0.7077 
## F-statistic: 351.5 on 4 and 575 DF,  p-value: < 0.00000000000000022