This report presents an analysis of housing prices and home characteristics using the HomesForSale dataset. The objective of this report is to answer the following five questions about how certain characteristics affect a home’s price:
Here we will analyze the questions in further detail using R.
home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
CA = subset(home, State == "CA")
model1 = lm(Price ~ Size, data = CA)
summary(model1)
##
## Call:
## lm(formula = Price ~ Size, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
plot(CA$Size, CA$Price,
main = "Home Size vs Price (California)",
xlab = "Size (1,000 sq ft)",
ylab = "Price ($1,000s)",
pch = 19)
abline(model1, col = "red", lwd = 2)
model2 = lm(Price ~ Beds, data = CA)
summary(model2)
##
## Call:
## lm(formula = Price ~ Beds, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
boxplot(Price ~ Beds, data = CA,
main = "Bedrooms vs Price (California)",
xlab = "Number of Bedrooms",
ylab = "Price ($1,000s)")
model3 = lm(Price ~ Baths, data = CA)
summary(model3)
##
## Call:
## lm(formula = Price ~ Baths, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
boxplot(Price ~ Baths, data = CA,
main = "Bathrooms vs Price (California)",
xlab = "Number of Bathrooms",
ylab = "Price ($1,000s)")
model4 = lm(Price ~ Size + Beds + Baths, data = CA)
summary(model4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
plot(model4$fitted.values, model4$residuals,
main = "Residual Plot for Multiple Regression (California)",
xlab = "Fitted Values",
ylab = "Residuals",
pch = 19)
abline(h = 0, col = "red", lwd = 2)
model5 = aov(Price ~ State, data = home)
summary(model5)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
boxplot(Price ~ State, data = home,
main = "Home Prices by State",
xlab = "State",
ylab = "Price ($1,000s)")
Larger homes tend to be significantly more expensive in California.
The number of bedrooms alone does not meaningfully predict home prices in California.
Homes with more bathrooms tend to be more expensive in California.
When considering all three variables together, size is the only significant predictor of California home prices.
The state in which a home is located has a significant effect on its price.