In this report, we are going to take look at houses for sale in California, New York, New Jersey, and Pennsylvania. We will compare the house prices while taking into considerationn the home size, number of bedrooms, bathrooms, etc. Below are the 5 questions we’re going to explore:
home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
homeCA = subset(home, State == "CA")
plot(Price ~ Size, data = homeCA,
main = "Price vs Size (only CA)",
xlab = "Size (1000 sq ft)", ylab = "Price ($1000s)")
abline(lm(Price ~ Size, data = homeCA))
reg1 = lm(Price ~ Size, data = homeCA)
summary(reg1)
##
## Call:
## lm(formula = Price ~ Size, data = homeCA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
The regression model for CA homes shows that the slope size is 0.33919, which means for each additional 1000 square feet, the home prices on average increase by $339,190. The p-value is 0.0004634 which is well below 0.0.5, indicating a strong statistical significance between home prices and home sizes in California. The R2 value of 0.3594 shows that 39.54% of the home price variation in California is explained by home size.
Reg2 = lm(Price ~ Beds, data = homeCA)
summary(Reg2)
##
## Call:
## lm(formula = Price ~ Beds, data = homeCA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The regression model for California homes shows that the slope for Bedrooms is 84.77, meaning that for each additional bedroom, the expected home price increases by approximately $84,770. However, the p-value for the slope is 0.255, which is well above the 0.05 significance level. This means we fail to reject the null hypothesis and conclude that the number of bedrooms does not have a statistically significant linear relationship with home price in California. Although the estimated effect is positive, the large p-value indicates that this association is not statistically meaningful based on the sample data.
reg3 = lm(Price ~ Baths, data = homeCA)
summary(reg3)
##
## Call:
## lm(formula = Price ~ Baths, data = homeCA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression model for CA homes shows that the slope for bathrooms is 194.74, meaning that each additional bathroom is associated with an estimated increase of about $194,740 in home price. The p-value for this slope is 0.004, which is below the 0.05 significance level. This indicates that the number of bathrooms has a statistically significant linear relationship with home price in California.
reg4 = lm(Price ~ Size + Beds + Baths, data = homeCA)
summary(reg4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = homeCA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
The p-value for size is 0.026, which is below 0.05, indicating that size remains a statistically significant predictor of home price regardless of the number of bedrooms and bathrooms. The p-value for bedrooms is 0.624, which is far above 0.05, showing that the number of bedrooms does not significantly impact home price once size and bathrooms are included in the model. Lastly, the p-value for bathrooms is 0.284, which is also above 0.05, indicating that bathrooms are not significant predictors of home price in California after accounting for size and bedrooms. Overall, size has the most statistically significant linear relationship with home price in California, as size increase, the price also increases.
anova5 = aov(Price ~ State, data = home)
boxplot(Price ~ State, data = home,
main = "Home Prices by State",
xlab = "State", ylab = "Price ($1000s)")
summary(anova5)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results show that state has a statistically significant effect on home prices. The p-value of 0.000148 is far below the 0.05 significance level, so we reject the null hypothesis that all four states have the same average home price. I also included a boxplot to visually compare the price differences across states, which shows that California has the highest mean home price among the four states, while Pennsylvania has the lowest.