We use the data from the HomesForSale dataset to examine factors that affect home prices in the states of California, New York, New Jersey, and Pennsylvania.
Q1. Use only California data. How much does size influence price?
Q2. Use only California data. How does the number of bedrooms influence price?
Q3. Use only California data. How do the number of bathrooms influence price?
Q4. Use only California data. How do size, bedrooms, and bathrooms jointly influence price?
Q5. Are there significant differences in home prices among states (CA, NJ, NY, PA)?
We will explore the questions in detail.
CA <- subset(home, State == "CA")
model1 <- lm(Price ~ Size, data = CA)
summary(model1)
##
## Call:
## lm(formula = Price ~ Size, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
The regression model for California homes is: Price = −56.82 + 0.339 * Size
The slope estimate is 0.339, which means that for each additional square foot, the expected home price increases by about $339. This shows that size has a positive influence on home price.
model2 <- lm(Price ~ Beds, data = CA)
summary(model2)
##
## Call:
## lm(formula = Price ~ Beds, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The regression model using only California data is: Price = 269.76 + 84.77 * Beds
The p-value for the slope is 0.255, which is greater than 0.05. This means there is no statistically significant evidence that the number of bedrooms is related to home price in California. Which means, based on this dataset, bedrooms do not significantly influence price.
model3 <- lm(Price ~ Baths, data = CA)
summary(model3)
##
## Call:
## lm(formula = Price ~ Baths, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression model using only California data is: Price = 90.71 + 194.74 * Baths
The p-value for the slope is 0.00409, which is less than 0.05. This means there is strong statistically significant evidence that the number of bathrooms is related to home price in California. Based on this dataset, bathrooms do significantly influence price: homes with more bathrooms tend to be more expensive.
model4 <- lm(Price ~ Size + Beds + Baths, data = CA)
summary(model4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
The multiple regression model using only California data is: Price = -41.56 + 0.2811 * Size - 33.70 * Beds + 83.98 * Baths
The p-value for Size is 0.0259, which is less than 0.05, so size is a significant predictor of home price when controlling for bedrooms and bathrooms.
The p-value for Beds is 0.6239, which is greater than 0.05, so the number of bedrooms is not a significant predictor in the multiple regression model.
The p-value for Baths is 0.2839, also greater than 0.05, meaning bathrooms are not a significant predictor after accounting for size and bedrooms.
This makes size being the only variable that significantly influences price.
model5 <- lm(Price ~ State, data = home)
anova(model5)
## Analysis of Variance Table
##
## Response: Price
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.3547 0.0001482 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model5)
##
## Call:
## lm(formula = Price ~ State, data = home)
##
## Residuals:
## Min 1Q Median 3Q Max
## -390.37 -166.77 -47.05 89.48 884.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 535.37 42.55 12.583 < 2e-16 ***
## StateNJ -206.83 60.17 -3.438 0.000816 ***
## StateNY -170.03 60.17 -2.826 0.005553 **
## StatePA -269.80 60.17 -4.484 1.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 233 on 116 degrees of freedom
## Multiple R-squared: 0.1598, Adjusted R-squared: 0.1381
## F-statistic: 7.355 on 3 and 116 DF, p-value: 0.0001482
An ANOVA model was used to test whether mean home prices differ among the four states. The ANOVA p-value is 0.0001482, which is far below 0.05. This indicates strong statistically significant evidence that average home prices are not the same across the states. Which means that state location has a significant effect on home price.
Overall, these results show that home size is the strongest and most consistent factor affecting price in California. In both the simple and multiple regression models, larger homes tended to be more expensive, and size was the only variable that stayed significant once everything was included together.
Bedrooms did not have a meaningful impact on price in any of the models, and bathrooms were only significant when considered by themselves. Once size was added to the model, the effect of bathrooms disappeared, suggesting that bathroom count overlaps with home size in explaining price.
The ANOVA test comparing all four states showed that average home prices are not the same across states. This means that location plays a clear role in pricing as well.
Based on this dataset, bigger homes cost more, and prices vary a lot by state, while bedroom and bathroom counts matter less than size.
# Q1 code: CA <- subset(home, State == "CA")
# Q1 code: model1 <- lm(Price ~ Size, data = CA)
# Q1 code: summary(model1)
# Q2 code: model2 <- lm(Price ~ Beds, data = CA)
# Q2 code: summary(model2)
# Q3 code: model3 <- lm(Price ~ Baths, data = CA)
# Q3 code: summary(model3)
# Q4 code: model4 <- lm(Price ~ Size + Beds + Baths, data = CA)
# Q4 code: summary(model4)
# Q5 code: model5 <- lm(Price ~ State, data = home)
# Q5 code: anova(model5)
# Q5 code: summary(model5)