Introduction

We use the data from the HomesForSale dataset to examine factors that affect home prices in the states of California, New York, New Jersey, and Pennsylvania.

Analysis

We will explore the questions in detail.

Q1: Use only California data. How much does size influence price?

CA <- subset(home, State == "CA")

model1 <- lm(Price ~ Size, data = CA)
summary(model1)
## 
## Call:
## lm(formula = Price ~ Size, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

The regression model for California homes is: Price = −56.82 + 0.339 * Size

The slope estimate is 0.339, which means that for each additional square foot, the expected home price increases by about $339. This shows that size has a positive influence on home price.

Q2: Use only California data. How does the number of bedrooms influence price?

model2 <- lm(Price ~ Beds, data = CA)
summary(model2)
## 
## Call:
## lm(formula = Price ~ Beds, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The regression model using only California data is: Price = 269.76 + 84.77 * Beds

The p-value for the slope is 0.255, which is greater than 0.05. This means there is no statistically significant evidence that the number of bedrooms is related to home price in California. Which means, based on this dataset, bedrooms do not significantly influence price.

Q3: Use only California data. How do the number of bathrooms influence price?

model3 <- lm(Price ~ Baths, data = CA)
summary(model3)
## 
## Call:
## lm(formula = Price ~ Baths, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression model using only California data is: Price = 90.71 + 194.74 * Baths

The p-value for the slope is 0.00409, which is less than 0.05. This means there is strong statistically significant evidence that the number of bathrooms is related to home price in California. Based on this dataset, bathrooms do significantly influence price: homes with more bathrooms tend to be more expensive.

Q4: Use only California data. How do size, bedrooms, and bathrooms jointly influence price?

model4 <- lm(Price ~ Size + Beds + Baths, data = CA)
summary(model4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

The multiple regression model using only California data is: Price = -41.56 + 0.2811 * Size - 33.70 * Beds + 83.98 * Baths

The p-value for Size is 0.0259, which is less than 0.05, so size is a significant predictor of home price when controlling for bedrooms and bathrooms.

The p-value for Beds is 0.6239, which is greater than 0.05, so the number of bedrooms is not a significant predictor in the multiple regression model.

The p-value for Baths is 0.2839, also greater than 0.05, meaning bathrooms are not a significant predictor after accounting for size and bedrooms.

This makes size being the only variable that significantly influences price.

Q5: Are there significant differences in home prices among states (CA, NJ, NY, PA)?

model5 <- lm(Price ~ State, data = home)
anova(model5)
## Analysis of Variance Table
## 
## Response: Price
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## State       3 1198169  399390  7.3547 0.0001482 ***
## Residuals 116 6299266   54304                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model5)
## 
## Call:
## lm(formula = Price ~ State, data = home)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -390.37 -166.77  -47.05   89.48  884.67 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   535.37      42.55  12.583  < 2e-16 ***
## StateNJ      -206.83      60.17  -3.438 0.000816 ***
## StateNY      -170.03      60.17  -2.826 0.005553 ** 
## StatePA      -269.80      60.17  -4.484 1.73e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 233 on 116 degrees of freedom
## Multiple R-squared:  0.1598, Adjusted R-squared:  0.1381 
## F-statistic: 7.355 on 3 and 116 DF,  p-value: 0.0001482

An ANOVA model was used to test whether mean home prices differ among the four states. The ANOVA p-value is 0.0001482, which is far below 0.05. This indicates strong statistically significant evidence that average home prices are not the same across the states. Which means that state location has a significant effect on home price.

Summary

Overall, these results show that home size is the strongest and most consistent factor affecting price in California. In both the simple and multiple regression models, larger homes tended to be more expensive, and size was the only variable that stayed significant once everything was included together.

Bedrooms did not have a meaningful impact on price in any of the models, and bathrooms were only significant when considered by themselves. Once size was added to the model, the effect of bathrooms disappeared, suggesting that bathroom count overlaps with home size in explaining price.

The ANOVA test comparing all four states showed that average home prices are not the same across states. This means that location plays a clear role in pricing as well.

Based on this dataset, bigger homes cost more, and prices vary a lot by state, while bedroom and bathroom counts matter less than size.

Appendix

# Q1 code: CA <- subset(home, State == "CA")
# Q1 code: model1 <- lm(Price ~ Size, data = CA)
# Q1 code: summary(model1)

# Q2 code: model2 <- lm(Price ~ Beds, data = CA)
# Q2 code: summary(model2)

# Q3 code: model3 <- lm(Price ~ Baths, data = CA)
# Q3 code: summary(model3)

# Q4 code: model4 <- lm(Price ~ Size + Beds + Baths, data = CA)
# Q4 code: summary(model4)

# Q5 code: model5 <- lm(Price ~ State, data = home)
# Q5 code: anova(model5)
# Q5 code: summary(model5)