Introduction

The HomesForSale dataset contains information on homes listed for sale across four U.S. states—California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA). Key variables include the home’s asking price, size (in thousands of square feet), and the number of bedrooms and bathrooms. Using this dataset, we applied simple linear regression, multiple regression, and ANOVA techniques to evaluate how home characteristics influence price and whether home values differ significantly across states. The analysis focuses first on the California subset to understand how size, bedrooms, and bathrooms individually and jointly predict housing prices, and then uses all states to assess whether location has a significant effect on price.

Data:

data = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(data)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Analysis:

  1. Use the data only for California. How much does the size of a home influence its price?
CA_data <- subset(data, State == "CA")
model1 <- lm(Price ~ Size, data = CA_data)
summary(model1)
## 
## Call:
## lm(formula = Price ~ Size, data = CA_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
plot(CA_data$Size, CA_data$Price, xlab="Size (sq.ft. / 1,000s)", ylab="Price ($1,000)", main="CA: Price vs Size")
abline(model1, col="blue")

The estimated slope for Size is 0.339 (price units: $1,000 per 1,000 sq.ft.), so each additional 1,000 sq.ft. is associated with an average increase of $339,000 in asking price in California. This effect is highly statistically significant (p ≈ 0.000463), and the model R² ≈ 0.359, meaning size alone explains about 36% of the price variation in the California subset.

  1. Use the data only for California. How does the number of bedrooms of a home influence its price?
model2 <- lm(Price ~ Beds, data = CA_data)
summary(model2)
## 
## Call:
## lm(formula = Price ~ Beds, data = CA_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The estimated slope for Beds is about 84.77 (i.e., ~$84,770 per extra bedroom), but this effect is not statistically significant (p ≈ 0.255). Therefore, we do not have sufficient evidence to conclude that the number of bedrooms has a real effect on price in the California sample (R² is small ≈ 0.046).

  1. Use the data only for California. How does the number of bathrooms of a home influence its price?
model3 <- lm(Price ~ Baths, data = CA_data)
summary(model3)
## 
## Call:
## lm(formula = Price ~ Baths, data = CA_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The estimated slope for Baths is about 194.74 (i.e., ~$194,740 per additional bathroom), and this effect is statistically significant (p ≈ 0.0041). Thus, in the California sample, more bathrooms are associated with higher asking prices (strong evidence of a positive relationship); Baths alone explains about 26% of price variation (R² ≈ 0.259).

  1. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
model4 <- lm(Price ~ Size + Beds + Baths, data = CA_data)
summary(model4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

When Size, Beds, and Baths are included together, Size remains the only predictor with a statistically significant slope (Size estimate ≈ 0.281, p ≈ 0.0259); Beds (p ≈ 0.624) and Baths (p ≈ 0.284) are not significant in the multiple regression. Therefore, controlling for size, the number of bedrooms and bathrooms do not provide significant additional linear predictive power for price in this California sample. The multiple regression explains about 39% of price variation (R² ≈ 0.391) and the overall model is significant (overall p ≈ 0.00435).

  1. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
data$State <- as.factor(data$State)
anova_model <- aov(Price ~ State, data = data)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA test for State has F ≈ 7.355 with p ≈ 0.000148, so there is very strong evidence that average home prices differ among the four states (CA, NY, NJ, PA). In other words, the state where the home is located significantly impacts asking price (at least one state mean differs from the others).

Summary:

Our analysis showed that home size is the strongest and most consistent predictor of price in California, with larger homes commanding substantially higher asking prices. While the number of bathrooms also showed a significant positive relationship with price in a simple regression, this effect disappeared when controlling for size in a multiple regression. The number of bedrooms alone did not significantly influence price in any model. When all states were included, the ANOVA results revealed significant differences in average home prices between the four states, indicating that location plays a meaningful role in determining housing values. Overall, the findings suggest that size and geographic location are the primary drivers of home price variation, while bedroom and bathroom counts play a more limited or context-dependent role.