Introduction

This project explores the influence of home characteristics (size, number of bedrooms, and bathrooms) on price using the HomesForSale dataset. We also test whether home prices vary significantly by state.

Load the Dataset

# Load dataset
home <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")

# View column names
names(home)
## [1] "State" "Price" "Size"  "Beds"  "Baths"
# Preview
head(home)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Question 1 – Influence of Size on Price (California)

# Filter for California homes
ca_homes <- subset(home, State == "CA")

# Linear regression: Price vs Size
model1 <- lm(Price ~ Size, data = ca_homes)
summary(model1)
## 
## Call:
## lm(formula = Price ~ Size, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Interpretation:
The slope for Size tells us how much price increases per square foot. A statistically significant p-value (p < 0.05) means size is a significant predictor of price.


Question 2 – Influence of Number of Bedrooms on Price (California)

model2 <- lm(Price ~ Beds, data = ca_homes)
summary(model2)
## 
## Call:
## lm(formula = Price ~ Beds, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

Interpretation:
The p-value for Beds tests whether the number of bedrooms significantly affects price.


Question 3 – Influence of Number of Bathrooms on Price (California)

model3 <- lm(Price ~ Baths, data = ca_homes)
summary(model3)
## 
## Call:
## lm(formula = Price ~ Baths, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

Interpretation:
The slope shows the increase in price per additional bathroom. A significant p-value indicates a meaningful relationship.


Question 4 – Joint Influence of Size, Bedrooms, and Bathrooms (California)

model4 <- lm(Price ~ Size + Beds + Baths, data = ca_homes)
summary(model4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_homes)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Interpretation:
Each variable’s p-value shows if it contributes to price prediction after adjusting for the others. Significant values indicate strong influence.


Question 5 – Do Home Prices Differ Across CA, NY, NJ, and PA?

# Filter data for the 4 states
homes_4states <- subset(home, State %in% c("CA", "NY", "NJ", "PA"))

# ANOVA test
anova_model <- aov(Price ~ State, data = homes_4states)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:
The ANOVA p-value tells us whether price differences among the four states are statistically significant. A low p-value (< 0.05) means state has an impact on home price.


Conclusion

This analysis reveals how home features influence price in California and whether location (state) significantly impacts pricing. Regression and ANOVA were used to test these effects quantitatively.