This project explores the influence of home characteristics (size, number of bedrooms, and bathrooms) on price using the HomesForSale dataset. We also test whether home prices vary significantly by state.
# Load dataset
home <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
# View column names
names(home)
## [1] "State" "Price" "Size" "Beds" "Baths"
# Preview
head(home)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
# Filter for California homes
ca_homes <- subset(home, State == "CA")
# Linear regression: Price vs Size
model1 <- lm(Price ~ Size, data = ca_homes)
summary(model1)
##
## Call:
## lm(formula = Price ~ Size, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
Interpretation:
The slope for Size tells us how much price increases per
square foot. A statistically significant p-value (p < 0.05) means
size is a significant predictor of price.
model2 <- lm(Price ~ Beds, data = ca_homes)
summary(model2)
##
## Call:
## lm(formula = Price ~ Beds, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
Interpretation:
The p-value for Beds tests whether the number of bedrooms
significantly affects price.
model3 <- lm(Price ~ Baths, data = ca_homes)
summary(model3)
##
## Call:
## lm(formula = Price ~ Baths, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
Interpretation:
The slope shows the increase in price per additional bathroom. A
significant p-value indicates a meaningful relationship.
model4 <- lm(Price ~ Size + Beds + Baths, data = ca_homes)
summary(model4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_homes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
Interpretation:
Each variable’s p-value shows if it contributes to price prediction
after adjusting for the others. Significant values indicate strong
influence.
# Filter data for the 4 states
homes_4states <- subset(home, State %in% c("CA", "NY", "NJ", "PA"))
# ANOVA test
anova_model <- aov(Price ~ State, data = homes_4states)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation:
The ANOVA p-value tells us whether price differences among the four
states are statistically significant. A low p-value (< 0.05) means
state has an impact on home price.
This analysis reveals how home features influence price in California and whether location (state) significantly impacts pricing. Regression and ANOVA were used to test these effects quantitatively.