| title: “HomesForSale Regression Analysis” |
| author: “K.M Fazle Rabbi” |
| date: “2025-12-09” |
This report uses the HomesForSale dataset to study how home prices are related to:
and to test whether prices differ across the four states (CA, NY, NJ, PA).
Questions 1–4 use only California homes. Question 5 uses all homes.
## # A tibble: 4 × 4
## State n mean_price sd_price
## <chr> <int> <dbl> <dbl>
## 1 CA 30 535. 269.
## 2 NJ 30 328. 158
## 3 NY 30 365. 318.
## 4 PA 30 266. 137.
We first model Price as a function of Size for California homes.
mod1 <- lm(Price ~ Size, data = home_CA)
summary(mod1)
##
## Call:
## lm(formula = Price ~ Size, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
Key findings
Now we regress Price on Beds (number of bedrooms) for California homes.
mod2 <- lm(Price ~ Beds, data = home_CA)
summary(mod2)
##
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
Key findings
Next we regress Price on Baths (number of bathrooms).
mod3 <- lm(Price ~ Baths, data = home_CA)
summary(mod3)
##
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
Key findings
We now fit a multiple regression with all three predictors:
[ = _0 + _1 + _2 + _3 + .]
mod4 <- lm(Price ~ Size + Beds + Baths, data = home_CA)
summary(mod4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608472 210.3809020 -0.1975505 0.84493313
## Size 0.2811026 0.1189322 2.3635534 0.02585657
## Beds -33.7035665 67.9255137 -0.4961842 0.62393327
## Baths 83.9844096 76.7529751 1.0942170 0.28389382
Key findings
For this question we use all homes and compare prices across states (CA, NY, NJ, PA) with a one-way ANOVA.
anova_mod <- aov(Price ~ State, data = home)
summary(anova_mod)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## CA NJ NY PA
## 535.3667 328.5333 365.3333 265.5667
Key findings
Using regression and ANOVA on the HomesForSale dataset, we find that:
This analysis highlights how both home characteristics (especially size) and location (state) play important roles in determining home prices.