title: “HomesForSale Regression Analysis”
author: “K.M Fazle Rabbi”
date: “2025-12-09”

1. Overview

This report uses the HomesForSale dataset to study how home prices are related to:

and to test whether prices differ across the four states (CA, NY, NJ, PA).

Questions 1–4 use only California homes. Question 5 uses all homes.

## # A tibble: 4 × 4
##   State     n mean_price sd_price
##   <chr> <int>      <dbl>    <dbl>
## 1 CA       30       535.     269.
## 2 NJ       30       328.     158 
## 3 NY       30       365.     318.
## 4 PA       30       266.     137.

2. Question 1 – Effect of Size on Price (California)

We first model Price as a function of Size for California homes.

mod1 <- lm(Price ~ Size, data = home_CA)
summary(mod1)
## 
## Call:
## lm(formula = Price ~ Size, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Key findings

3. Question 2 – Effect of Bedrooms on Price (California)

Now we regress Price on Beds (number of bedrooms) for California homes.

mod2 <- lm(Price ~ Beds, data = home_CA)
summary(mod2)
## 
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

Key findings

4. Question 3 – Effect of Bathrooms on Price (California)

Next we regress Price on Baths (number of bathrooms).

mod3 <- lm(Price ~ Baths, data = home_CA)
summary(mod3)
## 
## Call:
## lm(formula = Price ~ Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

Key findings

5. Question 4 – Joint Effect of Size, Bedrooms, and Bathrooms (California)

We now fit a multiple regression with all three predictors:

[ = _0 + _1 + _2 + _3 + .]

mod4 <- lm(Price ~ Size + Beds + Baths, data = home_CA)
summary(mod4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353
##                Estimate  Std. Error    t value   Pr(>|t|)
## (Intercept) -41.5608472 210.3809020 -0.1975505 0.84493313
## Size          0.2811026   0.1189322  2.3635534 0.02585657
## Beds        -33.7035665  67.9255137 -0.4961842 0.62393327
## Baths        83.9844096  76.7529751  1.0942170 0.28389382

Key findings

6. Question 5 – Do Home Prices Differ Among the Four States?

For this question we use all homes and compare prices across states (CA, NY, NJ, PA) with a one-way ANOVA.

anova_mod <- aov(Price ~ State, data = home)
summary(anova_mod)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##       CA       NJ       NY       PA 
## 535.3667 328.5333 365.3333 265.5667

Key findings

7. Conclusion

Using regression and ANOVA on the HomesForSale dataset, we find that:

This analysis highlights how both home characteristics (especially size) and location (state) play important roles in determining home prices.