1. Introduction

This project examines housing prices using the HomesForSale data, focusing on how different home characteristics influence price and whether prices vary across states. The analysis explores the impact of size, number of bedrooms, and number of bathrooms on home prices in California through simple and multiple regression models.

We use the data from Statistics: Data collected from www.zillow.com in 2019.

I propose the following 5 questions based on my own understandng of the data.

  1. Use the data only for California. How much does the size of a home influence its price?

  2. Use the data only for California. How does the number of bedrooms of a home influence its price?

  3. Use the data only for California. How does the number of bathrooms of a home influence its price?

  4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

Analysis

We will explore the questions in detail

homes = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(homes)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1: Use the data only for California. How much does the size of a home influence its price?

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## 
## Call:
## lm(formula = Price ~ Size, data = homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Price = –56.82 + 0.33919·Size The slope is 0.33919, so the size of a home has a positive and statistically significant effect on price.

Q2 Use the data only for California. How does the number of bedrooms of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Beds, data = homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The p-value for the slope is 0.255, which is larger than 0.05. This means there is not sufficient evidence to conclude that the number of bedrooms has a statistically significant effect on home price in California. Homes with more bedrooms do not reliably differ in price from homes with fewer bedrooms.

Q3 Use the data only for California. How does the number of bathrooms of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Baths, data = homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The p-value for the slope is 0.00409, which is less than 0.05. We reject the hypothesis and conclude that the number of bathrooms has a statistically significant effect on home price in California. This means that the number of bathrooms is associated with higher home prices.

Q4 Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Larger homes tend to have higher prices as the estimated slope (0.2811) means that each additional 1,000 square feet is associated with about a $281,100 increase in price, and the p-value is below 0.05.

Q5 Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results show a significant difference in average home prices of the four states. The p-value for the effect of state is 0.000148, which is below the 0.05 significance level. This indicates evidence against the null hypothesis that all four states have the same mean home price. It concludes that home prices differ significantly by state. The analysis shows that at least one state’s average home price is different from the others.

Summary

Using only the California data, home size has a strong and statistically significant effect on price larger homes tend to cost more. The number of bedrooms does not significantly predict home price, meaning price does not reliably increase with additional bedrooms. The number of bathrooms, shows a significant effect on price, indicating that homes with more bathrooms tend to be more expensive. When size, bedrooms, and bathrooms are included together in a multiple regression model, size remains the only significant, while bedrooms and bathrooms no longer show significant effects once size is accounted for. Finally, when comparing all four states, the ANOVA results reveal a highly significant difference in mean home prices among states, meaning at least one state’s average home price differs significantly from the others.