1. Introduction

The price of a home is influenced by a combination of physical features and geographic factors. Understanding how these characteristics relate to housing prices provides valuable insight for buyers, sellers, and policymakers. In this project, I analyze the HomesForSale dataset, which contains information on 120 homes listed for sale in 2019 across four U.S. states: California (CA), New Jersey (NJ), New York (NY), and Pennsylvania (PA). The data were originally collected from Zillow and include key variables such as the asking price (in thousands of dollars), the size of the home (in thousands of square feet), and the number of bedrooms and bathrooms.

2. Analysis

Reading the data

Here I will explore the questions in details.

house= read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(house)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Reading the California data

california_data = subset(house, State == "CA")
head(california_data)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

3. Analysis

Q1. How much does the size of a home influence its price, with using the data only for California?

qn1 = lm(Price ~ Size, data = california_data)
summary(qn1)
## 
## Call:
## lm(formula = Price ~ Size, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
plot(california_data$Size, california_data$Price, main = " price vs size", xlab = "size", ylab = " price")
abline(qn1, col = "red", lwd=2)

A simple linear regression was fitted to examine how the size of a home affects its price in California. The model shows a statistically significant positive relationship between home size and price. The estimated slope for Size is 0.33919, with a p-value of 0.000463. This indicates that, on average, for every additional 1,000 square feet, the asking price increases by approximately $339,190. Because the p-value is well below 0.05, this relationship is statistically significant.

Q2. How does the number of bedrooms of a home influence its price, with using the data only for California?

qn1 = lm(Price ~ Beds, data = california_data)
summary(qn1)
## 
## Call:
## lm(formula = Price ~ Beds, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
plot(california_data$Beds, california_data$Price, main = " price vs NumBeds", xlab = "NumBeds", ylab = " price")
abline(qn1, col = "green", lwd=2)

Based on this regression model, there is no statistically significant evidence that the number of bedrooms alone has a meaningful impact on home prices in California. Bedrooms by themselves are not a strong predictor of price in this dataset.

Q3. How does the number of bathrooms of a home influence its price, with using the data only for California?

qn1 = lm(Price ~ Baths, data = california_data)
summary(qn1)
## 
## Call:
## lm(formula = Price ~ Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
plot(california_data$Baths, california_data$Price, main = " price vs NumBaths", xlab = "NumBaths", ylab = " price")
abline(qn1, col = "pink", lwd=2)

There is statistically significant evidence that the number of bathrooms influences home prices in California. Homes with more bathrooms tend to have higher asking prices, and bathrooms appear to be a stronger predictor of price than bedrooms in this dataset.

Q4. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price, with using the data only for California?

qn1 = lm(Price ~ Size + Beds + Baths, data = california_data)
summary(qn1)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

When considered together, home size is the only characteristic that significantly predicts price in California. Bedrooms and bathrooms do not have significant independent effects once the size of the home is taken into account.

Q5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

anova_model = aov(Price ~ State, data = house)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is far below the 0.05 significance level, we reject the null hypothesis that all states have the same average home price. This means that there are significant differences in mean home prices among the four states. The state in which a home is located has a significant impact on its price. At least one state has a mean home price that differs from the others.

4. Summary

This analysis found that in California, home size is the strongest and only significant predictor of price when all variables are considered together. Bedrooms showed no significant effect, and bathrooms were only significant in a simple regression. The multiple regression model explained about 39% of the variation in prices. Using all states, ANOVA showed that average home prices differ significantly among California, New York, New Jersey, and Pennsylvania, meaning location plays an important role in determining home value.

5. Reference

Lock, R., Lock, P., Lock, E., & Lock, K. (2019). HomesForSale data set. Lock5 Data. Retrieved from https://www.lock5stat.com/datapage3e.html