Introduction

This report presents an analysis of the “HomesForSale” dataset obtained from Lock5 Datasets, Third Edition. This dataset provides information on homes listed for sale in several states, including the state the home is located in, the home’s price, its size (in square feet), number of beds, and number of baths.

The objective of this analysis is to investigate how various factors relate to home prices. Specifically, we will examine four questions related to California homes, and then one question comparing differences between California, New York, New Jersey, and Pennsylvania homes.

For California homes:

  1. How does the size of a home (in square feet) influence its price in California?
  2. How does the number of beds influence the price of a home in California?
  3. How does the number of baths influence the price of a home in California?
  4. How do the size, number of beds, and number of baths jointly influence a home’s price in California?

For California, New York, New Jersey, and Pennsylvania homes:

  1. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help determine if the state in which a home is located has a significant impact on its price.

By addressing these questions, we aim to determine the importance of home size, number of bedrooms, number of bathrooms, and the state in which the home is located to explain variations in home prices.

Data

According to the Lock5 Data Guide, Third Edition, the “HomesForSale” dataset contains 30 observations from www.zillow.com from 2019 with the following variables:

These five variables will be analyzed for patterns.

## 'data.frame':    120 obs. of  5 variables:
##  $ State: chr  "CA" "CA" "CA" "CA" ...
##  $ Price: int  533 610 899 929 210 268 1095 699 729 700 ...
##  $ Size : int  1589 2008 2380 1868 1360 2131 2436 1375 2013 1371 ...
##  $ Beds : int  3 3 5 3 2 3 3 2 3 3 ...
##  $ Baths: num  2.5 2 3 3 2 2 2 1 4 2 ...
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Methodology

All analyses were conducted in R using Posit Cloud. We will use regression and ANOVA to do our analysis.

The threshold of statistical significance was set at a p-value of or below 0.05. Any p-values above this threshold were considered statistically insignificant.

Analysis

Question 1: Influence of Size on Price of Homes in California

Research Question: How much does the size of a home influence its price for homes located in California?

Approach: Fit a simple linear regression with Price as the response and Size as the predictor, using only California homes.

## 
## Call:
## lm(formula = Price ~ Size, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
## `geom_smooth()` using formula = 'y ~ x'

The resulting slope from the regression model is 0.33919, indicating an average increase in price as the size of the home increases. The p-value of 0.0004634 is below our threshold of 0.05, showing statistical significance.

This regression model shows that the price of a home can be expected to increase by about 0.33919 thousands of dollars per additional square foot.

Question 2: Influence of Bedrooms on Price of Homes in California

Research Question: How does the number of beds influence the price of a home in California?

Approach: Fit a simple linear regression model:

  • Response Variable: Price
  • Predictor: Beds
## 
## Call:
## lm(formula = Price ~ Beds, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
## `geom_smooth()` using formula = 'y ~ x'

The slope estimate for Beds is approximately 84.77, suggesting that for each additional bed, the home price is expected to increase by about 84.77 thousand dollars on average. However, the p-value is 0.255, which is above our significance threshold of 0.05, which means that this result is not statistically significant.

This regression model shows a positive association between the number of beds and price of homes in California, but the results are not statistically significant.

Question 3: Influence of Bathrooms on Price of Homes in California

Research Question: How does the number of baths influence the price of a home in California?

Approach: Fit a simple linear regression model:

  • Response Variable: Price
  • Predictor: Baths
## 
## Call:
## lm(formula = Price ~ Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
## `geom_smooth()` using formula = 'y ~ x'

The slope for Baths is approximately 194.74, showing that for each additional bathroom, on average, the price of a home in California is expected to increase by about 194.74 thousand dollars. The p-value of 0.00409 is well below the threshold of 0.05, indicating that this is statistically significant.

This regression model provides evidence that homes with more baths tend to have higher prices in California.

Question 4: Multiple Regression of Size, Beds, and Baths of Homes in California

Research Question: How do size, number of beds, and number of baths jointly influence home prices in California?

Approach: Fit a multiple linear regression model:

  • Response Variable: Price
  • Predictors: Size, Beds, Baths
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

In this multiple regression model, we consider the combined effects of Size, Beds, and Baths on Price.

The p-value for the overall model (F-statistic) is about 0.004353, which is below the threshold of 0.05. This shows that collectively, Size, Beds, and Baths have a statistically significant relationship with Price.

Size has a slope estimate of about 0.2811 with a p-value of 0.0259, which is below the threshold of 0.05. This suggests that even when controlling for Beds and Baths, Size remains a statistically significant predictor of Price. On average, each additional square foot increases the home price by about 0.2811 thousand dollars.

Beds and Baths are not statistically significant in this model. This could mean that their relationship with Price is explained by Size or that the dataset does not provide strong evidence of their independent effects once Size is taken into account.

Question 5: Differences in Home Prices Among States

Research Question: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

Approach: Use ANOVA to compare mean prices across states.

The null hypothesis is that all states have the same mean price. The alternative hypothesis is that at least one state’s mean price differs from the others.

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results have a p-value of 0.000148, which is far below the threshold of 0.05, showing that we have statistically significant evidence to reject the null hypothesis and conclude that not all states have the same mean home price.

This model contributes evidence that there are significant differences in the average home prices between California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA), supporting the alternative hypothesis.

Summary

In this report, we conducted statistical analysis to address five research questions concerning the relationship between home characteristics and prices in California, as well as differences in home prices among four states.

  1. Size and Price (CA):
    The regression analysis of California homes found a statistically significant positive relationship between home size and price. On average, each additional square foot increased the price by about 0.339 thousand dollars.

  2. Beds and Price (CA):
    Although the regression suggested that each additional bedroom might increase price by about 84.77 thousand dollars, the result was not statistically significant. Thus, we cannot conclude that the number of beds independently influences home price in California.

  3. Baths and Price (CA):
    The analysis showed a statistically significant positive relationship between the number of bathrooms and price. Each additional bathroom was associated with an increase of about 194.74 thousand dollars in home price.

  4. Multiple Regression (CA):
    When jointly considering size, beds, and baths, the overall model was statistically significant. However, only size remained statistically significant when controlling for the other variables, indicating that the size of the home is the primary driver of price differences among these three predictors.

  5. State Differences in Price (CA, NY, NJ, PA):
    The ANOVA revealed significant differences in mean home prices among the four states. This suggests that the state in which a home is located has a significant impact on its price.

These results provide insights into how factors such as home size, number of bedrooms, number of bathrooms, and geographical location can affect home prices.

Conclusion

This analysis utilized regression models and ANOVA to explore how size, bedrooms, bathrooms, and state location relate to home prices. The results provide understanding of patterns in the housing market that influence prices, which can inform buyers, sellers, and market analysts.

Works Cited

Lock, Robin H., et al. “HomesForSale Dataset.” Lock5 Datasets, Third Edition, https://www.lock5stat.com/datapage3e.html. Accessed 8 Dec. 2024.

Lock, Robin H., et al. Lock5 Data Guide, Third Edition. Lock5Stat, https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf. Accessed 8 Dec. 2024.