Introduction

This report analyzes the factors influencing home prices in California and compares home prices across four states: California, New York, New Jersey, and Pennsylvania. The data comes from the “HomesForSale” dataset, sourced from Lock5Stat (https://www.lock5stat.com/datapage3e.html). This dataset consists of 120 observations across 5 variables.

Questions

The following questions that will be analyzed in this report are given to us in the directions for project #3:

  1. Use the data only for California. How much does the size of a home influence its price?

  2. Use the data only for California. How does the number of bedrooms of a home influence its price?

  3. Use the data only for California. How does the number of bathrooms of a home influence its price?

  4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

By sequentially analyzing these questions, this report aims to reveal the factors that significantly influence home prices in California and identify price differences among states, highlighting both significant and non-significant relationships through analysis.

Data

Lock5’s Dataset Documentation for the third edition of “Statistics: Unlocking the Power of Data, 3rd Edition” gives the following information for the dataset:

“A data frame with 120 observations on the following 5 variables.

State - Location of the home (CA, NJ, NY, or PA)

Price - Asking price (in $1,000’s)

Size - Area of all rooms (in sq. ft.)

Beds - Number of bedrooms

Baths - Number of bathrooms”

We are also given this information: “Data for samples of homes for sale in each state, selected from zillow.com. Source: Data collected from www.zillow.com in 2019.”

Analysis

#1: Use the data only for California. How much does the size of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Size, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

The regression model reveals a statistically significant relationship between the size of a home and its price in California (p = 0.0004634). The slope coefficient (β = 0.33919) suggests that for every additional square foot of home size, the price increases by approximately $339.19.The residual Standard Error (219.3) indicates that the average difference between observed and predicted values (values from the regression line) is $219,300. The model explains 35.94% of the variance in home prices (R-squared = 0.3594).

#2: Use the data only for California. How does the number of bedrooms of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Beds, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The regression model shows no statistically significant relationship between the number of bedrooms and home prices in California (p = 0.2548). The slope coefficient (β = 84.77) suggests that the price increases by approximately $84,770 per additional bedroom, although this relationship is not statistically significant. The residual standard error (267.6) indicates that the average difference between observed and predicted values is $267,600. The model explains only 4.61% of the variance in home prices (R-squared = 0.0461).

#3: Use the data only for California. How does the number of bathrooms of a home influence its price?

## 
## Call:
## lm(formula = Price ~ Baths, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression model reveals a statistically significant relationship between the number of bathrooms and home prices in California (p = 0.004092). The slope coefficient (β = 194.74) suggests that each additional bathroom is associated with a $194,740 increase in price. The residual standard error (235.8) indicates that the average difference between observed and predicted values is $235,800. The model explains 25.88% of the variance in home prices (R-squared = 0.2588).

#4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

The multiple regression model reveals that home size significantly influences home prices in California (p = 0.0259), with an estimated increase of $281.10 per additional square foot. However, the number of bedrooms (p = 0.6239) and bathrooms (p = 0.2839) do not have statistically significant effects. The residual standard error (221.8) indicates that the average difference between the data and predicted values is $221,800. The model explains 39.12% of the variance in home prices (R-squared = 0.3912).

#5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Price ~ State, data = home)
## 
## $State
##             diff       lwr        upr     p adj
## NJ-CA -206.83333 -363.6729  -49.99379 0.0044754
## NY-CA -170.03333 -326.8729  -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ   36.80000 -120.0395  193.63955 0.9282064
## PA-NJ  -62.96667 -219.8062   93.87288 0.7224830
## PA-NY  -99.76667 -256.6062   57.07288 0.3505951

The ANOVA model reveals statistically significant differences in home prices among the four states (F = 7.355, p = 0.000148). Tukey’s HSD post-hoc test shows that homes in California are significantly more expensive than those in New Jersey (diff = -206.83, p = 0.004), New York (diff = -170.03, p = 0.028), and Pennsylvania (diff = -269.80, p = 0.0001). However, New Jersey, New York, and Pennsylvania have no significant differences when compared with each other.

Summary

This report analyzes the influence that various factors have on home prices in California. These factors are size, number of bedrooms, and number of bathrooms. Both size and number of bathrooms have a statistically significant relationship with home price. Although, the number of bedrooms does not have a statistically significant relationship with home price.

Based on the regression lines for each of the models, there is a significant increase of $339.19 per additional square foot, a significant increase of $194,740 per additional bathroom, and a non-significant increase of $84,770 per additional bedroom.

When pairing these factors together in a multiple regression model, home size significantly influences home prices in California by $281.10 per additional square foot. However, the number of bedrooms and bathrooms do not lead to a significant influence on home prices.

Exploring price differences among homes in California, New York, New Jersey, and Pennsylvania, the analysis reveals various findings. The ANOVA model reveals a statistically significant difference in home prices among the four states. Taking this further, I used Tukey’s HSD post-hoc test, which lead to the finding that home prices in California are significantly more expensive than those in New York, New Jersey, and Pennsylvania. When comparing New York, New Jersey, and Pennsylvania with one another, no significant difference can be found.

Conclusion

From the analyses, home size and number of bathrooms have a statistically significant relationship with home price, while the number of bedrooms does not. Tying these three factors together yields only home size having a statistically significant relationship with home price. When comparing California home prices with other states, such as New York, New Jersey, and Pennsylvania, houses in California are significantly more expensive while the other three states have no significant differences among one another.

References

The data used for this project comes from Lock5’s Statistics: Unlocking the Power of Data, 3rd Edition. The spreadsheet can be found on https://www.lock5stat.com/datapage3e.html under the dataname “HomesForSale”. The information about this spreadsheet can be found on pg. 43 of Lock5’s Dataset Documentation for the third edition of “Statistics: Unlocking the Power of Data, 3rd Edition”. (https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf)

The Dataset Documentation PDF sources the following for the “HomesForSale” dataset: “Data collected from www.zillow.com in 2019.”