This report analyzes the factors influencing home prices in California and compares home prices across four states: California, New York, New Jersey, and Pennsylvania. The data comes from the “HomesForSale” dataset, sourced from Lock5Stat (https://www.lock5stat.com/datapage3e.html). This dataset consists of 120 observations across 5 variables.
The following questions that will be analyzed in this report are given to us in the directions for project #3:
Use the data only for California. How much does the size of a home influence its price?
Use the data only for California. How does the number of bedrooms of a home influence its price?
Use the data only for California. How does the number of bathrooms of a home influence its price?
Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
By sequentially analyzing these questions, this report aims to reveal the factors that significantly influence home prices in California and identify price differences among states, highlighting both significant and non-significant relationships through analysis.
Lock5’s Dataset Documentation for the third edition of “Statistics: Unlocking the Power of Data, 3rd Edition” gives the following information for the dataset:
“A data frame with 120 observations on the following 5 variables.
State - Location of the home (CA, NJ, NY, or PA)
Price - Asking price (in $1,000’s)
Size - Area of all rooms (in sq. ft.)
Beds - Number of bedrooms
Baths - Number of bathrooms”
We are also given this information: “Data for samples of homes for sale in each state, selected from zillow.com. Source: Data collected from www.zillow.com in 2019.”
##
## Call:
## lm(formula = Price ~ Size, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
The regression model reveals a statistically significant relationship between the size of a home and its price in California (p = 0.0004634). The slope coefficient (β = 0.33919) suggests that for every additional square foot of home size, the price increases by approximately $339.19.The residual Standard Error (219.3) indicates that the average difference between observed and predicted values (values from the regression line) is $219,300. The model explains 35.94% of the variance in home prices (R-squared = 0.3594).
##
## Call:
## lm(formula = Price ~ Beds, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The regression model shows no statistically significant relationship between the number of bedrooms and home prices in California (p = 0.2548). The slope coefficient (β = 84.77) suggests that the price increases by approximately $84,770 per additional bedroom, although this relationship is not statistically significant. The residual standard error (267.6) indicates that the average difference between observed and predicted values is $267,600. The model explains only 4.61% of the variance in home prices (R-squared = 0.0461).
##
## Call:
## lm(formula = Price ~ Baths, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression model reveals a statistically significant relationship between the number of bathrooms and home prices in California (p = 0.004092). The slope coefficient (β = 194.74) suggests that each additional bathroom is associated with a $194,740 increase in price. The residual standard error (235.8) indicates that the average difference between observed and predicted values is $235,800. The model explains 25.88% of the variance in home prices (R-squared = 0.2588).
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
The multiple regression model reveals that home size significantly influences home prices in California (p = 0.0259), with an estimated increase of $281.10 per additional square foot. However, the number of bedrooms (p = 0.6239) and bathrooms (p = 0.2839) do not have statistically significant effects. The residual standard error (221.8) indicates that the average difference between the data and predicted values is $221,800. The model explains 39.12% of the variance in home prices (R-squared = 0.3912).
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Price ~ State, data = home)
##
## $State
## diff lwr upr p adj
## NJ-CA -206.83333 -363.6729 -49.99379 0.0044754
## NY-CA -170.03333 -326.8729 -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ 36.80000 -120.0395 193.63955 0.9282064
## PA-NJ -62.96667 -219.8062 93.87288 0.7224830
## PA-NY -99.76667 -256.6062 57.07288 0.3505951
The ANOVA model reveals statistically significant differences in home prices among the four states (F = 7.355, p = 0.000148). Tukey’s HSD post-hoc test shows that homes in California are significantly more expensive than those in New Jersey (diff = -206.83, p = 0.004), New York (diff = -170.03, p = 0.028), and Pennsylvania (diff = -269.80, p = 0.0001). However, New Jersey, New York, and Pennsylvania have no significant differences when compared with each other.
This report analyzes the influence that various factors have on home prices in California. These factors are size, number of bedrooms, and number of bathrooms. Both size and number of bathrooms have a statistically significant relationship with home price. Although, the number of bedrooms does not have a statistically significant relationship with home price.
Based on the regression lines for each of the models, there is a significant increase of $339.19 per additional square foot, a significant increase of $194,740 per additional bathroom, and a non-significant increase of $84,770 per additional bedroom.
When pairing these factors together in a multiple regression model, home size significantly influences home prices in California by $281.10 per additional square foot. However, the number of bedrooms and bathrooms do not lead to a significant influence on home prices.
Exploring price differences among homes in California, New York, New Jersey, and Pennsylvania, the analysis reveals various findings. The ANOVA model reveals a statistically significant difference in home prices among the four states. Taking this further, I used Tukey’s HSD post-hoc test, which lead to the finding that home prices in California are significantly more expensive than those in New York, New Jersey, and Pennsylvania. When comparing New York, New Jersey, and Pennsylvania with one another, no significant difference can be found.
From the analyses, home size and number of bathrooms have a statistically significant relationship with home price, while the number of bedrooms does not. Tying these three factors together yields only home size having a statistically significant relationship with home price. When comparing California home prices with other states, such as New York, New Jersey, and Pennsylvania, houses in California are significantly more expensive while the other three states have no significant differences among one another.
The data used for this project comes from Lock5’s Statistics: Unlocking the Power of Data, 3rd Edition. The spreadsheet can be found on https://www.lock5stat.com/datapage3e.html under the dataname “HomesForSale”. The information about this spreadsheet can be found on pg. 43 of Lock5’s Dataset Documentation for the third edition of “Statistics: Unlocking the Power of Data, 3rd Edition”. (https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf)
The Dataset Documentation PDF sources the following for the “HomesForSale” dataset: “Data collected from www.zillow.com in 2019.”