This project examines housing prices using the HomesForSale data, focusing on how different home characteristics influence price and whether prices vary across states. The analysis explores the impact of size, number of bedrooms, and number of bathrooms on home prices in California through simple and multiple regression models.
We use the data from Statistics: Data collected from www.zillow.com in 2019.
I propose the following 5 questions based on my own understandng of the data.
Use the data only for California. How much does the size of a home influence its price?
Use the data only for California. How does the number of bedrooms of a home influence its price?
Use the data only for California. How does the number of bathrooms of a home influence its price?
Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.
We will explore the questions in detail
homes = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(homes)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Call:
## lm(formula = Price ~ Size, data = homes_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
Price = –56.82 + 0.33919·Size The slope is 0.33919, so the size of a home has a positive and statistically significant effect on price.
##
## Call:
## lm(formula = Price ~ Beds, data = homes_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The p-value for the slope is 0.255, which is larger than 0.05. This means there is not sufficient evidence to conclude that the number of bedrooms has a statistically significant effect on home price in California. Homes with more bedrooms do not reliably differ in price from homes with fewer bedrooms.
##
## Call:
## lm(formula = Price ~ Baths, data = homes_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The p-value for the slope is 0.00409, which is less than 0.05. We reject the hypothesis and conclude that the number of bathrooms has a statistically significant effect on home price in California. This means that the number of bathrooms is associated with higher home prices.
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = homes_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
Larger homes tend to have higher prices as the estimated slope (0.2811) means that each additional 1,000 square feet is associated with about a $281,100 increase in price, and the p-value is below 0.05.
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results show a significant difference in average home prices of the four states. The p-value for the effect of state is 0.000148, which is below the 0.05 significance level. This indicates evidence against the null hypothesis that all four states have the same mean home price. It concludes that home prices differ significantly by state. The analysis shows that at least one state’s average home price is different from the others.
Using only the California data, home size has a strong and statistically significant effect on price larger homes tend to cost more. The number of bedrooms does not significantly predict home price, meaning price does not reliably increase with additional bedrooms. The number of bathrooms, shows a significant effect on price, indicating that homes with more bathrooms tend to be more expensive. When size, bedrooms, and bathrooms are included together in a multiple regression model, size remains the only significant, while bedrooms and bathrooms no longer show significant effects once size is accounted for. Finally, when comparing all four states, the ANOVA results reveal a highly significant difference in mean home prices among states, meaning at least one state’s average home price differs significantly from the others.