STAT 353 Project #3 Factors influencing California Homes Sales 04.29.2025
Mike Jensen
Introduction In this report will explore the different influences of home features on price
the following ten questions will be used to help determine factors and causality s between features and house pricing.
questions
(1). How much does the size of a home influence its price?
(2). How does the number of bedrooms of a home influence its price?
(3). How does the number of bathrooms of a home influence its price?
(4). How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
(5). Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price.
Data
the following data set was compiled from 29 observations on 4 different data variables.
Analysis
(1). How much does the size of a home influence its price?
##
## Call:
## lm(formula = Size ~ Price, data = House)
##
## Residuals:
## Min 1Q Median 3Q Max
## -549.31 -346.31 14.74 258.57 796.39
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1178.6075 159.6556 7.382 4.87e-08 ***
## Price 1.0596 0.2673 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 387.5 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
the estimated slope is 1.0596 this indicates that for unit of size increase the price of the house increases by 1.0596 Sq units. this indicates as the sq footage of the house increases so does the cost per sq foot. this is counter to scales of economy where prices tend to low are quantity increase.houses tend to adhere to this rule except for in artificially inflated scenarios (rent control) or areas that can no longer expand and develop with very high desirability.
(2). How does the number of bedrooms of a home influence its price?
##
## Call:
## lm(formula = Beds ~ Price, data = House)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.22223 -0.23459 -0.13639 0.02536 1.95759
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.8424861 0.2790631 10.186 6.4e-11 ***
## Price 0.0005433 0.0004673 1.163 0.255
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6774 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
a P-value of 0.2548 would indicate that the relevant affect of bedroom numbers to house cost is low as it considered a high number this is also backed by the R Square numbers being low indicating there is a high variation and low predictability.this does not support the hypothesis of the test slope.
(3). How does the number of bathrooms of a home influence its price?
##
## Call:
## lm(formula = Baths ~ Price, data = House)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.50083 -0.36559 0.08877 0.22996 1.45930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.571745 0.253843 6.192 1.09e-06 ***
## Price 0.001329 0.000425 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6161 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
With a P-value Of 0.0040924 indicating that the effects of bathroom count is high on the cost of the house, even more so that effects of bedroom count in question 2.does support the hypothesis of the test slope.
(4). How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = House)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
the P-value of 0.004353 indicates a strong influence of the combination of size, bed, and bath factors on the overall price of the house.
(5). Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price.
##
## Call:
## lm(formula = Price ~ State, data = House_all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -390.37 -166.77 -47.05 89.48 884.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 535.37 42.55 12.583 < 2e-16 ***
## StateNJ -206.83 60.17 -3.438 0.000816 ***
## StateNY -170.03 60.17 -2.826 0.005553 **
## StatePA -269.80 60.17 -4.484 1.73e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 233 on 116 degrees of freedom
## Multiple R-squared: 0.1598, Adjusted R-squared: 0.1381
## F-statistic: 7.355 on 3 and 116 DF, p-value: 0.0001482
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = state_model)
##
## $State
## diff lwr upr p adj
## NJ-CA -206.83333 -363.6729 -49.99379 0.0044754
## NY-CA -170.03333 -326.8729 -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ 36.80000 -120.0395 193.63955 0.9282064
## PA-NJ -62.96667 -219.8062 93.87288 0.7224830
## PA-NY -99.76667 -256.6062 57.07288 0.3505951
using the histogram it gives a good indication that due to California being used as the base line against NY,NJ, and PA it shows there is an over estimation in pricing with regards to the other states by -200 in residuals ($200k). as for the P-value is is very near zero at 0.0001482 showing that the values are statistically significant.
Summery
with the information given and explored questions the studies indicates that California sees a sharp rise in cost vs sq footage which is inverse to most area. this is sported by the results for question 5 demonstrating a clear over shooting in prices estimated for New York, New Jersey, and Pennsylvania. all have dense housing and due to development of outline ares can no longer expand economically thus driving up prices, however California has out paced all or these other cites by approximately $200k (taken form the residual of -200). this indicates there are other factor at play that are not counted for possibly natural or artificial.
References # Appendix: Code Listing
House <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSaleCA.csv")
head(House)
# Size vs Price
lm_model <- lm(Size ~ Price, data = House)
summary(lm_model)
plot(House$Price, House$Size, main = "Home Size vs Price", xlab = "Price (thousands)", ylab = "Size (sq ft)", pch = 19, col = "steelblue")
abline(lm_model, col = "red", lwd = 2)
# Bedrooms vs Price
lm_model <- lm(Beds ~ Price, data = House)
summary(lm_model)
# Bathrooms vs Price
lm_model <- lm(Baths ~ Price, data = House)
summary(lm_model)
# Multiple regression
mlr_model <- lm(Price~Size + Beds + Baths, data = House)
summary(mlr_model)
# State analysis
House_all <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(House_all)
str(House_all)
state_model <- lm(Price ~ State, data = House_all)
summary(state_model)
anova(state_model)
# Tukey HSD test
posthoc_tukey <- TukeyHSD(aov(state_model))
print(posthoc_tukey)
# Residual analysis
residuals <- resid(state_model)
plot(fitted(state_model), residuals, main = "Residuals vs. Fitted", xlab = "Fitted Values", ylab = "Residuals")
abline(h = 0, col = "red")
hist(residuals, main = "Histogram of Residuals", xlab = "Residuals")