The price of a home is influenced by a combination of physical features and geographic factors. Understanding how these characteristics relate to housing prices provides valuable insight for buyers, sellers, and policymakers. In this project, I analyze the HomesForSale dataset, which contains information on 120 homes listed for sale in 2019 across four U.S. states: California (CA), New Jersey (NJ), New York (NY), and Pennsylvania (PA). The data were originally collected from Zillow and include key variables such as the asking price (in thousands of dollars), the size of the home (in thousands of square feet), and the number of bedrooms and bathrooms.
Here I will explore the questions in details.
house= read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(house)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
california_data = subset(house, State == "CA")
head(california_data)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
qn1 = lm(Price ~ Size, data = california_data)
summary(qn1)
##
## Call:
## lm(formula = Price ~ Size, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
plot(california_data$Size, california_data$Price, main = " price vs size", xlab = "size", ylab = " price")
abline(qn1, col = "red", lwd=2)
A simple linear regression was fitted to examine how the size of a home affects its price in California. The model shows a statistically significant positive relationship between home size and price. The estimated slope for Size is 0.33919, with a p-value of 0.000463. This indicates that, on average, for every additional 1,000 square feet, the asking price increases by approximately $339,190. Because the p-value is well below 0.05, this relationship is statistically significant.
qn1 = lm(Price ~ Beds, data = california_data)
summary(qn1)
##
## Call:
## lm(formula = Price ~ Beds, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
plot(california_data$Beds, california_data$Price, main = " price vs NumBeds", xlab = "NumBeds", ylab = " price")
abline(qn1, col = "green", lwd=2)
Based on this regression model, there is no statistically significant evidence that the number of bedrooms alone has a meaningful impact on home prices in California. Bedrooms by themselves are not a strong predictor of price in this dataset.
qn1 = lm(Price ~ Baths, data = california_data)
summary(qn1)
##
## Call:
## lm(formula = Price ~ Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
plot(california_data$Baths, california_data$Price, main = " price vs NumBaths", xlab = "NumBaths", ylab = " price")
abline(qn1, col = "pink", lwd=2)
There is statistically significant evidence that the number of bathrooms influences home prices in California. Homes with more bathrooms tend to have higher asking prices, and bathrooms appear to be a stronger predictor of price than bedrooms in this dataset.
qn1 = lm(Price ~ Size + Beds + Baths, data = california_data)
summary(qn1)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
When considered together, home size is the only characteristic that significantly predicts price in California. Bedrooms and bathrooms do not have significant independent effects once the size of the home is taken into account.
anova_model = aov(Price ~ State, data = house)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value is far below the 0.05 significance level, we reject the null hypothesis that all states have the same average home price. This means that there are significant differences in mean home prices among the four states. The state in which a home is located has a significant impact on its price. At least one state has a mean home price that differs from the others.
This analysis found that in California, home size is the strongest and only significant predictor of price when all variables are considered together. Bedrooms showed no significant effect, and bathrooms were only significant in a simple regression. The multiple regression model explained about 39% of the variation in prices. Using all states, ANOVA showed that average home prices differ significantly among California, New York, New Jersey, and Pennsylvania, meaning location plays an important role in determining home value.
Lock, R., Lock, P., Lock, E., & Lock, K. (2019). HomesForSale data set. Lock5 Data. Retrieved from https://www.lock5stat.com/datapage3e.html