This report presents an analysis of the “HomesForSale” dataset obtained from Lock5 Datasets, Third Edition. This dataset provides information on homes listed for sale in several states, including the state the home is located in, the home’s price, its size (in square feet), number of beds, and number of baths.
The objective of this analysis is to investigate how various factors relate to home prices. Specifically, we will examine four questions related to California homes, and then one question comparing differences between California, New York, New Jersey, and Pennsylvania homes.
For California homes:
For California, New York, New Jersey, and Pennsylvania homes:
By addressing these questions, we aim to determine the importance of home size, number of bedrooms, number of bathrooms, and the state in which the home is located to explain variations in home prices.
According to the Lock5 Data Guide, Third Edition, the “HomesForSale” dataset contains 30 observations from www.zillow.com from 2019 with the following variables:
These five variables will be analyzed for patterns.
## 'data.frame': 120 obs. of 5 variables:
## $ State: chr "CA" "CA" "CA" "CA" ...
## $ Price: int 533 610 899 929 210 268 1095 699 729 700 ...
## $ Size : int 1589 2008 2380 1868 1360 2131 2436 1375 2013 1371 ...
## $ Beds : int 3 3 5 3 2 3 3 2 3 3 ...
## $ Baths: num 2.5 2 3 3 2 2 2 1 4 2 ...
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
All analyses were conducted in R using Posit Cloud. We will use regression and ANOVA to do our analysis.
Regression Models:
We will use lm() to fit linear models. For single predictor
models, we will interpret the slope and the corresponding p-value to
determine the significance of the relationship. For multiple regression,
we will examine the p-values of each predictor’s slope in the presence
of others.
ANOVA:
To determine whether state location influences home price, we will use
an ANOVA test with aov().
The threshold of statistical significance was set at a p-value of or below 0.05. Any p-values above this threshold were considered statistically insignificant.
Research Question: How much does the size of a home influence its price for homes located in California?
Approach: Fit a simple linear regression with
Price as the response and Size as the
predictor, using only California homes.
##
## Call:
## lm(formula = Price ~ Size, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
## `geom_smooth()` using formula = 'y ~ x'
The resulting slope from the regression model is 0.33919, indicating an average increase in price as the size of the home increases. The p-value of 0.0004634 is below our threshold of 0.05, showing statistical significance.
This regression model shows that the price of a home can be expected to increase by about 0.33919 thousands of dollars per additional square foot.
Research Question: How does the number of beds influence the price of a home in California?
Approach: Fit a simple linear regression model:
PriceBeds##
## Call:
## lm(formula = Price ~ Beds, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
## `geom_smooth()` using formula = 'y ~ x'
The slope estimate for Beds is approximately 84.77,
suggesting that for each additional bed, the home price is expected to
increase by about 84.77 thousand dollars on average. However, the
p-value is 0.255, which is above our significance threshold of 0.05,
which means that this result is not statistically significant.
This regression model shows a positive association between the number of beds and price of homes in California, but the results are not statistically significant.
Research Question: How does the number of baths influence the price of a home in California?
Approach: Fit a simple linear regression model:
PriceBaths##
## Call:
## lm(formula = Price ~ Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
## `geom_smooth()` using formula = 'y ~ x'
The slope for Baths is approximately 194.74, showing
that for each additional bathroom, on average, the price of a home in
California is expected to increase by about 194.74 thousand dollars. The
p-value of 0.00409 is well below the threshold of 0.05, indicating that
this is statistically significant.
This regression model provides evidence that homes with more baths tend to have higher prices in California.
Research Question: How do size, number of beds, and number of baths jointly influence home prices in California?
Approach: Fit a multiple linear regression model:
PriceSize, Beds,
Baths##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = california_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
In this multiple regression model, we consider the combined effects
of Size, Beds, and Baths on
Price.
The p-value for the overall model (F-statistic) is about 0.004353,
which is below the threshold of 0.05. This shows that collectively,
Size, Beds, and Baths have a
statistically significant relationship with Price.
Size has a slope estimate of about 0.2811 with a p-value
of 0.0259, which is below the threshold of 0.05. This suggests that even
when controlling for Beds and Baths,
Size remains a statistically significant predictor of
Price. On average, each additional square foot increases
the home price by about 0.2811 thousand dollars.
Beds and Baths are not statistically
significant in this model. This could mean that their relationship with
Price is explained by Size or that the dataset
does not provide strong evidence of their independent effects once
Size is taken into account.
Research Question: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?
Approach: Use ANOVA to compare mean prices across states.
The null hypothesis is that all states have the same mean price. The alternative hypothesis is that at least one state’s mean price differs from the others.
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA results have a p-value of 0.000148, which is far below the threshold of 0.05, showing that we have statistically significant evidence to reject the null hypothesis and conclude that not all states have the same mean home price.
This model contributes evidence that there are significant differences in the average home prices between California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA), supporting the alternative hypothesis.
In this report, we conducted statistical analysis to address five research questions concerning the relationship between home characteristics and prices in California, as well as differences in home prices among four states.
Size and Price (CA):
The regression analysis of California homes found a statistically
significant positive relationship between home size and price. On
average, each additional square foot increased the price by about 0.339
thousand dollars.
Beds and Price (CA):
Although the regression suggested that each additional bedroom might
increase price by about 84.77 thousand dollars, the result was not
statistically significant. Thus, we cannot conclude that the number of
beds independently influences home price in California.
Baths and Price (CA):
The analysis showed a statistically significant positive relationship
between the number of bathrooms and price. Each additional bathroom was
associated with an increase of about 194.74 thousand dollars in home
price.
Multiple Regression (CA):
When jointly considering size, beds, and baths, the overall model was
statistically significant. However, only size remained statistically
significant when controlling for the other variables, indicating that
the size of the home is the primary driver of price differences among
these three predictors.
State Differences in Price (CA, NY, NJ,
PA):
The ANOVA revealed significant differences in mean home prices among the
four states. This suggests that the state in which a home is located has
a significant impact on its price.
These results provide insights into how factors such as home size, number of bedrooms, number of bathrooms, and geographical location can affect home prices.
This analysis utilized regression models and ANOVA to explore how size, bedrooms, bathrooms, and state location relate to home prices. The results provide understanding of patterns in the housing market that influence prices, which can inform buyers, sellers, and market analysts.
Lock, Robin H., et al. “HomesForSale Dataset.” Lock5 Datasets, Third Edition, https://www.lock5stat.com/datapage3e.html. Accessed 8 Dec. 2024.
Lock, Robin H., et al. Lock5 Data Guide, Third Edition. Lock5Stat, https://www.lock5stat.com/datasets3e/Lock5DataGuide3e.pdf. Accessed 8 Dec. 2024.