This analysis seeks to understand how home characteristics (size, bedrooms, bathrooms) affect pricing in California and to evaluate price differences across four states.
The objectives of this analysis are to understand the individual and joint contributions of home characteristics to pricing in California and to evaluate differences in home prices across four states (CA, NY, NJ, PA). By addressing these questions, this study seeks to provide a deeper understanding of the real estate market and the factors driving price variations.
Using R for statistical analysis and visualization, we aim to extract meaningful insights from the dataset. The findings from this study will help contextualize the influence of home attributes and location on pricing, offering valuable information for potential buyers, sellers, and policymakers.
The following research questions guide this report:
In this section, we will delve into the proposed questions in detail, focusing on the “HomesForSale” dataset to analyze factors influencing home prices. We will conduct a thorough investigation of the dataset, exploring how home size, the number of bedrooms, and the number of bathrooms affect prices in California, as well as examining price differences across states.
By applying regression models, ANOVA, and visualizations, we aim to derive meaningful insights into the relationships between home characteristics, location, and pricing. These statistical methods will enable us to interpret the data effectively and draw accurate conclusions that address our research questions.
home = read.csv("HomesForSale.csv")
head(home)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
# Subset for California homes only
home_CA <- subset(home, State == "CA")
# Fit a linear regression model
model_size <- lm(Price ~ Size, data = home_CA)
# Summary of the model
summary(model_size)
##
## Call:
## lm(formula = Price ~ Size, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
# Scatterplot with regression line
plot(home_CA$Size, home_CA$Price,
main = "Relationship Between Home Size and Price in California",
xlab = "Home Size (sq. ft.)",
ylab = "Price ($)",
pch = 16,
col = "blue")
# Add regression line
abline(model_size, col = "red", lwd = 2)
The scatterplot above illustrates the relationship between home size (in square feet) and price (in dollars) for homes in California, with a red line representing the fitted regression model.
Insights:
# Fit the regression model for Price vs. Number of Bedrooms
model_bedrooms <- lm(Price ~ Beds, data = home_CA)
# Summary of the model
summary(model_bedrooms)
##
## Call:
## lm(formula = Price ~ Beds, data = home_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
# Plotting the relationship between Bedrooms and Price
plot(home_CA$Beds, home_CA$Price, main="Home Price vs. Number of Bedrooms",
xlab="Number of Bedrooms", ylab="Price ($1,000)", pch=19, col="blue")
abline(model_bedrooms, col="red")
The graph above illustrates the relationship between the number of bedrooms and home prices, showing the observed data points along with the fitted regression line.
Insights:
# Subset the data for California homes
home_CA_baths <- home[home$State == "CA", ]
# Fit the regression model
model_baths <- lm(Price ~ Baths, data = home_CA_baths)
# View the summary of the model
summary(model_baths)
##
## Call:
## lm(formula = Price ~ Baths, data = home_CA_baths)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
# Plot the relationship between number of bathrooms and price with blue color for both plot and line
plot(home_CA_baths$Baths, home_CA_baths$Price,
xlab = "Number of Bathrooms", ylab = "Price (in $1,000)",
main = "Number of Bathrooms vs. Home Price", col = "blue", pch = 16)
abline(model_baths, col = "red")
The scatter plot shows the relationship between the number of bathrooms
and home price for homes in California, with a red regression line
indicating the trend. The plot provides a visual representation of how
the number of bathrooms relates to home prices in the dataset.
Insights: - The Estimate (194.74) for the slope coefficient indicates that for each additional bathroom, the home price increases by approximately $194,740. - The Residual standard error (235.8) represents the average difference between the observed home prices and the predicted prices, which suggests a moderate level of prediction error. - The R-squared value (0.2588) indicates that 25.88% of the variation in home prices is explained by the number of bathrooms, suggesting a more significant relationship than in previous questions. - The p-value (0.00409) for the slope coefficient indicates that the relationship between the number of bathrooms and home price is statistically significant at the 0.01 significance level, meaning the number of bathrooms is a meaningful predictor of home prices.
# Subset the data for California homes
home_CA_multiple <- home[home$State == "CA", ]
# Fit the multiple regression model
model_multiple <- lm(Price ~ Size + Beds + Baths, data = home_CA_multiple)
# View the summary of the model
summary(model_multiple)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = home_CA_multiple)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
# Subset the data for the four states
home_states <- home[home$State %in% c("CA", "NY", "NJ", "PA"), ]
# Perform ANOVA to compare home prices across states
anova_model <- aov(Price ~ State, data = home_states)
# View the summary of the ANOVA model
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Boxplot to visualize the differences in home prices across states
boxplot(Price ~ State, data = home_states,
main = "Home Prices by State",
xlab = "State", ylab = "Price (in $1,000)",
col = c("lightblue", "lightgreen", "lightpink", "lightyellow"))
The boxplot illustrates the distribution of home prices across the
states of California, New York, New Jersey, and Pennsylvania, showing
the median, interquartile range, and potential outliers for each
state.
This analysis explored factors influencing home prices in California and compared these factors across four states: California (CA), New York (NY), New Jersey (NJ), and Pennsylvania (PA). By applying linear regression and ANOVA techniques, we examined how key home attributes—such as size, number of bedrooms, and number of bathrooms—impact home prices. The findings are summarized as follows:
Home Size and Price in California: A significant positive relationship was found between home size and price in California. For every additional square foot of home size, the price increases by approximately $0.34. This relationship was statistically significant with an R-squared value of 35.94%, indicating that home size explains about 36% of the variation in home prices.
Number of Bedrooms and Price in California: The relationship between the number of bedrooms and price was weak and not statistically significant. For every additional bedroom, the price increases by approximately $84,770, but the low R-squared value (4.61%) and the high p-value (0.255) suggest that the number of bedrooms does not significantly explain the variation in home prices.
Number of Bathrooms and Price in California: A stronger relationship was found between the number of bathrooms and home price. For every additional bathroom, the price increases by approximately $194,740. This relationship was statistically significant (p-value = 0.004), with an R-squared value of 25.88%, indicating that the number of bathrooms is a meaningful predictor of home prices.
Joint Impact of Size, Bedrooms, and Bathrooms on Price: When considering the joint effect of home size, number of bedrooms, and number of bathrooms, home size had a statistically significant positive impact on price, with a coefficient of $281.1 for every 1,000 square feet. However, the number of bedrooms and bathrooms did not show statistically significant effects in this multiple regression model, indicating that these factors, when considered together, may not provide additional explanatory power.
Price Differences Across States: ANOVA testing revealed significant differences in home prices across the four states of CA, NY, NJ, and PA. Boxplot visualizations further confirmed these differences, highlighting the variation in home prices among the states.
Conclusion: This report highlights the significant influence of home size and the number of bathrooms on home prices in California. While the number of bedrooms was not found to have a strong impact, it is still an important factor to consider. Additionally, significant differences in home prices were observed between California and other states, suggesting that location plays a crucial role in determining home prices. These insights can assist home buyers, sellers, and real estate professionals in making informed decisions based on home attributes and location.
Lock5Stat. (n.d.). HomesForSale dataset. Retrieved from https://www.lock5stat.com/datapage3e.html
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Retrieved from https://www.r-project.org.