Knowing what affects home prices is very important for buyers, sellers, and decision-makers in California, which has one of the most active and varied housing markets in the country. This chapter looks at houses for sale in California in 2019. It talks about important features like how big the houses are, how many bedrooms they have, and how many bathrooms they have. The study wants to find out how these things both alone and together affect home prices in the state. Also, we include a comparison of home prices in different states to better understand California’s housing market.
In this chapter, we will look at these questions:
1.How much does the size of a home influence its price in California?
2.How does the number of bedrooms in a home influence its price in California?
3.How does the number of bathrooms in a home influence its price in California?
4.How do the size, number of bedrooms, and number of bathrooms jointly influence the price of a home in California?
5.Are there significant differences in home prices among the four states (CA, NJ, NY, PA)?
This analysis examines factors influencing home prices, focusing on key variables such as size, number of bedrooms, and number of bathrooms in California. Additionally, it explores differences in home prices across states using statistical methods like regression and ANOVA to derive meaningful insights.
home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
# Filter data for California
ca_data <- subset(home, State == "CA")
# Fit regression model: Price ~ Size
model_size <- lm(Price ~ Size, data = ca_data)
# Display model summary
summary(model_size)
##
## Call:
## lm(formula = Price ~ Size, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
# Scatter plot with regression line
plot(ca_data$Size, ca_data$Price,
main = "Relationship Between Home Size and Price in California",
xlab = "Size (1,000 sq. ft.)",
ylab = "Price ($1,000)",
pch = 16, col = "blue")
# Add the regression line
abline(model_size, col = "red", lwd = 2)
# Fit regression model: Price ~ Beds
model_beds <- lm(Price ~ Beds, data = ca_data)
# Display model summary
summary(model_beds)
##
## Call:
## lm(formula = Price ~ Beds, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
# Scatter plot with jitter and regression line
plot(jitter(ca_data$Beds), ca_data$Price,
main = "Relationship Between Bedrooms and Price in California",
xlab = "Number of Bedrooms",
ylab = "Price ($1,000)",
pch = 16, col = "blue")
# Add the regression line
abline(model_beds, col = "red", lwd = 2)
# Fit regression model: Price ~ Baths
model_baths <- lm(Price ~ Baths, data = ca_data)
# Display model summary
summary(model_baths)
##
## Call:
## lm(formula = Price ~ Baths, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
# Scatter plot with jitter and regression line
plot(jitter(ca_data$Baths), ca_data$Price,
main = "Relationship Between Bathrooms and Price in California",
xlab = "Number of Bathrooms",
ylab = "Price ($1,000)",
pch = 16, col = "blue")
# Add the regression line
abline(model_baths, col = "red", lwd = 2)
# Load the car package
library(car)
## Loading required package: carData
# Fit multiple regression model: Price ~ Size + Beds + Baths
model_joint <- lm(Price ~ Size + Beds + Baths, data = ca_data)
# Display model summary
summary(model_joint)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
# Residual plot for the multiple regression model
plot(model_joint, which = 1, main = "Residuals vs Fitted Values")
# Partial regression plots
avPlots(model_joint, main = "Partial Regression Plots")
# Arrange plots in one row
par(mfrow = c(1, 3))
# Scatter plot for Size vs Price
plot(ca_data$Size, ca_data$Price,
main = "Size vs Price",
xlab = "Size (1,000 sq. ft.)",
ylab = "Price ($1,000)",
pch = 16, col = "blue")
abline(lm(Price ~ Size, data = ca_data), col = "red", lwd = 2)
# Scatter plot for Beds vs Price
plot(ca_data$Beds, ca_data$Price,
main = "Beds vs Price",
xlab = "Number of Bedrooms",
ylab = "Price ($1,000)",
pch = 16, col = "green")
abline(lm(Price ~ Beds, data = ca_data), col = "red", lwd = 2)
# Scatter plot for Baths vs Price
plot(ca_data$Baths, ca_data$Price,
main = "Baths vs Price",
xlab = "Number of Bathrooms",
ylab = "Price ($1,000)",
pch = 16, col = "purple")
abline(lm(Price ~ Baths, data = ca_data), col = "red", lwd = 2)
# Fit an ANOVA model
model_anova <- aov(Price ~ State, data = home)
# Display ANOVA summary
summary(model_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since,the p-value for the ANOVA test is less than 0.05, we reject the null hypothesis. This means that property values vary significantly among the four states (CA, NJ, NY, and PA).we proferm the post hoc tuckey’s HSD test.
# Tukey's Honest Significant Difference Test
TukeyHSD(model_anova)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Price ~ State, data = home)
##
## $State
## diff lwr upr p adj
## NJ-CA -206.83333 -363.6729 -49.99379 0.0044754
## NY-CA -170.03333 -326.8729 -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ 36.80000 -120.0395 193.63955 0.9282064
## PA-NJ -62.96667 -219.8062 93.87288 0.7224830
## PA-NY -99.76667 -256.6062 57.07288 0.3505951
# Boxplot for home prices by state
boxplot(Price ~ State, data = home,
main = "Home Prices by State",
xlab = "State",
ylab = "Price ($1,000)",
col = c("lightblue", "lightgreen", "lightpink", "lightyellow"),
border = "darkblue")
# Optional: Add a horizontal line at the overall mean
abline(h = mean(home$Price), col = "red", lwd = 2, lty = 2)
This study looked at what affects home prices in California and compared them to prices in New Jersey, New York, and Pennsylvania. The study used statistical methods like regression and ANOVA to find important information.
Home Size Affects Price: In California, the size of a home is important because bigger homes usually cost more. The size of a home is an important factor in determining its value and helps to explain a lot of the differences in prices.
Bedrooms and Price: The number of bedrooms doesn’t seem to have a strong effect on prices in California. Having more bedrooms might seem like it would make a home worth more, but it doesn’t seem to be the main reason why home prices are high.
Bathrooms Affect Home Prices: The number of bathrooms can greatly impact how much a house costs in California. Extra bathrooms increase a home’s value because buyers care about having them, so they are important for setting the price.
The size of a house is the most important thing that affects home prices in California, even when you consider the number of bedrooms and bathrooms. The size of a home is the most important factor in its price, while the number of bedrooms and bathrooms doesn’t make much difference if you consider the size.
Home prices are very different in each state. California has much higher prices compared to New Jersey, New York, and Pennsylvania. Other states have some differences, but they are not important enough to matter. This shows that California has a special housing market with high prices.
Finally, the data shows that size and bathrooms are important determinants in influencing property costs in California, with nationwide comparisons confirming California as the most costly market of the four states. Bedrooms are less important, and the findings emphasize the necessity of putting space and bathrooms first when pricing and evaluating markets.
“https://www.lock5stat.com/datasets3e/HomesForSale.csv” Data collected from www.zillow.com in 2019.