INTRODUCTION

Knowing what affects home prices is very important for buyers, sellers, and decision-makers in California, which has one of the most active and varied housing markets in the country. This chapter looks at houses for sale in California in 2019. It talks about important features like how big the houses are, how many bedrooms they have, and how many bathrooms they have. The study wants to find out how these things both alone and together affect home prices in the state. Also, we include a comparison of home prices in different states to better understand California’s housing market.

In this chapter, we will look at these questions:

1.How much does the size of a home influence its price in California?

2.How does the number of bedrooms in a home influence its price in California?

3.How does the number of bathrooms in a home influence its price in California?

4.How do the size, number of bedrooms, and number of bathrooms jointly influence the price of a home in California?

5.Are there significant differences in home prices among the four states (CA, NJ, NY, PA)?

ANALYSIS

This analysis examines factors influencing home prices, focusing on key variables such as size, number of bedrooms, and number of bathrooms in California. Additionally, it explores differences in home prices across states using statistical methods like regression and ANOVA to derive meaningful insights.

home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1. How much does the size of a home influence its price in California?

# Filter data for California
ca_data <- subset(home, State == "CA")

# Fit regression model: Price ~ Size
model_size <- lm(Price ~ Size, data = ca_data)

# Display model summary
summary(model_size)
## 
## Call:
## lm(formula = Price ~ Size, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634
# Scatter plot with regression line
plot(ca_data$Size, ca_data$Price,
     main = "Relationship Between Home Size and Price in California",
     xlab = "Size (1,000 sq. ft.)",
     ylab = "Price ($1,000)",
     pch = 16, col = "blue")

# Add the regression line
abline(model_size, col = "red", lwd = 2)

Q2.How does the number of bedrooms in a home influence its price in California?

# Fit regression model: Price ~ Beds
model_beds <- lm(Price ~ Beds, data = ca_data)

# Display model summary
summary(model_beds)
## 
## Call:
## lm(formula = Price ~ Beds, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548
# Scatter plot with jitter and regression line
plot(jitter(ca_data$Beds), ca_data$Price,
     main = "Relationship Between Bedrooms and Price in California",
     xlab = "Number of Bedrooms",
     ylab = "Price ($1,000)",
     pch = 16, col = "blue")

# Add the regression line
abline(model_beds, col = "red", lwd = 2)

Q3. How does the number of bathrooms in a home influence its price in California?

# Fit regression model: Price ~ Baths
model_baths <- lm(Price ~ Baths, data = ca_data)

# Display model summary
summary(model_baths)
## 
## Call:
## lm(formula = Price ~ Baths, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092
# Scatter plot with jitter and regression line
plot(jitter(ca_data$Baths), ca_data$Price,
     main = "Relationship Between Bathrooms and Price in California",
     xlab = "Number of Bathrooms",
     ylab = "Price ($1,000)",
     pch = 16, col = "blue")

# Add the regression line
abline(model_baths, col = "red", lwd = 2)

Q4. How do the size, number of bedrooms, and number of bathrooms jointly influence the price of a home in California?

# Load the car package
library(car)
## Loading required package: carData
# Fit multiple regression model: Price ~ Size + Beds + Baths
model_joint <- lm(Price ~ Size + Beds + Baths, data = ca_data)

# Display model summary
summary(model_joint)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = ca_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353
# Residual plot for the multiple regression model
plot(model_joint, which = 1, main = "Residuals vs Fitted Values")

# Partial regression plots
avPlots(model_joint, main = "Partial Regression Plots")

# Arrange plots in one row
par(mfrow = c(1, 3))

# Scatter plot for Size vs Price
plot(ca_data$Size, ca_data$Price,
     main = "Size vs Price",
     xlab = "Size (1,000 sq. ft.)",
     ylab = "Price ($1,000)",
     pch = 16, col = "blue")
abline(lm(Price ~ Size, data = ca_data), col = "red", lwd = 2)

# Scatter plot for Beds vs Price
plot(ca_data$Beds, ca_data$Price,
     main = "Beds vs Price",
     xlab = "Number of Bedrooms",
     ylab = "Price ($1,000)",
     pch = 16, col = "green")
abline(lm(Price ~ Beds, data = ca_data), col = "red", lwd = 2)

# Scatter plot for Baths vs Price
plot(ca_data$Baths, ca_data$Price,
     main = "Baths vs Price",
     xlab = "Number of Bathrooms",
     ylab = "Price ($1,000)",
     pch = 16, col = "purple")
abline(lm(Price ~ Baths, data = ca_data), col = "red", lwd = 2)

Q5. Are there significant differences in home prices among the four states (CA, NJ, NY, PA)?

# Fit an ANOVA model
model_anova <- aov(Price ~ State, data = home)

# Display ANOVA summary
summary(model_anova)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since,the p-value for the ANOVA test is less than 0.05, we reject the null hypothesis. This means that property values vary significantly among the four states (CA, NJ, NY, and PA).we proferm the post hoc tuckey’s HSD test.

# Tukey's Honest Significant Difference Test
TukeyHSD(model_anova)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Price ~ State, data = home)
## 
## $State
##             diff       lwr        upr     p adj
## NJ-CA -206.83333 -363.6729  -49.99379 0.0044754
## NY-CA -170.03333 -326.8729  -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ   36.80000 -120.0395  193.63955 0.9282064
## PA-NJ  -62.96667 -219.8062   93.87288 0.7224830
## PA-NY  -99.76667 -256.6062   57.07288 0.3505951
# Boxplot for home prices by state
boxplot(Price ~ State, data = home,
        main = "Home Prices by State",
        xlab = "State",
        ylab = "Price ($1,000)",
        col = c("lightblue", "lightgreen", "lightpink", "lightyellow"),
        border = "darkblue")

# Optional: Add a horizontal line at the overall mean
abline(h = mean(home$Price), col = "red", lwd = 2, lty = 2)

SUMMARY

This study looked at what affects home prices in California and compared them to prices in New Jersey, New York, and Pennsylvania. The study used statistical methods like regression and ANOVA to find important information.

Home Size Affects Price: In California, the size of a home is important because bigger homes usually cost more. The size of a home is an important factor in determining its value and helps to explain a lot of the differences in prices.

Bedrooms and Price: The number of bedrooms doesn’t seem to have a strong effect on prices in California. Having more bedrooms might seem like it would make a home worth more, but it doesn’t seem to be the main reason why home prices are high.

Bathrooms Affect Home Prices: The number of bathrooms can greatly impact how much a house costs in California. Extra bathrooms increase a home’s value because buyers care about having them, so they are important for setting the price.

The size of a house is the most important thing that affects home prices in California, even when you consider the number of bedrooms and bathrooms. The size of a home is the most important factor in its price, while the number of bedrooms and bathrooms doesn’t make much difference if you consider the size.

Home prices are very different in each state. California has much higher prices compared to New Jersey, New York, and Pennsylvania. Other states have some differences, but they are not important enough to matter. This shows that California has a special housing market with high prices.

CONCLUSION

Finally, the data shows that size and bathrooms are important determinants in influencing property costs in California, with nationwide comparisons confirming California as the most costly market of the four states. Bedrooms are less important, and the findings emphasize the necessity of putting space and bathrooms first when pricing and evaluating markets.

REFERENCE

https://www.lock5stat.com/datasets3e/HomesForSale.csv” Data collected from www.zillow.com in 2019.