Introduction

The “HomesForSale” dataset, from lock5stat, provides a variety of factors that influence the housing market. The factors include physical characteristics of homes and location. In this project, we utilize the data to compare homes in California, New Jersey, New York, and Pennsylvania. In comparing the homes, we determine how home size, number of bedrooms, and number of bathrooms influence price with the following five questions.

The questions include:

# 1. Use the data only for California. How much does the size of a home influence its price?
# 2. Use the data only for California. How does the number of bedrooms of a home influence its price?
# 3. Use the data only for California. How does the number of bathrooms of a home influence its price?
# 4.  Use the data only for California. How do the size,  the number of bedrooms, and  the number of bathrooms of a home jointly influence its price?
# 5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

Data

Source: “HomesForSale” data from lock5stat

Variables: -Price: Sale price of homes in $1,000 -State: Location of home (CA, NJ, NY, and PA) -Size: Area of all rooms in 1,000’s sq. ft. -Bedrooms: Number of bedrooms -Bathrooms: Number of bathrooms

home_data = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home_data)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Analysis

We explore the questions in detail.

Q1: Use the data only for California. How much does the size of a home influence its price?

Homes_CA <- subset(home_data, State == "CA")
Homes_CA <- na.omit(Homes_CA)
model_size <- lm(Price ~ Size, data = Homes_CA)
summary(model_size)
## 
## Call:
## lm(formula = Price ~ Size, data = Homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Regression Results: -The slope estimate for size is 0.33919 which indicates for every additional 1,000 square feet, the price increases by approximately $339,190. -The significantly high p-value for size is 0.0004634 suggesting a strong relationship between home size and price. -Utilizing our R-squared value of 0.3594, approximatly 36% of the variability in home prices is explained by size alone.

Q2: Use the data only for California. How does the number of bedrooms of a home influence its price?

model_beds <- lm(Price ~ Beds, data = Homes_CA)
summary(model_beds)
## 
## Call:
## lm(formula = Price ~ Beds, data = Homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

Regression results: -From our slope estimate value of 84.77, each additional bedroom is associated with an average increase of $84,770 in price. -Insignificantly, our p-value is greater than 0.05. -The R-squared value is 0.04605 indicating only 4.6% of the variability in prices is explained by the number of bedrooms.

Q3: Use the data only for California. How does the number of bathrooms of a home influence its price?

model_baths <- lm(Price ~ Baths, data = Homes_CA)
summary(model_baths)
## 
## Call:
## lm(formula = Price ~ Baths, data = Homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

Regression results: -From the slope estimate value for bathrooms, each additional bathroom is associated with a $194,740 increase in price. -The p-value=0.004092 which is statistically significant. -From the R-squared value, approximately 26% of the variability in price is related to the number of bathrooms

Q4: Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

model_full <- lm(Price ~ Size + Beds + Baths, data = Homes_CA)
summary(model_full)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = Homes_CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Regression Results: -The model’s value for R-squared indicates about 39% of the variability in price is associated to size, bedrooms, and bathrooms combined. -Size is the only significant predictor (p=0.0259) with a slope estimate indicating for every additional 1,000 square feet the price increases by approximately $281,100. -Bedrooms (p=0.6239) and bathrooms (p=0.2839) are not statistically significant in this model.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

anova_model <- aov(Price ~ State, data = home_data)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = Price ~ State, data = home_data)
## 
## $State
##             diff       lwr        upr     p adj
## NJ-CA -206.83333 -363.6729  -49.99379 0.0044754
## NY-CA -170.03333 -326.8729  -13.19379 0.0280402
## PA-CA -269.80000 -426.6395 -112.96045 0.0001011
## NY-NJ   36.80000 -120.0395  193.63955 0.9282064
## PA-NJ  -62.96667 -219.8062   93.87288 0.7224830
## PA-NY  -99.76667 -256.6062   57.07288 0.3505951

ANOVA results: -There is a significant difference in mean home prices among the four states (p=0.000148) -For the Turkey post-hoc test shows the following: -California homes are significantly more expensive than homes in New Jersey, New York, and Pennsylvania. -The difference in price between homes in Pennsylvania, New York, and New Jersey is insignificant.

Summary and Conclusion

-Size is the most influential factor in home price in California. -Bedrooms alone show a positive relationship to price, but their effect does not significantly influence price. -Bathrooms alone consistently add value to homes reflecting higher prices for more bathrooms. -The joint model confirms that size is a primary influence in price, bedrooms and bathrooms both reflect influence but not as significant to price. -State differences are statistically significant with California being the most expensive. Location itself impacts prices as well as physical characteristics.

In all, home prices are influenced by physical attributes as well as location. Buyers will pay more for larger sizes and more bathrooms. Additionally, it is more expensive to purchase a home in California than in New Jersey, New York, and Pennsylvania.

Appendix of All Code Chunks:

# Q1:
# Homes_CA <- subset(home_data, State == "CA")
# Homes_CA <- na.omit(Homes_CA)
# model_size <- lm(Price ~ Size, data = Homes_CA)
# summary(model_size)

# Q2:
# model_beds <- lm(Price ~ Beds, data = Homes_CA)
# summary(model_beds)

# Q3:
# model_baths <- lm(Price ~ Baths, data = Homes_CA)
# summary(model_baths)

# Q4:
# model_full <- lm(Price ~ Size + Beds + Baths, data = Homes_CA)
# summary(model_full)

# Q5:
# anova_model <- aov(Price ~ State, data = home_data)
# summary(anova_model)
# TukeyHSD(anova_model)