1. Introduction

In this report, we are going to take look at houses for sale in California, New York, New Jersey, and Pennsylvania. We will compare the house prices while taking into considerationn the home size, number of bedrooms, bathrooms, etc. Below are the 5 questions we’re going to explore:

  1. How much does the size of a home influence its price?
  2. How does the number of bedrooms of a home influence its price?
  3. How does the number of bathrooms of a home influence its price?
  4. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price.

Analysis

home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
homeCA = subset(home, State == "CA")

Q1. How much does the size of a home influence its price?

plot(Price ~ Size, data = homeCA,
     main = "Price vs Size (only CA)",
     xlab = "Size (1000 sq ft)", ylab = "Price ($1000s)")
abline(lm(Price ~ Size, data = homeCA))

reg1 = lm(Price ~ Size, data = homeCA)
summary(reg1)
## 
## Call:
## lm(formula = Price ~ Size, data = homeCA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

The regression model for CA homes shows that the slope size is 0.33919, which means for each additional 1000 square feet, the home prices on average increase by $339,190. The p-value is 0.0004634 which is well below 0.0.5, indicating a strong statistical significance between home prices and home sizes in California. The R2 value of 0.3594 shows that 39.54% of the home price variation in California is explained by home size.

Q2. How does the number of bedrooms of a home influence its price?

Reg2 = lm(Price ~ Beds, data = homeCA)
summary(Reg2)
## 
## Call:
## lm(formula = Price ~ Beds, data = homeCA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The regression model for California homes shows that the slope for Bedrooms is 84.77, meaning that for each additional bedroom, the expected home price increases by approximately $84,770. However, the p-value for the slope is 0.255, which is well above the 0.05 significance level. This means we fail to reject the null hypothesis and conclude that the number of bedrooms does not have a statistically significant linear relationship with home price in California. Although the estimated effect is positive, the large p-value indicates that this association is not statistically meaningful based on the sample data.

Q3. How does the number of bathrooms of a home influence its price?

reg3 = lm(Price ~ Baths, data = homeCA)
summary(reg3)
## 
## Call:
## lm(formula = Price ~ Baths, data = homeCA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression model for CA homes shows that the slope for bathrooms is 194.74, meaning that each additional bathroom is associated with an estimated increase of about $194,740 in home price. The p-value for this slope is 0.004, which is below the 0.05 significance level. This indicates that the number of bathrooms has a statistically significant linear relationship with home price in California.

Q4. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

reg4 = lm(Price ~ Size + Beds + Baths, data = homeCA)
summary(reg4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = homeCA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

The p-value for size is 0.026, which is below 0.05, indicating that size remains a statistically significant predictor of home price regardless of the number of bedrooms and bathrooms. The p-value for bedrooms is 0.624, which is far above 0.05, showing that the number of bedrooms does not significantly impact home price once size and bathrooms are included in the model. Lastly, the p-value for bathrooms is 0.284, which is also above 0.05, indicating that bathrooms are not significant predictors of home price in California after accounting for size and bedrooms. Overall, size has the most statistically significant linear relationship with home price in California, as size increase, the price also increases.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price.

anova5 = aov(Price ~ State, data = home)
boxplot(Price ~ State, data = home,
        main = "Home Prices by State",
        xlab = "State", ylab = "Price ($1000s)")

summary(anova5)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA results show that state has a statistically significant effect on home prices. The p-value of 0.000148 is far below the 0.05 significance level, so we reject the null hypothesis that all four states have the same average home price. I also included a boxplot to visually compare the price differences across states, which shows that California has the highest mean home price among the four states, while Pennsylvania has the lowest.