Introduction

I used data from the “HomesForSale” section from https://www.lock5stat.com/datapage3e.html to explore these 5 questions.

Analysis

We will explore the 5 questions in detail, below is a portion of the data set provided.

##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1: Using only the data for California, how much does the size of a home influence its price?

## 
## Call:
## lm(formula = CA_Price ~ CA_Size, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## CA_Size       0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

Using the regression model summarized above, the estimated slope shows that price increases by about $339.19 for an increase of 1000 square feet in total size.

Q2: Using only the data for California, how does the number of bedrooms of a home influence its price?

## 
## Call:
## lm(formula = CA_Price ~ CA_Beds, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## CA_Beds        84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The regression model summarized above shows that each additional bedroom is associated with an increase of approximately $84,770 in home price, but the relationship is NOT statistically significant based on the p-value of 0.2548. This is well above the textbooks significance level of 0.01.

Q3: Using only the data for California, how does the number of bathrooms of a home influence its price?

## 
## Call:
## lm(formula = CA_Price ~ CA_Baths, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## CA_Baths      194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The regression model summarized above shows that each additional bathroom increases the home price by approximately $194,740 and that the relationship is significant with a p-value of 0.004092.

Q4: Using only the data for California, how do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

## 
## Call:
## lm(formula = CA_Price ~ CA_Baths + CA_Beds + CA_Size, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## CA_Baths     83.9844    76.7530   1.094   0.2839  
## CA_Beds     -33.7036    67.9255  -0.496   0.6239  
## CA_Size       0.2811     0.1189   2.364   0.0259 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Looking at the multiple regression data, home size remains a statistically significant predictor of price with a p-value of 0.0259 and with each additional 1000 square foot associated with a $281.10 price increase. Each additional bathroom is associated with an $83,984.40 increase in price, but this is not statistically significant with a p-value of 0.2839. Oddly, the multiple regression model suggests a negative relationship between number of bedrooms and price, -$33,703.60 per bedroom, but this is also not statistically significant, p-value of 0.6239. The model as a whole is statistically significant with an F-statistic p-value of 0.004353. Overall this regression suggests that square footage is the most important factor among these three variables.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help determine if the state in which a home is located has a significant impact on its price, all data will be used.

##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## state         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Based on the ANOVA results above, there are indeed significant differences in home prices among the four states, CA, NY, NJ, and PA. The p-value of 0.000148 is very small, indicating a highly significant relationship between state and prices. The F value of 7.355 also shows that the variation between states is substantially greater than the variation within states.

Appendix

# Q1 Code:

  #Only_CA <- home %>%
  #filter(State == "CA")
  #CA_Size <- Only_CA$Size
  #CA_Price <- Only_CA$Price
  #df = data.frame(CA_Size, CA_Price)
  #lm_model <- lm(CA_Price ~ CA_Size, data = df)
  #summary(lm_model)

# Q2 Code: 

  #CA_Beds <- Only_CA$Beds
  #df = data.frame(CA_Beds, CA_Price)
  #lm_model <- lm(CA_Price ~ CA_Beds, data = df)
  #summary(lm_model)

# Q3 Code:

  #CA_Baths <- Only_CA$Baths
  #df = data.frame(CA_Baths, CA_Price)
  #lm_model <- lm(CA_Price ~ CA_Baths, data = df)
  #summary(lm_model)

# Q4 Code:

  #df = data.frame(CA_Baths, CA_Beds, CA_Size, CA_Price)
  #lm_model <- lm(CA_Price ~ CA_Baths + CA_Beds + CA_Size, data = df)
  #summary(lm_model)

# Q5 Code:

  #State <- home$State
  #State = as.factor(State)
  #All_Price <- home$Price
  #myData <- data.frame(state = State, all_price = All_Price)
  #model <- aov(all_price ~ state, data = myData)
  #summary(model)