##Introduction

This project uses data from the “HomesForSale” dataset that was provided. Using this dataset, the provided research questions will be analyzed using concepts learned in class.

The 5 questions to be analyzed are as follows: 1) Use the data only for California. How much does the size of a home influence its price? 2) Use the data only for California. How does the number of bedrooms of a home influence its price? 3) Use the data only for California. How does the number of bathrooms of a home influence its price? 4) Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price? 5) Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

Analysis

Here, we will explore 5 of the above questions in detail.

##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1 Use the data only for California. How much does the size of a home influence its price?

CA_Data <- HomesData[HomesData$State == "CA", ]
model <- lm(Price ~ Size, data = CA_Data)
summary(model)
## 
## Call:
## lm(formula = Price ~ Size, data = CA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -462.55 -139.69   39.24  147.65  352.21 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -56.81675  154.68102  -0.367 0.716145    
## Size          0.33919    0.08558   3.963 0.000463 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared:  0.3594, Adjusted R-squared:  0.3365 
## F-statistic: 15.71 on 1 and 28 DF,  p-value: 0.0004634

This shows that the slope for size of the house, based on the data, is 0.33919, and since the p-value (0.0004634) is calculated to be less than 0.05, it is statistically significant. From the datasheet, it can be inferred that for every 1000 square feet of house size, the price increases by $339,190.

Q2 Use the data only for California. How does the number of bedrooms of a home influence its price?

CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q2 <- lm(Price ~ Beds, data = CA_Data)
summary(model_Q2)
## 
## Call:
## lm(formula = Price ~ Beds, data = CA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

This shows that the number of bedrooms slope is 84.77, but the p-value of 0.2548 is relatively large (greater than 0.05), so it is not a reliable indication to determine the overall cost. If we go off the slope alone though, it could be said that the overall cost goes up by about $84,770 per bedroom, but again, it is not a reliable predictor.

Q3 Use the data only for California. How does the number of bathrooms of a home influence its price?

CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q3 <- lm(Price ~ Baths, data = CA_Data)
summary(model_Q3)
## 
## Call:
## lm(formula = Price ~ Baths, data = CA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

This shows that since the p-value is below 0.05, the number of bathrooms in a house has a significant effect on the overall cost of the house. The slope for bathrooms of 194.74 shows that the cost of the house raises $194,740 per bathroom.

Q4 Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q4 <- lm(Price ~ Size + Beds + Baths, data = CA_Data)
summary(model_Q4)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

This shows the p-values for size, beds, and baths are 0.0259, 0.6239, and 0.2839 respectively. Since the only p-value below 0.05 is for size, it can be concluded that home size is the only significant predictor of the overall cost of a home.

Q5 Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

anova_model <- aov(Price ~ State, data = HomesData)
summary(anova_model)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows a p-value of 0.000148, which is much smaller than 0.05, so the null hypothesis of the state not mattering is rejected. This shows that the state a house is in strongly impacts its price. The large F-value of 7.355 also indicates that the mean price between states largely differs.

Summary

In conclusion, methods learned in the class material proved very useful in comparing different sets of data. This project helped me practice learned concepts and utilize problem solving skills. This project also helped me become more familiar with the functionality of posit.

Appendix

knitr::opts_chunk$set(echo = TRUE)
HomesData = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(HomesData)
CA_Data <- HomesData[HomesData$State == "CA", ]
model <- lm(Price ~ Size, data = CA_Data)
summary(model)
CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q2 <- lm(Price ~ Beds, data = CA_Data)
summary(model_Q2)
CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q3 <- lm(Price ~ Baths, data = CA_Data)
summary(model_Q3)
CA_Data <- HomesData[HomesData$State == "CA", ]
model_Q4 <- lm(Price ~ Size + Beds + Baths, data = CA_Data)
summary(model_Q4)
anova_model <- aov(Price ~ State, data = HomesData)
summary(anova_model)