1. Intoduction

We used the data from https://www.lock5stat.com/datapage3e.html

These questions are to addressed using the data provided above

  1. Use the data only for California. How much does the size of a home influence its price?

  2. Use the data only for California. How does the number of bedrooms of a home influence its price?

  3. Use the data only for California. How does the number of bathrooms of a home influence its price?

  4. Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

2. Analysis

We will use different statistical methods to analysis our data,

home <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home)
##   State Price Size Beds Baths
## 1    CA   533 1589    3   2.5
## 2    CA   610 2008    3   2.0
## 3    CA   899 2380    5   3.0
## 4    CA   929 1868    3   3.0
## 5    CA   210 1360    2   2.0
## 6    CA   268 2131    3   2.0

Q1: Use the data only for California. How much does the size of a home influence its price?

CAdata = subset(home, State == "CA")
lm(Price ~ Size, data = CAdata)
## 
## Call:
## lm(formula = Price ~ Size, data = CAdata)
## 
## Coefficients:
## (Intercept)         Size  
##    -56.8167       0.3392

Based off the regression model, it seems that for every square foot that a house has the price increases by roughly 340 dollars.

Q2: Use the data only for California. How does the number of bedrooms of a home influence its price?

CA = subset(home, State == "CA")
lm(Price ~ Beds, data = CA)
## 
## Call:
## lm(formula = Price ~ Beds, data = CA)
## 
## Coefficients:
## (Intercept)         Beds  
##      269.76        84.77
model1 <- lm(Price ~ Beds, data = CA)
summary(model1)
## 
## Call:
## lm(formula = Price ~ Beds, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413.83 -236.62   29.94  197.69  570.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)   269.76     233.62   1.155    0.258
## Beds           84.77      72.91   1.163    0.255
## 
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared:  0.04605,    Adjusted R-squared:  0.01198 
## F-statistic: 1.352 on 1 and 28 DF,  p-value: 0.2548

The number of bedrooms does not significantly affect the price of a house in the California data as the p-value is larger than 0.05.

Q3: Use the data only for California. How does the number of bathrooms of a home influence its price?

CA = subset(home, State == "CA")
lm(Price ~ Baths, data = CA)
## 
## Call:
## lm(formula = Price ~ Baths, data = CA)
## 
## Coefficients:
## (Intercept)        Baths  
##       90.71       194.74
model2 <- lm(Price ~ Baths, data = CA)
summary(model2)
## 
## Call:
## lm(formula = Price ~ Baths, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -374.93 -181.56   -2.74  152.31  614.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    90.71     148.57   0.611  0.54641   
## Baths         194.74      62.28   3.127  0.00409 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared:  0.2588, Adjusted R-squared:  0.2324 
## F-statistic: 9.779 on 1 and 28 DF,  p-value: 0.004092

The number of bathrooms does significantly affect the price of a house in California as the p value is much less than 0.05.

Q4: Use the data only for California. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

CA = subset(home, State == "CA")
lm(Price ~ Size + Beds + Baths, data = CA)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA)
## 
## Coefficients:
## (Intercept)         Size         Beds        Baths  
##    -41.5608       0.2811     -33.7036      83.9844
model3 <- lm(Price ~ Size + Beds + Baths, data = CA)
summary(model3)
## 
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -415.47 -130.32   19.64  154.79  384.94 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -41.5608   210.3809  -0.198   0.8449  
## Size          0.2811     0.1189   2.364   0.0259 *
## Beds        -33.7036    67.9255  -0.496   0.6239  
## Baths        83.9844    76.7530   1.094   0.2839  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared:  0.3912, Adjusted R-squared:  0.3209 
## F-statistic: 5.568 on 3 and 26 DF,  p-value: 0.004353

Because the p-value is much less than 0.05 jointly these factors have a significant effect on the price of a home. However as Bedrooms (p=0.6239) and Bathrooms (p=0.2839) have large p values these factors don’t have a statistically significant effect on the price of a home. While the Size (p=0.0259) has a statistically significant effect on the price of the home compared to the other two factors as it has the small p value by a long shot.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)? This will help you determine if the state in which a home is located has a significant impact on its price. All data should be used.

aov(Price ~ State, data = home)
## Call:
##    aov(formula = Price ~ State, data = home)
## 
## Terms:
##                   State Residuals
## Sum of Squares  1198169   6299266
## Deg. of Freedom       3       116
## 
## Residual standard error: 233.0322
## Estimated effects may be unbalanced
model4 <- aov(Price ~ State, data = home)
summary(model4)
##              Df  Sum Sq Mean Sq F value   Pr(>F)    
## State         3 1198169  399390   7.355 0.000148 ***
## Residuals   116 6299266   54304                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the P-value is 0.000148 which is very close to zero, the ANOVA test proves that there are differences in the price of homes based on the state that the house is being sold in.

3. Summary

In conclusion, we answered all the questions that were given to us, proved through p values if certain factors affected the price of homes, and learned more about posit.