I used data from the “HomesForSale” section from https://www.lock5stat.com/datapage3e.html to explore these 5 questions.
We will explore the 5 questions in detail, below is a portion of the data set provided.
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
##
## Call:
## lm(formula = CA_Price ~ CA_Size, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## CA_Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
Using the regression model summarized above, the estimated slope shows that price increases by about $339.19 for an increase of 1000 square feet in total size.
##
## Call:
## lm(formula = CA_Price ~ CA_Beds, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## CA_Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
The regression model summarized above shows that each additional bedroom is associated with an increase of approximately $84,770 in home price, but the relationship is NOT statistically significant based on the p-value of 0.2548. This is well above the textbooks significance level of 0.01.
##
## Call:
## lm(formula = CA_Price ~ CA_Baths, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## CA_Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
The regression model summarized above shows that each additional bathroom increases the home price by approximately $194,740 and that the relationship is significant with a p-value of 0.004092.
##
## Call:
## lm(formula = CA_Price ~ CA_Baths + CA_Beds + CA_Size, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## CA_Baths 83.9844 76.7530 1.094 0.2839
## CA_Beds -33.7036 67.9255 -0.496 0.6239
## CA_Size 0.2811 0.1189 2.364 0.0259 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
Looking at the multiple regression data, home size remains a statistically significant predictor of price with a p-value of 0.0259 and with each additional 1000 square foot associated with a $281.10 price increase. Each additional bathroom is associated with an $83,984.40 increase in price, but this is not statistically significant with a p-value of 0.2839. Oddly, the multiple regression model suggests a negative relationship between number of bedrooms and price, -$33,703.60 per bedroom, but this is also not statistically significant, p-value of 0.6239. The model as a whole is statistically significant with an F-statistic p-value of 0.004353. Overall this regression suggests that square footage is the most important factor among these three variables.
## Df Sum Sq Mean Sq F value Pr(>F)
## state 3 1198169 399390 7.355 0.000148 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the ANOVA results above, there are indeed significant differences in home prices among the four states, CA, NY, NJ, and PA. The p-value of 0.000148 is very small, indicating a highly significant relationship between state and prices. The F value of 7.355 also shows that the variation between states is substantially greater than the variation within states.
# Q1 Code:
#Only_CA <- home %>%
#filter(State == "CA")
#CA_Size <- Only_CA$Size
#CA_Price <- Only_CA$Price
#df = data.frame(CA_Size, CA_Price)
#lm_model <- lm(CA_Price ~ CA_Size, data = df)
#summary(lm_model)
# Q2 Code:
#CA_Beds <- Only_CA$Beds
#df = data.frame(CA_Beds, CA_Price)
#lm_model <- lm(CA_Price ~ CA_Beds, data = df)
#summary(lm_model)
# Q3 Code:
#CA_Baths <- Only_CA$Baths
#df = data.frame(CA_Baths, CA_Price)
#lm_model <- lm(CA_Price ~ CA_Baths, data = df)
#summary(lm_model)
# Q4 Code:
#df = data.frame(CA_Baths, CA_Beds, CA_Size, CA_Price)
#lm_model <- lm(CA_Price ~ CA_Baths + CA_Beds + CA_Size, data = df)
#summary(lm_model)
# Q5 Code:
#State <- home$State
#State = as.factor(State)
#All_Price <- home$Price
#myData <- data.frame(state = State, all_price = All_Price)
#model <- aov(all_price ~ state, data = myData)
#summary(model)