This document provides an analysis of the data provided at https://www.lock5stat.com/datasets3e/HomesForSale.csv The data contains information about homes for sale and has data on location, price, size, and number of bed and baths for each home. The goal of this project is to explore how different aspects of a house influence the listing price.
I propose the following 5 questions based on my understanding of the data.
We will explore the questions in detail and summarize the results.
home = read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
head(home)
## State Price Size Beds Baths
## 1 CA 533 1589 3 2.5
## 2 CA 610 2008 3 2.0
## 3 CA 899 2380 5 3.0
## 4 CA 929 1868 3 3.0
## 5 CA 210 1360 2 2.0
## 6 CA 268 2131 3 2.0
There is statistically significant evidence that the size of the home possitevly affects the price in california. The p value of 0.000463 is less than the .001 threshold to be considered highly significant meaning we can conclude size does effect the price of homes in California. The R value of .3594 shows that roughly one third of the variation in pricing comes from the size of the house and the fact that its positive means that its a postive relationship. The estimated change in price due to an increase in size by one square foot is 339 dollars. Home prices have a deviation of $219,300 based on the residual standard error of the data.
The estimated change in price due to a home having one more bedroom is 84,770 dollars. While this is a large increase in price, there is no statistically significant effect due to having a p value of .255 which is much larger than the standard threshold of .05 for a significant effect. The r value is 0.04605 which is very low also showing there is little effect on price when considering number of bedrooms. We cannot conclude that there is any relationship betweeen number of bedrooms and home price in California
The estimated change in price due to a home having one more bathroom is 195,000 dollars. The p value is 0.00409 which is less than the .05 threshold to be considered significant. This shows there is a linear relationship between number of bathrooms and home price. The r value is 0.2588 which also shows a positive relationship and says that roughly one quarter of the prices variation is due to number of bathrooms.
The overall data is significant given the p value: 0.004353. This is once again below the threshold to be considered significant. The data also has an r value: 0.3912 meaning about 39% of the variance in price is explained by the home size, number of bedrooms and number of bathrooms. The overall data shows that size is the most significant factor given its the lowest p value. On the other hand, number of bedrooms is the least significant given it has the highest p value. Number of bathrooms does have some accosiaction with price but once the home size is factored in, number of bathrooms becomes less significant. We can conclude that the most effective way to increase home value in California is to increase square footage.
The data concerning home price among the different states listed in the dataset shows a highly significant relationship between home price and the state in which it is located in. The p value: 0.0001482 is well below the threshold to be considered highly significant. Using the Sum SQ data for state and residual and calculating the r value, the data shows that roughly 16% of price variation is due to the state the home lies in.
CA_home <- subset(home, State == "CA")
mod1 <- lm(Price ~ Size, data = CA_home)
summary(mod1)
##
## Call:
## lm(formula = Price ~ Size, data = CA_home)
##
## Residuals:
## Min 1Q Median 3Q Max
## -462.55 -139.69 39.24 147.65 352.21
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -56.81675 154.68102 -0.367 0.716145
## Size 0.33919 0.08558 3.963 0.000463 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 219.3 on 28 degrees of freedom
## Multiple R-squared: 0.3594, Adjusted R-squared: 0.3365
## F-statistic: 15.71 on 1 and 28 DF, p-value: 0.0004634
mod2 <- lm(Price ~ Beds, data = CA_home)
summary(mod2)
##
## Call:
## lm(formula = Price ~ Beds, data = CA_home)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413.83 -236.62 29.94 197.69 570.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 269.76 233.62 1.155 0.258
## Beds 84.77 72.91 1.163 0.255
##
## Residual standard error: 267.6 on 28 degrees of freedom
## Multiple R-squared: 0.04605, Adjusted R-squared: 0.01198
## F-statistic: 1.352 on 1 and 28 DF, p-value: 0.2548
mod3 <- lm(Price ~ Baths, data = CA_home)
summary(mod3)
##
## Call:
## lm(formula = Price ~ Baths, data = CA_home)
##
## Residuals:
## Min 1Q Median 3Q Max
## -374.93 -181.56 -2.74 152.31 614.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 90.71 148.57 0.611 0.54641
## Baths 194.74 62.28 3.127 0.00409 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 235.8 on 28 degrees of freedom
## Multiple R-squared: 0.2588, Adjusted R-squared: 0.2324
## F-statistic: 9.779 on 1 and 28 DF, p-value: 0.004092
mod4 <- lm(Price ~ Size + Beds + Baths, data = CA_home)
summary(mod4)
##
## Call:
## lm(formula = Price ~ Size + Beds + Baths, data = CA_home)
##
## Residuals:
## Min 1Q Median 3Q Max
## -415.47 -130.32 19.64 154.79 384.94
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -41.5608 210.3809 -0.198 0.8449
## Size 0.2811 0.1189 2.364 0.0259 *
## Beds -33.7036 67.9255 -0.496 0.6239
## Baths 83.9844 76.7530 1.094 0.2839
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 221.8 on 26 degrees of freedom
## Multiple R-squared: 0.3912, Adjusted R-squared: 0.3209
## F-statistic: 5.568 on 3 and 26 DF, p-value: 0.004353
mod5 <- lm(Price ~ State, data = home)
anova(mod5)
## Analysis of Variance Table
##
## Response: Price
## Df Sum Sq Mean Sq F value Pr(>F)
## State 3 1198169 399390 7.3547 0.0001482 ***
## Residuals 116 6299266 54304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1