This brief analyses creates regression models to show the relationship between house price and housing characteristics. This analysis uses data from the HousePrices.csv, which was found on the website of https://www.biz.uiowa.edu/faculty/jledolter/DataMining/dataexercises.html. In addition to this, this analyses draws on topics and concepts covered in chapter 3.
library(readr)
HousePrices <- read_csv("HousePrices.csv")
## Parsed with column specification:
## cols(
## HomeID = col_double(),
## Price = col_double(),
## SqFt = col_double(),
## Bedrooms = col_double(),
## Bathrooms = col_double(),
## Offers = col_double(),
## Brick = col_character(),
## Neighborhood = col_character()
## )
HousePrice <-HousePrices[-c(1,7,8)]
head(HousePrice)
## # A tibble: 6 x 5
## Price SqFt Bedrooms Bathrooms Offers
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 114300 1790 2 2 2
## 2 114200 2030 4 2 3
## 3 114800 1740 3 2 1
## 4 94700 1980 3 2 3
## 5 119800 2130 3 3 3
## 6 114600 1780 3 2 2
plot(Price~SqFt,data=HousePrices)
plot(Price~Bedrooms,data=HousePrices)
plot(Price~Bathrooms,data=HousePrices)
plot(Price~Offers,data=HousePrices)
m1=lm(Price~.,data=HousePrice)
summary(m1)
##
## Call:
## lm(formula = Price ~ ., data = HousePrice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33608 -9889 -2968 9398 43243
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17347.377 12724.896 -1.363 0.175
## SqFt 61.840 8.264 7.483 1.20e-11 ***
## Bedrooms 9319.753 2148.754 4.337 2.97e-05 ***
## Bathrooms 12646.347 3109.662 4.067 8.45e-05 ***
## Offers -13601.011 1324.819 -10.266 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15000 on 123 degrees of freedom
## Multiple R-squared: 0.6982, Adjusted R-squared: 0.6884
## F-statistic: 71.13 on 4 and 123 DF, p-value: < 2.2e-16
cor(HousePrice, method = "pearson", use = "complete.obs")
## Price SqFt Bedrooms Bathrooms Offers
## Price 1.0000000 0.5529822 0.5259261 0.5232578 -0.3136359
## SqFt 0.5529822 1.0000000 0.4838071 0.5227453 0.3369234
## Bedrooms 0.5259261 0.4838071 1.0000000 0.4145560 0.1142706
## Bathrooms 0.5232578 0.5227453 0.4145560 1.0000000 0.1437934
## Offers -0.3136359 0.3369234 0.1142706 0.1437934 1.0000000
library(moments)
skewness(HousePrice,na.rm=TRUE)
## Price SqFt Bedrooms Bathrooms Offers
## 0.46737973 0.07755647 0.21266288 0.39390645 0.28060403
kurtosis(HousePrice,na.rm=TRUE)
## Price SqFt Bedrooms Bathrooms Offers
## 2.943963 3.095895 2.566366 1.572100 2.829153
m2=lm(Price~SqFt, data = HousePrice)
summary(m2)
##
## Call:
## lm(formula = Price ~ SqFt, data = HousePrice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -46593 -16644 -1610 15124 54829
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10091.130 18966.104 -0.532 0.596
## SqFt 70.226 9.426 7.450 1.3e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22480 on 126 degrees of freedom
## Multiple R-squared: 0.3058, Adjusted R-squared: 0.3003
## F-statistic: 55.5 on 1 and 126 DF, p-value: 1.302e-11
m3=lm(Price~Bedrooms, data = HousePrice)
summary(m3)
##
## Call:
## lm(formula = Price ~ Bedrooms, data = HousePrice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -48671 -14496 462 13178 61763
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71575 8718 8.210 2.24e-13 ***
## Bedrooms 19466 2804 6.941 1.83e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22940 on 126 degrees of freedom
## Multiple R-squared: 0.2766, Adjusted R-squared: 0.2709
## F-statistic: 48.18 on 1 and 126 DF, p-value: 1.83e-10
m4=lm(Price~Bathrooms, data = HousePrice)
summary(m4)
##
## Call:
## lm(formula = Price ~ Bathrooms, data = HousePrice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -61985 -15583 -2272 15722 65615
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 63605 9906 6.421 2.51e-09 ***
## Bathrooms 27327 3965 6.892 2.35e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22990 on 126 degrees of freedom
## Multiple R-squared: 0.2738, Adjusted R-squared: 0.268
## F-statistic: 47.51 on 1 and 126 DF, p-value: 2.345e-10
m5=lm(Price~Offers, data = HousePrice)
summary(m5)
##
## Call:
## lm(formula = Price ~ Offers, data = HousePrice)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58003 -19213 -5612 18278 84097
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 150745 5929 25.424 < 2e-16 ***
## Offers -7881 2126 -3.708 0.000312 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25610 on 126 degrees of freedom
## Multiple R-squared: 0.09837, Adjusted R-squared: 0.09121
## F-statistic: 13.75 on 1 and 126 DF, p-value: 0.0003122
It can be concluded that house price is impacted by square footage, and the number of bathrooms and bedrooms.The categories of HomeID Brick, and Neighborhood columns were removed since they were irrelevant to the analysis. Models were made to show the relationships of Bathrooms, Bedrooms, and Square Footage with Price. The first model showed a scatterplot of Price vs. SqFt. The plot showed that the price for a house increased as its square footage increased. The second model showed the relationship Price vs. Bedrooms. The plot showed that the number of bedrooms for a house ranges from 2-5 and that the number of rooms wasn’t necessarily the only reason that a house may have cost more. The third model showed hows the relationship Price vs. Bathrooms. The plot showed that the number of bathrooms for a house ranges from 2-4 and that the number of bathrooms wasn’t necessarily the only reason that a house may have cost more. The fourth plot showed the relationship between Price vs, Offers. The number of offers ranged from 1-5 offers, but the offering prices varied. This showed the number of offers wasn’t the sole factor that impacted overall price. The output for all of the variables showed that SqFt, Bedrooms, Bathrooms, and Offers explains about 70% of the variably for house price. The correlation matrix showed that Price has the strongest correlations with SqFt, Bedrooms, and Bathrooms. While the correlation values aren’t that high, those three variables have the highest ones. For skewness, SqFt was the closest variable that neared zero and that it’s kurtosis was close to 3. The other variables kurtosis were close to 3, but there skewness were a little higher. The model for SqFt showed that 30.1% of its variability accounted for price, and that each square foot costs about $70.22. The model for Bedrooms showed that 27.6% of its variability accounted price and that each bedroom cost about $19,466. The model for Bathrooms showed that 27.3% of its variability accounted for price, and that each bathroom costs about $27,327. The model for Offers showed that 9.8% of its variability accounted for price, and that each offer lessened the price by -$7,881.