Introduction

This brief analyses creates regression models to show the relationship between house price and housing characteristics. This analysis uses data from the HousePrices.csv, which was found on the website of https://www.biz.uiowa.edu/faculty/jledolter/DataMining/dataexercises.html. In addition to this, this analyses draws on topics and concepts covered in chapter 3.

library(readr)
HousePrices <- read_csv("HousePrices.csv")
## Parsed with column specification:
## cols(
##   HomeID = col_double(),
##   Price = col_double(),
##   SqFt = col_double(),
##   Bedrooms = col_double(),
##   Bathrooms = col_double(),
##   Offers = col_double(),
##   Brick = col_character(),
##   Neighborhood = col_character()
## )
HousePrice <-HousePrices[-c(1,7,8)]
head(HousePrice)
## # A tibble: 6 x 5
##    Price  SqFt Bedrooms Bathrooms Offers
##    <dbl> <dbl>    <dbl>     <dbl>  <dbl>
## 1 114300  1790        2         2      2
## 2 114200  2030        4         2      3
## 3 114800  1740        3         2      1
## 4  94700  1980        3         2      3
## 5 119800  2130        3         3      3
## 6 114600  1780        3         2      2
plot(Price~SqFt,data=HousePrices)

plot(Price~Bedrooms,data=HousePrices)

plot(Price~Bathrooms,data=HousePrices)

plot(Price~Offers,data=HousePrices)

m1=lm(Price~.,data=HousePrice)
summary(m1)
## 
## Call:
## lm(formula = Price ~ ., data = HousePrice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -33608  -9889  -2968   9398  43243 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -17347.377  12724.896  -1.363    0.175    
## SqFt            61.840      8.264   7.483 1.20e-11 ***
## Bedrooms      9319.753   2148.754   4.337 2.97e-05 ***
## Bathrooms    12646.347   3109.662   4.067 8.45e-05 ***
## Offers      -13601.011   1324.819 -10.266  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15000 on 123 degrees of freedom
## Multiple R-squared:  0.6982, Adjusted R-squared:  0.6884 
## F-statistic: 71.13 on 4 and 123 DF,  p-value: < 2.2e-16
cor(HousePrice, method = "pearson", use = "complete.obs")
##                Price      SqFt  Bedrooms Bathrooms     Offers
## Price      1.0000000 0.5529822 0.5259261 0.5232578 -0.3136359
## SqFt       0.5529822 1.0000000 0.4838071 0.5227453  0.3369234
## Bedrooms   0.5259261 0.4838071 1.0000000 0.4145560  0.1142706
## Bathrooms  0.5232578 0.5227453 0.4145560 1.0000000  0.1437934
## Offers    -0.3136359 0.3369234 0.1142706 0.1437934  1.0000000
library(moments)
skewness(HousePrice,na.rm=TRUE)
##      Price       SqFt   Bedrooms  Bathrooms     Offers 
## 0.46737973 0.07755647 0.21266288 0.39390645 0.28060403
kurtosis(HousePrice,na.rm=TRUE)
##     Price      SqFt  Bedrooms Bathrooms    Offers 
##  2.943963  3.095895  2.566366  1.572100  2.829153
m2=lm(Price~SqFt, data = HousePrice)
summary(m2)
## 
## Call:
## lm(formula = Price ~ SqFt, data = HousePrice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -46593 -16644  -1610  15124  54829 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -10091.130  18966.104  -0.532    0.596    
## SqFt            70.226      9.426   7.450  1.3e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22480 on 126 degrees of freedom
## Multiple R-squared:  0.3058, Adjusted R-squared:  0.3003 
## F-statistic:  55.5 on 1 and 126 DF,  p-value: 1.302e-11
m3=lm(Price~Bedrooms, data = HousePrice)
summary(m3)
## 
## Call:
## lm(formula = Price ~ Bedrooms, data = HousePrice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -48671 -14496    462  13178  61763 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    71575       8718   8.210 2.24e-13 ***
## Bedrooms       19466       2804   6.941 1.83e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22940 on 126 degrees of freedom
## Multiple R-squared:  0.2766, Adjusted R-squared:  0.2709 
## F-statistic: 48.18 on 1 and 126 DF,  p-value: 1.83e-10
m4=lm(Price~Bathrooms, data = HousePrice)
summary(m4)
## 
## Call:
## lm(formula = Price ~ Bathrooms, data = HousePrice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -61985 -15583  -2272  15722  65615 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    63605       9906   6.421 2.51e-09 ***
## Bathrooms      27327       3965   6.892 2.35e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22990 on 126 degrees of freedom
## Multiple R-squared:  0.2738, Adjusted R-squared:  0.268 
## F-statistic: 47.51 on 1 and 126 DF,  p-value: 2.345e-10
m5=lm(Price~Offers, data = HousePrice)
summary(m5)
## 
## Call:
## lm(formula = Price ~ Offers, data = HousePrice)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -58003 -19213  -5612  18278  84097 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   150745       5929  25.424  < 2e-16 ***
## Offers         -7881       2126  -3.708 0.000312 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25610 on 126 degrees of freedom
## Multiple R-squared:  0.09837,    Adjusted R-squared:  0.09121 
## F-statistic: 13.75 on 1 and 126 DF,  p-value: 0.0003122

Conclusion

It can be concluded that house price is impacted by square footage, and the number of bathrooms and bedrooms.The categories of HomeID Brick, and Neighborhood columns were removed since they were irrelevant to the analysis. Models were made to show the relationships of Bathrooms, Bedrooms, and Square Footage with Price. The first model showed a scatterplot of Price vs. SqFt. The plot showed that the price for a house increased as its square footage increased. The second model showed the relationship Price vs. Bedrooms. The plot showed that the number of bedrooms for a house ranges from 2-5 and that the number of rooms wasn’t necessarily the only reason that a house may have cost more. The third model showed hows the relationship Price vs. Bathrooms. The plot showed that the number of bathrooms for a house ranges from 2-4 and that the number of bathrooms wasn’t necessarily the only reason that a house may have cost more. The fourth plot showed the relationship between Price vs, Offers. The number of offers ranged from 1-5 offers, but the offering prices varied. This showed the number of offers wasn’t the sole factor that impacted overall price. The output for all of the variables showed that SqFt, Bedrooms, Bathrooms, and Offers explains about 70% of the variably for house price. The correlation matrix showed that Price has the strongest correlations with SqFt, Bedrooms, and Bathrooms. While the correlation values aren’t that high, those three variables have the highest ones. For skewness, SqFt was the closest variable that neared zero and that it’s kurtosis was close to 3. The other variables kurtosis were close to 3, but there skewness were a little higher. The model for SqFt showed that 30.1% of its variability accounted for price, and that each square foot costs about $70.22. The model for Bedrooms showed that 27.6% of its variability accounted price and that each bedroom cost about $19,466. The model for Bathrooms showed that 27.3% of its variability accounted for price, and that each bathroom costs about $27,327. The model for Offers showed that 9.8% of its variability accounted for price, and that each offer lessened the price by -$7,881.