Data was collected on 100 houses recently sold in a city. It consisted of the sales price (in $), house size (in square feet), the number of bedrooms, and the annual real estate tax (in $).
library(readxl)
## Warning: package 'readxl' was built under R version 3.3.3
housing <- read_excel("C:/Users/Vinay/MyRWork/MultipleRegressionAnalysisAssignment.xlsx")
housing
## Price Size Tax Bedroom
## 1 382374 3077 1835 4
## 2 243944 2480 2401 3
## 3 271384 2732 3957 2
## 4 231283 2889 3646 4
## 5 696108 1416 4824 3
## 6 573958 3317 2271 2
## 7 126827 1609 2700 3
## 8 666507 2510 1220 5
## 9 669773 2321 4443 4
## 10 676600 3040 3773 3
## 11 443487 2065 1133 1
## 12 366304 1314 3113 5
## 13 395384 2514 3410 2
## 14 288365 1862 2261 4
## 15 392874 2643 4511 4
## 16 642262 2377 2166 4
## 17 470750 2739 4610 3
## 18 500450 1294 4035 3
## 19 309438 3877 3199 5
## 20 695833 2813 4182 1
## 21 403600 2888 3295 5
## 22 107888 2629 1673 3
## 23 132039 3278 1163 4
## 24 544609 2850 4565 4
## 25 693154 3893 3988 3
## 26 148383 1534 3932 2
## 27 691992 3704 4144 2
## 28 159389 2671 4556 4
## 29 291433 1676 2456 4
## 30 343977 2621 3149 4
## 31 102473 3882 4992 4
## 32 299896 3233 2843 4
## 33 106572 3854 3207 1
## 34 447384 1080 3991 3
## 35 564943 2697 3625 3
## 36 114274 2367 3319 2
## 37 351245 3070 4543 4
## 38 537115 3703 1095 2
## 39 247149 1001 883 3
## 40 391773 1999 1048 1
## 41 110816 1786 4976 2
## 42 287789 1523 1465 1
## 43 518058 3520 3689 2
## 44 156343 2294 3629 4
## 45 516500 1583 2630 2
## 46 373974 2402 1515 4
## 47 463759 2246 4664 5
## 48 315379 2533 1825 4
## 49 451385 1441 3918 2
## 50 607390 2780 4790 3
## 51 438956 1886 3477 3
## 52 306041 3502 3856 1
## 53 480141 2389 1715 1
## 54 641800 2715 3219 5
## 55 633090 3405 4272 5
## 56 353046 3198 1662 3
## 57 643094 2103 1288 2
## 58 425530 2824 3949 3
## 59 516514 2199 2366 1
## 60 654003 1471 4023 5
## 61 687017 2863 4004 2
## 62 581898 2343 4008 3
## 63 657913 1912 4680 1
## 64 250101 3773 4976 3
## 65 124229 1949 1708 4
## 66 593670 3525 1293 2
## 67 499275 1898 2177 5
## 68 305039 3622 2987 5
## 69 183217 3573 2692 1
## 70 677795 1161 3037 4
## 71 413243 2716 2054 2
## 72 477415 3437 2750 3
## 73 408751 3118 1870 3
## 74 226606 3334 877 2
## 75 632040 3306 1496 1
## 76 472470 3807 4483 4
## 77 350582 1774 1549 2
## 78 130775 1767 2326 2
## 79 511378 3771 3512 4
## 80 527075 2278 4286 5
## 81 340017 2000 2405 5
## 82 208139 2603 1920 5
## 83 175386 1230 2798 5
## 84 375749 2022 2953 2
## 85 118583 1480 1049 2
## 86 419249 3666 2773 1
## 87 161009 2970 2147 1
## 88 227432 3476 914 2
## 89 410853 2104 2606 3
## 90 418980 1921 3790 1
## 91 494029 1422 4236 3
## 92 387532 2745 4319 2
## 93 611014 3036 1197 4
## 94 353905 1876 4318 3
## 95 508767 2458 3024 1
## 96 139347 1389 3454 4
## 97 551511 3332 1068 5
## 98 451381 2886 4453 1
## 99 100925 3525 4633 3
## 100 601664 1818 3002 1
Please investigate how the variables are related to one another. You can do this graphically by constructing scatter plots of all pair-wise combinations.
myvars<- c("Price", "Size", "Tax")
housing1<- housing[myvars]
plot(housing1)
Based on the scatter plot,there is no significant correlation between the selected variable Price,Size,Tax and Bedroom
Please fit a linear model to predict the impact of the most important variables on sales price.
results <-lm(Price ~ Size + Tax, data=housing)
results
##
## Call:
## lm(formula = Price ~ Size + Tax, data = housing)
##
## Coefficients:
## (Intercept) Size Tax
## 3.179e+05 9.045e+00 2.075e+01
summary(results)
##
## Call:
## lm(formula = Price ~ Size + Tax, data = housing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -354111 -127562 7866 134148 300595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.179e+05 7.515e+04 4.230 5.29e-05 ***
## Size 9.045e+00 2.307e+01 0.392 0.696
## Tax 2.075e+01 1.493e+01 1.389 0.168
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 180000 on 97 degrees of freedom
## Multiple R-squared: 0.02144, Adjusted R-squared: 0.001268
## F-statistic: 1.063 on 2 and 97 DF, p-value: 0.3495
The output shows that F = 1.845, indicating that we should clearly reject the null hypothesis that the variables Size and Tax collectively have no effect on Price. The results also show that the variable Size is significant controlling for the variable Tax (p = 0.217), as is Tax controlling for the variable Size (p=0.146). In addition, the output also shows that R-squared= 0.03665 and Adjusted R-squared = 0.01679
Please then check the if the selected variables have a significant impact of sales price or not (5% significance level)
reduced <-lm(Price ~ Size + Tax, data=housing)
full <- lm(Price ~ Size + Tax + Bedroom, data=housing)
anova(reduced, full)
## Analysis of Variance Table
##
## Model 1: Price ~ Size + Tax
## Model 2: Price ~ Size + Tax + Bedroom
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 97 3.1427e+12
## 2 96 3.1418e+12 1 931656954 0.0285 0.8664
we include the variables bedroom, size and tax in our model and are interested in testing whether the number of bedrooms are significant after taking size and tax into consideration.
The output shows the results of the partial F-test. Since F=1.46 (p-value=0.2303) we cannot reject the null hypothesis at the 5% level of significance.It appears that the variable Bedroom do not contribute significant information to the sales price once the variables Size and Tax have been taken into consideration.