Robert W. Walker
November 30, 2016
model. (mod’l) A systematic description of an object or phenomenon that shares important > characteristics with the object or phenomenon. Scientific models can be material, visual, > mathematical, or computational and are often used in the construction of scientific theories. See also hypothesis, theory.
Models predict:
* the response variable also sometimes known as dependent variable [in experimental settings], outcome variable, output, or endogenous variable.
Models predict the response variable as a function of:
* explanatory variables also sometimes known as predictor variables, independent variables, manipulated variables, control variables, inputs, and exogenous variables. Moderating or mediating variables are special cases.
* Moderating variables produce interaction effects. The effect of a predictor on an outcome depends on some other factor. Consider the effect of penicillin on sick people.
* Mediating or intervening variables present pathways for an explanatory variable to impact a response variable. Parents transmit social status to children through access to education.
What determines real estate prices?
## Value LotSize Bedrooms Bathrooms Rooms Age Taxes Garage
## 1 342.0 6.9 4 2.0 8 38 6750.0 One-car garage
## 2 387.0 6.0 2 2.0 7 30 5140.8 One-car garage
## 3 288.0 6.0 3 2.0 6 35 5832.0 None
## 4 351.0 6.0 5 2.0 8 35 7200.0 One-car garage
## 5 293.4 7.0 3 1.0 6 39 4860.0 One-car garage
## 6 325.8 7.0 4 1.5 7 32 7126.2 One-car garage
## Location Style Heating.Fuel Heating.System Pool EIK
## 1 C Colonial Oil Hot water None Present
## 2 A Ranch Oil Hot water None Present
## 3 C Cape Oil Hot water None Present
## 4 A Ranch Oil Hot air None Present
## 5 C Cape Oil Hot air None Present
## 6 B Ranch Oil Hot water Above ground Present
## C.A.C Fireplace Sewer Basement Modern.Kitchen Modern.Bathrooms
## 1 Absent Absent Present Absent Absent Absent
## 2 Present Absent Present Present Absent Absent
## 3 Absent Present Present Absent Present Present
## 4 Absent Absent Present Absent Present Present
## 5 Absent Absent Present Absent Absent Absent
## 6 Absent Absent Present Present Absent Absent
## Value LotSize Bedrooms Bathrooms
## Min. :180.0 Min. : 3.550 Min. :1.000 Min. :1.00
## 1st Qu.:279.0 1st Qu.: 6.000 1st Qu.:3.000 1st Qu.:1.00
## Median :315.0 Median : 7.500 Median :3.000 Median :1.50
## Mean :320.1 Mean : 9.327 Mean :3.575 Mean :1.61
## 3rd Qu.:342.0 3rd Qu.:10.000 3rd Qu.:4.000 3rd Qu.:2.00
## Max. :558.0 Max. :37.500 Max. :7.000 Max. :3.50
## Rooms Age Taxes Garage
## Min. : 4.000 Min. : 1.00 Min. : 1800 None : 94
## 1st Qu.: 6.000 1st Qu.:28.00 1st Qu.: 5040 One-car garage:222
## Median : 7.000 Median :33.50 Median : 5940 Two-car garage: 46
## Mean : 7.052 Mean :33.43 Mean : 6104
## 3rd Qu.: 8.000 3rd Qu.:37.00 3rd Qu.: 7148
## Max. :12.000 Max. :95.00 Max. :11646
## Location Style Heating.Fuel Heating.System
## A:74 Cape :112 Gas: 44 Hot air : 57
## B:60 Colonial : 47 Oil:318 Hot water:293
## C:99 Expanded ranch: 38 Other : 12
## D:84 Ranch :119
## E:45 Split level : 46
##
## Pool EIK C.A.C Fireplace
## Above ground: 38 Absent : 22 Absent :333 Absent :240
## In ground : 16 Present:340 Present: 29 Present:122
## None :308
##
##
##
## Sewer Basement Modern.Kitchen Modern.Bathrooms
## Absent : 96 Absent :161 Absent :184 Absent :193
## Present:266 Present:201 Present:178 Present:169
##
##
##
##
## [1] "51" "144" "63" "103" "31" "35" "142" "233" "92" "112" "140"
## [12] "182" "310" "311" "312" "313" "314" "315" "316" "317" "362"
A categorical predictor. How much does location alone explain?
Because it is categorical, I need a linear model. Almost 40% of the variance.
##
## Call:
## lm(formula = Value ~ Location, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -164.40 -28.28 -6.29 16.42 204.60
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 387.409 5.963 64.973 < 2e-16 ***
## LocationB -43.009 8.911 -4.827 0.00000206 ***
## LocationC -77.194 7.882 -9.794 < 2e-16 ***
## LocationD -108.697 8.178 -13.292 < 2e-16 ***
## LocationE -111.119 9.696 -11.460 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51.29 on 357 degrees of freedom
## Multiple R-squared: 0.394, Adjusted R-squared: 0.3872
## F-statistic: 58.02 on 4 and 357 DF, p-value: < 2.2e-16
The label unseen is the average for category A. That’s represented in the intercept.
Everything else shows a difference from category A. They are all cheaper than A.
By how much?
* -43.009 thousand for B with a standard error of 8.911 thousand.
* -77.194 thousand for C with a standard error of 7.882 thousand.
* -108.697 thousand for D with a standard error of 8.178 thousand.
* -111.119 thousand for E with a standard error of 9.696 thousand.
## [1] "51" "144" "63" "103" "31" "35" "142" "233" "92" "112" "140"
## [12] "182" "310" "311" "312" "313" "314" "315" "316" "317" "362"
A regression
##
## Call:
## lm(formula = Value ~ Location + Rooms, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -216.067 -23.091 -4.095 14.757 182.503
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 249.425 13.552 18.405 < 2e-16 ***
## LocationB -47.111 7.716 -6.105 0.00000000268 ***
## LocationC -74.055 6.823 -10.853 < 2e-16 ***
## LocationD -104.034 7.086 -14.682 < 2e-16 ***
## LocationE -110.484 8.387 -13.174 < 2e-16 ***
## Rooms 19.375 1.760 11.010 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 44.36 on 356 degrees of freedom
## Multiple R-squared: 0.5479, Adjusted R-squared: 0.5416
## F-statistic: 86.29 on 5 and 356 DF, p-value: < 2.2e-16
A house in Location A with no rooms is 249.425 thousand dollars with a standard error of 13.552 thousand dollars. The difference between Location B and A is -47.111 thousand with a standard error of 7.716 thousand dollars. The difference between Location C and A is -74.055 thousand with a standard error of 6.823 thousand dollars. The difference between Location D and A is -104.034 thousand with a standard error of 7.086 thousand dollars. The difference between Location E and A is -110.484 thousand with a standard error of 8.387 thousand dollars. Each room is estimated to be worth 19.375 thousand dollars with a standard error of 1.76 thousand dollars. This alone explains 54.79% of variance. Rooms clearly increase prices. Do they do so uniformly?
I could estimate a regression for each location. As it happens, the easiest way to do this is to create a pivot table in Excel with locations on the rows. Then I can click in the pivot table and it will show me all of the data corresponding to that location. I can copy that data to the clipboard and import it into R. Then I have a dataset for each location. The R Commander can also subset the active data set using Active data set and subset Active data set. Select all variables, give the dataset a name, I will call mine LocA, and change the expression to Location==“A” for Location A or Location==“B” for Location B with dataset name LocB, and so on.
I can also combine predictors to measure separate slopes. I need a factor and a quantity and I multiply them together. It makes the slope conditional on the Location with a unique slope per location. To show the basic picture of this, construct a scatterplot of Rooms on the x axis and Value on the y. Plot the data by groups, the location, plot the lines by group, and include the least squares line. If the lines are parallel, the slopes do not depend on the location.
##
## Call:
## lm(formula = Value ~ Location * Rooms, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -200.705 -22.553 -4.951 13.347 186.344
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 134.642 28.382 4.744 0.00000305 ***
## LocationB 109.918 43.788 2.510 0.012512 *
## LocationC 64.841 40.456 1.603 0.109887
## LocationD 23.432 36.549 0.641 0.521861
## LocationE 42.354 39.400 1.075 0.283126
## Rooms 35.493 3.922 9.049 < 2e-16 ***
## LocationB:Rooms -21.878 5.956 -3.673 0.000277 ***
## LocationC:Rooms -19.582 5.670 -3.454 0.000621 ***
## LocationD:Rooms -17.961 5.110 -3.515 0.000497 ***
## LocationE:Rooms -21.486 5.424 -3.962 0.00009016 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 43.3 on 352 degrees of freedom
## Multiple R-squared: 0.5741, Adjusted R-squared: 0.5632
## F-statistic: 52.72 on 9 and 352 DF, p-value: < 2.2e-16
How does that compare?
## Analysis of Variance Table
##
## Model 1: Value ~ Location + Rooms
## Model 2: Value ~ Location * Rooms
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 356 700666
## 2 352 660113 4 40553 5.4062 0.0003101 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
They are not equivalent. Rooms have a different price by location.
Let’s try the effects plot. Select everything including higher order effects.
##
## Attaching package: 'effects'
## The following object is masked from 'package:car':
##
## Prestige
We can also predict this. Let me use the R Commander directly. Or even Excel….
Rooms can go from 4 to 12. Locations can be A, B, C, D, E.
##
## Call:
## lm(formula = Value ~ Location * Bathrooms, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -130.206 -23.403 -5.493 18.149 182.584
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 244.752 15.449 15.843 < 2e-16 ***
## LocationB 9.244 24.283 0.381 0.70368
## LocationC 16.334 20.941 0.780 0.43592
## LocationD -29.521 20.840 -1.417 0.15749
## LocationE -33.151 25.280 -1.311 0.19059
## Bathrooms 79.076 8.114 9.745 < 2e-16 ***
## LocationB:Bathrooms -22.866 13.780 -1.659 0.09795 .
## LocationC:Bathrooms -46.758 12.018 -3.891 0.00012 ***
## LocationD:Bathrooms -38.370 11.724 -3.273 0.00117 **
## LocationE:Bathrooms -38.362 14.441 -2.656 0.00826 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 42.48 on 352 degrees of freedom
## Multiple R-squared: 0.5902, Adjusted R-squared: 0.5797
## F-statistic: 56.32 on 9 and 352 DF, p-value: < 2.2e-16
## [1] 3767.396
## [1] 3753.473
The bottom two results are AIC; bathrooms are preferred. Here is another way to compare them. Suppose that I measure the residuals from each model. If the absolute value of the residual is smaller from one model than the other, then that observation is better explained by the model with the smaller residual. How do these compare?
As we saw, the Bathrooms explain more variance. What is interesting is that much of this improvement comes from a few poor fits. In the table below, TRUE means that Bathrooms have smaller residuals in absolute value.
##
## FALSE TRUE
## 193 169
How about the ``garbage can’’?
##
## Call:
## lm(formula = Value ~ Age + Basement + Bathrooms + Bedrooms +
## C.A.C + EIK + Fireplace + Garage + Heating.Fuel + Heating.System +
## Location + LotSize + Modern.Bathrooms + Modern.Kitchen +
## Pool + Rooms + Sewer + Style, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -129.443 -18.941 -2.124 15.197 144.938
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 280.9887 21.4727 13.086 < 2e-16 ***
## Age -0.6506 0.1852 -3.513 0.000504 ***
## BasementPresent 8.1272 5.3220 1.527 0.127683
## Bathrooms 25.4590 4.2547 5.984 0.00000000561888 ***
## Bedrooms -8.7729 3.3622 -2.609 0.009482 **
## C.A.CPresent 28.5657 7.6700 3.724 0.000230 ***
## EIKPresent 10.9302 8.2130 1.331 0.184148
## FireplacePresent 16.9429 4.4362 3.819 0.000160 ***
## GarageOne-car garage 4.5713 4.5270 1.010 0.313329
## GarageTwo-car garage 22.1096 6.9286 3.191 0.001552 **
## Heating.FuelOil -10.1317 6.9004 -1.468 0.142971
## Heating.SystemHot water 15.9273 5.8566 2.720 0.006879 **
## Heating.SystemOther 8.0988 11.7981 0.686 0.492907
## LocationB -40.7861 6.5694 -6.208 0.00000000158686 ***
## LocationC -52.0083 7.3114 -7.113 0.00000000000694 ***
## LocationD -106.2347 7.1206 -14.919 < 2e-16 ***
## LocationE -111.9624 8.7067 -12.859 < 2e-16 ***
## LotSize 0.8130 0.4554 1.785 0.075149 .
## Modern.BathroomsPresent -17.6002 7.6113 -2.312 0.021365 *
## Modern.KitchenPresent 14.6608 7.1824 2.041 0.042015 *
## PoolIn ground 4.6209 10.8155 0.427 0.669476
## PoolNone -3.3363 6.5933 -0.506 0.613179
## Rooms 8.9899 2.1204 4.240 0.00002900087035 ***
## SewerPresent 5.1261 5.0331 1.018 0.309193
## StyleColonial 31.1265 6.9453 4.482 0.00001018956356 ***
## StyleExpanded ranch 13.0737 7.9362 1.647 0.100426
## StyleRanch -4.5048 5.4746 -0.823 0.411182
## StyleSplit level 24.4713 7.3651 3.323 0.000991 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.31 on 334 degrees of freedom
## Multiple R-squared: 0.7314, Adjusted R-squared: 0.7097
## F-statistic: 33.68 on 27 and 334 DF, p-value: < 2.2e-16
##
## Direction: backward/forward
## Criterion: BIC
##
## Start: AIC=2716.18
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Heating.Fuel + Heating.System + Location +
## LotSize + Modern.Bathrooms + Modern.Kitchen + Pool + Rooms +
## Sewer + Style
##
## Df Sum of Sq RSS AIC
## - Pool 2 1124 417443 2705.4
## - Sewer 1 1293 417612 2711.4
## - EIK 1 2208 418527 2712.2
## - Heating.System 2 9445 425764 2712.5
## - Heating.Fuel 1 2687 419006 2712.6
## - Basement 1 2907 419226 2712.8
## - LotSize 1 3972 420291 2713.7
## - Modern.Kitchen 1 5193 421513 2714.8
## - Garage 2 13191 429510 2715.7
## - Modern.Bathrooms 1 6665 422984 2716.0
## <none> 416319 2716.2
## - Bedrooms 1 8486 424805 2717.6
## - Age 1 15381 431701 2723.4
## - C.A.C 1 17289 433609 2725.0
## - Fireplace 1 18182 434501 2725.8
## - Rooms 1 22406 438725 2729.3
## - Style 4 46411 462730 2730.9
## - Bathrooms 1 44630 460949 2747.2
## - Location 4 305534 721853 2891.8
##
## Step: AIC=2705.38
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Heating.Fuel + Heating.System + Location +
## LotSize + Modern.Bathrooms + Modern.Kitchen + Rooms + Sewer +
## Style
##
## Df Sum of Sq RSS AIC
## - Sewer 1 1303 418746 2700.6
## - Heating.System 2 8956 426400 2701.3
## - Heating.Fuel 1 2449 419892 2701.6
## - EIK 1 2478 419922 2701.6
## - Basement 1 3077 420521 2702.1
## - LotSize 1 3825 421268 2702.8
## - Modern.Kitchen 1 5243 422687 2704.0
## - Garage 2 12814 430257 2704.5
## - Modern.Bathrooms 1 6492 423936 2705.1
## <none> 417443 2705.4
## - Bedrooms 1 8442 425885 2706.7
## - Age 1 15252 432695 2712.5
## - C.A.C 1 17641 435085 2714.5
## - Fireplace 1 18605 436049 2715.3
## + Pool 2 1124 416319 2716.2
## - Style 4 46450 463894 2720.0
## - Rooms 1 24560 442003 2720.2
## - Bathrooms 1 45485 462929 2736.9
## - Location 4 306589 724033 2881.2
##
## Step: AIC=2700.61
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Heating.Fuel + Heating.System + Location +
## LotSize + Modern.Bathrooms + Modern.Kitchen + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Heating.System 2 9116 427862 2696.6
## - Heating.Fuel 1 2306 421052 2696.7
## - EIK 1 2608 421354 2697.0
## - LotSize 1 3293 422039 2697.6
## - Basement 1 3328 422074 2697.6
## - Modern.Kitchen 1 5280 424026 2699.3
## - Garage 2 12547 431293 2699.5
## - Modern.Bathrooms 1 6346 425092 2700.2
## <none> 418746 2700.6
## - Bedrooms 1 8139 426885 2701.7
## + Sewer 1 1303 417443 2705.4
## - Age 1 14583 433330 2707.1
## - C.A.C 1 17519 436265 2709.6
## - Fireplace 1 18997 437744 2710.8
## + Pool 2 1134 417612 2711.4
## - Rooms 1 24682 443429 2715.4
## - Style 4 46937 465683 2715.5
## - Bathrooms 1 46442 465188 2732.8
## - Location 4 318354 737100 2881.7
##
## Step: AIC=2696.63
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Heating.Fuel + Location + LotSize +
## Modern.Bathrooms + Modern.Kitchen + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Heating.Fuel 1 161 428023 2690.9
## - LotSize 1 3151 431013 2693.4
## - EIK 1 3308 431170 2693.5
## - Basement 1 3739 431601 2693.9
## - Modern.Kitchen 1 4537 432399 2694.6
## - Modern.Bathrooms 1 5282 433144 2695.2
## - Garage 2 12538 440400 2695.3
## <none> 427862 2696.6
## - Bedrooms 1 9675 437537 2698.8
## + Heating.System 2 9116 418746 2700.6
## + Sewer 1 1462 426400 2701.3
## - Age 1 15114 442976 2703.3
## - C.A.C 1 18055 445917 2705.7
## - Fireplace 1 18736 446598 2706.2
## + Pool 2 629 427232 2707.9
## - Style 4 48039 475900 2711.6
## - Rooms 1 27164 455026 2713.0
## - Bathrooms 1 47915 475776 2729.2
## - Location 4 315815 743677 2873.2
##
## Step: AIC=2690.87
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Location + LotSize + Modern.Bathrooms +
## Modern.Kitchen + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - LotSize 1 2993 431016 2687.5
## - EIK 1 3212 431235 2687.7
## - Basement 1 3713 431736 2688.1
## - Modern.Kitchen 1 4495 432517 2688.8
## - Modern.Bathrooms 1 5263 433286 2689.4
## - Garage 2 12397 440420 2689.4
## <none> 428023 2690.9
## - Bedrooms 1 9636 437659 2693.0
## + Sewer 1 1406 426617 2695.6
## + Heating.Fuel 1 161 427862 2696.6
## + Heating.System 2 6971 421052 2696.7
## - Age 1 15081 443104 2697.5
## - Fireplace 1 18590 446613 2700.4
## - C.A.C 1 18619 446642 2700.4
## + Pool 2 586 427436 2702.2
## - Style 4 48836 476859 2706.4
## - Rooms 1 27172 455195 2707.3
## - Bathrooms 1 47927 475949 2723.4
## - Location 4 330771 758794 2874.6
##
## Step: AIC=2687.5
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK +
## Fireplace + Garage + Location + Modern.Bathrooms + Modern.Kitchen +
## Rooms + Style
##
## Df Sum of Sq RSS AIC
## - EIK 1 3292 434308 2684.4
## - Basement 1 3768 434784 2684.8
## - Modern.Kitchen 1 4095 435111 2685.0
## - Modern.Bathrooms 1 4860 435876 2685.7
## - Garage 2 12475 443491 2686.1
## <none> 431016 2687.5
## - Bedrooms 1 8592 439608 2688.8
## + LotSize 1 2993 428023 2690.9
## + Sewer 1 898 430118 2692.6
## + Heating.System 2 7719 423297 2692.7
## + Heating.Fuel 1 3 431013 2693.4
## - Age 1 16480 447496 2695.2
## - Fireplace 1 19118 450134 2697.3
## - C.A.C 1 19768 450784 2697.8
## + Pool 2 563 430453 2698.8
## - Style 4 49009 480024 2702.9
## - Rooms 1 27493 458509 2704.0
## - Bathrooms 1 47511 478527 2719.5
## - Location 4 352179 783195 2880.1
##
## Step: AIC=2684.36
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + Fireplace +
## Garage + Location + Modern.Bathrooms + Modern.Kitchen + Rooms +
## Style
##
## Df Sum of Sq RSS AIC
## - Basement 1 3085 437392 2681.0
## - Modern.Kitchen 1 5115 439423 2682.7
## - Garage 2 12671 446979 2683.0
## - Modern.Bathrooms 1 5574 439882 2683.1
## <none> 434308 2684.4
## - Bedrooms 1 8736 443044 2685.7
## + EIK 1 3292 431016 2687.5
## + LotSize 1 3073 431235 2687.7
## + Heating.System 2 8610 425698 2688.9
## + Sewer 1 1034 433273 2689.4
## + Heating.Fuel 1 40 434268 2690.2
## - Age 1 16135 450443 2691.7
## - C.A.C 1 20161 454469 2694.9
## - Fireplace 1 20317 454625 2695.0
## + Pool 2 728 433580 2695.5
## - Rooms 1 26830 461138 2700.2
## - Style 4 51778 486086 2701.6
## - Bathrooms 1 50933 485240 2718.6
## - Location 4 357488 791796 2878.2
##
## Step: AIC=2681.03
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage +
## Location + Modern.Bathrooms + Modern.Kitchen + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Modern.Kitchen 1 5239 442632 2679.4
## - Modern.Bathrooms 1 5891 443284 2680.0
## - Garage 2 14070 451462 2680.7
## <none> 437392 2681.0
## - Bedrooms 1 8126 445519 2681.8
## + LotSize 1 3115 434277 2684.3
## + Basement 1 3085 434308 2684.4
## + EIK 1 2609 434784 2684.8
## + Heating.System 2 8936 428457 2685.3
## + Sewer 1 1236 436157 2685.9
## + Heating.Fuel 1 47 437346 2686.9
## - Age 1 15612 453004 2687.8
## - Fireplace 1 20249 457641 2691.5
## + Pool 2 845 436547 2692.1
## - C.A.C 1 23540 460932 2694.1
## - Rooms 1 26909 464301 2696.8
## - Style 4 52465 489857 2698.5
## - Bathrooms 1 50884 488276 2715.0
## - Location 4 387661 825054 2887.2
##
## Step: AIC=2679.45
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage +
## Location + Modern.Bathrooms + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Modern.Bathrooms 1 987 443619 2674.4
## - Garage 2 12094 454726 2677.4
## <none> 442632 2679.4
## - Bedrooms 1 7420 450051 2679.6
## + Modern.Kitchen 1 5239 437392 2681.0
## + EIK 1 3525 439107 2682.4
## + Basement 1 3208 439423 2682.7
## + LotSize 1 2667 439965 2683.2
## + Sewer 1 1341 441290 2684.2
## + Heating.System 2 8443 434188 2684.3
## + Heating.Fuel 1 71 442561 2685.3
## - Age 1 16242 458874 2686.6
## - Fireplace 1 19567 462199 2689.2
## + Pool 2 970 441662 2690.4
## - Rooms 1 27503 470134 2695.4
## - Style 4 51273 493905 2695.6
## - C.A.C 1 27766 470397 2695.6
## - Bathrooms 1 49234 491866 2711.7
## - Location 4 392436 835067 2885.7
##
## Step: AIC=2674.37
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage +
## Location + Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Garage 2 12078 455697 2672.3
## - Bedrooms 1 7119 450738 2674.2
## <none> 443619 2674.4
## + EIK 1 3407 440212 2677.5
## + Basement 1 3350 440269 2677.5
## + LotSize 1 2626 440992 2678.1
## + Sewer 1 1222 442397 2679.3
## + Modern.Bathrooms 1 987 442632 2679.4
## + Heating.System 2 8094 435525 2679.5
## + Modern.Kitchen 1 335 443284 2680.0
## + Heating.Fuel 1 59 443560 2680.2
## - Age 1 16085 459704 2681.4
## - Fireplace 1 19709 463328 2684.2
## + Pool 2 851 442768 2685.5
## - Rooms 1 27010 470629 2689.9
## - Style 4 51384 495003 2690.5
## - C.A.C 1 27847 471466 2690.5
## - Bathrooms 1 48708 492327 2706.2
## - Location 4 428186 871805 2895.4
##
## Step: AIC=2672.31
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Location +
## Rooms + Style
##
## Df Sum of Sq RSS AIC
## - Bedrooms 1 7439 463135 2672.3
## <none> 455697 2672.3
## + Garage 2 12078 443619 2674.4
## + Basement 1 4668 451029 2674.5
## + EIK 1 3271 452425 2675.6
## + LotSize 1 2758 452939 2676.0
## + Heating.System 2 8900 446796 2676.9
## + Sewer 1 1029 454667 2677.4
## + Modern.Bathrooms 1 971 454726 2677.4
## + Heating.Fuel 1 245 455452 2678.0
## + Modern.Kitchen 1 91 455605 2678.1
## - Age 1 16005 471702 2678.9
## + Pool 2 660 455037 2683.6
## - Fireplace 1 24254 479951 2685.2
## - Rooms 1 27336 483032 2687.5
## - C.A.C 1 28486 484182 2688.4
## - Style 4 57248 512945 2691.6
## - Bathrooms 1 57639 513336 2709.5
## - Location 4 428361 884058 2888.6
##
## Step: AIC=2672.28
## Value ~ Age + Bathrooms + C.A.C + Fireplace + Location + Rooms +
## Style
##
## Df Sum of Sq RSS AIC
## <none> 463135 2672.3
## + Bedrooms 1 7439 455697 2672.3
## + Garage 2 12398 450738 2674.2
## + Basement 1 3933 459202 2675.1
## + EIK 1 3442 459694 2675.5
## + Heating.System 2 10204 452931 2676.0
## - Age 1 12553 475689 2676.1
## + LotSize 1 1800 461336 2676.8
## + Sewer 1 873 462262 2677.5
## + Modern.Bathrooms 1 694 462442 2677.6
## + Heating.Fuel 1 215 462921 2678.0
## + Modern.Kitchen 1 106 463029 2678.1
## - Rooms 1 19906 483041 2681.6
## - Fireplace 1 22339 485474 2683.4
## + Pool 2 560 462576 2683.6
## - Style 4 58358 521493 2691.7
## - C.A.C 1 37163 500299 2694.3
## - Bathrooms 1 53646 516781 2706.1
## - Location 4 423435 886571 2883.8
##
## Call:
## lm(formula = Value ~ Age + Bathrooms + C.A.C + Fireplace + Location +
## Rooms + Style, data = RealEstate)
##
## Coefficients:
## (Intercept) Age Bathrooms
## 288.3237 -0.5675 26.8216
## C.A.CPresent FireplacePresent LocationB
## 39.3708 18.3318 -38.8414
## LocationC LocationD LocationE
## -62.0523 -100.1258 -104.8913
## Rooms StyleColonial StyleExpanded ranch
## 7.1053 29.7749 13.3786
## StyleRanch StyleSplit level
## -4.6369 32.0658
##
## Call:
## lm(formula = Value ~ Age + Bathrooms + C.A.C + Fireplace + Location +
## Rooms + Style, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -150.803 -20.347 -1.835 15.846 144.611
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 288.3237 16.0159 18.002 < 2e-16 ***
## Age -0.5675 0.1848 -3.071 0.002300 **
## Bathrooms 26.8216 4.2246 6.349 0.000000000677 ***
## C.A.CPresent 39.3708 7.4504 5.284 0.000000223047 ***
## FireplacePresent 18.3318 4.4745 4.097 0.000052110679 ***
## LocationB -38.8414 6.5213 -5.956 0.000000006323 ***
## LocationC -62.0523 6.1227 -10.135 < 2e-16 ***
## LocationD -100.1258 6.3258 -15.828 < 2e-16 ***
## LocationE -104.8913 7.3406 -14.289 < 2e-16 ***
## Rooms 7.1053 1.8372 3.867 0.000131 ***
## StyleColonial 29.7749 6.9678 4.273 0.000024902879 ***
## StyleExpanded ranch 13.3786 7.7904 1.717 0.086810 .
## StyleRanch -4.6369 5.4885 -0.845 0.398772
## StyleSplit level 32.0658 7.0583 4.543 0.000007658643 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.48 on 348 degrees of freedom
## Multiple R-squared: 0.7012, Adjusted R-squared: 0.69
## F-statistic: 62.81 on 13 and 348 DF, p-value: < 2.2e-16
It fits pretty well.
##
## Shapiro-Wilk normality test
##
## data: RealEstate$residuals.LinearModel.3
## W = 0.94821, p-value = 0.0000000005921
## Warning: 'qq.plot' is deprecated.
## Use 'qqPlot' instead.
## See help("Deprecated") and help("car-deprecated").
They’re not normal. They often are not. Think about the data.
Inference of a basic sort. If the residuals are normal, then slopes are t, sum of squares can be compared using F, etc. If the residuls are not normal, then we do not know the distributions of sums of squares to be chi square and a host of problems arise. There is a way to calculate the standard error for the slopes and intercept known as the sandwich that gives the slopes t distributions under general conditions. This will allow usto at least say, with a particular probability, what is related to what. Under Models and then summarize models, there is a tick box for the sandwich estimator. There are multiple sandwich estimators and there is a large literature on their performance. hc3 and hc4 seem generally preferred. Summarizing th model with a sandwich yields the following.
##
## Call:
## lm(formula = Value ~ Age + Bathrooms + C.A.C + Fireplace + Location +
## Rooms + Style, data = RealEstate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -150.803 -20.347 -1.835 15.846 144.611
##
## Coefficients:
## Estimate Std.Err(hc3) t value Pr(>|t|)
## (Intercept) 288.3237 19.7849 14.573 < 2e-16 ***
## Age -0.5675 0.2292 -2.476 0.013766 *
## Bathrooms 26.8216 4.8610 5.518 0.0000000672 ***
## C.A.CPresent 39.3708 12.0817 3.259 0.001230 **
## FireplacePresent 18.3318 4.8090 3.812 0.000163 ***
## LocationB -38.8414 8.3278 -4.664 0.0000044275 ***
## LocationC -62.0523 6.1478 -10.093 < 2e-16 ***
## LocationD -100.1258 7.5668 -13.232 < 2e-16 ***
## LocationE -104.8913 7.0800 -14.815 < 2e-16 ***
## Rooms 7.1053 2.5466 2.790 0.005559 **
## StyleColonial 29.7749 8.5102 3.499 0.000528 ***
## StyleExpanded ranch 13.3786 8.8334 1.515 0.130794
## StyleRanch -4.6369 5.4037 -0.858 0.391424
## StyleSplit level 32.0658 8.3826 3.825 0.000155 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.48 on 348 degrees of freedom
## Multiple R-squared: 0.7012, Adjusted R-squared: 0.69
## F-statistic: 47.79 on 13 and 348 DF, p-value: < 2.2e-16