Modelling and the Language of Models with a Real Estate Example

Robert W. Walker

November 26, 2018

Models

model. (mod’l) A systematic description of an object or phenomenon that shares important > characteristics with the object or phenomenon. Scientific models can be material, visual, > mathematical, or computational and are often used in the construction of scientific theories. See also hypothesis, theory.

The Language of Models

Models predict:
* the response variable also sometimes known as dependent variable [in experimental settings], outcome variable, output, or endogenous variable.

Models predict the response variable as a function of:
* explanatory variables also sometimes known as predictor variables, independent variables, manipulated variables, control variables, inputs, and exogenous variables. Moderating or mediating variables are special cases.
* Moderating variables produce interaction effects. The effect of a predictor on an outcome depends on some other factor. Consider the effect of penicillin on sick people.
* Mediating or intervening variables present pathways for an explanatory variable to impact a response variable. Parents transmit social status to children through access to education.

Example: Real Estate Prices

What determines real estate prices?

The data

What do I have?

##   Value LotSize Bedrooms Bathrooms Rooms Age  Taxes         Garage
## 1 342.0     6.9        4       2.0     8  38 6750.0 One-car garage
## 2 387.0     6.0        2       2.0     7  30 5140.8 One-car garage
## 3 288.0     6.0        3       2.0     6  35 5832.0           None
## 4 351.0     6.0        5       2.0     8  35 7200.0 One-car garage
## 5 293.4     7.0        3       1.0     6  39 4860.0 One-car garage
## 6 325.8     7.0        4       1.5     7  32 7126.2 One-car garage
##   Location    Style Heating.Fuel Heating.System         Pool     EIK
## 1        C Colonial          Oil      Hot water         None Present
## 2        A    Ranch          Oil      Hot water         None Present
## 3        C     Cape          Oil      Hot water         None Present
## 4        A    Ranch          Oil        Hot air         None Present
## 5        C     Cape          Oil        Hot air         None Present
## 6        B    Ranch          Oil      Hot water Above ground Present
##     C.A.C Fireplace   Sewer Basement Modern.Kitchen Modern.Bathrooms
## 1  Absent    Absent Present   Absent         Absent           Absent
## 2 Present    Absent Present  Present         Absent           Absent
## 3  Absent   Present Present   Absent        Present          Present
## 4  Absent    Absent Present   Absent        Present          Present
## 5  Absent    Absent Present   Absent         Absent           Absent
## 6  Absent    Absent Present  Present         Absent           Absent
##      Value          LotSize          Bedrooms       Bathrooms   
##  Min.   :180.0   Min.   : 3.550   Min.   :1.000   Min.   :1.00  
##  1st Qu.:279.0   1st Qu.: 6.000   1st Qu.:3.000   1st Qu.:1.00  
##  Median :315.0   Median : 7.500   Median :3.000   Median :1.50  
##  Mean   :320.1   Mean   : 9.327   Mean   :3.575   Mean   :1.61  
##  3rd Qu.:342.0   3rd Qu.:10.000   3rd Qu.:4.000   3rd Qu.:2.00  
##  Max.   :558.0   Max.   :37.500   Max.   :7.000   Max.   :3.50  
##      Rooms             Age            Taxes                  Garage   
##  Min.   : 4.000   Min.   : 1.00   Min.   : 1800   None          : 94  
##  1st Qu.: 6.000   1st Qu.:28.00   1st Qu.: 5040   One-car garage:222  
##  Median : 7.000   Median :33.50   Median : 5940   Two-car garage: 46  
##  Mean   : 7.052   Mean   :33.43   Mean   : 6104                       
##  3rd Qu.: 8.000   3rd Qu.:37.00   3rd Qu.: 7148                       
##  Max.   :12.000   Max.   :95.00   Max.   :11646                       
##  Location            Style     Heating.Fuel   Heating.System
##  A:74     Cape          :112   Gas: 44      Hot air  : 57   
##  B:60     Colonial      : 47   Oil:318      Hot water:293   
##  C:99     Expanded ranch: 38                Other    : 12   
##  D:84     Ranch         :119                                
##  E:45     Split level   : 46                                
##                                                             
##            Pool          EIK          C.A.C       Fireplace  
##  Above ground: 38   Absent : 22   Absent :333   Absent :240  
##  In ground   : 16   Present:340   Present: 29   Present:122  
##  None        :308                                            
##                                                              
##                                                              
##                                                              
##      Sewer        Basement   Modern.Kitchen Modern.Bathrooms
##  Absent : 96   Absent :161   Absent :184    Absent :193     
##  Present:266   Present:201   Present:178    Present:169     
##                                                             
##                                                             
##                                                             
## 

The outcome: Value

##  [1] "51"  "144" "63"  "103" "31"  "35"  "142" "233" "92"  "112" "140"
## [12] "182" "310" "311" "312" "313" "314" "315" "316" "317" "362"

Location, Location, Location

A categorical predictor. How much does location alone explain?
Because it is categorical, I need a linear model. Almost 40% of the variance.

The Regression

## 
## Call:
## lm(formula = Value ~ Location, data = RealEstate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -164.40  -28.28   -6.29   16.42  204.60 
## 
## Coefficients:
##             Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)  387.409      5.963  64.973    < 2e-16 ***
## LocationB    -43.009      8.911  -4.827 0.00000206 ***
## LocationC    -77.194      7.882  -9.794    < 2e-16 ***
## LocationD   -108.697      8.178 -13.292    < 2e-16 ***
## LocationE   -111.119      9.696 -11.460    < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 51.29 on 357 degrees of freedom
## Multiple R-squared:  0.394,  Adjusted R-squared:  0.3872 
## F-statistic: 58.02 on 4 and 357 DF,  p-value: < 2.2e-16

Interpretation

The label unseen is the average for category A. That’s represented in the intercept.
Everything else shows a difference from category A. They are all cheaper than A.
By how much?
* -43.009 thousand for B with a standard error of 8.911 thousand.
* -77.194 thousand for C with a standard error of 7.882 thousand.
* -108.697 thousand for D with a standard error of 8.178 thousand.
* -111.119 thousand for E with a standard error of 9.696 thousand.

##  [1] "51"  "144" "63"  "103" "31"  "35"  "142" "233" "92"  "112" "140"
## [12] "182" "310" "311" "312" "313" "314" "315" "316" "317" "362"

Suppose it is just locations and size [by rooms]

A regression

## 
## Call:
## lm(formula = Value ~ Location + Rooms, data = RealEstate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -216.067  -23.091   -4.095   14.757  182.503 
## 
## Coefficients:
##             Estimate Std. Error t value      Pr(>|t|)    
## (Intercept)  249.425     13.552  18.405       < 2e-16 ***
## LocationB    -47.111      7.716  -6.105 0.00000000268 ***
## LocationC    -74.055      6.823 -10.853       < 2e-16 ***
## LocationD   -104.034      7.086 -14.682       < 2e-16 ***
## LocationE   -110.484      8.387 -13.174       < 2e-16 ***
## Rooms         19.375      1.760  11.010       < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.36 on 356 degrees of freedom
## Multiple R-squared:  0.5479, Adjusted R-squared:  0.5416 
## F-statistic: 86.29 on 5 and 356 DF,  p-value: < 2.2e-16

A house in Location A with no rooms is 249.425 thousand dollars with a standard error of 13.552 thousand dollars. The difference between Location B and A is -47.111 thousand with a standard error of 7.716 thousand dollars. The difference between Location C and A is -74.055 thousand with a standard error of 6.823 thousand dollars. The difference between Location D and A is -104.034 thousand with a standard error of 7.086 thousand dollars. The difference between Location E and A is -110.484 thousand with a standard error of 8.387 thousand dollars. Each room is estimated to be worth 19.375 thousand dollars with a standard error of 1.76 thousand dollars. This alone explains 54.79% of variance. Rooms clearly increase prices. Do they do so uniformly?

An Interaction

I could estimate a regression for each location. As it happens, the low tech and easiest way to do this is to create a pivot table in Excel with locations on the rows. Then I can click in the pivot table and it will show me all of the data corresponding to that location. I can copy that data to the clipboard and import it into R. Then I have a dataset for each location.

I can also use R’s tidyverse filter or base R subset.

filter(Location=="A") or subset(RealEstate, subset=RealEstate$Location=="A")

I can also combine predictors to measure separate slopes. I need a factor and a quantity and I multiply them together. It makes the slope conditional on the Location with a unique slope per location. To show the basic picture of this, construct a scatterplot of Rooms on the x axis and Value on the y. Plot the data by groups, the location, plot the lines by group, and include the least squares line. If the lines are parallel, the slopes do not depend on the location.

Picture

## 
## Call:
## lm(formula = Value ~ Location * Rooms, data = RealEstate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -200.705  -22.553   -4.951   13.347  186.344 
## 
## Coefficients:
##                 Estimate Std. Error t value   Pr(>|t|)    
## (Intercept)      134.642     28.382   4.744 0.00000305 ***
## LocationB        109.918     43.788   2.510   0.012512 *  
## LocationC         64.841     40.456   1.603   0.109887    
## LocationD         23.432     36.549   0.641   0.521861    
## LocationE         42.354     39.400   1.075   0.283126    
## Rooms             35.493      3.922   9.049    < 2e-16 ***
## LocationB:Rooms  -21.878      5.956  -3.673   0.000277 ***
## LocationC:Rooms  -19.582      5.670  -3.454   0.000621 ***
## LocationD:Rooms  -17.961      5.110  -3.515   0.000497 ***
## LocationE:Rooms  -21.486      5.424  -3.962 0.00009016 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 43.3 on 352 degrees of freedom
## Multiple R-squared:  0.5741, Adjusted R-squared:  0.5632 
## F-statistic: 52.72 on 9 and 352 DF,  p-value: < 2.2e-16

How does that compare?

## Analysis of Variance Table
## 
## Model 1: Value ~ Location + Rooms
## Model 2: Value ~ Location * Rooms
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    356 700666                                  
## 2    352 660113  4     40553 5.4062 0.0003101 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

They are not equivalent. Rooms have a different price by location.

Understanding the Effect.

Let’s try the effects plot. Select everything including higher order effects.

We can also predict this.

Rooms can go from 4 to 12. Locations can be A, B, C, D, E.

***

Are Rooms or Bathrooms Better?

## 
## Call:
## lm(formula = Value ~ Location * Bathrooms, data = RealEstate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -130.206  -23.403   -5.493   18.149  182.584 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          244.752     15.449  15.843  < 2e-16 ***
## LocationB              9.244     24.283   0.381  0.70368    
## LocationC             16.334     20.941   0.780  0.43592    
## LocationD            -29.521     20.840  -1.417  0.15749    
## LocationE            -33.151     25.280  -1.311  0.19059    
## Bathrooms             79.076      8.114   9.745  < 2e-16 ***
## LocationB:Bathrooms  -22.866     13.780  -1.659  0.09795 .  
## LocationC:Bathrooms  -46.758     12.018  -3.891  0.00012 ***
## LocationD:Bathrooms  -38.370     11.724  -3.273  0.00117 ** 
## LocationE:Bathrooms  -38.362     14.441  -2.656  0.00826 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.48 on 352 degrees of freedom
## Multiple R-squared:  0.5902, Adjusted R-squared:  0.5797 
## F-statistic: 56.32 on 9 and 352 DF,  p-value: < 2.2e-16
## [1] 3767.396
## [1] 3753.473

The bottom two results are AIC; bathrooms are preferred.

\[AIC = 2k + n*ln(RSS)\] for model comparisons. Note that if all the models have the same k, then selecting the model with minimum AIC is equivalent to selecting the model with minimum RSS – the usual objective of model selection based on least squares.

It is worth noting that Mallows’s Cp is equivalent to AIC in the case of (Gaussian) linear regression.

Here is another way to compare them. Suppose that I measure the residuals from each model. If the absolute value of the residual is smaller from one model than the other, then that observation is better explained by the model with the smaller residual. How do these compare?

As we saw, the Bathrooms explain more variance. What is interesting is that much of this improvement comes from a few poor fits. In the table below, TRUE means that Bathrooms have smaller residuals in absolute value.

## 
## FALSE  TRUE 
##   193   169

But that ignores lots of data.

How about the ``garbage can’’?

## 
## Call:
## lm(formula = Value ~ Age + Basement + Bathrooms + Bedrooms + 
##     C.A.C + EIK + Fireplace + Garage + Heating.Fuel + Heating.System + 
##     Location + LotSize + Modern.Bathrooms + Modern.Kitchen + 
##     Pool + Rooms + Sewer + Style, data = RealEstate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -129.443  -18.941   -2.124   15.197  144.938 
## 
## Coefficients:
##                          Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)              280.9887    21.4727  13.086          < 2e-16 ***
## Age                       -0.6506     0.1852  -3.513         0.000504 ***
## BasementPresent            8.1272     5.3220   1.527         0.127683    
## Bathrooms                 25.4590     4.2547   5.984 0.00000000561888 ***
## Bedrooms                  -8.7729     3.3622  -2.609         0.009482 ** 
## C.A.CPresent              28.5657     7.6700   3.724         0.000230 ***
## EIKPresent                10.9302     8.2130   1.331         0.184148    
## FireplacePresent          16.9429     4.4362   3.819         0.000160 ***
## GarageOne-car garage       4.5713     4.5270   1.010         0.313329    
## GarageTwo-car garage      22.1096     6.9286   3.191         0.001552 ** 
## Heating.FuelOil          -10.1317     6.9004  -1.468         0.142971    
## Heating.SystemHot water   15.9273     5.8566   2.720         0.006879 ** 
## Heating.SystemOther        8.0988    11.7981   0.686         0.492907    
## LocationB                -40.7861     6.5694  -6.208 0.00000000158686 ***
## LocationC                -52.0083     7.3114  -7.113 0.00000000000694 ***
## LocationD               -106.2347     7.1206 -14.919          < 2e-16 ***
## LocationE               -111.9624     8.7067 -12.859          < 2e-16 ***
## LotSize                    0.8130     0.4554   1.785         0.075149 .  
## Modern.BathroomsPresent  -17.6002     7.6113  -2.312         0.021365 *  
## Modern.KitchenPresent     14.6608     7.1824   2.041         0.042015 *  
## PoolIn ground              4.6209    10.8155   0.427         0.669476    
## PoolNone                  -3.3363     6.5933  -0.506         0.613179    
## Rooms                      8.9899     2.1204   4.240 0.00002900087035 ***
## SewerPresent               5.1261     5.0331   1.018         0.309193    
## StyleColonial             31.1265     6.9453   4.482 0.00001018956356 ***
## StyleExpanded ranch       13.0737     7.9362   1.647         0.100426    
## StyleRanch                -4.5048     5.4746  -0.823         0.411182    
## StyleSplit level          24.4713     7.3651   3.323         0.000991 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.31 on 334 degrees of freedom
## Multiple R-squared:  0.7314, Adjusted R-squared:  0.7097 
## F-statistic: 33.68 on 27 and 334 DF,  p-value: < 2.2e-16

Automated Fitting

## 
## Direction:  backward/forward
## Criterion:  BIC 
## 
## Start:  AIC=2716.18
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Heating.Fuel + Heating.System + Location + 
##     LotSize + Modern.Bathrooms + Modern.Kitchen + Pool + Rooms + 
##     Sewer + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Pool              2      1124 417443 2705.4
## - Sewer             1      1293 417612 2711.4
## - EIK               1      2208 418527 2712.2
## - Heating.System    2      9445 425764 2712.5
## - Heating.Fuel      1      2687 419006 2712.6
## - Basement          1      2907 419226 2712.8
## - LotSize           1      3972 420291 2713.7
## - Modern.Kitchen    1      5193 421513 2714.8
## - Garage            2     13191 429510 2715.7
## - Modern.Bathrooms  1      6665 422984 2716.0
## <none>                          416319 2716.2
## - Bedrooms          1      8486 424805 2717.6
## - Age               1     15381 431701 2723.4
## - C.A.C             1     17289 433609 2725.0
## - Fireplace         1     18182 434501 2725.8
## - Rooms             1     22406 438725 2729.3
## - Style             4     46411 462730 2730.9
## - Bathrooms         1     44630 460949 2747.2
## - Location          4    305534 721853 2891.8
## 
## Step:  AIC=2705.38
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Heating.Fuel + Heating.System + Location + 
##     LotSize + Modern.Bathrooms + Modern.Kitchen + Rooms + Sewer + 
##     Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Sewer             1      1303 418746 2700.6
## - Heating.System    2      8956 426400 2701.3
## - Heating.Fuel      1      2449 419892 2701.6
## - EIK               1      2478 419922 2701.6
## - Basement          1      3077 420521 2702.1
## - LotSize           1      3825 421268 2702.8
## - Modern.Kitchen    1      5243 422687 2704.0
## - Garage            2     12814 430257 2704.5
## - Modern.Bathrooms  1      6492 423936 2705.1
## <none>                          417443 2705.4
## - Bedrooms          1      8442 425885 2706.7
## - Age               1     15252 432695 2712.5
## - C.A.C             1     17641 435085 2714.5
## - Fireplace         1     18605 436049 2715.3
## + Pool              2      1124 416319 2716.2
## - Style             4     46450 463894 2720.0
## - Rooms             1     24560 442003 2720.2
## - Bathrooms         1     45485 462929 2736.9
## - Location          4    306589 724033 2881.2
## 
## Step:  AIC=2700.61
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Heating.Fuel + Heating.System + Location + 
##     LotSize + Modern.Bathrooms + Modern.Kitchen + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Heating.System    2      9116 427862 2696.6
## - Heating.Fuel      1      2306 421052 2696.7
## - EIK               1      2608 421354 2697.0
## - LotSize           1      3293 422039 2697.6
## - Basement          1      3328 422074 2697.6
## - Modern.Kitchen    1      5280 424026 2699.3
## - Garage            2     12547 431293 2699.5
## - Modern.Bathrooms  1      6346 425092 2700.2
## <none>                          418746 2700.6
## - Bedrooms          1      8139 426885 2701.7
## + Sewer             1      1303 417443 2705.4
## - Age               1     14583 433330 2707.1
## - C.A.C             1     17519 436265 2709.6
## - Fireplace         1     18997 437744 2710.8
## + Pool              2      1134 417612 2711.4
## - Rooms             1     24682 443429 2715.4
## - Style             4     46937 465683 2715.5
## - Bathrooms         1     46442 465188 2732.8
## - Location          4    318354 737100 2881.7
## 
## Step:  AIC=2696.63
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Heating.Fuel + Location + LotSize + 
##     Modern.Bathrooms + Modern.Kitchen + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Heating.Fuel      1       161 428023 2690.9
## - LotSize           1      3151 431013 2693.4
## - EIK               1      3308 431170 2693.5
## - Basement          1      3739 431601 2693.9
## - Modern.Kitchen    1      4537 432399 2694.6
## - Modern.Bathrooms  1      5282 433144 2695.2
## - Garage            2     12538 440400 2695.3
## <none>                          427862 2696.6
## - Bedrooms          1      9675 437537 2698.8
## + Heating.System    2      9116 418746 2700.6
## + Sewer             1      1462 426400 2701.3
## - Age               1     15114 442976 2703.3
## - C.A.C             1     18055 445917 2705.7
## - Fireplace         1     18736 446598 2706.2
## + Pool              2       629 427232 2707.9
## - Style             4     48039 475900 2711.6
## - Rooms             1     27164 455026 2713.0
## - Bathrooms         1     47915 475776 2729.2
## - Location          4    315815 743677 2873.2
## 
## Step:  AIC=2690.87
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Location + LotSize + Modern.Bathrooms + 
##     Modern.Kitchen + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - LotSize           1      2993 431016 2687.5
## - EIK               1      3212 431235 2687.7
## - Basement          1      3713 431736 2688.1
## - Modern.Kitchen    1      4495 432517 2688.8
## - Modern.Bathrooms  1      5263 433286 2689.4
## - Garage            2     12397 440420 2689.4
## <none>                          428023 2690.9
## - Bedrooms          1      9636 437659 2693.0
## + Sewer             1      1406 426617 2695.6
## + Heating.Fuel      1       161 427862 2696.6
## + Heating.System    2      6971 421052 2696.7
## - Age               1     15081 443104 2697.5
## - Fireplace         1     18590 446613 2700.4
## - C.A.C             1     18619 446642 2700.4
## + Pool              2       586 427436 2702.2
## - Style             4     48836 476859 2706.4
## - Rooms             1     27172 455195 2707.3
## - Bathrooms         1     47927 475949 2723.4
## - Location          4    330771 758794 2874.6
## 
## Step:  AIC=2687.5
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + EIK + 
##     Fireplace + Garage + Location + Modern.Bathrooms + Modern.Kitchen + 
##     Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - EIK               1      3292 434308 2684.4
## - Basement          1      3768 434784 2684.8
## - Modern.Kitchen    1      4095 435111 2685.0
## - Modern.Bathrooms  1      4860 435876 2685.7
## - Garage            2     12475 443491 2686.1
## <none>                          431016 2687.5
## - Bedrooms          1      8592 439608 2688.8
## + LotSize           1      2993 428023 2690.9
## + Sewer             1       898 430118 2692.6
## + Heating.System    2      7719 423297 2692.7
## + Heating.Fuel      1         3 431013 2693.4
## - Age               1     16480 447496 2695.2
## - Fireplace         1     19118 450134 2697.3
## - C.A.C             1     19768 450784 2697.8
## + Pool              2       563 430453 2698.8
## - Style             4     49009 480024 2702.9
## - Rooms             1     27493 458509 2704.0
## - Bathrooms         1     47511 478527 2719.5
## - Location          4    352179 783195 2880.1
## 
## Step:  AIC=2684.36
## Value ~ Age + Basement + Bathrooms + Bedrooms + C.A.C + Fireplace + 
##     Garage + Location + Modern.Bathrooms + Modern.Kitchen + Rooms + 
##     Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Basement          1      3085 437392 2681.0
## - Modern.Kitchen    1      5115 439423 2682.7
## - Garage            2     12671 446979 2683.0
## - Modern.Bathrooms  1      5574 439882 2683.1
## <none>                          434308 2684.4
## - Bedrooms          1      8736 443044 2685.7
## + EIK               1      3292 431016 2687.5
## + LotSize           1      3073 431235 2687.7
## + Heating.System    2      8610 425698 2688.9
## + Sewer             1      1034 433273 2689.4
## + Heating.Fuel      1        40 434268 2690.2
## - Age               1     16135 450443 2691.7
## - C.A.C             1     20161 454469 2694.9
## - Fireplace         1     20317 454625 2695.0
## + Pool              2       728 433580 2695.5
## - Rooms             1     26830 461138 2700.2
## - Style             4     51778 486086 2701.6
## - Bathrooms         1     50933 485240 2718.6
## - Location          4    357488 791796 2878.2
## 
## Step:  AIC=2681.03
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage + 
##     Location + Modern.Bathrooms + Modern.Kitchen + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Modern.Kitchen    1      5239 442632 2679.4
## - Modern.Bathrooms  1      5891 443284 2680.0
## - Garage            2     14070 451462 2680.7
## <none>                          437392 2681.0
## - Bedrooms          1      8126 445519 2681.8
## + LotSize           1      3115 434277 2684.3
## + Basement          1      3085 434308 2684.4
## + EIK               1      2609 434784 2684.8
## + Heating.System    2      8936 428457 2685.3
## + Sewer             1      1236 436157 2685.9
## + Heating.Fuel      1        47 437346 2686.9
## - Age               1     15612 453004 2687.8
## - Fireplace         1     20249 457641 2691.5
## + Pool              2       845 436547 2692.1
## - C.A.C             1     23540 460932 2694.1
## - Rooms             1     26909 464301 2696.8
## - Style             4     52465 489857 2698.5
## - Bathrooms         1     50884 488276 2715.0
## - Location          4    387661 825054 2887.2
## 
## Step:  AIC=2679.45
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage + 
##     Location + Modern.Bathrooms + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Modern.Bathrooms  1       987 443619 2674.4
## - Garage            2     12094 454726 2677.4
## <none>                          442632 2679.4
## - Bedrooms          1      7420 450051 2679.6
## + Modern.Kitchen    1      5239 437392 2681.0
## + EIK               1      3525 439107 2682.4
## + Basement          1      3208 439423 2682.7
## + LotSize           1      2667 439965 2683.2
## + Sewer             1      1341 441290 2684.2
## + Heating.System    2      8443 434188 2684.3
## + Heating.Fuel      1        71 442561 2685.3
## - Age               1     16242 458874 2686.6
## - Fireplace         1     19567 462199 2689.2
## + Pool              2       970 441662 2690.4
## - Rooms             1     27503 470134 2695.4
## - Style             4     51273 493905 2695.6
## - C.A.C             1     27766 470397 2695.6
## - Bathrooms         1     49234 491866 2711.7
## - Location          4    392436 835067 2885.7
## 
## Step:  AIC=2674.37
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Garage + 
##     Location + Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Garage            2     12078 455697 2672.3
## - Bedrooms          1      7119 450738 2674.2
## <none>                          443619 2674.4
## + EIK               1      3407 440212 2677.5
## + Basement          1      3350 440269 2677.5
## + LotSize           1      2626 440992 2678.1
## + Sewer             1      1222 442397 2679.3
## + Modern.Bathrooms  1       987 442632 2679.4
## + Heating.System    2      8094 435525 2679.5
## + Modern.Kitchen    1       335 443284 2680.0
## + Heating.Fuel      1        59 443560 2680.2
## - Age               1     16085 459704 2681.4
## - Fireplace         1     19709 463328 2684.2
## + Pool              2       851 442768 2685.5
## - Rooms             1     27010 470629 2689.9
## - Style             4     51384 495003 2690.5
## - C.A.C             1     27847 471466 2690.5
## - Bathrooms         1     48708 492327 2706.2
## - Location          4    428186 871805 2895.4
## 
## Step:  AIC=2672.31
## Value ~ Age + Bathrooms + Bedrooms + C.A.C + Fireplace + Location + 
##     Rooms + Style
## 
##                    Df Sum of Sq    RSS    AIC
## - Bedrooms          1      7439 463135 2672.3
## <none>                          455697 2672.3
## + Garage            2     12078 443619 2674.4
## + Basement          1      4668 451029 2674.5
## + EIK               1      3271 452425 2675.6
## + LotSize           1      2758 452939 2676.0
## + Heating.System    2      8900 446796 2676.9
## + Sewer             1      1029 454667 2677.4
## + Modern.Bathrooms  1       971 454726 2677.4
## + Heating.Fuel      1       245 455452 2678.0
## + Modern.Kitchen    1        91 455605 2678.1
## - Age               1     16005 471702 2678.9
## + Pool              2       660 455037 2683.6
## - Fireplace         1     24254 479951 2685.2
## - Rooms             1     27336 483032 2687.5
## - C.A.C             1     28486 484182 2688.4
## - Style             4     57248 512945 2691.6
## - Bathrooms         1     57639 513336 2709.5
## - Location          4    428361 884058 2888.6
## 
## Step:  AIC=2672.28
## Value ~ Age + Bathrooms + C.A.C + Fireplace + Location + Rooms + 
##     Style
## 
##                    Df Sum of Sq    RSS    AIC
## <none>                          463135 2672.3
## + Bedrooms          1      7439 455697 2672.3
## + Garage            2     12398 450738 2674.2
## + Basement          1      3933 459202 2675.1
## + EIK               1      3442 459694 2675.5
## + Heating.System    2     10204 452931 2676.0
## - Age               1     12553 475689 2676.1
## + LotSize           1      1800 461336 2676.8
## + Sewer             1       873 462262 2677.5
## + Modern.Bathrooms  1       694 462442 2677.6
## + Heating.Fuel      1       215 462921 2678.0
## + Modern.Kitchen    1       106 463029 2678.1
## - Rooms             1     19906 483041 2681.6
## - Fireplace         1     22339 485474 2683.4
## + Pool              2       560 462576 2683.6
## - Style             4     58358 521493 2691.7
## - C.A.C             1     37163 500299 2694.3
## - Bathrooms         1     53646 516781 2706.1
## - Location          4    423435 886571 2883.8
## 
## Call:
## lm(formula = Value ~ Age + Bathrooms + C.A.C + Fireplace + Location + 
##     Rooms + Style, data = RealEstate)
## 
## Coefficients:
##         (Intercept)                  Age            Bathrooms  
##            288.3237              -0.5675              26.8216  
##        C.A.CPresent     FireplacePresent            LocationB  
##             39.3708              18.3318             -38.8414  
##           LocationC            LocationD            LocationE  
##            -62.0523            -100.1258            -104.8913  
##               Rooms        StyleColonial  StyleExpanded ranch  
##              7.1053              29.7749              13.3786  
##          StyleRanch     StyleSplit level  
##             -4.6369              32.0658

How did we do?

## 
## Call:
## lm(formula = Value ~ Age + Bathrooms + C.A.C + Fireplace + Location + 
##     Rooms + Style, data = RealEstate)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -150.803  -20.347   -1.835   15.846  144.611 
## 
## Coefficients:
##                      Estimate Std. Error t value       Pr(>|t|)    
## (Intercept)          288.3237    16.0159  18.002        < 2e-16 ***
## Age                   -0.5675     0.1848  -3.071       0.002300 ** 
## Bathrooms             26.8216     4.2246   6.349 0.000000000677 ***
## C.A.CPresent          39.3708     7.4504   5.284 0.000000223047 ***
## FireplacePresent      18.3318     4.4745   4.097 0.000052110679 ***
## LocationB            -38.8414     6.5213  -5.956 0.000000006323 ***
## LocationC            -62.0523     6.1227 -10.135        < 2e-16 ***
## LocationD           -100.1258     6.3258 -15.828        < 2e-16 ***
## LocationE           -104.8913     7.3406 -14.289        < 2e-16 ***
## Rooms                  7.1053     1.8372   3.867       0.000131 ***
## StyleColonial         29.7749     6.9678   4.273 0.000024902879 ***
## StyleExpanded ranch   13.3786     7.7904   1.717       0.086810 .  
## StyleRanch            -4.6369     5.4885  -0.845       0.398772    
## StyleSplit level      32.0658     7.0583   4.543 0.000007658643 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36.48 on 348 degrees of freedom
## Multiple R-squared:  0.7012, Adjusted R-squared:   0.69 
## F-statistic: 62.81 on 13 and 348 DF,  p-value: < 2.2e-16

It fits pretty well.

Residuals?

## 
##  Shapiro-Wilk normality test
## 
## data:  RealEstate$residuals.LinearModel.3
## W = 0.94821, p-value = 0.0000000005921

## [1] 103  35

They’re not normal. They often are not. Think about the data.

What can we learn without Normal residuals?

Inference of a basic sort. If the residuals are normal, then slopes are t, sum of squares can be compared using F, etc. If the residuls are not normal, then we do not know the distributions of sums of squares to be chi square and a host of problems arise. There is a way to calculate the standard error for the slopes and intercept known as the sandwich that gives the slopes t distributions under general conditions. This will allow usto at least say, with a particular probability, what is related to what. Under Models and then summarize models, there is a tick box for the sandwich estimator. There are multiple sandwich estimators and there is a large literature on their performance. hc3 and hc4 seem generally preferred. Summarizing th model with a sandwich yields the following.

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## t test of coefficients:
## 
##                       Estimate Std. Error  t value      Pr(>|t|)    
## (Intercept)          288.32370   19.78493  14.5729     < 2.2e-16 ***
## Age                   -0.56749    0.22921  -2.4759     0.0137659 *  
## Bathrooms             26.82161    4.86104   5.5177 0.00000006719 ***
## C.A.CPresent          39.37077   12.08173   3.2587     0.0012296 ** 
## FireplacePresent      18.33184    4.80895   3.8120     0.0001630 ***
## LocationB            -38.84136    8.32785  -4.6640 0.00000442751 ***
## LocationC            -62.05232    6.14778 -10.0934     < 2.2e-16 ***
## LocationD           -100.12582    7.56679 -13.2323     < 2.2e-16 ***
## LocationE           -104.89134    7.08002 -14.8151     < 2.2e-16 ***
## Rooms                  7.10532    2.54661   2.7901     0.0055592 ** 
## StyleColonial         29.77487    8.51024   3.4987     0.0005282 ***
## StyleExpanded ranch   13.37856    8.83335   1.5146     0.1307938    
## StyleRanch            -4.63695    5.40369  -0.8581     0.3914235    
## StyleSplit level      32.06577    8.38255   3.8253     0.0001548 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Diagnostics