Plot of the Bay regions

Column

Let’s see what regions compose the Bay area:

OLS

Column

Basic multiple linear regression

Let’s regress log housing values (lmedhval) on log total population (ltotp), median household income (lmedinc), median age of housing (medage), median number of rooms (medrooms), the median number of years current residents have been residing in their houses (meddur), the number of parks within a 10 minute walk (parks), and the percent of 4th graders attending the nearest school who scored proficient and above on the California’s English Language Arts standardized test (edppl13).


Call:
lm(formula = lmedhval ~ ltotp + lmedinc + medage + medrooms + 
    meddur + parks + edppl3, data = bayarea)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.84833 -0.18460 -0.01965  0.16931  1.44107 

Coefficients:
              Estimate Std. Error t value
(Intercept)  4.1986398  0.3034919  13.834
ltotp       -0.0382342  0.0176646  -2.164
lmedinc      0.7675167  0.0263889  29.085
medage       0.0061966  0.0005512  11.241
medrooms    -0.0856691  0.0095095  -9.009
meddur       0.0117370  0.0023333   5.030
parks        0.0102368  0.0012140   8.432
edppl3       0.8972933  0.0647460  13.859
                        Pr(>|t|)    
(Intercept) < 0.0000000000000002 ***
ltotp                     0.0306 *  
lmedinc     < 0.0000000000000002 ***
medage      < 0.0000000000000002 ***
medrooms    < 0.0000000000000002 ***
meddur               0.000000546 ***
parks       < 0.0000000000000002 ***
edppl3      < 0.0000000000000002 ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3129 on 1568 degrees of freedom
Multiple R-squared:  0.6422,    Adjusted R-squared:  0.6406 
F-statistic:   402 on 7 and 1568 DF,  p-value: < 0.00000000000000022

The linear model is a global model, i.e. it estimates an average effect and assumes that this effect applies to all places. Let’s now deal with spatial heterogeneity in the regression coefficients.

Interaction model

Column

Let’s fit the interaction between median age of housing (medage) and region.


Call:
lm(formula = lmedhval ~ ltotp + lmedinc + medage * region + medrooms + 
    meddur + parks + edppl3, data = bayarea)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.83400 -0.16412 -0.00931  0.15247  1.27117 

Coefficients:
                             Estimate Std. Error t value
(Intercept)                 6.1506997  0.3125779  19.677
ltotp                      -0.0614002  0.0165147  -3.718
lmedinc                     0.5957876  0.0267336  22.286
medage                      0.0040160  0.0007772   5.167
regionNorth Bay            -0.0676059  0.0688656  -0.982
regionPeninsula            -0.2307712  0.1273507  -1.812
regionSan Francisco         0.5371517  0.1104769   4.862
regionSouth Bay            -0.0315589  0.0690407  -0.457
medrooms                   -0.0379107  0.0095196  -3.982
meddur                      0.0064221  0.0022256   2.885
parks                       0.0058858  0.0013373   4.401
edppl3                      1.0073740  0.0604405  16.667
medage:regionNorth Bay      0.0007567  0.0014485   0.522
medage:regionPeninsula      0.0095904  0.0023297   4.117
medage:regionSan Francisco -0.0027678  0.0015792  -1.753
medage:regionSouth Bay      0.0056334  0.0014471   3.893
                                       Pr(>|t|)    
(Intercept)                < 0.0000000000000002 ***
ltotp                                  0.000208 ***
lmedinc                    < 0.0000000000000002 ***
medage                              0.000000269 ***
regionNorth Bay                        0.326396    
regionPeninsula                        0.070164 .  
regionSan Francisco                 0.000001278 ***
regionSouth Bay                        0.647659    
medrooms                            0.000071368 ***
meddur                                 0.003962 ** 
parks                               0.000011492 ***
edppl3                     < 0.0000000000000002 ***
medage:regionNorth Bay                 0.601484    
medage:regionPeninsula              0.000040476 ***
medage:regionSan Francisco             0.079847 .  
medage:regionSouth Bay                 0.000103 ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2878 on 1560 degrees of freedom
Multiple R-squared:  0.6988,    Adjusted R-squared:  0.6959 
F-statistic: 241.3 on 15 and 1560 DF,  p-value: < 0.00000000000000022

Stratified model

Column

This method partitions or stratifies data by region and fits regression models separately for each region. We have 5 Bay Area regions.

Let’s subset the data. Example for San Francisco:

Call:
lm(formula = lmedhval ~ ltotp + lmedinc + medage + medrooms + 
    meddur + parks + edppl3, data = bayarea, subset = region == 
    "San Francisco")

Residuals:
     Min       1Q   Median       3Q      Max 
-0.77154 -0.14630 -0.02581  0.13285  0.58939 

Coefficients:
             Estimate Std. Error t value
(Intercept)  9.737834   0.464642  20.958
ltotp       -0.025386   0.028981  -0.876
lmedinc      0.318145   0.037911   8.392
medage       0.004164   0.001093   3.812
medrooms    -0.015036   0.020216  -0.744
meddur      -0.006940   0.004157  -1.670
parks        0.004527   0.002011   2.251
edppl3       0.625570   0.121673   5.141
                        Pr(>|t|)    
(Intercept) < 0.0000000000000002 ***
ltotp                   0.382194    
lmedinc       0.0000000000000118 ***
medage                  0.000188 ***
medrooms                0.457958    
meddur                  0.096696 .  
parks                   0.025548 *  
edppl3        0.0000006871664101 ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2184 on 186 degrees of freedom
Multiple R-squared:  0.5344,    Adjusted R-squared:  0.5169 
F-statistic: 30.49 on 7 and 186 DF,  p-value: < 0.00000000000000022

Spatial Regime Model

Column

The goal of the SRM is to determine whether the regression coefficients vary across geographic space, in our case across the 5 regions.


Call:
lm(formula = lmedhval ~ 0 + region/(ltotp + lmedinc + medage + 
    medrooms + meddur + parks + edppl3), data = bayarea)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.84101 -0.14322 -0.00983  0.14132  1.22961 

Coefficients:
                               Estimate Std. Error t value
regionEast Bay                7.0667271  0.5448765  12.969
regionNorth Bay              -1.3558964  0.8475253  -1.600
regionPeninsula               0.2200716  1.5807389   0.139
regionSan Francisco           9.7378339  0.5678225  17.149
regionSouth Bay               6.0805756  0.6050788  10.049
regionEast Bay:ltotp         -0.0824204  0.0252857  -3.260
regionNorth Bay:ltotp        -0.1156100  0.0402536  -2.872
regionPeninsula:ltotp         0.2260217  0.0628154   3.598
regionSan Francisco:ltotp    -0.0253860  0.0354172  -0.717
regionSouth Bay:ltotp        -0.0704088  0.0358652  -1.963
regionEast Bay:lmedinc        0.5114252  0.0463247  11.040
regionNorth Bay:lmedinc       1.4296971  0.0846586  16.888
regionPeninsula:lmedinc       0.9609715  0.1296948   7.409
regionSan Francisco:lmedinc   0.3181446  0.0463297   6.867
regionSouth Bay:lmedinc       0.5916428  0.0551912  10.720
regionEast Bay:medage         0.0038393  0.0008124   4.726
regionNorth Bay:medage        0.0006410  0.0014241   0.450
regionPeninsula:medage        0.0096107  0.0023089   4.162
regionSan Francisco:medage    0.0041644  0.0013352   3.119
regionSouth Bay:medage        0.0085567  0.0013515   6.331
regionEast Bay:medrooms      -0.0149528  0.0148162  -1.009
regionNorth Bay:medrooms     -0.2943818  0.0334664  -8.796
regionPeninsula:medrooms     -0.0497104  0.0369444  -1.346
regionSan Francisco:medrooms -0.0150357  0.0247049  -0.609
regionSouth Bay:medrooms     -0.0555579  0.0168478  -3.298
regionEast Bay:meddur         0.0054152  0.0033194   1.631
regionNorth Bay:meddur        0.0251022  0.0051989   4.828
regionPeninsula:meddur       -0.0019460  0.0068275  -0.285
regionSan Francisco:meddur   -0.0069402  0.0050801  -1.366
regionSouth Bay:meddur        0.0115730  0.0053658   2.157
regionEast Bay:parks          0.0129149  0.0028795   4.485
regionNorth Bay:parks         0.0014171  0.0028955   0.489
regionPeninsula:parks        -0.0177200  0.0052184  -3.396
regionSan Francisco:parks     0.0045266  0.0024573   1.842
regionSouth Bay:parks         0.0037323  0.0026900   1.387
regionEast Bay:edppl3         1.1097366  0.0928388  11.953
regionNorth Bay:edppl3        0.7243787  0.1557113   4.652
regionPeninsula:edppl3        0.5133467  0.1976856   2.597
regionSan Francisco:edppl3    0.6255696  0.1486926   4.207
regionSouth Bay:edppl3        1.3448835  0.1364490   9.856
                                         Pr(>|t|)    
regionEast Bay               < 0.0000000000000002 ***
regionNorth Bay                          0.109842    
regionPeninsula                          0.889294    
regionSan Francisco          < 0.0000000000000002 ***
regionSouth Bay              < 0.0000000000000002 ***
regionEast Bay:ltotp                     0.001140 ** 
regionNorth Bay:ltotp                    0.004134 ** 
regionPeninsula:ltotp                    0.000331 ***
regionSan Francisco:ltotp                0.473625    
regionSouth Bay:ltotp                    0.049809 *  
regionEast Bay:lmedinc       < 0.0000000000000002 ***
regionNorth Bay:lmedinc      < 0.0000000000000002 ***
regionPeninsula:lmedinc         0.000000000000208 ***
regionSan Francisco:lmedinc     0.000000000009491 ***
regionSouth Bay:lmedinc      < 0.0000000000000002 ***
regionEast Bay:medage           0.000002498677425 ***
regionNorth Bay:medage                   0.652691    
regionPeninsula:medage          0.000033235702884 ***
regionSan Francisco:medage               0.001848 ** 
regionSouth Bay:medage          0.000000000318854 ***
regionEast Bay:medrooms                  0.313027    
regionNorth Bay:medrooms     < 0.0000000000000002 ***
regionPeninsula:medrooms                 0.178648    
regionSan Francisco:medrooms             0.542871    
regionSouth Bay:medrooms                 0.000997 ***
regionEast Bay:meddur                    0.103019    
regionNorth Bay:meddur          0.000001513803860 ***
regionPeninsula:meddur                   0.775665    
regionSan Francisco:meddur               0.172091    
regionSouth Bay:meddur                   0.031175 *  
regionEast Bay:parks            0.000007829080748 ***
regionNorth Bay:parks                    0.624608    
regionPeninsula:parks                    0.000702 ***
regionSan Francisco:parks                0.065657 .  
regionSouth Bay:parks                    0.165493    
regionEast Bay:edppl3        < 0.0000000000000002 ***
regionNorth Bay:edppl3          0.000003568587313 ***
regionPeninsula:edppl3                   0.009500 ** 
regionSan Francisco:edppl3      0.000027358794393 ***
regionSouth Bay:edppl3       < 0.0000000000000002 ***
---
Signif. codes:  
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.267 on 1536 degrees of freedom
Multiple R-squared:  0.9996,    Adjusted R-squared:  0.9996 
F-statistic: 9.942e+04 on 40 and 1536 DF,  p-value: < 0.00000000000000022

Tests

Column

Is the spatial regime a better model than the non-interacted OLS? To answer this question, you can run the spatial chow test to determine whether there is evidence that the relationships between the independent variables and housing values differ across regions.

[[1]]
[1] 78.49631

[[2]]
[1] 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004832431

[[3]]
[1] 8

[[4]]
[1] 1560

The 2nd value in the list gives the p-value. Using a cutoff of 0.05, we can reject the null of the restrained model (non spatial regime OLS).

GWR

Column

Geographically Weighted Regression (GWR) attempts to treat your study area like a continuous surface by using Kernel density function and cross-validation based bandwidth.

Bandwidth: 99167.94 CV score: 152.5657 
Bandwidth: 160296.9 CV score: 154.6469 
Bandwidth: 61388.17 CV score: 147.5673 
Bandwidth: 38038.98 CV score: 137.5391 
Bandwidth: 23608.39 CV score: 124.6536 
Bandwidth: 14689.8 CV score: 112.6202 
Bandwidth: 9177.801 CV score: 102.8401 
Bandwidth: 5771.201 CV score: 102.4129 
Bandwidth: 7135.327 CV score: 98.98384 
Bandwidth: 7425.969 CV score: 99.20257 
Bandwidth: 7090.012 CV score: 98.97257 
Bandwidth: 7029.76 CV score: 98.96845 
Bandwidth: 7039.899 CV score: 98.96825 
Bandwidth: 7040.456 CV score: 98.96825 
Bandwidth: 7040.377 CV score: 98.96825 
Bandwidth: 7040.378 CV score: 98.96825 
Bandwidth: 7040.378 CV score: 98.96825 
Bandwidth: 7040.378 CV score: 98.96825 
Bandwidth: 7040.378 CV score: 98.96825 
Call:
gwr(formula = lmedhval ~ ltotp + lmedinc + medage + medrooms + 
    meddur + parks + edppl3, data = bayarea.sp, bandwidth = gwr.b1, 
    hatmatrix = TRUE)
Kernel function: gwr.Gauss 
Fixed bandwidth: 7040.378 
Summary of GWR coefficient estimates at data points:
                    Min.     1st Qu.      Median
X.Intercept. -14.8367664   4.7192790   6.4927353
ltotp         -1.3257556  -0.0812652  -0.0249426
lmedinc       -2.6077786   0.3291706   0.5705322
medage        -0.0351063   0.0027374   0.0051062
medrooms      -0.8227821  -0.0525912  -0.0224669
meddur        -0.0664802   0.0012466   0.0056096
parks         -0.0414050   0.0022049   0.0067550
edppl3        -3.0989630   0.6059132   0.7902467
                 3rd Qu.        Max.  Global
X.Intercept.   9.0014005  37.9782702  4.1986
ltotp          0.0083518   0.3033374 -0.0382
lmedinc        0.7039981   2.6226942  0.7675
medage         0.0069553   0.0493151  0.0062
medrooms       0.0188203   1.7695621 -0.0857
meddur         0.0124948   0.2522683  0.0117
parks          0.0108894   0.0717151  0.0102
edppl3         1.0476815   2.6368918  0.8973
Number of data points: 1576 
Effective number of parameters (residual: 2traceS - traceS'S): 284.4856 
Effective degrees of freedom (residual: 2traceS - traceS'S): 1291.514 
Sigma (residual: 2traceS - traceS'S): 0.2262557 
Effective number of parameters (model: traceS): 219.4385 
Effective degrees of freedom (model: traceS): 1356.561 
Sigma (model: traceS): 0.2207646 
Sigma (ML): 0.2048194 
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -12.45141 
AIC (GWR p. 96, eq. 4.22): -305.9628 
Residual sum of squares: 66.11474 
Quasi-global R2: 0.8458944 

Effects Plot

Column

Map of the effects of the number of parks within a 10 minute walk (parks) coefficient on the log of housing value in the San Francisco Bay Area.