Set working directory and read file
setwd("/Users/annapeterson/Desktop/Classes/GEOG6000/Lab10")
boston = st_read("/Users/annapeterson/Desktop/Classes/GEOG6000/lab10/boston.shp",
quiet = TRUE)
Mapping House Prices
Setting up neighborhood structure and spatial weight matrix
boston$log_CMEDV = log(boston$CMEDV)
boston_nbq = poly2nb(boston)
boston_listw = nb2listw(boston_nbq)
## Warning in st_centroid.sf(boston): st_centroid assumes attributes are constant
## over geometries of x
Plot of House Prices
Linear Regression Model
I chose to compare: industry, crime, socioeconomic status, and nitric oxide levels.
Prep data
boston$l_crime = log(boston$CRIM)
boston$l_LSTAT = log(boston$LSTAT)
boston$l_nox = log(boston$NOX)
Linear model
boston_lm = lm(log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS, data = boston)
summary(boston_lm)
##
## Call:
## lm(formula = log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS, data = boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.89785 -0.13891 -0.00201 0.13437 0.90420
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.401727 0.091451 48.132 < 2e-16 ***
## l_crime -0.033958 0.008308 -4.087 5.08e-05 ***
## l_LSTAT -0.516041 0.021873 -23.593 < 2e-16 ***
## l_nox 0.228824 0.096364 2.375 0.0179 *
## INDUS -0.002745 0.002491 -1.102 0.2711
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2256 on 501 degrees of freedom
## Multiple R-squared: 0.6972, Adjusted R-squared: 0.6948
## F-statistic: 288.4 on 4 and 501 DF, p-value: < 2.2e-16
Spatial Autocorrelation on Residuals
boston_moran = moran.mc(residuals(boston_lm),
listw = boston_listw,
nsim = 999,
alternative = "greater")
boston_moran
##
## Monte-Carlo simulation of Moran I
##
## data: residuals(boston_lm)
## weights: boston_listw
## number of simulations + 1: 1000
##
## statistic = 0.52287, observed rank = 1000, p-value = 0.001
## alternative hypothesis: greater
The p-value is significant (p = 0.001) and a large statistic (0.523) so we can reject the null. This leads me to the assumption that there is spatial autocorrelation in our residuals.
Spatial Regression Model
I’m using spatial filtering and I seriously thought I’d leave eigenvectors in my geophysics degree hahahah. I think I could also use Lagrange Multipliers to determine if the spatial error is responsible for the autocorrelation. I figured since I did that in the previous lab, it would be good practice to try spatial filtering instead. Maybe I’ve misintrepreted its use so please correct me if I have these mixed up.
boston_sf = SpatialFiltering(boston_lm,
nb = boston_nbq,
style = "C",
alpha = 0.25,
ExactEV = FALSE,
data = boston)
boston_sf
## Step SelEvec Eval MinMi ZMinMi Pr(ZI) R2
## 0 0 0 0.00000000 0.4896362700 19.326222 3.232368e-83 0.6972148
## 1 1 7 1.01976116 0.4498779811 17.938340 5.919767e-72 0.7183388
## 2 2 4 1.05595298 0.4244771353 17.114182 1.163385e-65 0.7296685
## 3 3 2 1.08892867 0.3990835858 16.293918 1.090166e-59 0.7396196
## 4 4 32 0.78285626 0.3737948378 15.389422 1.927462e-53 0.7557167
## 5 5 5 1.04657625 0.3493298198 14.581540 3.681195e-48 0.7642881
## 6 6 17 0.91138421 0.3312103496 13.985240 1.918282e-44 0.7716496
## 7 7 26 0.82434374 0.3147100720 13.430103 4.028529e-41 0.7790429
## 8 8 24 0.84263531 0.2977183493 12.857417 7.814409e-38 0.7859328
## 9 9 1 1.10200225 0.2815984044 12.381431 3.294177e-35 0.7901390
## 10 10 34 0.75990795 0.2652377927 11.810447 3.447244e-32 0.7970799
## 11 11 27 0.81838257 0.2487065891 11.243033 2.506211e-29 0.8029683
## 12 12 28 0.81072471 0.2324894806 10.684470 1.203347e-26 0.8084942
## 13 13 15 0.93758388 0.2207091225 10.331850 5.057587e-25 0.8116412
## 14 14 39 0.72128688 0.2092026374 9.944024 2.677864e-23 0.8158737
## 15 15 20 0.88217580 0.1973889194 9.574645 1.022079e-21 0.8190501
## 16 16 33 0.76432411 0.1855425786 9.178597 4.367440e-20 0.8227538
## 17 17 21 0.86490095 0.1732409123 8.781989 1.606084e-18 0.8259062
## 18 18 9 0.98964053 0.1628175180 8.487154 2.117598e-17 0.8281009
## 19 19 3 1.06422280 0.1524737629 8.209926 2.213257e-16 0.8300511
## 20 20 8 1.00698727 0.1429157578 7.950968 1.850603e-15 0.8319310
## 21 21 22 0.85973178 0.1334705523 7.664660 1.793059e-14 0.8341168
## 22 22 31 0.78832267 0.1241954557 7.370544 1.699327e-13 0.8364335
## 23 23 11 0.97301135 0.1148022422 7.106080 1.193852e-12 0.8382238
## 24 24 46 0.67437647 0.1059041016 6.805127 1.009599e-11 0.8407560
## 25 25 13 0.95281777 0.0968899706 6.549322 5.779882e-11 0.8424331
## 26 26 35 0.75599917 0.0885398524 6.283693 3.306231e-10 0.8444043
## 27 27 63 0.55909757 0.0805691038 6.001333 1.957037e-09 0.8469960
## 28 28 104 0.34076882 0.0736949595 5.735656 9.713589e-09 0.8509341
## 29 29 10 0.98793676 0.0667899731 5.572382 2.512794e-08 0.8520515
## 30 30 65 0.54648673 0.0604311216 5.356994 8.461776e-08 0.8539871
## 31 31 109 0.31881220 0.0547457718 5.139829 2.749887e-07 0.8571308
## 32 32 82 0.45322663 0.0488762386 4.932798 8.105980e-07 0.8592046
## 33 33 75 0.49287049 0.0430599101 4.733740 2.204205e-06 0.8610252
## 34 34 84 0.44188320 0.0371815737 4.525164 6.034866e-06 0.8630438
## 35 35 95 0.38070448 0.0313324511 4.310157 1.631385e-05 0.8653367
## 36 36 70 0.51697397 0.0256567494 4.120960 3.772966e-05 0.8668924
## 37 37 81 0.45826403 0.0205552467 3.949054 7.846070e-05 0.8684437
## 38 38 16 0.91758361 0.0155572879 3.849040 1.185815e-04 0.8691727
## 39 39 59 0.58615433 0.0105308852 3.697228 2.179663e-04 0.8703151
## 40 40 45 0.68548530 0.0054426677 3.556654 3.756079e-04 0.8712854
## 41 41 101 0.35215298 0.0002866169 3.368322 7.562722e-04 0.8731715
## 42 42 6 1.02898787 -0.0046396103 3.287167 1.012010e-03 0.8737760
## 43 43 25 0.83773459 -0.0093519643 3.184700 1.449039e-03 0.8744781
## 44 44 18 0.90304794 -0.0140554507 3.092393 1.985495e-03 0.8751219
## 45 45 93 0.39144335 -0.0185922067 2.934601 3.339763e-03 0.8765036
## 46 46 38 0.73006172 -0.0228945863 2.833374 4.605944e-03 0.8772092
## 47 47 68 0.52901479 -0.0271270839 2.707103 6.787319e-03 0.8781437
## 48 48 23 0.84794127 -0.0310977371 2.637860 8.343110e-03 0.8786942
## 49 49 123 0.25392710 -0.0351391419 2.485342 1.294271e-02 0.8803901
## 50 50 41 0.71048089 -0.0391669034 2.392848 1.671815e-02 0.8810328
## 51 51 119 0.26690194 -0.0430305171 2.250225 2.443466e-02 0.8825158
## 52 52 43 0.69500963 -0.0467783223 2.168011 3.015784e-02 0.8831094
## 53 53 12 0.96415014 -0.0505515193 2.123456 3.371566e-02 0.8835441
## 54 54 145 0.16665892 -0.0541588617 1.980296 4.767023e-02 0.8854465
## 55 55 171 0.08009749 -0.0579036289 1.821582 6.851837e-02 0.8885550
## 56 56 48 0.66613026 -0.0615362965 1.739901 8.187648e-02 0.8891114
## 57 57 96 0.37509701 -0.0651572681 1.621581 1.048931e-01 0.8900234
## 58 58 42 0.69961188 -0.0688145075 1.542865 1.228634e-01 0.8905468
## 59 59 53 0.62433833 -0.0723141949 1.461439 1.438950e-01 0.8910967
## 60 60 137 0.20528165 -0.0758763613 1.325706 1.849372e-01 0.8924764
## 61 61 122 0.25496869 -0.0792814257 1.203522 2.287743e-01 0.8935718
## 62 62 77 0.47524095 -0.0827020650 1.107039 2.682771e-01 0.8942243
## gamma
## 0 0.0000000
## 1 -1.3334797
## 2 0.9765790
## 3 -0.9152345
## 4 1.1640503
## 5 0.8494244
## 6 0.7871955
## 7 -0.7888885
## 8 0.7615628
## 9 0.5950332
## 10 0.7643742
## 11 0.7040428
## 12 0.6820255
## 13 -0.5146911
## 14 -0.5968864
## 15 -0.5170967
## 16 0.5583571
## 17 -0.5151365
## 18 0.4298215
## 19 0.4051692
## 20 0.3978012
## 21 0.4289446
## 22 0.4416039
## 23 -0.3882003
## 24 0.4616900
## 25 0.3757276
## 26 -0.4073471
## 27 -0.4670815
## 28 -0.5757622
## 29 -0.3066933
## 30 -0.4036452
## 31 0.5144176
## 32 -0.4178211
## 33 0.3914732
## 34 -0.4122170
## 35 0.4393294
## 36 0.3618691
## 37 0.3613733
## 38 -0.2477084
## 39 -0.3101036
## 40 0.2857970
## 41 0.3984573
## 42 -0.2255702
## 43 0.2431220
## 44 0.2327870
## 45 -0.3410384
## 46 0.2437221
## 47 -0.2804701
## 48 0.2152529
## 49 0.3778388
## 50 0.2325867
## 51 -0.3533256
## 52 0.2235303
## 53 -0.1912817
## 54 0.4001803
## 55 0.5115327
## 56 0.2164089
## 57 0.2770783
## 58 -0.2099055
## 59 -0.2151389
## 60 -0.3408008
## 61 0.3036525
## 62 -0.2343610
Plotting the eigenvectors
Apply filter to the linear model
E_sel = fitted(boston_sf)
lm_sf = lm(log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS + E_sel,
data = boston)
summary(lm_sf)
##
## Call:
## lm(formula = log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS +
## E_sel, data = boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.54047 -0.08004 0.00351 0.08245 0.56076
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.401727 0.057743 76.230 < 2e-16 ***
## l_crime -0.033958 0.005246 -6.473 2.57e-10 ***
## l_LSTAT -0.516041 0.013811 -37.365 < 2e-16 ***
## l_nox 0.228824 0.060845 3.761 0.000192 ***
## INDUS -0.002745 0.001573 -1.745 0.081713 .
## E_selvec7 -1.333480 0.142416 -9.363 < 2e-16 ***
## E_selvec4 0.976579 0.142416 6.857 2.39e-11 ***
## E_selvec2 -0.915235 0.142416 -6.426 3.41e-10 ***
## E_selvec32 1.164050 0.142416 8.174 3.24e-15 ***
## E_selvec5 0.849424 0.142416 5.964 5.06e-09 ***
## E_selvec17 0.787196 0.142416 5.527 5.58e-08 ***
## E_selvec26 -0.788888 0.142416 -5.539 5.24e-08 ***
## E_selvec24 0.761563 0.142416 5.347 1.44e-07 ***
## E_selvec1 0.595033 0.142416 4.178 3.55e-05 ***
## E_selvec34 0.764374 0.142416 5.367 1.30e-07 ***
## E_selvec27 0.704043 0.142416 4.944 1.09e-06 ***
## E_selvec28 0.682026 0.142416 4.789 2.30e-06 ***
## E_selvec15 -0.514691 0.142416 -3.614 0.000336 ***
## E_selvec39 -0.596886 0.142416 -4.191 3.36e-05 ***
## E_selvec20 -0.517097 0.142416 -3.631 0.000316 ***
## E_selvec33 0.558357 0.142416 3.921 0.000102 ***
## E_selvec21 -0.515136 0.142416 -3.617 0.000333 ***
## E_selvec9 0.429821 0.142416 3.018 0.002692 **
## E_selvec3 0.405169 0.142416 2.845 0.004649 **
## E_selvec8 0.397801 0.142416 2.793 0.005447 **
## E_selvec22 0.428945 0.142416 3.012 0.002746 **
## E_selvec31 0.441604 0.142416 3.101 0.002054 **
## E_selvec11 -0.388200 0.142416 -2.726 0.006671 **
## E_selvec46 0.461690 0.142416 3.242 0.001278 **
## E_selvec13 0.375728 0.142416 2.638 0.008630 **
## E_selvec35 -0.407347 0.142416 -2.860 0.004435 **
## E_selvec63 -0.467081 0.142416 -3.280 0.001122 **
## E_selvec104 -0.575762 0.142416 -4.043 6.24e-05 ***
## E_selvec10 -0.306693 0.142416 -2.154 0.031823 *
## E_selvec65 -0.403645 0.142416 -2.834 0.004805 **
## E_selvec109 0.514418 0.142416 3.612 0.000339 ***
## E_selvec82 -0.417821 0.142416 -2.934 0.003524 **
## E_selvec75 0.391473 0.142416 2.749 0.006228 **
## E_selvec84 -0.412217 0.142416 -2.894 0.003987 **
## E_selvec95 0.439329 0.142416 3.085 0.002165 **
## E_selvec70 0.361869 0.142416 2.541 0.011399 *
## E_selvec81 0.361373 0.142416 2.537 0.011511 *
## E_selvec16 -0.247708 0.142416 -1.739 0.082678 .
## E_selvec59 -0.310104 0.142416 -2.177 0.029979 *
## E_selvec45 0.285797 0.142416 2.007 0.045386 *
## E_selvec101 0.398457 0.142416 2.798 0.005371 **
## E_selvec6 -0.225570 0.142416 -1.584 0.113941
## E_selvec25 0.243122 0.142416 1.707 0.088506 .
## E_selvec18 0.232787 0.142416 1.635 0.102859
## E_selvec93 -0.341038 0.142416 -2.395 0.017054 *
## E_selvec38 0.243722 0.142416 1.711 0.087725 .
## E_selvec68 -0.280470 0.142416 -1.969 0.049539 *
## E_selvec23 0.215253 0.142416 1.511 0.131397
## E_selvec123 0.377839 0.142416 2.653 0.008266 **
## E_selvec41 0.232587 0.142416 1.633 0.103154
## E_selvec119 -0.353326 0.142416 -2.481 0.013477 *
## E_selvec43 0.223530 0.142416 1.570 0.117238
## E_selvec12 -0.191282 0.142416 -1.343 0.179927
## E_selvec145 0.400180 0.142416 2.810 0.005177 **
## E_selvec171 0.511533 0.142416 3.592 0.000365 ***
## E_selvec48 0.216409 0.142416 1.520 0.129343
## E_selvec96 0.277078 0.142416 1.946 0.052346 .
## E_selvec42 -0.209906 0.142416 -1.474 0.141228
## E_selvec53 -0.215139 0.142416 -1.511 0.131601
## E_selvec137 -0.340801 0.142416 -2.393 0.017131 *
## E_selvec122 0.303653 0.142416 2.132 0.033549 *
## E_selvec77 -0.234361 0.142416 -1.646 0.100560
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1424 on 439 degrees of freedom
## Multiple R-squared: 0.8942, Adjusted R-squared: 0.8783
## F-statistic: 56.23 on 66 and 439 DF, p-value: < 2.2e-16
Re-run moran test then use anova to see which one is the best model
moran.test(residuals(lm_sf), boston_listw)
##
## Moran I test under randomisation
##
## data: residuals(lm_sf)
## weights: boston_listw
##
## Moran I statistic standard deviate = -3.2049, p-value = 0.9993
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## -0.0882748056 -0.0019801980 0.0007249807
anova(boston_lm, lm_sf)
## Analysis of Variance Table
##
## Model 1: log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS
## Model 2: log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS + E_sel
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 501 25.4877
## 2 439 8.9039 62 16.584 13.188 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is incredibly small which suggests that spatial filtering
has improved the accuracy of our model.
Goodness-of-fit
summary(lm_sf)
##
## Call:
## lm(formula = log_CMEDV ~ l_crime + l_LSTAT + l_nox + INDUS +
## E_sel, data = boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.54047 -0.08004 0.00351 0.08245 0.56076
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.401727 0.057743 76.230 < 2e-16 ***
## l_crime -0.033958 0.005246 -6.473 2.57e-10 ***
## l_LSTAT -0.516041 0.013811 -37.365 < 2e-16 ***
## l_nox 0.228824 0.060845 3.761 0.000192 ***
## INDUS -0.002745 0.001573 -1.745 0.081713 .
## E_selvec7 -1.333480 0.142416 -9.363 < 2e-16 ***
## E_selvec4 0.976579 0.142416 6.857 2.39e-11 ***
## E_selvec2 -0.915235 0.142416 -6.426 3.41e-10 ***
## E_selvec32 1.164050 0.142416 8.174 3.24e-15 ***
## E_selvec5 0.849424 0.142416 5.964 5.06e-09 ***
## E_selvec17 0.787196 0.142416 5.527 5.58e-08 ***
## E_selvec26 -0.788888 0.142416 -5.539 5.24e-08 ***
## E_selvec24 0.761563 0.142416 5.347 1.44e-07 ***
## E_selvec1 0.595033 0.142416 4.178 3.55e-05 ***
## E_selvec34 0.764374 0.142416 5.367 1.30e-07 ***
## E_selvec27 0.704043 0.142416 4.944 1.09e-06 ***
## E_selvec28 0.682026 0.142416 4.789 2.30e-06 ***
## E_selvec15 -0.514691 0.142416 -3.614 0.000336 ***
## E_selvec39 -0.596886 0.142416 -4.191 3.36e-05 ***
## E_selvec20 -0.517097 0.142416 -3.631 0.000316 ***
## E_selvec33 0.558357 0.142416 3.921 0.000102 ***
## E_selvec21 -0.515136 0.142416 -3.617 0.000333 ***
## E_selvec9 0.429821 0.142416 3.018 0.002692 **
## E_selvec3 0.405169 0.142416 2.845 0.004649 **
## E_selvec8 0.397801 0.142416 2.793 0.005447 **
## E_selvec22 0.428945 0.142416 3.012 0.002746 **
## E_selvec31 0.441604 0.142416 3.101 0.002054 **
## E_selvec11 -0.388200 0.142416 -2.726 0.006671 **
## E_selvec46 0.461690 0.142416 3.242 0.001278 **
## E_selvec13 0.375728 0.142416 2.638 0.008630 **
## E_selvec35 -0.407347 0.142416 -2.860 0.004435 **
## E_selvec63 -0.467081 0.142416 -3.280 0.001122 **
## E_selvec104 -0.575762 0.142416 -4.043 6.24e-05 ***
## E_selvec10 -0.306693 0.142416 -2.154 0.031823 *
## E_selvec65 -0.403645 0.142416 -2.834 0.004805 **
## E_selvec109 0.514418 0.142416 3.612 0.000339 ***
## E_selvec82 -0.417821 0.142416 -2.934 0.003524 **
## E_selvec75 0.391473 0.142416 2.749 0.006228 **
## E_selvec84 -0.412217 0.142416 -2.894 0.003987 **
## E_selvec95 0.439329 0.142416 3.085 0.002165 **
## E_selvec70 0.361869 0.142416 2.541 0.011399 *
## E_selvec81 0.361373 0.142416 2.537 0.011511 *
## E_selvec16 -0.247708 0.142416 -1.739 0.082678 .
## E_selvec59 -0.310104 0.142416 -2.177 0.029979 *
## E_selvec45 0.285797 0.142416 2.007 0.045386 *
## E_selvec101 0.398457 0.142416 2.798 0.005371 **
## E_selvec6 -0.225570 0.142416 -1.584 0.113941
## E_selvec25 0.243122 0.142416 1.707 0.088506 .
## E_selvec18 0.232787 0.142416 1.635 0.102859
## E_selvec93 -0.341038 0.142416 -2.395 0.017054 *
## E_selvec38 0.243722 0.142416 1.711 0.087725 .
## E_selvec68 -0.280470 0.142416 -1.969 0.049539 *
## E_selvec23 0.215253 0.142416 1.511 0.131397
## E_selvec123 0.377839 0.142416 2.653 0.008266 **
## E_selvec41 0.232587 0.142416 1.633 0.103154
## E_selvec119 -0.353326 0.142416 -2.481 0.013477 *
## E_selvec43 0.223530 0.142416 1.570 0.117238
## E_selvec12 -0.191282 0.142416 -1.343 0.179927
## E_selvec145 0.400180 0.142416 2.810 0.005177 **
## E_selvec171 0.511533 0.142416 3.592 0.000365 ***
## E_selvec48 0.216409 0.142416 1.520 0.129343
## E_selvec96 0.277078 0.142416 1.946 0.052346 .
## E_selvec42 -0.209906 0.142416 -1.474 0.141228
## E_selvec53 -0.215139 0.142416 -1.511 0.131601
## E_selvec137 -0.340801 0.142416 -2.393 0.017131 *
## E_selvec122 0.303653 0.142416 2.132 0.033549 *
## E_selvec77 -0.234361 0.142416 -1.646 0.100560
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1424 on 439 degrees of freedom
## Multiple R-squared: 0.8942, Adjusted R-squared: 0.8783
## F-statistic: 56.23 on 66 and 439 DF, p-value: < 2.2e-16
I stated above that our p-value is significant and that it is incredibly small comparatively to our original linear model. Also our r-squared is 0.878 which is indicative of a strong relationship between the dependent variables and the linear model.
Variables influencing house prices I originally
chose these variables because I know I wouldn’t want to live in a high
crime area with a lot of industry. We would think being in an industrial
area we would also see high rates of nitric oxide with semi traffic.
Looking at the actual statistical data, we can say:
Crime: Has a negative relationship with house value and
has a small p-value than the others so maybe the relationship is
stronger as than industry and pollution variables.
Industry: Negative relationship and has a p-value about
0.05 which means that it does not have an impact on house value.
Nitric Oxide: Positive relationship and small p-value
which has an impact on house value. We could intrepret this as maybe
being close to a lot of traffic like highways so not only are you
getting pollution, but also road noise. That would be interesting to
add.
Lower Status Population: Negative relationship and very
small p-value which means it has an impact on house value.
Conclusion I don’t think this is adequate because
there’s a ton more factors involved in determining house values and I
think we could get a better fitting model with more variables like road
noise, carbon dioxide, school ratings, and house sizes. But it is
adequate enough to show that there is a relationship between these.