The dataset source: https://www.openintro.org/data/?data=county_complete. The data has been collected for 3142 counties in the United States. It is a dataframe consisting of 3142 observations and 188 variables. We will be estimating models for bachelors_2017 dependent variable in the State of California.
setwd('C:\\Users\\HP\\OneDrive\\Pulpit\\DATA SCIENCE\\YEAR II\\Spatial Econometrics in R')
# reading maps with rgdal::
map <- readOGR(dsn="CA_Counties_TIGER2016.shp")
plot(map)
#Reading data from csv file
counties <- read.csv('county_complete.csv')
head(counties)
## fips state name pop2000 pop2010 pop2011 pop2012 pop2013 pop2014
## 1 1001 Alabama Autauga County 43671 54571 55199 54927 54695 54864
## 2 1003 Alabama Baldwin County 140415 182265 186534 190048 194736 199064
## 3 1005 Alabama Barbour County 29038 27457 27351 27175 26947 26749
## 4 1007 Alabama Bibb County 20826 22915 22745 22658 22503 22533
## 5 1009 Alabama Blount County 51024 57322 57562 57595 57623 57546
## 6 1011 Alabama Bullock County 11714 10914 10675 10612 10549 10673
## pop2015 pop2016 pop2017 age_under_5_2010 age_under_5_2017 age_under_18_2010
## 1 54838 55278 55504 6.6 5.7 26.8
## 2 202863 207509 212628 6.1 5.7 23.0
## 3 26264 25774 25270 6.2 5.5 21.9
## 4 22561 22633 22668 6.0 5.7 22.7
## 5 57590 57562 58013 6.3 6.1 24.6
## 6 10419 10441 10309 6.8 5.8 22.3
## age_over_65_2010 age_over_65_2017 median_age_2017 female_2010 white_2010
## 1 12.0 14.3 37.8 51.3 78.5
## 2 16.8 19.0 42.6 51.1 85.7
## 3 14.2 17.4 39.7 46.9 48.0
## 4 12.7 15.1 39.8 46.3 75.8
## 5 14.7 17.4 40.9 50.5 92.6
## 6 13.5 15.2 40.8 45.8 23.0
## black_2010 black_2017 native_2010 native_2017 asian_2010 asian_2017
## 1 17.7 9.55 0.4 0.15 0.9 0.47
## 2 9.4 4.77 0.7 0.41 0.7 0.35
## 3 46.9 24.02 0.4 0.10 0.4 0.31
## 4 22.0 11.03 0.3 0.18 0.1 0.00
## 5 1.3 0.79 0.5 0.18 0.2 0.07
## 6 70.2 37.82 0.2 0.52 0.2 0.35
## pac_isl_2010 pac_isl_2017 other_single_race_2017 two_plus_races_2010
## 1 NA 0.04 0.65 1.6
## 2 NA 0.00 0.39 1.5
## 3 NA 0.00 1.87 0.9
## 4 NA 0.00 0.02 0.9
## 5 NA 0.00 0.37 1.2
## 6 NA 0.00 0.01 0.8
## two_plus_races_2017 hispanic_2010 hispanic_2017 white_not_hispanic_2010
## 1 0.84 2.4 2.67 77.2
## 2 0.82 4.4 4.44 83.5
## 3 0.41 5.1 4.21 46.8
## 4 0.42 1.8 2.35 75.0
## 5 0.85 8.1 9.01 88.9
## 6 0.33 7.1 0.33 21.9
## white_not_hispanic_2017 speak_english_only_2017 no_move_in_one_plus_year_2010
## 1 75.42 96.2 86.3
## 2 83.08 94.5 83.0
## 3 45.74 94.3 83.0
## 4 74.62 97.8 90.5
## 5 87.37 92.3 87.2
## 6 21.61 97.2 88.5
## foreign_born_2010 foreign_spoken_at_home_2010 women_16_to_50_birth_rate_2017
## 1 2.0 3.7 7.4
## 2 3.6 5.5 5.1
## 3 2.8 4.7 7.2
## 4 0.7 1.5 7.6
## 5 4.7 7.2 5.6
## 6 1.1 3.8 3.5
## hs_grad_2010 hs_grad_2016 hs_grad_2017 some_college_2016 some_college_2017
## 1 85.3 87.6 87.7 28.7 29.1
## 2 87.6 90.0 90.2 31.8 31.6
## 3 71.9 73.8 73.1 26.0 25.5
## 4 74.5 80.7 82.1 26.9 25.0
## 5 74.7 80.0 79.8 34.0 34.4
## 6 74.7 66.6 71.4 22.2 21.3
## bachelors_2010 bachelors_2016 bachelors_2017 veterans_2010 veterans_2017
## 1 21.7 24.6 25.0 5817 12.6
## 2 26.8 29.5 30.7 20396 11.9
## 3 13.5 12.9 12.0 2327 8.0
## 4 10.0 12.0 13.2 1883 7.4
## 5 12.5 13.1 13.1 4072 9.6
## 6 12.0 10.3 13.4 943 4.5
## mean_work_travel_2010 mean_work_travel_2017 broadband_2017 computer_2017
## 1 25.1 25.8 76.6 86.2
## 2 25.8 27.0 74.5 86.9
## 3 23.8 23.4 57.2 73.4
## 4 28.3 30.0 62.0 74.8
## 5 33.2 35.0 65.8 78.2
## 6 28.1 29.8 49.4 64.2
## housing_units_2010 homeownership_2010 housing_multi_unit_2010
## 1 22135 77.5 7.2
## 2 104061 76.7 22.6
## 3 11829 68.0 11.1
## 4 8981 82.9 6.6
## 5 23887 82.0 3.7
## 6 4493 76.9 9.9
## median_val_owner_occupied_2010 households_2010 households_2017
## 1 133900 19718 21054
## 2 177200 69476 76133
## 3 88200 9795 9191
## 4 81200 7441 6916
## 5 113700 20605 20690
## 6 66300 3732 3670
## persons_per_household_2010 persons_per_household_2017 per_capita_income_2010
## 1 2.70 2.59 24568
## 2 2.50 2.63 26469
## 3 2.52 2.54 15875
## 4 3.02 2.97 19918
## 5 2.73 2.76 21070
## 6 2.85 2.74 20289
## per_capita_income_2017 metro_2013 median_household_income_2010
## 1 27841.70 1 53255
## 2 27779.85 1 50147
## 3 17891.73 0 33219
## 4 20572.05 1 41770
## 5 21367.39 1 45549
## 6 15444.16 0 31602
## median_household_income_2016 median_household_income_2017
## 1 54487 55317
## 2 56460 52562
## 3 32884 33368
## 4 43079 43404
## 5 47213 47412
## 6 34278 29655
## private_nonfarm_establishments_2009 private_nonfarm_employment_2009
## 1 877 10628
## 2 4812 52233
## 3 522 7990
## 4 318 2927
## 5 749 6968
## 6 120 1919
## percent_change_private_nonfarm_employment_2009
## 1 16.6
## 2 17.4
## 3 -27.0
## 4 -14.0
## 5 -11.4
## 6 -18.5
## nonemployment_establishments_2009 firms_2007 black_owned_firms_2007
## 1 2971 4067 15.2
## 2 14175 19035 2.7
## 3 1527 1667 NA
## 4 1192 1385 14.9
## 5 3501 4458 NA
## 6 390 417 NA
## native_owned_firms_2007 asian_owned_firms_2007 pac_isl_owned_firms_2007
## 1 NA 1.3 NA
## 2 0.4 1.0 NA
## 3 NA NA NA
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## hispanic_owned_firms_2007 women_owned_firms_2007 manufacturer_shipments_2007
## 1 0.7 31.7 NA
## 2 1.3 27.3 1410273
## 3 NA 27.0 NA
## 4 NA NA 0
## 5 NA 23.2 341544
## 6 NA 38.8 NA
## mercent_whole_sales_2007 sales_2007 sales_per_capita_2007
## 1 NA 598175 12003
## 2 NA 2966489 17166
## 3 NA 188337 6334
## 4 NA 124707 5804
## 5 NA 319700 5622
## 6 NA 43810 3995
## accommodation_food_service_2007 building_permits_2010 fed_spending_2009
## 1 88157 191 331142
## 2 436955 696 1119082
## 3 NA 10 240308
## 4 10757 8 163201
## 5 20941 18 294114
## 6 3670 1 108846
## area_2010 density_2010 smoking_ban_2010 poverty_2010 poverty_2016
## 1 594.44 91.8 none 10.6 13.5
## 2 1589.78 114.6 none 12.2 11.7
## 3 884.88 31.0 partial 25.0 29.9
## 4 622.58 36.8 none 12.6 20.1
## 5 644.78 88.9 none 13.4 14.1
## 6 622.81 17.5 none 25.3 32.6
## poverty_2017 poverty_age_under_5_2017 poverty_age_under_18_2017
## 1 13.7 17.2 20.0
## 2 11.8 19.4 15.9
## 3 27.2 56.8 44.9
## 4 15.2 21.6 25.9
## 5 15.6 29.5 25.3
## 6 28.5 59.7 50.2
## civilian_labor_force_2007 employed_2007 unemployed_2007
## 1 24383 23577 806
## 2 82659 80099 2560
## 3 10334 9684 650
## 4 8791 8432 359
## 5 26629 25780 849
## 6 3653 3308 345
## unemployment_rate_2007 civilian_labor_force_2008 employed_2008
## 1 3.31 24687 23420
## 2 3.10 83223 79372
## 3 6.29 10161 9267
## 4 4.08 8749 8241
## 5 3.19 26698 25453
## 6 9.44 3634 3251
## unemployed_2008 unemployment_rate_2008 civilian_labor_force_2009
## 1 1267 5.13 24703
## 2 3851 4.63 82451
## 3 894 8.80 10003
## 4 508 5.81 8742
## 5 1245 4.66 26480
## 6 383 10.54 3739
## employed_2009 unemployed_2009 unemployment_rate_2009
## 1 22301 2402 9.72
## 2 74403 8048 9.76
## 3 8572 1431 14.31
## 4 7581 1161 13.28
## 5 23832 2648 10.00
## 6 3156 583 15.59
## civilian_labor_force_2010 employed_2010 unemployed_2010
## 1 25713 23431 2282
## 2 83459 75120 8339
## 3 10221 8959 1262
## 4 8934 7914 1020
## 5 24906 22460 2446
## 6 4941 4356 585
## unemployment_rate_2010 civilian_labor_force_2011 employed_2011
## 1 8.87 25836 23677
## 2 9.99 85045 77418
## 3 12.35 9849 8712
## 4 11.42 8933 7996
## 5 9.82 25123 22939
## 6 11.84 4833 4272
## unemployed_2011 unemployment_rate_2011 civilian_labor_force_2012
## 1 2159 8.36 25740
## 2 7627 8.97 84414
## 3 1137 11.54 9362
## 4 937 10.49 8798
## 5 2184 8.69 24960
## 6 561 11.61 4736
## employed_2012 unemployed_2012 unemployment_rate_2012
## 1 23961 1779 6.91
## 2 78065 6349 7.52
## 3 8283 1079 11.53
## 4 8047 751 8.54
## 5 23244 1716 6.88
## 6 4245 491 10.37
## civilian_labor_force_2013 employed_2013 unemployed_2013
## 1 25810 24205 1605
## 2 85280 79626 5654
## 3 9099 8168 931
## 4 8705 8016 689
## 5 24887 23325 1562
## 6 4778 4331 447
## unemployment_rate_2013 civilian_labor_force_2014 employed_2014
## 1 6.22 25602 24107
## 2 6.63 86415 81115
## 3 10.23 8848 7916
## 4 7.91 8562 7945
## 5 6.28 24535 23032
## 6 9.36 4733 4315
## unemployed_2014 unemployment_rate_2014 civilian_labor_force_2015
## 1 1495 5.84 25602
## 2 5300 6.13 87705
## 3 932 10.53 8609
## 4 617 7.21 8572
## 5 1503 6.13 24473
## 6 418 8.83 4777
## employed_2015 unemployed_2015 unemployment_rate_2015
## 1 24272 1330 5.19
## 2 82843 4862 5.54
## 3 7844 765 8.89
## 4 8005 567 6.61
## 5 23152 1321 5.40
## 6 4398 379 7.93
## civilian_labor_force_2016 employed_2016 unemployed_2016
## 1 25918 24593 1325
## 2 90500 85656 4844
## 3 8402 7700 702
## 4 8607 8050 557
## 5 24576 23248 1328
## 6 4824 4493 331
## unemployment_rate_2016 uninsured_2017 uninsured_age_under_6_2017
## 1 5.11 8.8 1.1
## 2 5.35 10.8 2.4
## 3 8.36 12.3 4.1
## 4 6.47 8.1 2.7
## 5 5.40 11.0 3.9
## 6 6.86 14.1 0
## uninsured_age_under_19_2017 uninsured_age_over_74_2017
## 1 3.2 0.0
## 2 3.5 0.4
## 3 4.6 0.0
## 4 1.8 0.0
## 5 5.7 0.2
## 6 1.2 0.0
## civilian_labor_force_2017 employed_2017 unemployed_2017
## 1 25909 24908 1001
## 2 91567 87915 3652
## 3 8236 7750 486
## 4 8506 8133 373
## 5 24494 23509 985
## 6 4812 4575 237
## unemployment_rate_2017 age_over_18_2019 age_over_65_2019 age_over_85_2019
## 1 3.86 76.2 15.0 1.6
## 2 3.99 78.3 20.0 1.9
## 3 5.90 79.1 18.6 1.6
## 4 4.39 79.4 15.9 2.0
## 5 4.02 76.8 17.9 1.8
## 6 4.93 79.2 16.0 1.7
## age_under_5_2019 asian_2019 avg_family_size_2019 bachelors_2019 black_2019
## 1 5.8 1.0 3.09 26.6 19.0
## 2 5.5 0.9 3.24 31.9 9.3
## 3 5.3 0.5 3.01 11.6 47.6
## 4 5.8 0.1 3.74 10.4 22.3
## 5 5.9 0.4 3.33 13.1 1.6
## 6 5.3 0.5 3.30 12.1 74.8
## hispanic_2019 household_has_broadband_2019 household_has_computer_2019
## 1 2.8 80.6 73.0
## 2 4.6 81.8 76.3
## 3 4.4 60.5 51.9
## 4 2.6 69.2 54.7
## 5 9.3 73.0 63.5
## 6 2.6 60.1 52.5
## household_has_smartphone_2019 households_2019
## 1 78.4 21397
## 2 81.7 80930
## 3 64.2 9345
## 4 66.6 6891
## 5 70.1 20847
## 6 70.6 3521
## households_speak_asian_or_pac_isl_2019 households_speak_limited_english_2019
## 1 1.8 0.7
## 2 0.6 1.2
## 3 0.6 1.6
## 4 0.0 0.6
## 5 0.1 1.8
## 6 0.4 1.4
## households_speak_other_2019 households_speak_other_indo_euro_lang_2019
## 1 0.2 0.3
## 2 0.0 1.8
## 3 0.0 1.1
## 4 0.0 0.5
## 5 0.2 0.9
## 6 0.0 1.6
## households_speak_spanish_2019 housing_mobile_homes_2019
## 1 2.9 26.7
## 2 4.6 24.8
## 3 5.2 39.1
## 4 1.9 25.6
## 5 6.6 21.2
## 6 3.6 28.9
## housing_one_unit_structures_2019 housing_two_unit_structures_2019
## 1 17.3 73.3
## 2 11.5 75.2
## 3 26.1 60.9
## 4 29.7 74.4
## 5 24.0 78.8
## 6 39.3 71.1
## hs_grad_2019 mean_household_income_2019 mean_work_travel_2019 median_age_2019
## 1 88.5 75326 24.4 38.2
## 2 90.8 80986 NA 43.0
## 3 73.2 47068 NA 40.4
## 4 79.1 60182 NA 40.9
## 5 80.5 65639 NA 40.7
## 6 74.7 48571 27.2 40.2
## median_household_income_2019 median_individual_income_2019
## 1 58731 29725
## 2 58320 29802
## 3 32525 17963
## 4 47542 21958
## 5 49358 26976
## 6 37785 21571
## median_individual_income_age_25plus_2019 native_2019 other_single_race_2019
## 1 40778 0.3 0.7
## 2 37897 0.8 1.1
## 3 27434 0.3 3.6
## 4 28789 0.1 0.0
## 5 39004 0.1 0.9
## 6 29516 0.0 2.0
## pac_isl_2019 per_capita_income_2019 persons_per_household_2019 pop_2019
## 1 0 29819 2.56 55380
## 2 0 32626 2.59 212830
## 3 0 18473 2.41 25361
## 4 0 20778 2.99 22493
## 5 0 24747 2.74 57681
## 6 0 20877 2.79 10248
## poverty_2019 poverty_65_and_over_2019 poverty_under_18_2019
## 1 15.2 8.7 23.2
## 2 10.4 7.4 13.4
## 3 30.7 16.8 50.1
## 4 NA NA NA
## 5 13.6 10.9 18.4
## 6 NA NA NA
## two_plus_races_2019 unemployment_rate_2019 uninsured_2019
## 1 2.2 3.5 7.1
## 2 1.7 4.0 8.9
## 3 1.2 9.4 11.3
## 4 0.6 7.0 10.7
## 5 1.6 3.1 10.8
## 6 0.8 4.1 11.4
## uninsured_65_and_older_2019 uninsured_under_19_2019 uninsured_under_6_2019
## 1 0.0 1.7 1.7
## 2 0.3 3.8 2.2
## 3 0.3 3.3 3.4
## 4 0.0 2.0 4.5
## 5 0.2 5.9 6.1
## 6 0.0 1.0 2.1
## veterans_2019 white_2019 white_not_hispanic_2019
## 1 12.6 76.8 74.6
## 2 11.8 86.2 83.1
## 3 6.6 46.8 45.8
## 4 8.0 76.8 74.5
## 5 7.7 95.5 86.9
## 6 3.3 21.9 21.4
#Variable selection, we will be using 17 variables out of 188. Below we also filter data only for California.
# Filter DF for desired columns only
counties <- counties[, c('state', 'name', 'fips',
'pop2017', 'median_age_2017', 'bachelors_2017',
'poverty_2017', 'poverty_age_under_18_2017',
'uninsured_2017', 'unemployment_rate_2017',
'per_capita_income_2017', 'median_household_income_2017',
'women_16_to_50_birth_rate_2017', 'hs_grad_2017',
'mean_work_travel_2017', 'broadband_2017',
'persons_per_household_2017')]
counties_CA <- counties[counties$state == 'California',]
nrow(counties_CA)
## [1] 58
# Bachelor's degree
variable <- counties_CA$bachelors_2017
summary(variable)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.40 16.90 23.30 26.42 33.67 57.50
intervals<-5
colors<-brewer.pal(intervals, "BuPu") # choice of colors
classes<-classIntervals(variable, intervals, style="fixed",
fixedBreaks=c(seq(10, 60, 10)))
color.table<-findColours(classes, colors)
plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Percent of population that earned a bachelor's degree")
# Median household income
summary(counties_CA$median_household_income_2017)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 36563 47496 56819 61047 71474 106761
intervals<-5
colors<-brewer.pal(intervals, "OrRd") # choice of colors
classes<-classIntervals(counties_CA$median_household_income_2017, intervals, style="fixed",
fixedBreaks=c(seq(35000, 110000, 15000)))
color.table<-findColours(classes, colors)
plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Median household income")
# Percent of population under age 18 below poverty level
summary(counties_CA$poverty_age_under_18_2017)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.80 12.85 19.25 19.59 24.43 36.20
intervals<-7
colors<-brewer.pal(intervals, "OrRd") # choice of colors
classes<-classIntervals(counties_CA$poverty_age_under_18_2017, intervals, style="fixed",
fixedBreaks=c(seq(5, 40, 5)))
color.table<-findColours(classes, colors)
plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Percent of population under age 18 below poverty level")
We compute the spatial weights matrix – contiguity matrix and present the spatial relationships in our dataset by plotting a neighbourhood.
cont.nb<-poly2nb(as(map, "SpatialPolygons"))
cont.listw<-nb2listw(cont.nb, style="W")
cont.listw
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 58
## Number of nonzero links: 288
## Percentage nonzero weights: 8.561237
## Average number of links: 4.965517
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 58 3364 58 24.65282 236.1432
crds<-coordinates(map)
colnames(crds)<-c("cx", "cy")
plot(map) # contour map
plot(cont.nb, crds, add=TRUE)
We measure the Moran’s I to measure a global spatial autocorrelation to get more information about the spatial patterns that appear in the data.
The scatterplot illustrates the relationship between values at each location and the average value for the neighbours. We can see that the value of bachelors vaiable in 2017 in a county is associated with the average valus of it’s neighbouring counties.
#### Morans Scatterplot ####
# preparing the data
x<-counties_CA$bachelors_2017 # selecting variable
zx<-as.data.frame(scale(x)) #standardization
mp <- moran.plot(zx$V1, cont.listw, pch=19, labels=as.character(counties_CA$county))
if (require(ggplot2, quietly=TRUE)) {
xname <- attr(mp, "xname")
ggplot(mp, aes(x=x, y=wx)) + geom_point(shape=1) +
geom_smooth(formula=y ~ x, method="lm") +
geom_hline(yintercept=mean(mp$wx), lty=2) +
geom_vline(xintercept=mean(mp$x), lty=2) + theme_minimal() +
geom_point(data=mp[mp$is_inf,], aes(x=x, y=wx), shape=9) +
geom_text(data=mp[mp$is_inf,], aes(x=x, y=wx, label=labels, vjust=1.5)) +
xlab(xname) + ylab(paste0("Spatially lagged ", xname))
}
# checking the average and standard deviation
round(mean(zx$V1),0)
## [1] 0
sd(zx$V1)
## [1] 1
wzx<-lag.listw(cont.listw, zx$V1) # spatial lag of x
To gain more insight about the relationship between the value in a county and value at neighbouring counties, we used the mapping of quadrants.
# Mapping of quadrants
cond1<-ifelse(zx>=0 & wzx>=0, 1,0) # I quarter
cond2<-ifelse(zx>=0 & wzx<0, 2,0) # II quarter
cond3<-ifelse(zx<0 & wzx<0, 3,0) # III quarter
cond4<-ifelse(zx<0 & wzx>=0, 4,0) # IV quarter
cond.all<-cond1+cond2+cond3+cond4
cond<-as.data.frame(cond.all)
is.data.frame(cond)
## [1] TRUE
# map
brks<-c(1,2,3,4)
cols<-brewer.pal(intervals, "Paired") # choice of colors
plot(map, col=cols[findInterval(cond$V1, brks)])
legend("topright", legend=c("I Q - HH – high surrounded by high", "II Q - LH -
low surrounded by high", "III Q - LL - low surrounded by low", "IV Q - HL - high
surrounded by low"), fill=cols, bty="n", cex=1)
title(main="Mapping of Moranscatterplot results poverty")
Estimation of OLS:
colnames(counties)
## [1] "state" "name"
## [3] "fips" "pop2017"
## [5] "median_age_2017" "bachelors_2017"
## [7] "poverty_2017" "poverty_age_under_18_2017"
## [9] "uninsured_2017" "unemployment_rate_2017"
## [11] "per_capita_income_2017" "median_household_income_2017"
## [13] "women_16_to_50_birth_rate_2017" "hs_grad_2017"
## [15] "mean_work_travel_2017" "broadband_2017"
## [17] "persons_per_household_2017"
eq = bachelors_2017 ~ median_household_income_2017 + hs_grad_2017 + poverty_2017
model <- lm(eq, data = counties_CA)
summary(model)
##
## Call:
## lm(formula = eq, data = counties_CA)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9162 -2.9486 0.5072 2.9215 9.7108
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.003e+01 1.287e+01 -6.219 7.59e-08 ***
## median_household_income_2017 6.302e-04 5.608e-05 11.237 9.40e-16 ***
## hs_grad_2017 6.715e-01 1.043e-01 6.435 3.39e-08 ***
## poverty_2017 7.354e-01 2.227e-01 3.303 0.0017 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.643 on 54 degrees of freedom
## Multiple R-squared: 0.8412, Adjusted R-squared: 0.8324
## F-statistic: 95.37 on 3 and 54 DF, p-value: < 2.2e-16
# Moran test – important – justifies using spatial methods
lm.morantest(model, cont.listw) # are residuals spatially random?
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = eq, data = counties_CA)
## weights: cont.listw
##
## Moran I statistic standard deviate = 2.222, p-value = 0.01314
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.164932549 -0.018455127 0.006811478
moran.test(model$residuals, cont.listw)
##
## Moran I test under randomisation
##
## data: model$residuals
## weights: cont.listw
##
## Moran I statistic standard deviate = 2.2283, p-value = 0.01293
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.164932549 -0.017543860 0.006706333
We perform the Moran test to check if we can use the spatial models. The p-value < 5% significance level, so we reject the null hypothesis stating that there is no spatial autocorrelation. This means that the use of spatial models is justified.
Firstly, we estimate the Manski model, however it is rarely the case that this model will be selected at the end, we will rather go for a different one.
# Manski model (full specification)
GNS <- sacsarlm(eq, data = counties_CA, listw = cont.listw, type = 'sacmixed')
summary(GNS)
##
## Call:sacsarlm(formula = eq, data = counties_CA, listw = cont.listw,
## type = "sacmixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.11193 -2.64286 -0.12379 3.17117 8.54747
##
## Type: sacmixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.0038e+02 5.7053e+01 -1.7593 0.07852
## median_household_income_2017 5.8238e-04 5.3732e-05 10.8388 < 2.2e-16
## hs_grad_2017 6.2234e-01 9.7145e-02 6.4063 1.491e-10
## poverty_2017 5.4225e-01 2.2667e-01 2.3922 0.01675
## lag.median_household_income_2017 -5.5358e-05 3.4326e-04 -0.1613 0.87188
## lag.hs_grad_2017 1.1886e-01 4.0181e-01 0.2958 0.76737
## lag.poverty_2017 7.8877e-01 6.9826e-01 1.1296 0.25864
##
## Rho: 0.42776
## Asymptotic standard error: 0.40988
## z-value: 1.0436, p-value: 0.29666
## Lambda: -0.26011
## Asymptotic standard error: 0.57915
## z-value: -0.44912, p-value: 0.65335
##
## LR test value: 10.95, p-value: 0.052381
##
## Log likelihood: -163.803 for sacmixed model
## ML residual variance (sigma squared): 15.702, (sigma: 3.9626)
## Number of observations: 58
## Number of parameters estimated: 10
## AIC: 347.61, (AIC for lm: 348.56)
In the Manski model, we see that the rho is significant and lambda is not significant (p-value > 5% significance level), therefore we go for the model with two components (lambda = 0) - SDM.
# SDM
SDM <-lagsarlm(eq, data=counties_CA, listw=cont.listw, type = 'mixed') # no spatial lags of X
summary(SDM)
##
## Call:lagsarlm(formula = eq, data = counties_CA, listw = cont.listw,
## type = "mixed")
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.3721 -2.6717 -0.1549 3.2571 8.4184
##
## Type: mixed
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1606e+02 3.4220e+01 -3.3915 0.0006952
## median_household_income_2017 5.9417e-04 5.0670e-05 11.7264 < 2.2e-16
## hs_grad_2017 6.4007e-01 9.4160e-02 6.7977 1.063e-11
## poverty_2017 6.0205e-01 2.0278e-01 2.9690 0.0029879
## lag.median_household_income_2017 5.5337e-05 1.8537e-04 0.2985 0.7653026
## lag.hs_grad_2017 2.1206e-01 2.6520e-01 0.7996 0.4239233
## lag.poverty_2017 9.3951e-01 4.7201e-01 1.9904 0.0465416
##
## Rho: 0.2604, LR test value: 1.6538, p-value: 0.19844
## Asymptotic standard error: 0.17326
## z-value: 1.5029, p-value: 0.13286
## Wald statistic: 2.2588, p-value: 0.13286
##
## Log likelihood: -164.0098 for mixed model
## ML residual variance (sigma squared): 16.488, (sigma: 4.0605)
## Number of observations: 58
## Number of parameters estimated: 9
## AIC: 346.02, (AIC for lm: 345.67)
## LM test for residual autocorrelation
## test value: 1.6372, p-value: 0.20071
According to SDM Model, the rho is not statistically significant (p-value > 5% significance level)
# SAR
SAR <- lagsarlm(eq, data=counties_CA, listw=cont.listw) # no spatial lags of X
summary(SAR)
##
## Call:lagsarlm(formula = eq, data = counties_CA, listw = cont.listw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.080012 -2.907337 -0.048908 2.619618 9.401488
##
## Type: lag
## Coefficients: (asymptotic standard errors)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.9176e+01 1.1938e+01 -6.6324 3.304e-11
## median_household_income_2017 6.0631e-04 5.2249e-05 11.6041 < 2.2e-16
## hs_grad_2017 6.2398e-01 9.7505e-02 6.3995 1.559e-10
## poverty_2017 6.1388e-01 2.0763e-01 2.9567 0.00311
##
## Rho: 0.24711, LR test value: 4.2181, p-value: 0.039996
## Asymptotic standard error: 0.11677
## z-value: 2.1162, p-value: 0.034329
## Wald statistic: 4.4782, p-value: 0.034329
##
## Log likelihood: -167.169 for lag model
## ML residual variance (sigma squared): 18.414, (sigma: 4.2912)
## Number of observations: 58
## Number of parameters estimated: 6
## AIC: 346.34, (AIC for lm: 348.56)
## LM test for residual autocorrelation
## test value: 0.34544, p-value: 0.5567
SLX - OLS with thetas only
# an ‘lm’ model augmented with the spatially lagged RHS variables
# RHS variables – right-hand side variables
SLX<-lmSLX(eq, data=counties_CA, listw=cont.listw)
summary(SLX)
##
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))),
## data = as.data.frame(x), weights = weights)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.8670 -2.9426 0.0569 3.4205 8.8876
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.502e+02 2.997e+01 -5.010 6.91e-06 ***
## median_household_income_2017 6.077e-04 5.446e-05 11.159 2.69e-15 ***
## hs_grad_2017 6.627e-01 1.016e-01 6.520 3.11e-08 ***
## poverty_2017 6.703e-01 2.153e-01 3.113 0.00303 **
## lag.median_household_income_2017 2.641e-04 1.466e-04 1.802 0.07748 .
## lag.hs_grad_2017 4.335e-01 2.481e-01 1.747 0.08663 .
## lag.poverty_2017 1.310e+00 4.573e-01 2.865 0.00604 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.425 on 51 degrees of freedom
## Multiple R-squared: 0.8638, Adjusted R-squared: 0.8477
## F-statistic: 53.89 on 6 and 51 DF, p-value: < 2.2e-16
screenreg(list(model, GNS, SDM, SLX, SAR), custom.model.names=c("OLS", "GNS", "SDM", "SLX", "SAR"))
##
## ================================================================================================
## OLS GNS SDM SLX SAR
## ------------------------------------------------------------------------------------------------
## (Intercept) -80.03 *** -100.38 -116.06 *** -150.16 *** -79.18 ***
## (12.87) (57.05) (34.22) (29.97) (11.94)
## median_household_income_2017 0.00 *** 0.00 *** 0.00 *** 0.00 *** 0.00 ***
## (0.00) (0.00) (0.00) (0.00) (0.00)
## hs_grad_2017 0.67 *** 0.62 *** 0.64 *** 0.66 *** 0.62 ***
## (0.10) (0.10) (0.09) (0.10) (0.10)
## poverty_2017 0.74 ** 0.54 * 0.60 ** 0.67 ** 0.61 **
## (0.22) (0.23) (0.20) (0.22) (0.21)
## lag.median_household_income_2017 -0.00 0.00 0.00
## (0.00) (0.00) (0.00)
## lag.hs_grad_2017 0.12 0.21 0.43
## (0.40) (0.27) (0.25)
## lag.poverty_2017 0.79 0.94 * 1.31 **
## (0.70) (0.47) (0.46)
## rho 0.43 0.26 0.25 *
## (0.41) (0.17) (0.12)
## lambda -0.26
## (0.58)
## ------------------------------------------------------------------------------------------------
## R^2 0.84 0.86
## Adj. R^2 0.83 0.85
## Num. obs. 58 58 58 58
## Parameters 10 9 6
## Log Likelihood -163.80 -164.01 -164.84 -167.17
## AIC (Linear model) 348.56 345.67 348.56
## AIC (Spatial model) 347.61 346.02 346.34
## LR test: statistic 10.95 1.65 4.22
## LR test: p-value 0.05 0.20 0.04
## Sigma 4.43
## Statistic 53.89
## P Value 0.00
## DF 6.00
## AIC 345.67
## BIC 362.16
## Deviance 998.84
## DF Resid. 51
## nobs 58
## ================================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05
According to AIC and Log Likelihood, the SAR models appears to be the best one. In all the models besides OLS, the variables hs_grad_2017 and median_household_income_2017 are significant. From the spatial lags, only lag.poverty_2017 is significant for SDM and SLX. In terms of the significance of spatial components we see that only rho component for SAR model is significant.
For SDM:
W.c<-as(as_dgRMatrix_listw(cont.listw), "CsparseMatrix")
trMat<-trW(W.c, type="mult")
SDM_imp<-impacts(SDM, tr=trMat, R=2000)
summary(SDM_imp, zstats=TRUE, short=TRUE)
## Impact measures (mixed, trace):
## Direct Indirect Total
## median_household_income_2017 0.000606837 0.0002713587 0.0008781957
## hs_grad_2017 0.662897417 0.4892570048 1.1521544217
## poverty_2017 0.668140799 1.4161886849 2.0843294843
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
## Direct Indirect Total
## median_household_income_2017 5.189702e-05 0.0002006386 0.0002205986
## hs_grad_2017 9.559194e-02 0.3311415860 0.3755938657
## poverty_2017 2.025549e-01 0.6343398490 0.7059595618
##
## Simulated z-values:
## Direct Indirect Total
## median_household_income_2017 11.653902 1.348657 3.968272
## hs_grad_2017 6.947262 1.482676 3.075337
## poverty_2017 3.281640 2.241049 2.955267
##
## Simulated p-values:
## Direct Indirect Total
## median_household_income_2017 < 2.22e-16 0.177447 7.2396e-05
## hs_grad_2017 3.7244e-12 0.138161 0.0021026
## poverty_2017 0.0010321 0.025023 0.0031240
To get the direct and spillover effect for SDM we need to compute the coefficients:
# extracting direct & total impacts
a<-SDM_imp$res$direct
b<-SDM_imp$res$total
a/b
## median_household_income_2017 hs_grad_2017
## 0.6910043 0.5753547
## poverty_2017
## 0.3205543
We can see that there aren’t any indirect effects, because for all variables the p-values are above the significance level.
For SLX:
W.c<-as(as_dgRMatrix_listw(cont.listw), "CsparseMatrix")
trMat<-trW(W.c, type="mult")
SLX_imp<-impacts(SLX, tr=trMat, R=2000)
summary(SLX_imp, zstats=TRUE, short=TRUE)
## Impact measures (SlX, estimable, n-k):
## Direct Indirect Total
## median_household_income_2017 0.0006076923 0.0002640818 0.0008717741
## hs_grad_2017 0.6627366864 0.4335430283 1.0962797147
## poverty_2017 0.6702821460 1.3102431635 1.9805253095
## ========================================================
## Standard errors:
## Direct Indirect Total
## median_household_income_2017 5.445968e-05 0.0001465605 0.0001538927
## hs_grad_2017 1.016390e-01 0.2481441345 0.2720093427
## poverty_2017 2.152945e-01 0.4572845481 0.4871987085
## ========================================================
## Z-values:
## Direct Indirect Total
## median_household_income_2017 11.158574 1.801862 5.664818
## hs_grad_2017 6.520494 1.747142 4.030302
## poverty_2017 3.113327 2.865269 4.065128
##
## p-values:
## Direct Indirect Total
## median_household_income_2017 < 2.22e-16 0.0715672 1.4718e-08
## hs_grad_2017 7.0076e-11 0.0806127 5.5705e-05
## poverty_2017 0.0018499 0.0041666 4.8006e-05
We can only interpret the indirect effects for poverty_2017 variable as it statistically significant. The interpretation can be done without the recomputing of coefficients.