The Datset information

The dataset source: https://www.openintro.org/data/?data=county_complete. The data has been collected for 3142 counties in the United States. It is a dataframe consisting of 3142 observations and 188 variables. We will be estimating models for bachelors_2017 dependent variable in the State of California.

setwd('C:\\Users\\HP\\OneDrive\\Pulpit\\DATA SCIENCE\\YEAR II\\Spatial Econometrics in R')

Shapefile

# reading maps with rgdal::
map <- readOGR(dsn="CA_Counties_TIGER2016.shp")
plot(map)

Loading non-spatial data

#Reading data from csv file
counties <- read.csv('county_complete.csv')
head(counties)

##   fips   state           name pop2000 pop2010 pop2011 pop2012 pop2013 pop2014
## 1 1001 Alabama Autauga County   43671   54571   55199   54927   54695   54864
## 2 1003 Alabama Baldwin County  140415  182265  186534  190048  194736  199064
## 3 1005 Alabama Barbour County   29038   27457   27351   27175   26947   26749
## 4 1007 Alabama    Bibb County   20826   22915   22745   22658   22503   22533
## 5 1009 Alabama  Blount County   51024   57322   57562   57595   57623   57546
## 6 1011 Alabama Bullock County   11714   10914   10675   10612   10549   10673
##   pop2015 pop2016 pop2017 age_under_5_2010 age_under_5_2017 age_under_18_2010
## 1   54838   55278   55504              6.6              5.7              26.8
## 2  202863  207509  212628              6.1              5.7              23.0
## 3   26264   25774   25270              6.2              5.5              21.9
## 4   22561   22633   22668              6.0              5.7              22.7
## 5   57590   57562   58013              6.3              6.1              24.6
## 6   10419   10441   10309              6.8              5.8              22.3
##   age_over_65_2010 age_over_65_2017 median_age_2017 female_2010 white_2010
## 1             12.0             14.3            37.8        51.3       78.5
## 2             16.8             19.0            42.6        51.1       85.7
## 3             14.2             17.4            39.7        46.9       48.0
## 4             12.7             15.1            39.8        46.3       75.8
## 5             14.7             17.4            40.9        50.5       92.6
## 6             13.5             15.2            40.8        45.8       23.0
##   black_2010 black_2017 native_2010 native_2017 asian_2010 asian_2017
## 1       17.7       9.55         0.4        0.15        0.9       0.47
## 2        9.4       4.77         0.7        0.41        0.7       0.35
## 3       46.9      24.02         0.4        0.10        0.4       0.31
## 4       22.0      11.03         0.3        0.18        0.1       0.00
## 5        1.3       0.79         0.5        0.18        0.2       0.07
## 6       70.2      37.82         0.2        0.52        0.2       0.35
##   pac_isl_2010 pac_isl_2017 other_single_race_2017 two_plus_races_2010
## 1           NA         0.04                   0.65                 1.6
## 2           NA         0.00                   0.39                 1.5
## 3           NA         0.00                   1.87                 0.9
## 4           NA         0.00                   0.02                 0.9
## 5           NA         0.00                   0.37                 1.2
## 6           NA         0.00                   0.01                 0.8
##   two_plus_races_2017 hispanic_2010 hispanic_2017 white_not_hispanic_2010
## 1                0.84           2.4          2.67                    77.2
## 2                0.82           4.4          4.44                    83.5
## 3                0.41           5.1          4.21                    46.8
## 4                0.42           1.8          2.35                    75.0
## 5                0.85           8.1          9.01                    88.9
## 6                0.33           7.1          0.33                    21.9
##   white_not_hispanic_2017 speak_english_only_2017 no_move_in_one_plus_year_2010
## 1                   75.42                    96.2                          86.3
## 2                   83.08                    94.5                          83.0
## 3                   45.74                    94.3                          83.0
## 4                   74.62                    97.8                          90.5
## 5                   87.37                    92.3                          87.2
## 6                   21.61                    97.2                          88.5
##   foreign_born_2010 foreign_spoken_at_home_2010 women_16_to_50_birth_rate_2017
## 1               2.0                         3.7                            7.4
## 2               3.6                         5.5                            5.1
## 3               2.8                         4.7                            7.2
## 4               0.7                         1.5                            7.6
## 5               4.7                         7.2                            5.6
## 6               1.1                         3.8                            3.5
##   hs_grad_2010 hs_grad_2016 hs_grad_2017 some_college_2016 some_college_2017
## 1         85.3         87.6         87.7              28.7              29.1
## 2         87.6         90.0         90.2              31.8              31.6
## 3         71.9         73.8         73.1              26.0              25.5
## 4         74.5         80.7         82.1              26.9              25.0
## 5         74.7         80.0         79.8              34.0              34.4
## 6         74.7         66.6         71.4              22.2              21.3
##   bachelors_2010 bachelors_2016 bachelors_2017 veterans_2010 veterans_2017
## 1           21.7           24.6           25.0          5817          12.6
## 2           26.8           29.5           30.7         20396          11.9
## 3           13.5           12.9           12.0          2327           8.0
## 4           10.0           12.0           13.2          1883           7.4
## 5           12.5           13.1           13.1          4072           9.6
## 6           12.0           10.3           13.4           943           4.5
##   mean_work_travel_2010 mean_work_travel_2017 broadband_2017 computer_2017
## 1                  25.1                  25.8           76.6          86.2
## 2                  25.8                  27.0           74.5          86.9
## 3                  23.8                  23.4           57.2          73.4
## 4                  28.3                  30.0           62.0          74.8
## 5                  33.2                  35.0           65.8          78.2
## 6                  28.1                  29.8           49.4          64.2
##   housing_units_2010 homeownership_2010 housing_multi_unit_2010
## 1              22135               77.5                     7.2
## 2             104061               76.7                    22.6
## 3              11829               68.0                    11.1
## 4               8981               82.9                     6.6
## 5              23887               82.0                     3.7
## 6               4493               76.9                     9.9
##   median_val_owner_occupied_2010 households_2010 households_2017
## 1                         133900           19718           21054
## 2                         177200           69476           76133
## 3                          88200            9795            9191
## 4                          81200            7441            6916
## 5                         113700           20605           20690
## 6                          66300            3732            3670
##   persons_per_household_2010 persons_per_household_2017 per_capita_income_2010
## 1                       2.70                       2.59                  24568
## 2                       2.50                       2.63                  26469
## 3                       2.52                       2.54                  15875
## 4                       3.02                       2.97                  19918
## 5                       2.73                       2.76                  21070
## 6                       2.85                       2.74                  20289
##   per_capita_income_2017 metro_2013 median_household_income_2010
## 1               27841.70          1                        53255
## 2               27779.85          1                        50147
## 3               17891.73          0                        33219
## 4               20572.05          1                        41770
## 5               21367.39          1                        45549
## 6               15444.16          0                        31602
##   median_household_income_2016 median_household_income_2017
## 1                        54487                        55317
## 2                        56460                        52562
## 3                        32884                        33368
## 4                        43079                        43404
## 5                        47213                        47412
## 6                        34278                        29655
##   private_nonfarm_establishments_2009 private_nonfarm_employment_2009
## 1                                 877                           10628
## 2                                4812                           52233
## 3                                 522                            7990
## 4                                 318                            2927
## 5                                 749                            6968
## 6                                 120                            1919
##   percent_change_private_nonfarm_employment_2009
## 1                                           16.6
## 2                                           17.4
## 3                                          -27.0
## 4                                          -14.0
## 5                                          -11.4
## 6                                          -18.5
##   nonemployment_establishments_2009 firms_2007 black_owned_firms_2007
## 1                              2971       4067                   15.2
## 2                             14175      19035                    2.7
## 3                              1527       1667                     NA
## 4                              1192       1385                   14.9
## 5                              3501       4458                     NA
## 6                               390        417                     NA
##   native_owned_firms_2007 asian_owned_firms_2007 pac_isl_owned_firms_2007
## 1                      NA                    1.3                       NA
## 2                     0.4                    1.0                       NA
## 3                      NA                     NA                       NA
## 4                      NA                     NA                       NA
## 5                      NA                     NA                       NA
## 6                      NA                     NA                       NA
##   hispanic_owned_firms_2007 women_owned_firms_2007 manufacturer_shipments_2007
## 1                       0.7                   31.7                          NA
## 2                       1.3                   27.3                     1410273
## 3                        NA                   27.0                          NA
## 4                        NA                     NA                           0
## 5                        NA                   23.2                      341544
## 6                        NA                   38.8                          NA
##   mercent_whole_sales_2007 sales_2007 sales_per_capita_2007
## 1                       NA     598175                 12003
## 2                       NA    2966489                 17166
## 3                       NA     188337                  6334
## 4                       NA     124707                  5804
## 5                       NA     319700                  5622
## 6                       NA      43810                  3995
##   accommodation_food_service_2007 building_permits_2010 fed_spending_2009
## 1                           88157                   191            331142
## 2                          436955                   696           1119082
## 3                              NA                    10            240308
## 4                           10757                     8            163201
## 5                           20941                    18            294114
## 6                            3670                     1            108846
##   area_2010 density_2010 smoking_ban_2010 poverty_2010 poverty_2016
## 1    594.44         91.8             none         10.6         13.5
## 2   1589.78        114.6             none         12.2         11.7
## 3    884.88         31.0          partial         25.0         29.9
## 4    622.58         36.8             none         12.6         20.1
## 5    644.78         88.9             none         13.4         14.1
## 6    622.81         17.5             none         25.3         32.6
##   poverty_2017 poverty_age_under_5_2017 poverty_age_under_18_2017
## 1         13.7                     17.2                      20.0
## 2         11.8                     19.4                      15.9
## 3         27.2                     56.8                      44.9
## 4         15.2                     21.6                      25.9
## 5         15.6                     29.5                      25.3
## 6         28.5                     59.7                      50.2
##   civilian_labor_force_2007 employed_2007 unemployed_2007
## 1                     24383         23577             806
## 2                     82659         80099            2560
## 3                     10334          9684             650
## 4                      8791          8432             359
## 5                     26629         25780             849
## 6                      3653          3308             345
##   unemployment_rate_2007 civilian_labor_force_2008 employed_2008
## 1                   3.31                     24687         23420
## 2                   3.10                     83223         79372
## 3                   6.29                     10161          9267
## 4                   4.08                      8749          8241
## 5                   3.19                     26698         25453
## 6                   9.44                      3634          3251
##   unemployed_2008 unemployment_rate_2008 civilian_labor_force_2009
## 1            1267                   5.13                     24703
## 2            3851                   4.63                     82451
## 3             894                   8.80                     10003
## 4             508                   5.81                      8742
## 5            1245                   4.66                     26480
## 6             383                  10.54                      3739
##   employed_2009 unemployed_2009 unemployment_rate_2009
## 1         22301            2402                   9.72
## 2         74403            8048                   9.76
## 3          8572            1431                  14.31
## 4          7581            1161                  13.28
## 5         23832            2648                  10.00
## 6          3156             583                  15.59
##   civilian_labor_force_2010 employed_2010 unemployed_2010
## 1                     25713         23431            2282
## 2                     83459         75120            8339
## 3                     10221          8959            1262
## 4                      8934          7914            1020
## 5                     24906         22460            2446
## 6                      4941          4356             585
##   unemployment_rate_2010 civilian_labor_force_2011 employed_2011
## 1                   8.87                     25836         23677
## 2                   9.99                     85045         77418
## 3                  12.35                      9849          8712
## 4                  11.42                      8933          7996
## 5                   9.82                     25123         22939
## 6                  11.84                      4833          4272
##   unemployed_2011 unemployment_rate_2011 civilian_labor_force_2012
## 1            2159                   8.36                     25740
## 2            7627                   8.97                     84414
## 3            1137                  11.54                      9362
## 4             937                  10.49                      8798
## 5            2184                   8.69                     24960
## 6             561                  11.61                      4736
##   employed_2012 unemployed_2012 unemployment_rate_2012
## 1         23961            1779                   6.91
## 2         78065            6349                   7.52
## 3          8283            1079                  11.53
## 4          8047             751                   8.54
## 5         23244            1716                   6.88
## 6          4245             491                  10.37
##   civilian_labor_force_2013 employed_2013 unemployed_2013
## 1                     25810         24205            1605
## 2                     85280         79626            5654
## 3                      9099          8168             931
## 4                      8705          8016             689
## 5                     24887         23325            1562
## 6                      4778          4331             447
##   unemployment_rate_2013 civilian_labor_force_2014 employed_2014
## 1                   6.22                     25602         24107
## 2                   6.63                     86415         81115
## 3                  10.23                      8848          7916
## 4                   7.91                      8562          7945
## 5                   6.28                     24535         23032
## 6                   9.36                      4733          4315
##   unemployed_2014 unemployment_rate_2014 civilian_labor_force_2015
## 1            1495                   5.84                     25602
## 2            5300                   6.13                     87705
## 3             932                  10.53                      8609
## 4             617                   7.21                      8572
## 5            1503                   6.13                     24473
## 6             418                   8.83                      4777
##   employed_2015 unemployed_2015 unemployment_rate_2015
## 1         24272            1330                   5.19
## 2         82843            4862                   5.54
## 3          7844             765                   8.89
## 4          8005             567                   6.61
## 5         23152            1321                   5.40
## 6          4398             379                   7.93
##   civilian_labor_force_2016 employed_2016 unemployed_2016
## 1                     25918         24593            1325
## 2                     90500         85656            4844
## 3                      8402          7700             702
## 4                      8607          8050             557
## 5                     24576         23248            1328
## 6                      4824          4493             331
##   unemployment_rate_2016 uninsured_2017 uninsured_age_under_6_2017
## 1                   5.11            8.8                        1.1
## 2                   5.35           10.8                        2.4
## 3                   8.36           12.3                        4.1
## 4                   6.47            8.1                        2.7
## 5                   5.40           11.0                        3.9
## 6                   6.86           14.1                          0
##   uninsured_age_under_19_2017 uninsured_age_over_74_2017
## 1                         3.2                        0.0
## 2                         3.5                        0.4
## 3                         4.6                        0.0
## 4                         1.8                        0.0
## 5                         5.7                        0.2
## 6                         1.2                        0.0
##   civilian_labor_force_2017 employed_2017 unemployed_2017
## 1                     25909         24908            1001
## 2                     91567         87915            3652
## 3                      8236          7750             486
## 4                      8506          8133             373
## 5                     24494         23509             985
## 6                      4812          4575             237
##   unemployment_rate_2017 age_over_18_2019 age_over_65_2019 age_over_85_2019
## 1                   3.86             76.2             15.0              1.6
## 2                   3.99             78.3             20.0              1.9
## 3                   5.90             79.1             18.6              1.6
## 4                   4.39             79.4             15.9              2.0
## 5                   4.02             76.8             17.9              1.8
## 6                   4.93             79.2             16.0              1.7
##   age_under_5_2019 asian_2019 avg_family_size_2019 bachelors_2019 black_2019
## 1              5.8        1.0                 3.09           26.6       19.0
## 2              5.5        0.9                 3.24           31.9        9.3
## 3              5.3        0.5                 3.01           11.6       47.6
## 4              5.8        0.1                 3.74           10.4       22.3
## 5              5.9        0.4                 3.33           13.1        1.6
## 6              5.3        0.5                 3.30           12.1       74.8
##   hispanic_2019 household_has_broadband_2019 household_has_computer_2019
## 1           2.8                         80.6                        73.0
## 2           4.6                         81.8                        76.3
## 3           4.4                         60.5                        51.9
## 4           2.6                         69.2                        54.7
## 5           9.3                         73.0                        63.5
## 6           2.6                         60.1                        52.5
##   household_has_smartphone_2019 households_2019
## 1                          78.4           21397
## 2                          81.7           80930
## 3                          64.2            9345
## 4                          66.6            6891
## 5                          70.1           20847
## 6                          70.6            3521
##   households_speak_asian_or_pac_isl_2019 households_speak_limited_english_2019
## 1                                    1.8                                   0.7
## 2                                    0.6                                   1.2
## 3                                    0.6                                   1.6
## 4                                    0.0                                   0.6
## 5                                    0.1                                   1.8
## 6                                    0.4                                   1.4
##   households_speak_other_2019 households_speak_other_indo_euro_lang_2019
## 1                         0.2                                        0.3
## 2                         0.0                                        1.8
## 3                         0.0                                        1.1
## 4                         0.0                                        0.5
## 5                         0.2                                        0.9
## 6                         0.0                                        1.6
##   households_speak_spanish_2019 housing_mobile_homes_2019
## 1                           2.9                      26.7
## 2                           4.6                      24.8
## 3                           5.2                      39.1
## 4                           1.9                      25.6
## 5                           6.6                      21.2
## 6                           3.6                      28.9
##   housing_one_unit_structures_2019 housing_two_unit_structures_2019
## 1                             17.3                             73.3
## 2                             11.5                             75.2
## 3                             26.1                             60.9
## 4                             29.7                             74.4
## 5                             24.0                             78.8
## 6                             39.3                             71.1
##   hs_grad_2019 mean_household_income_2019 mean_work_travel_2019 median_age_2019
## 1         88.5                      75326                  24.4            38.2
## 2         90.8                      80986                    NA            43.0
## 3         73.2                      47068                    NA            40.4
## 4         79.1                      60182                    NA            40.9
## 5         80.5                      65639                    NA            40.7
## 6         74.7                      48571                  27.2            40.2
##   median_household_income_2019 median_individual_income_2019
## 1                        58731                         29725
## 2                        58320                         29802
## 3                        32525                         17963
## 4                        47542                         21958
## 5                        49358                         26976
## 6                        37785                         21571
##   median_individual_income_age_25plus_2019 native_2019 other_single_race_2019
## 1                                    40778         0.3                    0.7
## 2                                    37897         0.8                    1.1
## 3                                    27434         0.3                    3.6
## 4                                    28789         0.1                    0.0
## 5                                    39004         0.1                    0.9
## 6                                    29516         0.0                    2.0
##   pac_isl_2019 per_capita_income_2019 persons_per_household_2019 pop_2019
## 1            0                  29819                       2.56    55380
## 2            0                  32626                       2.59   212830
## 3            0                  18473                       2.41    25361
## 4            0                  20778                       2.99    22493
## 5            0                  24747                       2.74    57681
## 6            0                  20877                       2.79    10248
##   poverty_2019 poverty_65_and_over_2019 poverty_under_18_2019
## 1         15.2                      8.7                  23.2
## 2         10.4                      7.4                  13.4
## 3         30.7                     16.8                  50.1
## 4           NA                       NA                    NA
## 5         13.6                     10.9                  18.4
## 6           NA                       NA                    NA
##   two_plus_races_2019 unemployment_rate_2019 uninsured_2019
## 1                 2.2                    3.5            7.1
## 2                 1.7                    4.0            8.9
## 3                 1.2                    9.4           11.3
## 4                 0.6                    7.0           10.7
## 5                 1.6                    3.1           10.8
## 6                 0.8                    4.1           11.4
##   uninsured_65_and_older_2019 uninsured_under_19_2019 uninsured_under_6_2019
## 1                         0.0                     1.7                    1.7
## 2                         0.3                     3.8                    2.2
## 3                         0.3                     3.3                    3.4
## 4                         0.0                     2.0                    4.5
## 5                         0.2                     5.9                    6.1
## 6                         0.0                     1.0                    2.1
##   veterans_2019 white_2019 white_not_hispanic_2019
## 1          12.6       76.8                    74.6
## 2          11.8       86.2                    83.1
## 3           6.6       46.8                    45.8
## 4           8.0       76.8                    74.5
## 5           7.7       95.5                    86.9
## 6           3.3       21.9                    21.4

#Variable selection, we will be using 17 variables out of 188. Below we also filter data only for California.

# Filter DF for desired columns only
counties <- counties[, c('state', 'name', 'fips',
                         'pop2017', 'median_age_2017', 'bachelors_2017',
                         'poverty_2017', 'poverty_age_under_18_2017',
                         'uninsured_2017', 'unemployment_rate_2017',
                         'per_capita_income_2017', 'median_household_income_2017',
                         'women_16_to_50_birth_rate_2017', 'hs_grad_2017',
                         'mean_work_travel_2017', 'broadband_2017', 
                         'persons_per_household_2017')]

counties_CA <- counties[counties$state == 'California',]
nrow(counties_CA)

## [1] 58

Visualization

# Bachelor's degree
variable <- counties_CA$bachelors_2017
summary(variable)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.40   16.90   23.30   26.42   33.67   57.50

intervals<-5
colors<-brewer.pal(intervals, "BuPu") # choice of colors
classes<-classIntervals(variable, intervals, style="fixed", 
                        fixedBreaks=c(seq(10, 60, 10)))
color.table<-findColours(classes, colors) 
plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
       fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Percent of population that earned a bachelor's degree")

# Median household income
summary(counties_CA$median_household_income_2017)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   36563   47496   56819   61047   71474  106761

intervals<-5
colors<-brewer.pal(intervals, "OrRd") # choice of colors
classes<-classIntervals(counties_CA$median_household_income_2017, intervals, style="fixed", 
                        fixedBreaks=c(seq(35000, 110000, 15000)))
color.table<-findColours(classes, colors) 

plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
       fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Median household income")

# Percent of population under age 18 below poverty level
summary(counties_CA$poverty_age_under_18_2017)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.80   12.85   19.25   19.59   24.43   36.20

intervals<-7
colors<-brewer.pal(intervals, "OrRd") # choice of colors
classes<-classIntervals(counties_CA$poverty_age_under_18_2017, intervals, style="fixed", 
                        fixedBreaks=c(seq(5, 40, 5)))
color.table<-findColours(classes, colors) 

plot(map, col = color.table)
legend('bottomleft', legend=names(attr(color.table, "table")),
       fill=attr(color.table, "palette"), cex=0.6, bty="n")
title(main="Percent of population under age 18 below poverty level")

Spatial weights matrix

We compute the spatial weights matrix – contiguity matrix and present the spatial relationships in our dataset by plotting a neighbourhood.

cont.nb<-poly2nb(as(map, "SpatialPolygons"))
cont.listw<-nb2listw(cont.nb, style="W")
cont.listw

## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 58 
## Number of nonzero links: 288 
## Percentage nonzero weights: 8.561237 
## Average number of links: 4.965517 
## 
## Weights style: W 
## Weights constants summary:
##    n   nn S0       S1       S2
## W 58 3364 58 24.65282 236.1432

crds<-coordinates(map)
colnames(crds)<-c("cx", "cy")

plot(map) # contour map
plot(cont.nb, crds, add=TRUE)

Moran’s scatterplot

We measure the Moran’s I to measure a global spatial autocorrelation to get more information about the spatial patterns that appear in the data.

The scatterplot illustrates the relationship between values at each location and the average value for the neighbours. We can see that the value of bachelors vaiable in 2017 in a county is associated with the average valus of it’s neighbouring counties.

#### Morans Scatterplot ####
# preparing the data
x<-counties_CA$bachelors_2017 # selecting variable
zx<-as.data.frame(scale(x)) #standardization

mp <- moran.plot(zx$V1, cont.listw, pch=19, labels=as.character(counties_CA$county))

if (require(ggplot2, quietly=TRUE)) {
  xname <- attr(mp, "xname")
  ggplot(mp, aes(x=x, y=wx)) + geom_point(shape=1) + 
    geom_smooth(formula=y ~ x, method="lm") + 
    geom_hline(yintercept=mean(mp$wx), lty=2) + 
    geom_vline(xintercept=mean(mp$x), lty=2) + theme_minimal() + 
    geom_point(data=mp[mp$is_inf,], aes(x=x, y=wx), shape=9) +
    geom_text(data=mp[mp$is_inf,], aes(x=x, y=wx, label=labels, vjust=1.5)) +
    xlab(xname) + ylab(paste0("Spatially lagged ", xname))
}

# checking the average and standard deviation
round(mean(zx$V1),0)

## [1] 0

sd(zx$V1)

## [1] 1

wzx<-lag.listw(cont.listw, zx$V1) # spatial lag of x

To gain more insight about the relationship between the value in a county and value at neighbouring counties, we used the mapping of quadrants.

# Mapping of quadrants
cond1<-ifelse(zx>=0 & wzx>=0, 1,0) # I quarter
cond2<-ifelse(zx>=0 & wzx<0, 2,0) # II quarter
cond3<-ifelse(zx<0 & wzx<0, 3,0) # III quarter
cond4<-ifelse(zx<0 & wzx>=0, 4,0) # IV quarter

cond.all<-cond1+cond2+cond3+cond4 
cond<-as.data.frame(cond.all)
is.data.frame(cond)

## [1] TRUE

# map
brks<-c(1,2,3,4)
cols<-brewer.pal(intervals, "Paired") # choice of colors
plot(map, col=cols[findInterval(cond$V1, brks)])

legend("topright", legend=c("I Q - HH – high surrounded by high", "II Q - LH -
low surrounded by high", "III Q - LL - low surrounded by low", "IV Q - HL - high
surrounded by low"), fill=cols, bty="n", cex=1)
title(main="Mapping of Moranscatterplot results poverty")

Modelling

Estimation of OLS:

colnames(counties)

##  [1] "state"                          "name"                          
##  [3] "fips"                           "pop2017"                       
##  [5] "median_age_2017"                "bachelors_2017"                
##  [7] "poverty_2017"                   "poverty_age_under_18_2017"     
##  [9] "uninsured_2017"                 "unemployment_rate_2017"        
## [11] "per_capita_income_2017"         "median_household_income_2017"  
## [13] "women_16_to_50_birth_rate_2017" "hs_grad_2017"                  
## [15] "mean_work_travel_2017"          "broadband_2017"                
## [17] "persons_per_household_2017"

eq = bachelors_2017 ~ median_household_income_2017 + hs_grad_2017 + poverty_2017

model <- lm(eq, data = counties_CA)

summary(model)

## 
## Call:
## lm(formula = eq, data = counties_CA)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9162  -2.9486   0.5072   2.9215   9.7108 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -8.003e+01  1.287e+01  -6.219 7.59e-08 ***
## median_household_income_2017  6.302e-04  5.608e-05  11.237 9.40e-16 ***
## hs_grad_2017                  6.715e-01  1.043e-01   6.435 3.39e-08 ***
## poverty_2017                  7.354e-01  2.227e-01   3.303   0.0017 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.643 on 54 degrees of freedom
## Multiple R-squared:  0.8412, Adjusted R-squared:  0.8324 
## F-statistic: 95.37 on 3 and 54 DF,  p-value: < 2.2e-16

# Moran test – important – justifies using spatial methods
lm.morantest(model, cont.listw) # are residuals spatially random?

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = eq, data = counties_CA)
## weights: cont.listw
## 
## Moran I statistic standard deviate = 2.222, p-value = 0.01314
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##      0.164932549     -0.018455127      0.006811478

moran.test(model$residuals, cont.listw)

## 
##  Moran I test under randomisation
## 
## data:  model$residuals  
## weights: cont.listw    
## 
## Moran I statistic standard deviate = 2.2283, p-value = 0.01293
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.164932549      -0.017543860       0.006706333

We perform the Moran test to check if we can use the spatial models. The p-value < 5% significance level, so we reject the null hypothesis stating that there is no spatial autocorrelation. This means that the use of spatial models is justified.

Firstly, we estimate the Manski model, however it is rarely the case that this model will be selected at the end, we will rather go for a different one.

# Manski model (full specification)
GNS <- sacsarlm(eq, data = counties_CA, listw = cont.listw, type = 'sacmixed')
summary(GNS)

## 
## Call:sacsarlm(formula = eq, data = counties_CA, listw = cont.listw, 
##     type = "sacmixed")
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -10.11193  -2.64286  -0.12379   3.17117   8.54747 
## 
## Type: sacmixed 
## Coefficients: (asymptotic standard errors) 
##                                     Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)                      -1.0038e+02  5.7053e+01 -1.7593   0.07852
## median_household_income_2017      5.8238e-04  5.3732e-05 10.8388 < 2.2e-16
## hs_grad_2017                      6.2234e-01  9.7145e-02  6.4063 1.491e-10
## poverty_2017                      5.4225e-01  2.2667e-01  2.3922   0.01675
## lag.median_household_income_2017 -5.5358e-05  3.4326e-04 -0.1613   0.87188
## lag.hs_grad_2017                  1.1886e-01  4.0181e-01  0.2958   0.76737
## lag.poverty_2017                  7.8877e-01  6.9826e-01  1.1296   0.25864
## 
## Rho: 0.42776
## Asymptotic standard error: 0.40988
##     z-value: 1.0436, p-value: 0.29666
## Lambda: -0.26011
## Asymptotic standard error: 0.57915
##     z-value: -0.44912, p-value: 0.65335
## 
## LR test value: 10.95, p-value: 0.052381
## 
## Log likelihood: -163.803 for sacmixed model
## ML residual variance (sigma squared): 15.702, (sigma: 3.9626)
## Number of observations: 58 
## Number of parameters estimated: 10 
## AIC: 347.61, (AIC for lm: 348.56)

In the Manski model, we see that the rho is significant and lambda is not significant (p-value > 5% significance level), therefore we go for the model with two components (lambda = 0) - SDM.

# SDM
SDM <-lagsarlm(eq, data=counties_CA, listw=cont.listw, type = 'mixed') # no spatial lags of X
summary(SDM)

## 
## Call:lagsarlm(formula = eq, data = counties_CA, listw = cont.listw, 
##     type = "mixed")
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3721  -2.6717  -0.1549   3.2571   8.4184 
## 
## Type: mixed 
## Coefficients: (asymptotic standard errors) 
##                                     Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)                      -1.1606e+02  3.4220e+01 -3.3915 0.0006952
## median_household_income_2017      5.9417e-04  5.0670e-05 11.7264 < 2.2e-16
## hs_grad_2017                      6.4007e-01  9.4160e-02  6.7977 1.063e-11
## poverty_2017                      6.0205e-01  2.0278e-01  2.9690 0.0029879
## lag.median_household_income_2017  5.5337e-05  1.8537e-04  0.2985 0.7653026
## lag.hs_grad_2017                  2.1206e-01  2.6520e-01  0.7996 0.4239233
## lag.poverty_2017                  9.3951e-01  4.7201e-01  1.9904 0.0465416
## 
## Rho: 0.2604, LR test value: 1.6538, p-value: 0.19844
## Asymptotic standard error: 0.17326
##     z-value: 1.5029, p-value: 0.13286
## Wald statistic: 2.2588, p-value: 0.13286
## 
## Log likelihood: -164.0098 for mixed model
## ML residual variance (sigma squared): 16.488, (sigma: 4.0605)
## Number of observations: 58 
## Number of parameters estimated: 9 
## AIC: 346.02, (AIC for lm: 345.67)
## LM test for residual autocorrelation
## test value: 1.6372, p-value: 0.20071

According to SDM Model, the rho is not statistically significant (p-value > 5% significance level)

# SAR
SAR <- lagsarlm(eq, data=counties_CA, listw=cont.listw) # no spatial lags of X
summary(SAR)

## 
## Call:lagsarlm(formula = eq, data = counties_CA, listw = cont.listw)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -13.080012  -2.907337  -0.048908   2.619618   9.401488 
## 
## Type: lag 
## Coefficients: (asymptotic standard errors) 
##                                 Estimate  Std. Error z value  Pr(>|z|)
## (Intercept)                  -7.9176e+01  1.1938e+01 -6.6324 3.304e-11
## median_household_income_2017  6.0631e-04  5.2249e-05 11.6041 < 2.2e-16
## hs_grad_2017                  6.2398e-01  9.7505e-02  6.3995 1.559e-10
## poverty_2017                  6.1388e-01  2.0763e-01  2.9567   0.00311
## 
## Rho: 0.24711, LR test value: 4.2181, p-value: 0.039996
## Asymptotic standard error: 0.11677
##     z-value: 2.1162, p-value: 0.034329
## Wald statistic: 4.4782, p-value: 0.034329
## 
## Log likelihood: -167.169 for lag model
## ML residual variance (sigma squared): 18.414, (sigma: 4.2912)
## Number of observations: 58 
## Number of parameters estimated: 6 
## AIC: 346.34, (AIC for lm: 348.56)
## LM test for residual autocorrelation
## test value: 0.34544, p-value: 0.5567

SLX - OLS with thetas only

# an ‘lm’ model augmented with the spatially lagged RHS variables
# RHS variables – right-hand side variables
SLX<-lmSLX(eq, data=counties_CA, listw=cont.listw)
summary(SLX)

## 
## Call:
## lm(formula = formula(paste("y ~ ", paste(colnames(x)[-1], collapse = "+"))), 
##     data = as.data.frame(x), weights = weights)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8670  -2.9426   0.0569   3.4205   8.8876 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      -1.502e+02  2.997e+01  -5.010 6.91e-06 ***
## median_household_income_2017      6.077e-04  5.446e-05  11.159 2.69e-15 ***
## hs_grad_2017                      6.627e-01  1.016e-01   6.520 3.11e-08 ***
## poverty_2017                      6.703e-01  2.153e-01   3.113  0.00303 ** 
## lag.median_household_income_2017  2.641e-04  1.466e-04   1.802  0.07748 .  
## lag.hs_grad_2017                  4.335e-01  2.481e-01   1.747  0.08663 .  
## lag.poverty_2017                  1.310e+00  4.573e-01   2.865  0.00604 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.425 on 51 degrees of freedom
## Multiple R-squared:  0.8638, Adjusted R-squared:  0.8477 
## F-statistic: 53.89 on 6 and 51 DF,  p-value: < 2.2e-16

screenreg(list(model, GNS, SDM, SLX, SAR), custom.model.names=c("OLS", "GNS", "SDM", "SLX", "SAR"))

## 
## ================================================================================================
##                                   OLS         GNS          SDM          SLX          SAR        
## ------------------------------------------------------------------------------------------------
## (Intercept)                       -80.03 ***  -100.38      -116.06 ***  -150.16 ***   -79.18 ***
##                                   (12.87)      (57.05)      (34.22)      (29.97)      (11.94)   
## median_household_income_2017        0.00 ***     0.00 ***     0.00 ***     0.00 ***     0.00 ***
##                                    (0.00)       (0.00)       (0.00)       (0.00)       (0.00)   
## hs_grad_2017                        0.67 ***     0.62 ***     0.64 ***     0.66 ***     0.62 ***
##                                    (0.10)       (0.10)       (0.09)       (0.10)       (0.10)   
## poverty_2017                        0.74 **      0.54 *       0.60 **      0.67 **      0.61 ** 
##                                    (0.22)       (0.23)       (0.20)       (0.22)       (0.21)   
## lag.median_household_income_2017                -0.00         0.00         0.00                 
##                                                 (0.00)       (0.00)       (0.00)                
## lag.hs_grad_2017                                 0.12         0.21         0.43                 
##                                                 (0.40)       (0.27)       (0.25)                
## lag.poverty_2017                                 0.79         0.94 *       1.31 **              
##                                                 (0.70)       (0.47)       (0.46)                
## rho                                              0.43         0.26                      0.25 *  
##                                                 (0.41)       (0.17)                    (0.12)   
## lambda                                          -0.26                                           
##                                                 (0.58)                                          
## ------------------------------------------------------------------------------------------------
## R^2                                 0.84                                   0.86                 
## Adj. R^2                            0.83                                   0.85                 
## Num. obs.                          58           58           58                        58       
## Parameters                                      10            9                         6       
## Log Likelihood                                -163.80      -164.01      -164.84      -167.17    
## AIC (Linear model)                             348.56       345.67                    348.56    
## AIC (Spatial model)                            347.61       346.02                    346.34    
## LR test: statistic                              10.95         1.65                      4.22    
## LR test: p-value                                 0.05         0.20                      0.04    
## Sigma                                                                      4.43                 
## Statistic                                                                 53.89                 
## P Value                                                                    0.00                 
## DF                                                                         6.00                 
## AIC                                                                      345.67                 
## BIC                                                                      362.16                 
## Deviance                                                                 998.84                 
## DF Resid.                                                                 51                    
## nobs                                                                      58                    
## ================================================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05

According to AIC and Log Likelihood, the SAR models appears to be the best one. In all the models besides OLS, the variables hs_grad_2017 and median_household_income_2017 are significant. From the spatial lags, only lag.poverty_2017 is significant for SDM and SLX. In terms of the significance of spatial components we see that only rho component for SAR model is significant.

Estimating direct and indirect impacts

For SDM:

W.c<-as(as_dgRMatrix_listw(cont.listw), "CsparseMatrix") 
trMat<-trW(W.c, type="mult") 

SDM_imp<-impacts(SDM, tr=trMat, R=2000)
summary(SDM_imp, zstats=TRUE, short=TRUE)

## Impact measures (mixed, trace):
##                                   Direct     Indirect        Total
## median_household_income_2017 0.000606837 0.0002713587 0.0008781957
## hs_grad_2017                 0.662897417 0.4892570048 1.1521544217
## poverty_2017                 0.668140799 1.4161886849 2.0843294843
## ========================================================
## Simulation results ( variance matrix):
## ========================================================
## Simulated standard errors
##                                    Direct     Indirect        Total
## median_household_income_2017 5.189702e-05 0.0002006386 0.0002205986
## hs_grad_2017                 9.559194e-02 0.3311415860 0.3755938657
## poverty_2017                 2.025549e-01 0.6343398490 0.7059595618
## 
## Simulated z-values:
##                                 Direct Indirect    Total
## median_household_income_2017 11.653902 1.348657 3.968272
## hs_grad_2017                  6.947262 1.482676 3.075337
## poverty_2017                  3.281640 2.241049 2.955267
## 
## Simulated p-values:
##                              Direct     Indirect Total     
## median_household_income_2017 < 2.22e-16 0.177447 7.2396e-05
## hs_grad_2017                 3.7244e-12 0.138161 0.0021026 
## poverty_2017                 0.0010321  0.025023 0.0031240

To get the direct and spillover effect for SDM we need to compute the coefficients:

# extracting direct & total impacts
a<-SDM_imp$res$direct
b<-SDM_imp$res$total
a/b

## median_household_income_2017                 hs_grad_2017 
##                    0.6910043                    0.5753547 
##                 poverty_2017 
##                    0.3205543

We can see that there aren’t any indirect effects, because for all variables the p-values are above the significance level.

For SLX:

W.c<-as(as_dgRMatrix_listw(cont.listw), "CsparseMatrix") 
trMat<-trW(W.c, type="mult") 

SLX_imp<-impacts(SLX, tr=trMat, R=2000)
summary(SLX_imp, zstats=TRUE, short=TRUE)

## Impact measures (SlX, estimable, n-k):
##                                    Direct     Indirect        Total
## median_household_income_2017 0.0006076923 0.0002640818 0.0008717741
## hs_grad_2017                 0.6627366864 0.4335430283 1.0962797147
## poverty_2017                 0.6702821460 1.3102431635 1.9805253095
## ========================================================
## Standard errors:
##                                    Direct     Indirect        Total
## median_household_income_2017 5.445968e-05 0.0001465605 0.0001538927
## hs_grad_2017                 1.016390e-01 0.2481441345 0.2720093427
## poverty_2017                 2.152945e-01 0.4572845481 0.4871987085
## ========================================================
## Z-values:
##                                 Direct Indirect    Total
## median_household_income_2017 11.158574 1.801862 5.664818
## hs_grad_2017                  6.520494 1.747142 4.030302
## poverty_2017                  3.113327 2.865269 4.065128
## 
## p-values:
##                              Direct     Indirect  Total     
## median_household_income_2017 < 2.22e-16 0.0715672 1.4718e-08
## hs_grad_2017                 7.0076e-11 0.0806127 5.5705e-05
## poverty_2017                 0.0018499  0.0041666 4.8006e-05

We can only interpret the indirect effects for poverty_2017 variable as it statistically significant. The interpretation can be done without the recomputing of coefficients.

Spatial Analysis of Bachelors in California in 2017

Anna Sikora, Aleksander Partyga

5 07 2022