Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy.It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure below. The GDP per capita of the poorest municipality is R3190.6. On the other hand, the GDP per capita of the richest municipality is R314638. Half of the municipalities with GDP per capita less than R16000 and the top 25% municipalities earn R26155 and above.
In this take-home exercise, we will determine factors affecting the unequal development of Brazil at the municipality level by using the data provided. The 5 Tasks are as follows: 1. Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level. 2. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method. 3 Prepare a choropleth map showing the distribution of the residual of the GDP per capita. 4. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse','geobr','heatmaply' )
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
First, we load the factors that we like to analyze in Brazil
brazil = read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")
brazil
## # A tibble: 5,573 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abad~ GO 0 6876 6876 0
## 2 Abad~ MG 0 6704 6704 0
## 3 Abad~ GO 0 15757 15609 148
## 4 Abae~ MG 0 22690 22690 0
## 5 Abae~ PA 0 141100 141040 60
## 6 Abai~ CE 0 10496 10496 0
## 7 Abaí~ BA 0 8316 8316 0
## 8 Abaré BA 0 17064 17064 0
## 9 Abat~ PR 0 7764 7764 0
## 10 Abdo~ SC 0 2653 2653 0
## # ... with 5,563 more rows, and 75 more variables: IBGE_DU <dbl>,
## # IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>,
## # IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>,
## # `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## # `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## # IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## # LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## # AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## # Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## # MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>
nrow(brazil)
## [1] 5573
Next we import the meta data to better understand the factors.
data_dict = read_delim("data/aspatial/Data_Dictionary.csv", delim = ";")
data_dict
## # A tibble: 96 x 6
## FIELD DESCRIPTION REFERENCE UNIT SOURCE X6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 CITY Name of the City <NA> <NA> - <NA>
## 2 STATE Name of the State <NA> <NA> - <NA>
## 3 CAPITAL 1 if Capital of State <NA> <NA> - <NA>
## 4 IBGE_RES_~ "Resident Population " 2010 - https://sidra.i~ <NA>
## 5 IBGE_RES_~ Resident Population B~ 2010 - https://sidra.i~ <NA>
## 6 IBGE_RES_~ Redident Population F~ 2010 - https://sidra.i~ <NA>
## 7 IBGE_DU "Domestic Units Total~ 2010 - https://sidra.i~ <NA>
## 8 IBGE_DU_U~ "Domestic Units Urban~ 2010 - https://sidra.i~ <NA>
## 9 IBGE_DU_R~ Domestic Units Rural 2010 - https://sidra.i~ <NA>
## 10 IBGE_POP Resident Population R~ 2010 - https://sidra.i~ <NA>
## # ... with 86 more rows
datasets <- list_geobr()
datasets
## # A tibble: 22 x 4
## `function` geography years source
## <chr> <chr> <chr> <chr>
## 1 `read_country` Country 1872, 1900, 1911, 1920, 19~ IBGE
## 2 `read_region` Region 2000, 2001, 2010, 2013, 20~ IBGE
## 3 `read_state` States 1872, 1900, 1911, 1920, 19~ IBGE
## 4 `read_meso_regi~ Meso region 2000, 2001, 2010, 2013, 20~ IBGE
## 5 `read_micro_reg~ Micro region 2000, 2001, 2010, 2013, 20~ IBGE
## 6 `read_intermedi~ Intermediate region 2017 IBGE
## 7 `read_immediate~ Immediate region 2017 IBGE
## 8 `read_municipal~ Municipality 1872, 1900, 1911, 1920, 19~ IBGE
## 9 `read_weighting~ Census weighting ar~ 2010 IBGE
## 10 `read_census_tr~ Census tract (setor~ 2000, 2010 IBGE
## # ... with 12 more rows
# Municipality of Brazil
muni <- read_municipality(year=2016)
##
|
| | 0%
|
|== | 4%
|
|===== | 7%
|
|======= | 11%
|
|========== | 15%
|
|============ | 19%
|
|============== | 22%
|
|================= | 26%
|
|=================== | 30%
|
|====================== | 33%
|
|======================== | 37%
|
|========================== | 41%
|
|============================= | 44%
|
|=============================== | 48%
|
|================================== | 52%
|
|==================================== | 56%
|
|======================================= | 59%
|
|========================================= | 63%
|
|=========================================== | 67%
|
|============================================== | 70%
|
|================================================ | 74%
|
|=================================================== | 78%
|
|===================================================== | 81%
|
|======================================================= | 85%
|
|========================================================== | 89%
|
|============================================================ | 93%
|
|=============================================================== | 96%
|
|=================================================================| 100%
head(muni)
## Simple feature collection with 6 features and 4 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
## code_muni name_muni code_state abbrev_state
## 1 1100015 Alta Floresta D'oeste 11 RO
## 2 1100023 Ariquemes 11 RO
## 3 1100031 Cabixi 11 RO
## 4 1100049 Cacoal 11 RO
## 5 1100056 Cerejeiras 11 RO
## 6 1100064 Colorado Do Oeste 11 RO
## geom
## 1 POLYGON ((-62.19465 -11.827...
## 2 POLYGON ((-62.53648 -9.7322...
## 3 POLYGON ((-60.37075 -13.363...
## 4 POLYGON ((-61.0008 -11.2973...
## 5 POLYGON ((-61.49976 -13.005...
## 6 POLYGON ((-60.50475 -12.966...
nrow(muni)
## [1] 5572
Check st_crs of muni
st_crs(muni)
## Coordinate Reference System:
## User input: SIRGAS 2000
## wkt:
## GEOGCRS["SIRGAS 2000",
## DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["Latin America - SIRGAS 2000 by country"],
## BBOX[-59.87,-122.19,32.72,-25.28]],
## ID["EPSG",4674]]
As seen above, the coordinate system used is SIRGAS 2000. Which the EPSG code is 4674
First, we need to check the number of NA rows for long and lat before converting it to sf.
brazil_na <- brazil %>% filter(is.na(LONG) | is.na(LAT))
brazil_na
## # A tibble: 9 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Baln~ SC 0 NA NA NA
## 2 Lago~ RS 0 NA NA NA
## 3 Moju~ PA 0 NA NA NA
## 4 Para~ MS 0 NA NA NA
## 5 Pesc~ SC 0 NA NA NA
## 6 Pinh~ RS 0 2130 2130 0
## 7 Pint~ RS 0 NA NA NA
## 8 Sant~ BA 0 9648 9648 0
## 9 São ~ PE 0 NA NA NA
## # ... with 75 more variables: IBGE_DU <dbl>, IBGE_DU_URBAN <dbl>,
## # IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>,
## # `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>,
## # `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## # `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## # IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## # LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## # AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## # Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## # MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>
As there are 9 rows with LONG/LAT value that are NA, we will need to replace and input their LONG LAT values. We will search and replace by STATE and CITY (some states may have same city names)
brazil$LONG[brazil$CITY == "Balneário Rincão" & brazil$STATE=="SC"] <- -49.2351
brazil$LAT[brazil$CITY == "Balneário Rincão" & brazil$STATE=="SC"] <- -28.8314
brazil$LONG[brazil$CITY == "Lagoa Dos Patos" & brazil$STATE=="RS"] <- -51.2500
brazil$LAT[brazil$CITY == "Lagoa Dos Patos" & brazil$STATE=="RS"] <- -31.1000
brazil$LONG[brazil$CITY == "Mojuí Dos Campos" & brazil$STATE=="PA"] <- -54.6425
brazil$LAT[brazil$CITY == "Mojuí Dos Campos" & brazil$STATE=="PA"] <- -2.6822
brazil$LONG[brazil$CITY == "Paraíso Das Águas" & brazil$STATE=="MS"] <- -53.0116
brazil$LAT[brazil$CITY == "Paraíso Das Águas" & brazil$STATE=="MS"] <- -19.0216
brazil$LONG[brazil$CITY == "Pescaria Brava" & brazil$STATE=="SC"] <- -48.8864
brazil$LAT[brazil$CITY == "Pescaria Brava" & brazil$STATE=="SC"] <- -228.3866
brazil$LONG[brazil$CITY == "Pinhal Da Serra" & brazil$STATE=="RS"] <- -51.1707
brazil$LAT[brazil$CITY == "Pinhal Da Serra" & brazil$STATE=="RS"] <- -27.8743
brazil$LONG[brazil$CITY == "Pinto Bandeira" & brazil$STATE=="RS"] <- -51.4503
brazil$LAT[brazil$CITY == "Pinto Bandeira" & brazil$STATE=="RS"] <- -29.0975
brazil$LONG[brazil$CITY == "Santa Terezinha" & brazil$STATE=="BA"] <- -39.6105
brazil$LAT[brazil$CITY == "Santa Terezinha" & brazil$STATE=="BA"] <- -12.7700
brazil$LONG[brazil$CITY == "São Caetano" & brazil$STATE=="PE"] <- -36.1361
brazil$LAT[brazil$CITY == "São Caetano" & brazil$STATE=="PE"] <- -8.3318
We then convert the data into a sf object and tranform the crs to 4674 to match it with muni.
brazil_3414 <- st_as_sf(brazil, coords = c("LONG","LAT"),crs=4326)
brazil_4674 <- st_transform(brazil_3414,4674)
head(brazil_4674)
## Simple feature collection with 6 features and 79 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -49.44055 ymin: -19.15585 xmax: -39.04755 ymax: -1.72347
## geographic CRS: SIRGAS 2000
## # A tibble: 6 x 80
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abad~ GO 0 6876 6876 0
## 2 Abad~ MG 0 6704 6704 0
## 3 Abad~ GO 0 15757 15609 148
## 4 Abae~ MG 0 22690 22690 0
## 5 Abae~ PA 0 141100 141040 60
## 6 Abai~ CE 0 10496 10496 0
## # ... with 74 more variables: IBGE_DU <dbl>, IBGE_DU_URBAN <dbl>,
## # IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>,
## # `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>,
## # `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## # `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## # IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## # ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>, AREA <dbl>,
## # REGIAO_TUR <chr>, CATEGORIA_TUR <chr>, ESTIMATED_POP <dbl>,
## # RURAL_URBAN <chr>, GVA_AGROPEC <dbl>, GVA_INDUSTRY <dbl>,
## # GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL ` <dbl>,
## # TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## # Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## # MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>, geometry <POINT [°]>
brazil_muni <- st_join(muni,brazil_4674,join = st_contains)
After joning, we check the row count again to ensure proper joining.
nrow(brazil_muni)
## [1] 5574
As seen above, we have 5574, which is more than what we should have (5572) as muni has 5572 rows. Hence, there are at least 2 records duplicated. We filter them out to identify the codes.
brazil_muni %>% group_by(code_muni) %>% tally() %>% filter(n>1)
## Simple feature collection with 2 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -39.67875 ymin: -12.93082 xmax: -36.06152 ymax: -8.187331
## geographic CRS: SIRGAS 2000
## # A tibble: 2 x 3
## code_muni n geom
## * <dbl> <int> <GEOMETRY [°]>
## 1 2613107 2 POLYGON ((-36.11492 -8.195208, -36.08247 -8.22815, -36.0~
## 2 2928505 2 MULTIPOLYGON (((-39.56748 -12.55098, -39.56109 -12.55148~
We then print out the data to better understand why they are duplicated.
brazil_muni %>% filter(code_muni==2613107|code_muni==2928505)
## Simple feature collection with 4 features and 83 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -39.67875 ymin: -12.93082 xmax: -36.06152 ymax: -8.187331
## geographic CRS: SIRGAS 2000
## code_muni name_muni code_state abbrev_state CITY STATE
## 1 2613107 São Caitano 26 PE São Caetano PE
## 2 2613107 São Caitano 26 PE São Caitano PE
## 3 2928505 Santa Terezinha 29 BA Santa Teresinha BA
## 4 2928505 Santa Terezinha 29 BA Santa Terezinha BA
## CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU
## 1 0 NA NA NA NA
## 2 0 35274 35274 0 10718
## 3 0 NA NA NA NA
## 4 0 9648 9648 0 2891
## IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## 1 NA NA NA NA NA NA NA
## 2 8363 2355 27047 438 1791 2401 2784
## 3 NA NA NA NA NA NA NA
## 4 734 2157 2332 40 126 191 217
## IBGE_15-59 IBGE_60+ IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$
## 1 NA NA NA NA
## 2 16586 3047 783 468
## 3 NA NA NA NA
## 4 1419 339 358 496
## IDHM Ranking 2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao ALT
## 1 NA NA NA NA NA NA
## 2 4380 0.591 0.583 0.756 0.469 555.25
## 3 4493 0.590 0.549 0.804 0.459 222.51
## 4 NA NA NA NA NA NA
## PAY_TV FIXED_PHONES AREA REGIAO_TUR CATEGORIA_TUR
## 1 NA NA NA <NA> <NA>
## 2 175 673 382.47 <NA> <NA>
## 3 92 125 NA Caminhos Do Jiquiriçá D
## 4 NA NA 719.26 <NA> <NA>
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 1 NA <NA> NA NA
## 2 37119 Intermediário Adjacente 17.34 34337.46
## 3 10345 <NA> NA NA
## 4 NA Rural Adjacente 13235.20 5398.61
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP
## 1 NA NA NA NA NA NA
## 2 113666.63 133.73 299079.63 23230.19 322309.83 36895
## 3 NA NA NA NA NA NA
## 4 17754.37 32630.97 69019.14 3149.33 72168.48 10619
## GDP_CAPITA
## 1 NA
## 2 8735.87
## 3 NA
## 4 6796.16
## GVA_MAIN
## 1 <NA>
## 2 Administração, defesa, educação e saúde públicas e seguridade social
## 3 <NA>
## 4 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 53609740 NA NA NA NA NA NA NA NA
## 2 NA 244 2 2 43 0 1 11 119
## 3 22225531 NA NA NA NA NA NA NA NA
## 4 NA 74 2 1 4 0 0 3 37
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q
## 1 NA NA NA NA NA NA NA NA NA NA
## 2 10 15 2 0 3 4 4 4 12 1
## 3 NA NA NA NA NA NA NA NA NA NA
## 4 0 3 1 0 0 1 2 2 12 2
## COMP_R COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank
## 1 NA NA NA NA NA NA NA NA NA
## 2 1 10 0 0 NA NA 0 1 0
## 3 NA NA NA NA NA NA NA NA NA
## 4 0 4 0 0 NA NA NA NA NA
## Pu_Bank Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC
## 1 NA NA NA NA NA NA NA NA
## 2 1 0 37171342 3619 5478 0 NA NA
## 3 NA NA NA 810 453 0 NA NA
## 4 NA NA NA NA NA NA NA NA
## WAL-MART POST_OFFICES geom
## 1 NA NA POLYGON ((-36.11492 -8.1952...
## 2 NA 3 POLYGON ((-36.11492 -8.1952...
## 3 NA NA MULTIPOLYGON (((-39.56748 -...
## 4 NA 1 MULTIPOLYGON (((-39.56748 -...
As seen above, its due to the fact that the original brazil dataset consists of duplicated data whereby the 2 of the cities are repeated twice. The spelling differs by one letter (São Caetano vs São Caitano) and (Santa Teresinha vs Santa Terezinha). I googled and they each represent the same place. Hence, we removed 1 of the 2 for both. We will remove the ones with GDP_CAPITA is NA. In this case, since we also need to remove those with GDP_CAPITA == NA for all datarows, we will simply remove all datasets with GDP_CAPITA == NA as they are irrelevant for the analysis.
brazil_muni_o <- brazil_muni %>% filter(!is.na(GDP_CAPITA))
nrow(brazil_muni_o)
## [1] 5569
summary(brazil_muni_o)
## code_muni name_muni code_state abbrev_state
## Min. :1100015 Bom Jesus : 5 31 : 853 MG : 853
## 1st Qu.:2512101 São Domingos: 5 35 : 645 SP : 645
## Median :3146255 Bonito : 4 43 : 497 RS : 497
## Mean :3253419 Santa Helena: 4 29 : 417 BA : 417
## 3rd Qu.:4119152 Santa Inês : 4 41 : 399 PR : 399
## Max. :5300108 Santa Luzia : 4 42 : 294 SC : 294
## (Other) :5543 (Other):2464 (Other):2464
## CITY STATE CAPITAL
## Length:5569 Length:5569 Min. :0.000000
## Class :character Class :character 1st Qu.:0.000000
## Mode :character Mode :character Median :0.000000
## Mean :0.004848
## 3rd Qu.:0.000000
## Max. :1.000000
##
## IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## Min. : 805 Min. : 805 Min. : 0.0
## 1st Qu.: 5235 1st Qu.: 5230 1st Qu.: 0.0
## Median : 10934 Median : 10926 Median : 0.0
## Mean : 34278 Mean : 34200 Mean : 77.5
## 3rd Qu.: 23424 3rd Qu.: 23390 3rd Qu.: 10.0
## Max. :11253503 Max. :11133776 Max. :119727.0
## NA's :4 NA's :4 NA's :4
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP
## Min. : 239 Min. : 60 Min. : 3 Min. : 174
## 1st Qu.: 1572 1st Qu.: 874 1st Qu.: 487 1st Qu.: 2801
## Median : 3174 Median : 1846 Median : 931 Median : 6170
## Mean : 10303 Mean : 8859 Mean : 1463 Mean : 27595
## 3rd Qu.: 6726 3rd Qu.: 4624 3rd Qu.: 1832 3rd Qu.: 15302
## Max. :3576148 Max. :3548433 Max. :33809 Max. :10463636
## NA's :6 NA's :6 NA's :77 NA's :4
## IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## Min. : 0.0 Min. : 5 Min. : 7 Min. : 12
## 1st Qu.: 38.0 1st Qu.: 158 1st Qu.: 220 1st Qu.: 259
## Median : 92.0 Median : 376 Median : 516 Median : 588
## Mean : 383.3 Mean : 1544 Mean : 2069 Mean : 2381
## 3rd Qu.: 232.0 3rd Qu.: 951 3rd Qu.: 1300 3rd Qu.: 1478
## Max. :129464.0 Max. :514794 Max. :684443 Max. :783702
## NA's :4 NA's :4 NA's :4 NA's :4
## IBGE_15-59 IBGE_60+ IBGE_PLANTED_AREA
## Min. : 94 Min. : 29 Min. : 0
## 1st Qu.: 1734 1st Qu.: 341 1st Qu.: 911
## Median : 3841 Median : 722 Median : 3473
## Mean : 18212 Mean : 3004 Mean : 14182
## 3rd Qu.: 9628 3rd Qu.: 1724 3rd Qu.: 11201
## Max. :7058221 Max. :1293012 Max. :1205669
## NA's :4 NA's :4
## IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 2327 1st Qu.:1392 1st Qu.:0.5990
## Median : 13846 Median :2782 Median :0.6650
## Mean : 57394 Mean :2783 Mean :0.6592
## 3rd Qu.: 55623 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :3274885 Max. :5565 Max. :0.8620
## NA's :5 NA's :5
## IDHM_Renda IDHM_Longevidade IDHM_Educacao ALT
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. : 0.0
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.: 169.7
## Median :0.6540 Median :0.8080 Median :0.5600 Median : 406.5
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean : 894.0
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.: 629.0
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :874579.0
## NA's :5 NA's :5 NA's :5 NA's :6
## PAY_TV FIXED_PHONES AREA
## Min. : 1.0 Min. : 3 Min. : 3.57
## 1st Qu.: 88.0 1st Qu.: 119 1st Qu.: 204.45
## Median : 247.0 Median : 328 Median : 416.59
## Mean : 3095.3 Mean : 6569 Mean : 1516.14
## 3rd Qu.: 815.2 3rd Qu.: 1151 3rd Qu.: 1026.44
## Max. :2047668.0 Max. :5543127 Max. :159533.33
## NA's :1 NA's :1 NA's :1
## REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## Length:5569 Length:5569 Min. : 786
## Class :character Class :character 1st Qu.: 5453
## Mode :character Mode :character Median : 11591
## Mean : 37442
## 3rd Qu.: 25299
## Max. :12176866
## NA's :1
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Length:5569 Min. : 0 Min. : 1
## Class :character 1st Qu.: 4193 1st Qu.: 1725
## Mode :character Median : 20430 Median : 7425
## Mean : 47279 Mean : 175959
## 3rd Qu.: 51238 3rd Qu.: 41026
## Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL
## Min. : 2 Min. : 7 Min. : 17
## 1st Qu.: 10113 1st Qu.: 17260 1st Qu.: 42254
## Median : 31212 Median : 35865 Median : 119504
## Mean : 489539 Mean : 123784 Mean : 833136
## 3rd Qu.: 115448 3rd Qu.: 89256 3rd Qu.: 313988
## Max. :464656988 Max. :41902893 Max. :569910503
##
## TAXES GDP POP_GDP
## Min. : -14159 Min. : 15 Min. : 815
## 1st Qu.: 1305 1st Qu.: 43707 1st Qu.: 5481
## Median : 5107 Median : 125196 Median : 11584
## Mean : 118885 Mean : 954740 Mean : 37003
## 3rd Qu.: 22208 3rd Qu.: 329717 3rd Qu.: 25088
## Max. :117125387 Max. :687035890 Max. :12038175
##
## GDP_CAPITA GVA_MAIN MUN_EXPENDIT
## Min. : 3191 Length:5569 Min. :1.421e+06
## 1st Qu.: 9062 Class :character 1st Qu.:1.574e+07
## Median : 15873 Mode :character Median :2.747e+07
## Mean : 21128 Mean :1.044e+08
## 3rd Qu.: 26155 3rd Qu.:5.675e+07
## Max. :314638 Max. :4.577e+10
## NA's :1491
## COMP_TOT COMP_A COMP_B COMP_C
## Min. : 6.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 162.0 Median : 2.00 Median : 0.000 Median : 11.00
## Mean : 906.9 Mean : 18.26 Mean : 1.852 Mean : 73.45
## 3rd Qu.: 448.0 3rd Qu.: 8.00 3rd Qu.: 2.000 3rd Qu.: 39.00
## Max. :530446.0 Max. :1948.00 Max. :274.000 Max. :31566.00
##
## COMP_D COMP_E COMP_F COMP_G
## Min. : 0.0000 Min. : 0.000 Min. : 0.00 Min. : 1
## 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 32
## Median : 0.0000 Median : 0.000 Median : 4.00 Median : 75
## Mean : 0.4263 Mean : 2.029 Mean : 43.27 Mean : 348
## 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00 3rd Qu.: 199
## Max. :332.0000 Max. :657.000 Max. :25222.00 Max. :150633
##
## COMP_H COMP_I COMP_J COMP_K
## Min. : 0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 1 1st Qu.: 2.00 1st Qu.: 0.00 1st Qu.: 0.00
## Median : 7 Median : 7.00 Median : 1.00 Median : 0.00
## Mean : 41 Mean : 55.89 Mean : 24.75 Mean : 15.55
## 3rd Qu.: 25 3rd Qu.: 24.00 3rd Qu.: 5.00 3rd Qu.: 2.00
## Max. :19515 Max. :29290.00 Max. :38720.00 Max. :23738.00
##
## COMP_L COMP_M COMP_N COMP_O
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 1.0 1st Qu.: 1.00 1st Qu.: 2.00
## Median : 0.00 Median : 4.0 Median : 4.00 Median : 2.00
## Mean : 15.14 Mean : 51.3 Mean : 83.71 Mean : 3.27
## 3rd Qu.: 3.00 3rd Qu.: 13.0 3rd Qu.: 14.00 3rd Qu.: 3.00
## Max. :14003.00 Max. :49181.0 Max. :76757.00 Max. :204.00
##
## COMP_P COMP_Q COMP_R
## Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00
## Median : 6.00 Median : 3.00 Median : 2.00
## Mean : 30.97 Mean : 34.16 Mean : 12.18
## 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00
## Max. :16030.00 Max. :22248.00 Max. :6687.00
##
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.61 Mean :0 Mean : 0.05028 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :4682
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 257.5 Mean : 3.383 Mean : 2.829 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :4682 NA's :2227 NA's :2227 NA's :2227
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602
## Median :2.00 Median :3.231e+07 Median :1.339e+08 Median : 1438
## Mean :1.58 Mean :9.180e+09 Mean :6.005e+09 Mean : 9862
## 3rd Qu.:2.00 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4087
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2227 NA's :2227 NA's :2227 NA's :9
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 Min. :1 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 1st Qu.:1 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median :1 Median : 2.000
## Mean : 4880 Mean : 5.756 Mean :1 Mean : 4.277
## 3rd Qu.: 3296 3rd Qu.: 1.000 3rd Qu.:1 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :1 Max. :130.000
## NA's :9 NA's :9 NA's :5444 NA's :5403
## WAL-MART POST_OFFICES geom
## Min. : 1.000 Min. : 1.00 MULTIPOLYGON :2772
## 1st Qu.: 1.000 1st Qu.: 1.00 POLYGON :2797
## Median : 1.000 Median : 1.00 epsg:4674 : 0
## Mean : 2.059 Mean : 2.08 +proj=long...: 0
## 3rd Qu.: 1.750 3rd Qu.: 2.00
## Max. :26.000 Max. :225.00
## NA's :5467 NA's :117
head(brazil_muni_o)
## Simple feature collection with 6 features and 83 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
## code_muni name_muni code_state abbrev_state
## 1 1100015 Alta Floresta D'oeste 11 RO
## 2 1100023 Ariquemes 11 RO
## 3 1100031 Cabixi 11 RO
## 4 1100049 Cacoal 11 RO
## 5 1100056 Cerejeiras 11 RO
## 6 1100064 Colorado Do Oeste 11 RO
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 1 Alta Floresta D'Oeste RO 0 24392 24392
## 2 Ariquemes RO 0 90353 90240
## 3 Cabixi RO 0 6313 6310
## 4 Cacoal RO 0 78574 78536
## 5 Cerejeiras RO 0 17029 17011
## 6 Colorado Do Oeste RO 0 18591 18562
## IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 1 0 7276 4306 2970 13804 203
## 2 113 27236 22971 4265 69068 1122
## 3 3 1979 887 1092 2689 43
## 4 38 24045 19411 4634 61699 923
## 5 18 5346 4561 785 13755 200
## 6 29 5962 4441 1521 13404 189
## IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IBGE_PLANTED_AREA
## 1 869 1159 1281 9023 1269 18288
## 2 4383 6101 6870 46000 4592 7115
## 3 191 206 272 1742 235 41086
## 4 3599 4790 5722 42027 4638 18180
## 5 827 1129 1333 8976 1290 52326
## 6 822 1002 1196 8801 1394 6268
## IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM IDHM_Renda
## 1 101470 3283 0.640 0.657
## 2 33365 1847 0.700 0.716
## 3 107975 3117 0.650 0.650
## 4 174665 1377 0.718 0.727
## 5 150755 2141 0.690 0.688
## 6 21765 2326 0.685 0.676
## IDHM_Longevidade IDHM_Educacao ALT PAY_TV FIXED_PHONES AREA
## 1 0.763 0.526 337.74 240 687 7067.03
## 2 0.806 0.600 138.69 2267 9191 4426.57
## 3 0.757 0.559 236.06 50 188 1314.35
## 4 0.821 0.620 177.45 1806 6491 3792.89
## 5 0.799 0.602 262.81 307 1215 2783.30
## 6 0.814 0.584 419.09 235 1107 1451.06
## REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1 Vale Do Guaporé D 23167
## 2 Vale Do Jamari C 106168
## 3 Vale Do Guaporé D 5438
## 4 Br 364 - Caminhos De Rondon C 84813
## 5 <NA> <NA> 16444
## 6 <NA> <NA> 16227
## RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 1 Intermediário Adjacente 166143.38 31.27 114455.32 142727.55
## 2 Urbano 145068.79 353163.07 879.97 589088.78
## 3 Rural Adjacente 59081.09 4623.27 24091.22 40041.59
## 4 Urbano 188004.72 230109.20 846490.22 485242.29
## 5 Urbano 52891.37 22445.48 175059.55 99495.64
## 6 Intermediário Adjacente 72684.77 28.37 85558.72 100799.66
## GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 1 454596.58 23186.17 477782.7 25506 18732.17
## 2 1967287.27 216095.93 2183383.2 105896 20618.18
## 3 127837.15 5.51 133345.4 6289 21202.96
## 4 1749846.43 194940.22 1944786.6 87877 22130.78
## 5 349.89 58.16 408047.8 17959 22721.08
## 6 287409.88 18466.33 305876.2 18639 16410.55
## GVA_MAIN
## 1 Administração, defesa, educação e saúde públicas e seguridade social
## 2 Demais serviços
## 3 Administração, defesa, educação e saúde públicas e seguridade social
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 50218466 508 3 1 38 7 4 26 270
## 2 201979608 2221 24 28 223 1 12 79 1054
## 3 17904387 60 0 0 7 0 0 1 32
## 4 168366053 1846 13 9 180 1 7 80 839
## 5 42641592 379 6 0 40 0 1 7 185
## 6 31611296 264 3 2 29 3 0 11 147
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q
## 1 23 20 8 1 3 12 10 4 21 13
## 2 68 136 27 32 12 91 116 5 68 74
## 3 2 4 2 0 0 1 2 2 2 1
## 4 76 102 34 19 31 83 93 4 52 104
## 5 19 20 3 1 3 15 8 3 10 12
## 6 11 13 3 3 0 8 9 4 5 4
## COMP_R COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank
## 1 6 38 0 0 NA NA 1 2 1
## 2 28 143 0 0 1 78 2 4 2
## 3 0 4 0 0 1 27 NA NA NA
## 4 18 101 0 0 1 40 2 3 2
## 5 2 44 0 0 1 40 1 3 1
## 6 1 8 0 0 NA NA 1 2 1
## Pu_Bank Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC
## 1 2 38958866 245288714 2464 9268 1 NA NA
## 2 3 156637075 2487942240 17513 42664 1 NA NA
## 3 NA NA NA 925 1838 0 NA NA
## 4 3 145453086 2339395610 17561 38073 3 NA NA
## 5 3 57939053 392087789 2948 6439 0 NA NA
## 6 2 77276523 237735184 3158 6874 0 NA NA
## WAL-MART POST_OFFICES geom
## 1 NA 1 POLYGON ((-62.19465 -11.827...
## 2 NA 2 POLYGON ((-62.53648 -9.7322...
## 3 NA 1 POLYGON ((-60.37075 -13.363...
## 4 NA 2 POLYGON ((-61.0008 -11.2973...
## 5 NA 1 POLYGON ((-61.49976 -13.005...
## 6 NA 1 POLYGON ((-60.50475 -12.966...
Lets review the data dictionary again.
data_dict
## # A tibble: 96 x 6
## FIELD DESCRIPTION REFERENCE UNIT SOURCE X6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 CITY Name of the City <NA> <NA> - <NA>
## 2 STATE Name of the State <NA> <NA> - <NA>
## 3 CAPITAL 1 if Capital of State <NA> <NA> - <NA>
## 4 IBGE_RES_~ "Resident Population " 2010 - https://sidra.i~ <NA>
## 5 IBGE_RES_~ Resident Population B~ 2010 - https://sidra.i~ <NA>
## 6 IBGE_RES_~ Redident Population F~ 2010 - https://sidra.i~ <NA>
## 7 IBGE_DU "Domestic Units Total~ 2010 - https://sidra.i~ <NA>
## 8 IBGE_DU_U~ "Domestic Units Urban~ 2010 - https://sidra.i~ <NA>
## 9 IBGE_DU_R~ Domestic Units Rural 2010 - https://sidra.i~ <NA>
## 10 IBGE_POP Resident Population R~ 2010 - https://sidra.i~ <NA>
## # ... with 86 more rows
First, we will decide which variables to use for our analysis. We will be choosing the following for our independent variables: IBGE_15-59 - Resident Population Regular Urban Planning - from 15 to 59 y.o IDHM_Renda - HDI GNI Index IDHM_Longevidade - HDI Life Expectancy index IDHM_Educacao - HDI Education index TAXES - Taxes GVA_AGROPEC - Gross Added Value - Agropecuary GVA_INDUSTRY - Gross Added Value - Industry GVA_SERVICES - Gross Added Value - Services GVA_PUBLIC - Gross Added Value - Public Services
We decide too omit other variables as they are irrelevant, and some of the variables re the overall variables for the variables that i have chosen. I chose IBGE_15-59 as it represents the economy active group. GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES and GVA_PUBLIC are part of the economy drivers. i included HDI Life Expectancy index as according to a study from https://ideas.repec.org/a/adm/journl/v8y2019i6p74-79.html, the increase in life expectancy is accompanied with the increase in Gross Domestic Product (GDP) per capita income in a state. We have also included the population growth rate as another important factor contributing towards GDP. Taxes are included as Tax cuts boost demand by increasing disposable income and by encouraging businesses to hire and invest more. Hence this affects the GDP economically.
I included HDI Education index as it directly affects economic growth. An increase in workers’ educational level improves their human capital, increasing the productivity of these workers and the economy’s output.
brazil_muni_var <- subset(brazil_muni_o,select=c(`code_muni`,
`name_muni`,
`STATE`,
`GDP_CAPITA`,
`IBGE_15-59`,
`IDHM_Renda`,
`IDHM_Longevidade`,
`IDHM_Educacao`,
`TAXES`,
`GVA_AGROPEC`,
`GVA_INDUSTRY`,
`GVA_SERVICES`,
`GVA_PUBLIC`
))
head(brazil_muni_var)
## Simple feature collection with 6 features and 13 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
## code_muni name_muni STATE GDP_CAPITA IBGE_15-59 IDHM_Renda
## 1 1100015 Alta Floresta D'oeste RO 18732.17 9023 0.657
## 2 1100023 Ariquemes RO 20618.18 46000 0.716
## 3 1100031 Cabixi RO 21202.96 1742 0.650
## 4 1100049 Cacoal RO 22130.78 42027 0.727
## 5 1100056 Cerejeiras RO 22721.08 8976 0.688
## 6 1100064 Colorado Do Oeste RO 16410.55 8801 0.676
## IDHM_Longevidade IDHM_Educacao TAXES GVA_AGROPEC GVA_INDUSTRY
## 1 0.763 0.526 23186.17 166143.38 31.27
## 2 0.806 0.600 216095.93 145068.79 353163.07
## 3 0.757 0.559 5.51 59081.09 4623.27
## 4 0.821 0.620 194940.22 188004.72 230109.20
## 5 0.799 0.602 58.16 52891.37 22445.48
## 6 0.814 0.584 18466.33 72684.77 28.37
## GVA_SERVICES GVA_PUBLIC geom
## 1 114455.32 142727.55 POLYGON ((-62.19465 -11.827...
## 2 879.97 589088.78 POLYGON ((-62.53648 -9.7322...
## 3 24091.22 40041.59 POLYGON ((-60.37075 -13.363...
## 4 846490.22 485242.29 POLYGON ((-61.0008 -11.2973...
## 5 175059.55 99495.64 POLYGON ((-61.49976 -13.005...
## 6 85558.72 100799.66 POLYGON ((-60.50475 -12.966...
summary(brazil_muni_var)
## code_muni name_muni STATE GDP_CAPITA
## Min. :1100015 Bom Jesus : 5 Length:5569 Min. : 3191
## 1st Qu.:2512101 São Domingos: 5 Class :character 1st Qu.: 9062
## Median :3146255 Bonito : 4 Mode :character Median : 15873
## Mean :3253419 Santa Helena: 4 Mean : 21128
## 3rd Qu.:4119152 Santa Inês : 4 3rd Qu.: 26155
## Max. :5300108 Santa Luzia : 4 Max. :314638
## (Other) :5543
## IBGE_15-59 IDHM_Renda IDHM_Longevidade IDHM_Educacao
## Min. : 94 Min. :0.4000 Min. :0.6720 Min. :0.2070
## 1st Qu.: 1734 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900
## Median : 3841 Median :0.6540 Median :0.8080 Median :0.5600
## Mean : 18212 Mean :0.6429 Mean :0.8016 Mean :0.5591
## 3rd Qu.: 9628 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310
## Max. :7058221 Max. :0.8910 Max. :0.8940 Max. :0.8250
## NA's :4 NA's :5 NA's :5 NA's :5
## TAXES GVA_AGROPEC GVA_INDUSTRY
## Min. : -14159 Min. : 0 Min. : 1
## 1st Qu.: 1305 1st Qu.: 4193 1st Qu.: 1725
## Median : 5107 Median : 20430 Median : 7425
## Mean : 118885 Mean : 47279 Mean : 175959
## 3rd Qu.: 22208 3rd Qu.: 51238 3rd Qu.: 41026
## Max. :117125387 Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC geom
## Min. : 2 Min. : 7 MULTIPOLYGON :2772
## 1st Qu.: 10113 1st Qu.: 17260 POLYGON :2797
## Median : 31212 Median : 35865 epsg:4674 : 0
## Mean : 489539 Mean : 123784 +proj=long...: 0
## 3rd Qu.: 115448 3rd Qu.: 89256
## Max. :464656988 Max. :41902893
##
As there are some NA data, we replace those with 0.
brazil_muni_var[is.na(brazil_muni_var)] = 0
The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in brazil_muni_var data.frame. To avoid multticollinearity, we will plot the scatterplot to better understand the bivariate relationship.
independent <- brazil_muni_var %>% select(-c(`GDP_CAPITA`, `code_muni`, `name_muni`, `STATE`)) %>%st_set_geometry(NULL)
independent.cor = cor(independent)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(independent.cor, method="color", col=col(200),
type="upper", order="hclust",
addCoef.col = "black", # Add coefficient of correlation
tl.col="black", tl.srt=45, #Text label color and rotation
# Combine with significance
sig.level = 0.01, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag=FALSE
)
As we can see from the above, the following pairs have a high correlation: 1. IBGE_15-59 & GVA_SERVICES: 0.94 2. AXES & GVVA_SERVICES:0.92 3. GVA_INDUSTRY & IBGE_15-59: 0.89 4. GVA_PUBLIC & IBGE_15-59: 0.87 5. GVA_INDUSTRY & GVA_SERVICES: 0.85 6. TAXES & GVA_PUBLIC: 0.85 7. GVA_PUBLIC & GVA_SERVICES: 0.85
Number 1 and 2 has extremely high correlation. Between both, we can see that the common variable is GVA_SERVICES. Also, we can see that GVA_SERVICES is highly correlated with the most variables. Hence, we remove it.
independent <- brazil_muni_var %>% select(-c(`GDP_CAPITA`, `code_muni`, `name_muni`, `STATE`, GVA_SERVICES, `IBGE_15-59`, `TAXES`)) %>%st_set_geometry(NULL)
independent.cor = cor(independent)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(independent.cor, method="color", col=col(200),
type="upper", order="hclust",
addCoef.col = "black", # Add coefficient of correlation
tl.col="black", tl.srt=45, #Text label color and rotation
# Combine with significance
sig.level = 0.01, insig = "blank",
# hide correlation coefficient on the principal diagonal
diag=FALSE
)
brazil_muni_var_o <- brazil_muni_var %>% select(-c(GVA_SERVICES))
tm_shape(brazil_muni_var_o)+
tm_borders(alpha = 0.5) +
tm_fill(col = "GDP_CAPITA", alpha = 0.9, id = "name_muni")
The plot shows that the GDP_CAPITAL in most of the municipalities ae in the lower range of 0 to 50,000. We can see that ome muncipalities in the Central to the South of Brazil have a higher range above 50,000.
summary(brazil_muni_var_o)
## code_muni name_muni STATE GDP_CAPITA
## Min. :1100015 Bom Jesus : 5 Length:5569 Min. : 3191
## 1st Qu.:2512101 São Domingos: 5 Class :character 1st Qu.: 9062
## Median :3146255 Bonito : 4 Mode :character Median : 15873
## Mean :3253419 Santa Helena: 4 Mean : 21128
## 3rd Qu.:4119152 Santa Inês : 4 3rd Qu.: 26155
## Max. :5300108 Santa Luzia : 4 Max. :314638
## (Other) :5543
## IBGE_15-59 IDHM_Renda IDHM_Longevidade IDHM_Educacao
## Min. : 0 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 1731 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900
## Median : 3838 Median :0.6540 Median :0.8080 Median :0.5600
## Mean : 18199 Mean :0.6423 Mean :0.8008 Mean :0.5586
## 3rd Qu.: 9591 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310
## Max. :7058221 Max. :0.8910 Max. :0.8940 Max. :0.8250
##
## TAXES GVA_AGROPEC GVA_INDUSTRY
## Min. : -14159 Min. : 0 Min. : 1
## 1st Qu.: 1305 1st Qu.: 4193 1st Qu.: 1725
## Median : 5107 Median : 20430 Median : 7425
## Mean : 118885 Mean : 47279 Mean : 175959
## 3rd Qu.: 22208 3rd Qu.: 51238 3rd Qu.: 41026
## Max. :117125387 Max. :1402282 Max. :63306755
##
## GVA_PUBLIC geom
## Min. : 7 MULTIPOLYGON :2772
## 1st Qu.: 17260 POLYGON :2797
## Median : 35865 epsg:4674 : 0
## Mean : 123784 +proj=long...: 0
## 3rd Qu.: 89256
## Max. :41902893
##
GDP_CAPITA_plot <- ggplot(data=brazil_muni_o, aes(x= `GDP_CAPITA`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ECONOMY_ACTIVE_plot <- ggplot(data=brazil_muni_o, aes(x= `IBGE_15-59`)) +
geom_histogram(bins=20, color="black", fill="light blue")
IDHM_Renda_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Renda`)) +
geom_histogram(bins=20, color="black", fill="light blue")
IDHM_Longevidade_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Longevidade`)) +
geom_histogram(bins=20, color="black", fill="light blue")
IDHM_Educacao_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Educacao`)) +
geom_histogram(bins=20, color="black", fill="light blue")
TAXES_plot <- ggplot(data=brazil_muni_o, aes(x= `TAXES`)) +
geom_histogram(bins=20, color="black", fill="light blue")
GVA_AGROPEC_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_AGROPEC`)) +
geom_histogram(bins=20, color="black", fill="light blue")
GVA_INDUSTRY_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_INDUSTRY`)) +
geom_histogram(bins=20, color="black", fill="light blue")
GVA_PUBLIC_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_PUBLIC`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(GDP_CAPITA_plot,ECONOMY_ACTIVE_plot,IDHM_Renda_plot,IDHM_Longevidade_plot,IDHM_Educacao_plot,TAXES_plot,GVA_AGROPEC_plot, GVA_INDUSTRY_plot, GVA_PUBLIC_plot, ncol = 3, nrow = 3)
summary(brazil_muni_var_o)
## code_muni name_muni STATE GDP_CAPITA
## Min. :1100015 Bom Jesus : 5 Length:5569 Min. : 3191
## 1st Qu.:2512101 São Domingos: 5 Class :character 1st Qu.: 9062
## Median :3146255 Bonito : 4 Mode :character Median : 15873
## Mean :3253419 Santa Helena: 4 Mean : 21128
## 3rd Qu.:4119152 Santa Inês : 4 3rd Qu.: 26155
## Max. :5300108 Santa Luzia : 4 Max. :314638
## (Other) :5543
## IBGE_15-59 IDHM_Renda IDHM_Longevidade IDHM_Educacao
## Min. : 0 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.: 1731 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900
## Median : 3838 Median :0.6540 Median :0.8080 Median :0.5600
## Mean : 18199 Mean :0.6423 Mean :0.8008 Mean :0.5586
## 3rd Qu.: 9591 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310
## Max. :7058221 Max. :0.8910 Max. :0.8940 Max. :0.8250
##
## TAXES GVA_AGROPEC GVA_INDUSTRY
## Min. : -14159 Min. : 0 Min. : 1
## 1st Qu.: 1305 1st Qu.: 4193 1st Qu.: 1725
## Median : 5107 Median : 20430 Median : 7425
## Mean : 118885 Mean : 47279 Mean : 175959
## 3rd Qu.: 22208 3rd Qu.: 51238 3rd Qu.: 41026
## Max. :117125387 Max. :1402282 Max. :63306755
##
## GVA_PUBLIC geom
## Min. : 7 MULTIPOLYGON :2772
## 1st Qu.: 17260 POLYGON :2797
## Median : 35865 epsg:4674 : 0
## Mean : 123784 +proj=long...: 0
## 3rd Qu.: 89256
## Max. :41902893
##
Next, we can see that the data range is very big for GDP_CAPITA, GVA_AGROPEC, GVA_INDUSTRY, GVA_PUBLIC, TAXES and IBGE_15-59.
brazil_muni_var_o$GDP_CAPITA.std <- normalize(brazil_muni_var_o$GDP_CAPITA)
brazil_muni_var_o$GVA_AGROPEC.std <- normalize(brazil_muni_var_o$GVA_AGROPEC)
brazil_muni_var_o$GVA_INDUSTRY.std <- normalize(brazil_muni_var_o$GVA_INDUSTRY)
brazil_muni_var_o$GVA_PUBLIC.std <- normalize(brazil_muni_var_o$GVA_PUBLIC)
brazil_muni_var_o$TAXES.std <- normalize(brazil_muni_var_o$TAXES)
brazil_muni_var_o$IBGE_ECONOMY_ACTIVE.std <- normalize(brazil_muni_var_o$`IBGE_15-59`)
As the 0 values that are being standardised may return Inf values, we will change those to 0.
brazil_muni_var_o[is.na(brazil_muni_var_o)] = 0
We remove the columns that we no longer need
brazil_muni_std <- subset(brazil_muni_var_o, select = -c(`GDP_CAPITA`,`GVA_AGROPEC`, `GVA_INDUSTRY`, `GVA_PUBLIC`,`TAXES`,`IBGE_15-59`))
For the subsequent analysis, we will only use code muni to identify each row and hence, we will no longer need state and name_muni. We remove these 2.
brazil_muni_code <- subset(brazil_muni_std, select = -c(`STATE`, `name_muni`))
We need to change the rows by code_muni instead of row number and remove the geometry by using the code chunk below
row.names(brazil_muni_code) <- brazil_muni_code$"code_muni"
brazil_muni_code_var <- select(brazil_muni_code, c(2:ncol(brazil_muni_code))) %>% st_set_geometry(NULL)
IDHM_Renda IDHM_Longevidade IDHM_Educacao GVA_AGROPEC GVA_INDUSTRY GVA_PUBLIC
brazil_muni.mlr <- lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + GVA_PUBLIC.std + TAXES.std +IBGE_ECONOMY_ACTIVE.std, data=brazil_muni_code_var)
summary(brazil_muni.mlr)
##
## Call:
## lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade +
## IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + GVA_PUBLIC.std +
## TAXES.std + IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49455 -0.01995 -0.00834 0.00534 0.82985
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10703 0.01262 -8.484 < 2e-16 ***
## IDHM_Renda 0.37800 0.01911 19.783 < 2e-16 ***
## IDHM_Longevidade -0.13779 0.02396 -5.750 9.39e-09 ***
## IDHM_Educacao 0.04642 0.01291 3.595 0.000327 ***
## GVA_AGROPEC.std 0.18801 0.01119 16.809 < 2e-16 ***
## GVA_INDUSTRY.std 2.09088 0.07596 27.527 < 2e-16 ***
## GVA_PUBLIC.std 0.02381 0.09381 0.254 0.799677
## TAXES.std 0.74150 0.09776 7.585 3.88e-14 ***
## IBGE_ECONOMY_ACTIVE.std -2.64081 0.11534 -22.895 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05099 on 5560 degrees of freedom
## Multiple R-squared: 0.3909, Adjusted R-squared: 0.39
## F-statistic: 446 on 8 and 5560 DF, p-value: < 2.2e-16
With reference to the report above, GVA_PUBLIC.std is statistically insignificant and thus we remove it.
brazil_muni.mlr2 <- lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std, data=brazil_muni_code_var)
summary(brazil_muni.mlr2)
##
## Call:
## lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade +
## IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std +
## IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49709 -0.01995 -0.00835 0.00535 0.82990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10699 0.01261 -8.482 < 2e-16 ***
## IDHM_Renda 0.37821 0.01909 19.815 < 2e-16 ***
## IDHM_Longevidade -0.13803 0.02394 -5.765 8.61e-09 ***
## IDHM_Educacao 0.04649 0.01291 3.602 0.000318 ***
## GVA_AGROPEC.std 0.18803 0.01118 16.812 < 2e-16 ***
## GVA_INDUSTRY.std 2.08843 0.07533 27.722 < 2e-16 ***
## TAXES.std 0.75257 0.08748 8.603 < 2e-16 ***
## IBGE_ECONOMY_ACTIVE.std -2.62689 0.10145 -25.894 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.05099 on 5561 degrees of freedom
## Multiple R-squared: 0.3909, Adjusted R-squared: 0.3901
## F-statistic: 509.8 on 7 and 5561 DF, p-value: < 2.2e-16
As seen above, it is clear that all the indepent variables are statistically significant as their p value are all less than the alpha value of 0.05.
ols_regress(brazil_muni.mlr2)
## Model Summary
## --------------------------------------------------------------
## R 0.625 RMSE 0.051
## R-Squared 0.391 Coef. Var 88.525
## Adj. R-Squared 0.390 MSE 0.003
## Pred R-Squared 0.320 MAE 0.025
## --------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ----------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ----------------------------------------------------------------------
## Regression 9.277 7 1.325 509.815 0.0000
## Residual 14.456 5561 0.003
## Total 23.734 5568
## ----------------------------------------------------------------------
##
## Parameter Estimates
## -----------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -----------------------------------------------------------------------------------------------------
## (Intercept) -0.107 0.013 -8.482 0.000 -0.132 -0.082
## IDHM_Renda 0.378 0.019 0.480 19.815 0.000 0.341 0.416
## IDHM_Longevidade -0.138 0.024 -0.107 -5.765 0.000 -0.185 -0.091
## IDHM_Educacao 0.046 0.013 0.067 3.602 0.000 0.021 0.072
## GVA_AGROPEC.std 0.188 0.011 0.182 16.812 0.000 0.166 0.210
## GVA_INDUSTRY.std 2.088 0.075 0.638 27.722 0.000 1.941 2.236
## TAXES.std 0.753 0.087 0.168 8.603 0.000 0.581 0.924
## IBGE_ECONOMY_ACTIVE.std -2.627 0.101 -0.713 -25.894 0.000 -2.826 -2.428
## -----------------------------------------------------------------------------------------------------
As seen from the model summary, the model explains 39.1% of the variation in GDP_CAPITAL. IDHM_Longevidade and IBGE_ECONOMY_ACTIVE.std have a negative relationship with GDP_CAPITAL while the rest have a positive relationship with GDP_CAPITAL.
ols_vif_tol(brazil_muni.mlr2)
## Variables Tolerance VIF
## 1 IDHM_Renda 0.1865197 5.361363
## 2 IDHM_Longevidade 0.3167394 3.157170
## 3 IDHM_Educacao 0.3120479 3.204637
## 4 GVA_AGROPEC.std 0.9337639 1.070935
## 5 GVA_INDUSTRY.std 0.2065560 4.841302
## 6 TAXES.std 0.2867399 3.487481
## 7 IBGE_ECONOMY_ACTIVE.std 0.1442663 6.931627
As seeen above, all VIF are below 10 and thus, there are no signs of multicollinearity.
ols_plot_resid_fit(brazil_muni.mlr2)
The figure above reveals that most of the data points are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.
Lastly, the code chunk below uses ols_plot_resid_hist() of olsrr package to perform normality assumption test.
ols_plot_resid_hist(brazil_muni.mlr2)
The figure reveals that the residual of the multiple linear regression model resemble normal distribution.
Ho = The residuals in brazil are randomly distributed.
H1= The residuals in brazil are not randomly distributed.
Confidence interval: 95%
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05.
mlr.output <- as.data.frame(brazil_muni.mlr2$residuals)
head(mlr.output)
## brazil_muni.mlr2$residuals
## 1100015 -0.02989087
## 1100023 -0.03995615
## 1100031 -0.01002947
## 1100049 -0.04115962
## 1100056 -0.01279513
## 1100064 -0.02770915
brazil_muni.res.sf <- cbind(brazil_muni_std,
brazil_muni.mlr2$residuals) %>%
rename(`MLR_RES` = `brazil_muni.mlr2.residuals`)
brazil_muni.sp <- as_Spatial(brazil_muni.res.sf)
brazil_muni.sp
## class : SpatialPolygonsDataFrame
## features : 5569
## extent : -73.99045, -28.83594, -33.75118, 5.271841 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## variables : 13
## names : code_muni, name_muni, STATE, IDHM_Renda, IDHM_Longevidade, IDHM_Educacao, GDP_CAPITA.std, GVA_AGROPEC.std, GVA_INDUSTRY.std, GVA_PUBLIC.std, TAXES.std, IBGE_ECONOMY_ACTIVE.std, MLR_RES
## min values : 1100015, Abadia De Goiás, AC, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.497089956814411
## max values : 5300108, Zortéa, TO, 0.891, 0.894, 0.825, 1, 1, 1, 1, 1, 1, 0.829896826754127
Task 3: Prepare a choropleth map showing the distribution of the residual of the GDP per capita.
tm_shape(brazil_muni.res.sf) +
tm_fill(col = "MLR_RES",
alpha = 0.6, style="quantile")
The plot above revealed that there is a sign of spatial autocorrelation. Hence, we will perform Moran’s I test to analyse if there is any spatial autocorrelation.
nb <- poly2nb(brazil_muni.sp, queen=TRUE)
summary(nb)
## Neighbour list object:
## Number of regions: 5569
## Number of nonzero links: 31796
## Percentage nonzero weights: 0.1025222
## Average number of links: 5.709463
## 2 regions with no links:
## 1526 3500
## Link number distribution:
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 2 14 102 387 1015 1304 1137 728 432 227 117 48 30 16 6
## 16 22
## 3 1
## 14 least connected regions:
## 279 606 1173 3180 3274 3713 4350 4408 4628 4642 4712 5000 5150 5513 with 1 link
## 1 most connected region:
## 3830 with 22 links
As we can see, Queens method is not appropriate as there are 2 with 0 connected regions.
coords <- coordinates(brazil_muni.sp)
k1 <- knn2nb(knearneigh(coords,k=1))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.386 11.747 15.880 21.372 23.968 369.663
The summary report shows that the largest first nearest neighbour distance is 369.663, so using this as the upper threshold gives certainty that all units will have at least 1 neighbour.
nb<- dnearneigh(coords, 0, 369.663 , longlat = TRUE)
nb
## Neighbour list object:
## Number of regions: 5569
## Number of nonzero links: 3033960
## Percentage nonzero weights: 9.782625
## Average number of links: 544.7944
nb_lw <- nb2listw(nb, zero.policy=TRUE, style = 'W')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5569
## Number of nonzero links: 3033960
## Percentage nonzero weights: 9.782625
## Average number of links: 544.7944
## Link number distribution:
##
## 1 4 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## 1 1 1 2 4 5 2 5 13 6 8 5 7 9 5 4 2 4
## 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42
## 8 9 6 2 9 6 4 4 1 4 2 1 2 1 2 5 4 4
## 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 2 2 5 3 7 5 5 5 7 3 7 9 8 8 6 12 6 4
## 61 62 63 64 65 66 67 69 70 71 72 73 74 75 76 77 78 80
## 1 2 3 1 4 2 4 2 2 5 3 4 3 2 7 2 3 5
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 96 97 98 100
## 3 3 1 3 2 2 1 5 2 2 3 1 1 4 1 3 1 2
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 118 119
## 4 2 1 2 1 1 4 3 2 1 4 5 2 4 4 6 1 4
## 120 121 123 124 125 126 128 129 130 131 132 133 134 135 136 137 138 139
## 2 3 2 1 5 2 2 1 2 6 3 1 1 6 4 3 1 4
## 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157
## 2 3 6 5 6 11 5 5 7 5 7 7 6 8 2 9 5 7
## 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
## 2 5 2 7 10 8 5 3 4 9 4 4 4 7 2 6 4 2
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194
## 3 1 1 5 2 3 1 1 3 2 1 1 3 7 6 4 5 1
## 195 196 197 198 200 201 202 203 204 205 206 207 208 209 210 211 212 213
## 1 2 1 2 4 2 4 2 1 1 1 2 5 5 2 3 7 3
## 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231
## 6 6 3 1 2 6 4 4 8 1 3 6 5 4 5 2 6 2
## 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
## 4 4 1 9 3 4 5 3 8 2 5 5 7 3 10 6 8 3
## 250 251 252 253 254 255 256 257 258 259 260 262 263 264 265 266 267 268
## 4 6 5 4 6 5 10 5 3 1 8 7 6 10 4 2 6 10
## 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286
## 4 4 5 8 8 6 11 9 3 6 7 8 10 3 3 7 5 5
## 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304
## 5 7 9 6 2 6 6 3 5 2 4 3 5 3 8 6 5 4
## 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322
## 4 9 3 5 5 3 7 5 4 6 4 6 7 5 3 7 11 7
## 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## 3 7 5 5 4 4 5 9 5 6 5 5 6 5 7 6 6 5
## 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358
## 5 3 4 4 6 5 7 4 5 6 11 2 2 3 4 3 6 3
## 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376
## 4 10 6 5 4 9 8 4 7 5 9 2 5 6 5 5 8 7
## 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394
## 10 7 8 9 4 9 7 9 13 11 15 11 6 7 8 4 9 8
## 395 396 397 398 399 400 401 402 403 404 405 406 407 408 410 411 412 413
## 5 6 8 3 5 10 7 5 10 7 3 3 5 4 7 8 3 1
## 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431
## 9 4 5 2 4 6 6 6 7 7 5 6 1 1 4 9 4 4
## 432 433 434 435 436 437 438 439 440 441 442 443 444 445 447 448 449 450
## 3 3 3 3 3 4 7 1 2 4 7 4 6 2 4 2 4 1
## 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468
## 3 4 3 9 5 4 4 3 4 3 8 2 3 5 6 4 2 3
## 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 485 486 487
## 2 3 2 2 2 2 5 2 4 5 3 3 2 5 3 6 5 3
## 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505
## 8 5 4 7 4 1 6 1 1 8 7 4 6 2 3 3 3 1
## 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523
## 6 2 4 4 3 5 6 5 6 2 8 4 3 7 4 8 4 3
## 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541
## 5 6 3 3 3 1 5 2 5 10 6 5 6 8 2 6 5 3
## 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559
## 5 6 2 1 7 4 7 4 7 5 5 8 6 8 7 10 2 9
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577
## 5 7 4 5 13 6 13 6 3 7 5 10 8 7 8 7 10 5
## 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595
## 9 7 9 7 8 5 8 8 8 6 8 6 9 9 5 4 7 8
## 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613
## 5 8 10 7 8 8 9 7 7 8 9 8 8 8 7 8 5 5
## 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631
## 4 5 4 5 4 5 6 14 7 7 8 10 13 5 5 11 9 7
## 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649
## 10 6 6 10 7 11 11 8 10 8 8 10 10 10 7 9 3 7
## 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667
## 9 7 14 9 11 10 6 12 12 8 12 3 10 10 15 6 6 11
## 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685
## 9 6 8 6 7 11 8 6 6 14 9 16 10 11 11 8 11 7
## 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703
## 9 8 9 12 12 9 12 6 13 6 12 9 10 7 4 13 8 16
## 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721
## 17 7 10 8 8 6 11 10 10 6 12 7 8 15 10 13 7 9
## 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739
## 9 11 11 11 4 11 6 10 11 5 10 12 11 14 9 9 11 13
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757
## 7 5 9 6 7 12 15 9 6 9 13 12 6 8 7 12 10 13
## 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775
## 4 9 9 6 4 11 9 7 11 10 10 8 11 12 5 10 6 8
## 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793
## 6 16 8 7 5 10 9 9 7 5 6 11 11 9 6 7 12 4
## 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811
## 11 6 3 7 6 5 12 4 9 8 14 9 5 9 6 11 5 7
## 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829
## 15 5 11 7 6 7 8 7 9 7 10 10 8 9 18 8 8 9
## 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847
## 6 7 8 7 9 8 6 9 1 8 6 7 9 10 4 11 8 6
## 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865
## 7 4 6 9 13 12 10 8 10 10 7 4 6 7 4 13 7 9
## 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883
## 9 9 12 6 6 11 8 4 11 5 10 8 10 8 5 5 3 10
## 884 885 886 887 888 889 890 891 892 893 894 895 897 898 899 900 901 902
## 6 13 2 11 7 8 4 3 3 10 9 6 10 7 9 3 4 9
## 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920
## 6 5 6 10 7 5 7 5 9 8 4 6 6 6 4 7 7 5
## 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938
## 5 4 4 6 7 6 10 2 7 9 3 4 5 1 6 5 4 5
## 939 940 941 942 945 947 948 949 950 951 953 954 956 957 958 962 964 972
## 3 2 3 2 1 3 1 1 1 1 1 2 1 1 1 1 1 1
## 973 974
## 1 1
## 1 least connected region:
## 1526 with 1 link
## 1 most connected region:
## 4159 with 974 links
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 5569 31013761 5569 45.62359 22403.91
As we can see from the above, the average number of links is 544, which is way too many. Hence, we will try the k method instead by determining number of neighbours.
We use the number 8 as for for values that are not normally distributed,it is important to evaluate each feature within the cnotext of at least 8 neighbours as a rule of thumb.
brazil_muni_8 <- knn2nb(knearneigh(coords, k=8))
brazil_muni_8
## Neighbour list object:
## Number of regions: 5569
## Number of nonzero links: 44552
## Percentage nonzero weights: 0.1436524
## Average number of links: 8
## Non-symmetric neighbours list
Next, we need to assign weights to each neighboring polygon. In our case, each neighboring polygon will be assigned equal weight (style=“W”)
nb_lw_8<- nb2listw(brazil_muni_8, style="W", zero.policy = TRUE)
nb_lw_8
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5569
## Number of nonzero links: 44552
## Percentage nonzero weights: 0.1436524
## Average number of links: 8
## Non-symmetric neighbours list
##
## Weights style: W
## Weights constants summary:
## n nn S0 S1 S2
## W 5569 31013761 5569 1279.25 22580.69
lm.morantest(brazil_muni.mlr2, nb_lw_8)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade
## + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std +
## IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
## weights: nb_lw_8
##
## Moran I statistic standard deviate = 10.607, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 6.746011e-02 -5.030635e-04 4.105778e-05
The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value is less than 0.00000000000000022 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed.
Since the Observed Global Moran I = 0.06746011 which is greater than 0, we can infer than the residuals resemble cluster distribution.
bw.fixed <- bw.gwr(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std, data=brazil_muni.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=TRUE)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 34.70094 CV score: 100.0252
## Fixed bandwidth: 21.45065 CV score: 562.666
## Fixed bandwidth: 42.89007 CV score: 44.72794
## Fixed bandwidth: 47.95123 CV score: 38.81675
## Fixed bandwidth: 51.07919 CV score: 36.50962
## Fixed bandwidth: 53.01239 CV score: 35.44099
## Fixed bandwidth: 54.20716 CV score: 34.86737
## Fixed bandwidth: 54.94558 CV score: 34.54285
## Fixed bandwidth: 55.40194 CV score: 34.33989
## Fixed bandwidth: 55.68399 CV score: 34.2242
## Fixed bandwidth: 55.85831 CV score: 34.14794
## Fixed bandwidth: 55.96604 CV score: 34.10478
## Fixed bandwidth: 56.03262 CV score: 34.07745
## Fixed bandwidth: 56.07377 CV score: 34.05968
## Fixed bandwidth: 56.0992 CV score: 34.05091
## Fixed bandwidth: 56.11492 CV score: 34.04379
## Fixed bandwidth: 56.12464 CV score: 34.04201
## Fixed bandwidth: 56.13064 CV score: 34.03789
## Fixed bandwidth: 56.13435 CV score: 34.03687
## Fixed bandwidth: 56.13664 CV score: 34.03573
## Fixed bandwidth: 56.13806 CV score: 34.03487
## Fixed bandwidth: 56.13894 CV score: 34.03424
## Fixed bandwidth: 56.13948 CV score: 34.03322
## Fixed bandwidth: 56.13981 CV score: 34.03355
## Fixed bandwidth: 56.13927 CV score: 34.03504
## Fixed bandwidth: 56.13961 CV score: 34.03415
## Fixed bandwidth: 56.1394 CV score: 34.0339
## Fixed bandwidth: 56.13953 CV score: 34.03343
## Fixed bandwidth: 56.13945 CV score: 34.03438
gwr.fixed <- gwr.basic(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std , data=brazil_muni.sp, bw=bw.fixed, kernel = 'gaussian', longlat = TRUE)
The output is saved in a list of class “gwrm”. The code below can be used to display the model output.
gwr.fixed
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 13:48:18
## Call:
## gwr.basic(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade +
## IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std +
## IBGE_ECONOMY_ACTIVE.std, data = brazil_muni.sp, bw = bw.fixed,
## kernel = "gaussian", longlat = TRUE)
##
## Dependent (y) variable: GDP_CAPITA.std
## Independent variables: IDHM_Renda IDHM_Longevidade IDHM_Educacao GVA_AGROPEC.std GVA_INDUSTRY.std TAXES.std IBGE_ECONOMY_ACTIVE.std
## Number of data points: 5569
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.49709 -0.01995 -0.00835 0.00535 0.82990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.10699 0.01261 -8.482 < 2e-16 ***
## IDHM_Renda 0.37821 0.01909 19.815 < 2e-16 ***
## IDHM_Longevidade -0.13803 0.02394 -5.765 8.61e-09 ***
## IDHM_Educacao 0.04649 0.01291 3.602 0.000318 ***
## GVA_AGROPEC.std 0.18803 0.01118 16.812 < 2e-16 ***
## GVA_INDUSTRY.std 2.08843 0.07533 27.722 < 2e-16 ***
## TAXES.std 0.75257 0.08748 8.603 < 2e-16 ***
## IBGE_ECONOMY_ACTIVE.std -2.62689 0.10145 -25.894 < 2e-16 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.05099 on 5561 degrees of freedom
## Multiple R-squared: 0.3909
## Adjusted R-squared: 0.3901
## F-statistic: 509.8 on 7 and 5561 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 14.45645
## Sigma(hat): 0.05095893
## AIC: -17334.74
## AICc: -17334.71
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 56.13948
## Regression points: the same locations as observations are used.
## Distance metric: Great Circle distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu.
## Intercept -4.5602e+00 -1.6295e-01 -6.0632e-02 -6.4799e-03
## IDHM_Renda -2.7387e+00 5.3042e-02 1.1149e-01 2.1838e-01
## IDHM_Longevidade -2.2598e+00 -8.0280e-02 1.4249e-03 8.8746e-02
## IDHM_Educacao -2.0456e+00 2.5275e-03 3.5712e-02 1.0883e-01
## GVA_AGROPEC.std -9.7822e-01 7.6581e-02 1.5762e-01 2.7593e-01
## GVA_INDUSTRY.std -6.0174e+02 2.3876e+00 4.3738e+00 7.7333e+00
## TAXES.std -8.9748e+02 6.4132e-01 5.5406e+00 1.9686e+01
## IBGE_ECONOMY_ACTIVE.std -2.1930e+02 -1.1906e+01 -4.8706e+00 -2.1403e+00
## Max.
## Intercept 1.6545
## IDHM_Renda 5.2401
## IDHM_Longevidade 3.7370
## IDHM_Educacao 2.0845
## GVA_AGROPEC.std 2.3158
## GVA_INDUSTRY.std 228.6122
## TAXES.std 1382.9059
## IBGE_ECONOMY_ACTIVE.std 71.2297
## ************************Diagnostic information*************************
## Number of data points: 5569
## Effective number of parameters (2trace(S) - trace(S'S)): 1667.471
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 3901.529
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): -18253.76
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -20385.58
## Residual sum of squares: 6.622341
## R-square value: 0.7209733
## Adjusted R-square value: 0.6016897
##
## ***********************************************************************
## Program stops at: 2020-05-31 13:49:03
As seen from the model summary, the model explains 39.1% of the variation in GDP_CAPITAL (Similar to the MLR model). IDHM_Longevidade and IBGE_ECONOMY_ACTIVE.std have a negative relationship with GDP_CAPITAL while the rest have a positive relationship with GDP_CAPITAL.
With the model, we will prepare a series of choropleth maps showing the outputs of the geographically weighted regression model.
brazil_muni.sf.fixed <- st_as_sf(gwr.fixed$SDF) %>%
st_transform(crs=4674)
brazil_muni.sf.fixed.4674 <- st_transform(brazil_muni.sf.fixed, 4674)
We join back to brazil_muni_std and not brazil_muni_code as we want to plot with the state and name of the municipality.
gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)
brazil_muni_code.fixed <- cbind(brazil_muni_std, as.matrix(gwr.fixed.output))
brazil_muni_code.fixed
## Simple feature collection with 5569 features and 42 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 5.271841
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni STATE IDHM_Renda IDHM_Longevidade
## 1 1100015 Alta Floresta D'oeste RO 0.657 0.763
## 2 1100023 Ariquemes RO 0.716 0.806
## 3 1100031 Cabixi RO 0.650 0.757
## 4 1100049 Cacoal RO 0.727 0.821
## 5 1100056 Cerejeiras RO 0.688 0.799
## 6 1100064 Colorado Do Oeste RO 0.676 0.814
## 7 1100072 Corumbiara RO 0.630 0.774
## 8 1100080 Costa Marques RO 0.616 0.751
## 9 1100098 Espigão D'oeste RO 0.691 0.819
## 10 1100106 Guajará-Mirim RO 0.663 0.823
## IDHM_Educacao GDP_CAPITA.std GVA_AGROPEC.std GVA_INDUSTRY.std
## 1 0.526 0.04990125 0.11848071 4.855722e-07
## 2 0.600 0.05595688 0.10345193 5.578592e-03
## 3 0.559 0.05783450 0.04213210 7.302128e-05
## 4 0.620 0.06081357 0.13407054 3.634820e-03
## 5 0.602 0.06270891 0.03771807 3.545427e-04
## 6 0.584 0.04244695 0.05183320 4.397635e-07
## 7 0.473 0.07657810 0.09425845 1.584682e-04
## 8 0.493 0.02861561 0.04461417 1.045577e-04
## 9 0.536 0.04321665 0.08232194 8.978652e-04
## 10 0.519 0.04026828 0.03560563 4.618831e-04
## GVA_PUBLIC.std TAXES.std IBGE_ECONOMY_ACTIVE.std Intercept
## 1 3.405978e-03 0.0003188051 0.0012783675 -0.45132447
## 2 1.405826e-02 0.0019656423 0.0065172230 -0.09781247
## 3 9.554075e-04 0.0001209158 0.0002468044 -1.32464677
## 4 1.157999e-02 0.0017850396 0.0059543333 -0.15336026
## 5 2.374261e-03 0.0001213652 0.0012717086 -0.58840318
## 6 2.405381e-03 0.0002785126 0.0012469148 -1.08752797
## 7 1.322334e-03 0.0001209434 0.0002387287 -0.38461707
## 8 1.998908e-06 0.0001209290 0.0006174360 -0.08165977
## 9 4.074391e-03 0.0004806875 0.0018401237 -0.11657488
## 10 5.811294e-06 0.0001216977 0.0030613946 -0.08789906
## IDHM_Renda.1 IDHM_Longevidade.1 IDHM_Educacao.1 GVA_AGROPEC.std.1
## 1 0.2697896 0.388869466 0.01648041 0.31956120
## 2 -0.1652060 0.197892832 0.18895694 0.13452448
## 3 4.1258596 -0.713026931 -1.33793449 -0.12661030
## 4 0.1635282 0.091674590 0.03470668 0.01455588
## 5 1.3440149 0.002231064 -0.37911191 -0.04818984
## 6 3.1671235 -0.522040551 -0.90262790 -0.02113617
## 7 0.8665679 -0.049839668 -0.13349381 0.16077721
## 8 0.1918322 0.037720715 -0.07902682 -0.08223659
## 9 0.1591303 0.048467196 0.04089547 -0.07682944
## 10 0.0317533 0.168862889 -0.05667693 -0.03970428
## GVA_INDUSTRY.std.1 TAXES.std.1 IBGE_ECONOMY_ACTIVE.std.1 y
## 1 8.7130004 35.727890 -25.1872324 0.04990125
## 2 0.4165859 15.768801 -7.5918675 0.05595688
## 3 -7.3105722 95.438920 -64.0063542 0.05783450
## 4 4.2042029 15.228903 -8.1907680 0.06081357
## 5 5.9505082 32.844711 -42.5604358 0.06270891
## 6 7.2829857 59.658946 -61.4852516 0.04244695
## 7 9.6290605 25.599855 -35.9288499 0.07657810
## 8 5.6726196 50.507181 -0.3816103 0.02861561
## 9 2.9296014 7.898972 -3.7560315 0.04321665
## 10 11.5886168 9.857580 -2.4764401 0.04026828
## yhat residual CV_Score Stud_residual Intercept_SE
## 1 0.04836112 1.540126e-03 0 0.1606131686 0.5698745
## 2 0.05453451 1.422371e-03 0 0.1468756953 0.4491582
## 3 0.05937002 -1.535518e-03 0 -0.2487859180 0.5235482
## 4 0.05795436 2.859202e-03 0 0.2157335183 0.5618656
## 5 0.05999013 2.718783e-03 0 0.4542802315 0.6126895
## 6 0.04022839 2.218560e-03 0 0.4111803357 0.5163537
## 7 0.07080164 5.776464e-03 0 0.4514660347 0.5709999
## 8 0.02867325 -5.763875e-05 0 -0.1631089713 0.8033834
## 9 0.04818981 -4.973160e-03 0 -0.3715237955 0.8380354
## 10 0.04026939 -1.106600e-06 0 -0.0009077684 0.6283788
## IDHM_Renda_SE IDHM_Longevidade_SE IDHM_Educacao_SE GVA_AGROPEC.std_SE
## 1 0.6120484 0.6879137 0.3582075 0.5807456
## 2 1.0466128 0.8612362 0.4565096 0.4220339
## 3 1.3952096 0.7100464 1.0150291 0.3467717
## 4 0.5114617 0.7044159 0.3960576 0.4998172
## 5 1.0223463 0.7102789 0.6395730 0.6956278
## 6 1.0475005 0.7152289 0.8488420 0.2415083
## 7 0.7372813 0.7240281 0.4743384 0.5379428
## 8 1.1525610 1.4384518 0.5179586 0.7375356
## 9 0.6296449 1.0320657 0.4907693 0.5220070
## 10 1.0129753 1.0641209 0.5205934 0.5142687
## GVA_INDUSTRY.std_SE TAXES.std_SE IBGE_ECONOMY_ACTIVE.std_SE
## 1 28.31909 96.67039 42.20039
## 2 11.92998 54.02517 26.98894
## 3 63.88874 173.47374 44.44462
## 4 20.33337 51.19842 25.81117
## 5 45.30980 159.79695 47.57823
## 6 57.87462 159.17505 39.41955
## 7 40.41984 127.23353 40.31494
## 8 54.41224 234.07310 32.99651
## 9 16.16985 52.80571 25.98817
## 10 33.57885 44.80166 23.50994
## Intercept_TV IDHM_Renda_TV IDHM_Longevidade_TV IDHM_Educacao_TV
## 1 -0.7919717 0.44079772 0.56528817 0.04600800
## 2 -0.2177684 -0.15784827 0.22977765 0.41391671
## 3 -2.5301331 2.95716106 -1.00419769 -1.31812428
## 4 -0.2729483 0.31972707 0.13014271 0.08763038
## 5 -0.9603611 1.31463763 0.00314111 -0.59275782
## 6 -2.1061685 3.02350556 -0.72989296 -1.06336381
## 7 -0.6735852 1.17535585 -0.06883665 -0.28143160
## 8 -0.1016448 0.16643993 0.02622313 -0.15257364
## 9 -0.1391049 0.25273030 0.04696135 0.08332931
## 10 -0.1398823 0.03134657 0.15868769 -0.10886986
## GVA_AGROPEC.std_TV GVA_INDUSTRY.std_TV TAXES.std_TV
## 1 0.55026023 0.30767228 0.3695846
## 2 0.31875281 0.03491925 0.2918788
## 3 -0.36511144 -0.11442661 0.5501635
## 4 0.02912241 0.20676374 0.2974487
## 5 -0.06927532 0.13132938 0.2055403
## 6 -0.08751733 0.12584076 0.3748009
## 7 0.29887417 0.23822608 0.2012037
## 8 -0.11150185 0.10425265 0.2157752
## 9 -0.14718084 0.18117680 0.1495856
## 10 -0.07720532 0.34511654 0.2200271
## IBGE_ECONOMY_ACTIVE.std_TV Local_R2 geom
## 1 -0.59684838 0.9682039 POLYGON ((-62.19465 -11.827...
## 2 -0.28129549 0.9746579 POLYGON ((-62.53648 -9.7322...
## 3 -1.44013741 0.9935490 POLYGON ((-60.37075 -13.363...
## 4 -0.31733423 0.9506142 POLYGON ((-61.0008 -11.2973...
## 5 -0.89453595 0.9584837 POLYGON ((-61.49976 -13.005...
## 6 -1.55976551 0.9940479 POLYGON ((-60.50475 -12.966...
## 7 -0.89120430 0.9569631 POLYGON ((-61.34273 -12.666...
## 8 -0.01156517 0.9994902 POLYGON ((-63.71199 -11.650...
## 9 -0.14452850 0.9360049 POLYGON ((-60.94827 -10.988...
## 10 -0.10533589 0.9987951 POLYGON ((-65.37724 -10.431...
tm_shape(brazil_muni_code.fixed) +
tm_fill(col = "Local_R2",
#size = 0.15,
border.col = "gray60",
border.lwd = 1, id="name_muni")
The local R2 values determines have better prediction using the model. By plotting the Local R2 values by municipality, we can identify which municipalities’ GDP_CAPITAL is predicted well and not predicted well with GWRmodel. From the plot above, the highest R2 range is 0.8 to 1.0, Majority of the maps are within this range including the following regions: the North, East, West and Central of Brazil. This shows that the prediction of GDP_CAPITA with this model is more accurate in these regions. Even though the model is accurate for most parts of Brazil, we can see the colour fades out towards the South of Brazil. Hence, this model does not predict as well for the municipalities in those areas.
tm_shape(brazil_muni_code.fixed) +
tm_fill(col = "residual",
border.col = "gray60",
border.lwd = 1)
These residual values is the error that isnt explained by the model. From the above plot, the residual values range from -0.4 to 0.8. Across the municipalities in Brazil, most areas have a residual value ranging from -0.2 to 0.2. The high residuals can be seen in a small area in the North of Brazil, with a residual value of between range of 0.6 to 0.8. Hence, this shows that there is minimal error that isnt eplained by the model in most municipalities, with the exception of a few in the South.
tm_shape(brazil_muni_code.fixed) +
tm_fill(col = "Intercept_SE",
border.col = "gray60",
border.lwd=1)
Intercept_SE represents the standard error of the intercept. It determines if the estimated intercept is statistically significant from a hypothesized value, which in our case is 0.05. As seen from the map, there are some municipalities in the North-West of Brazil with a high intercept SE. Hence, this indicates that the estimated intercept in these municipalities are statisticaly insignificant and if we were to use the prediction model on these municipalities, we may need to re-estimate the model without the intercept term being present.