1 Overview

Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy.It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure below. The GDP per capita of the poorest municipality is R3190.6. On the other hand, the GDP per capita of the richest municipality is R314638. Half of the municipalities with GDP per capita less than R16000 and the top 25% municipalities earn R26155 and above.

In this take-home exercise, we will determine factors affecting the unequal development of Brazil at the municipality level by using the data provided. The 5 Tasks are as follows: 1. Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level. 2. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method. 3 Prepare a choropleth map showing the distribution of the residual of the GDP per capita. 4. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.

2 Installing Packages

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse','geobr','heatmaply' )
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

3 Data Wrangling

3.1 Importing data

3.1.1 Load Brazil factor

First, we load the factors that we like to analyze in Brazil

brazil = read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";") 
brazil
## # A tibble: 5,573 x 81
##    CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
##    <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>
##  1 Abad~ GO          0         6876             6876                0
##  2 Abad~ MG          0         6704             6704                0
##  3 Abad~ GO          0        15757            15609              148
##  4 Abae~ MG          0        22690            22690                0
##  5 Abae~ PA          0       141100           141040               60
##  6 Abai~ CE          0        10496            10496                0
##  7 Abaí~ BA          0         8316             8316                0
##  8 Abaré BA          0        17064            17064                0
##  9 Abat~ PR          0         7764             7764                0
## 10 Abdo~ SC          0         2653             2653                0
## # ... with 5,563 more rows, and 75 more variables: IBGE_DU <dbl>,
## #   IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>,
## #   IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>,
## #   `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## #   `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## #   IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## #   LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## #   AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## #   Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## #   MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>
nrow(brazil)
## [1] 5573

3.1.2 Load Brazil factor meta data

Next we import the meta data to better understand the factors.

data_dict = read_delim("data/aspatial/Data_Dictionary.csv", delim = ";") 
data_dict
## # A tibble: 96 x 6
##    FIELD      DESCRIPTION            REFERENCE UNIT  SOURCE           X6   
##    <chr>      <chr>                  <chr>     <chr> <chr>            <chr>
##  1 CITY       Name of the City       <NA>      <NA>  -                <NA> 
##  2 STATE      Name of the State      <NA>      <NA>  -                <NA> 
##  3 CAPITAL    1 if Capital of State  <NA>      <NA>  -                <NA> 
##  4 IBGE_RES_~ "Resident Population " 2010      -     https://sidra.i~ <NA> 
##  5 IBGE_RES_~ Resident Population B~ 2010      -     https://sidra.i~ <NA> 
##  6 IBGE_RES_~ Redident Population F~ 2010      -     https://sidra.i~ <NA> 
##  7 IBGE_DU    "Domestic Units Total~ 2010      -     https://sidra.i~ <NA> 
##  8 IBGE_DU_U~ "Domestic Units Urban~ 2010      -     https://sidra.i~ <NA> 
##  9 IBGE_DU_R~ Domestic Units Rural   2010      -     https://sidra.i~ <NA> 
## 10 IBGE_POP   Resident Population R~ 2010      -     https://sidra.i~ <NA> 
## # ... with 86 more rows
datasets <- list_geobr()
datasets
## # A tibble: 22 x 4
##    `function`       geography            years                       source
##    <chr>            <chr>                <chr>                       <chr> 
##  1 `read_country`   Country              1872, 1900, 1911, 1920, 19~ IBGE  
##  2 `read_region`    Region               2000, 2001, 2010, 2013, 20~ IBGE  
##  3 `read_state`     States               1872, 1900, 1911, 1920, 19~ IBGE  
##  4 `read_meso_regi~ Meso region          2000, 2001, 2010, 2013, 20~ IBGE  
##  5 `read_micro_reg~ Micro region         2000, 2001, 2010, 2013, 20~ IBGE  
##  6 `read_intermedi~ Intermediate region  2017                        IBGE  
##  7 `read_immediate~ Immediate region     2017                        IBGE  
##  8 `read_municipal~ Municipality         1872, 1900, 1911, 1920, 19~ IBGE  
##  9 `read_weighting~ Census weighting ar~ 2010                        IBGE  
## 10 `read_census_tr~ Census tract (setor~ 2000, 2010                  IBGE  
## # ... with 12 more rows

3.1.3 Extract 2016 municipality boundary file

# Municipality of Brazil
muni <- read_municipality(year=2016)
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |=================================================================| 100%
head(muni)
## Simple feature collection with 6 features and 4 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
##   code_muni             name_muni code_state abbrev_state
## 1   1100015 Alta Floresta D'oeste         11           RO
## 2   1100023             Ariquemes         11           RO
## 3   1100031                Cabixi         11           RO
## 4   1100049                Cacoal         11           RO
## 5   1100056            Cerejeiras         11           RO
## 6   1100064     Colorado Do Oeste         11           RO
##                             geom
## 1 POLYGON ((-62.19465 -11.827...
## 2 POLYGON ((-62.53648 -9.7322...
## 3 POLYGON ((-60.37075 -13.363...
## 4 POLYGON ((-61.0008 -11.2973...
## 5 POLYGON ((-61.49976 -13.005...
## 6 POLYGON ((-60.50475 -12.966...
nrow(muni)
## [1] 5572

Check st_crs of muni

st_crs(muni)
## Coordinate Reference System:
##   User input: SIRGAS 2000 
##   wkt:
## GEOGCRS["SIRGAS 2000",
##     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Latin America - SIRGAS 2000 by country"],
##         BBOX[-59.87,-122.19,32.72,-25.28]],
##     ID["EPSG",4674]]

As seen above, the coordinate system used is SIRGAS 2000. Which the EPSG code is 4674

3.2 Convert aspastial data to spatial data

First, we need to check the number of NA rows for long and lat before converting it to sf.

brazil_na <- brazil %>% filter(is.na(LONG) | is.na(LAT))
brazil_na
## # A tibble: 9 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>
## 1 Baln~ SC          0           NA               NA               NA
## 2 Lago~ RS          0           NA               NA               NA
## 3 Moju~ PA          0           NA               NA               NA
## 4 Para~ MS          0           NA               NA               NA
## 5 Pesc~ SC          0           NA               NA               NA
## 6 Pinh~ RS          0         2130             2130                0
## 7 Pint~ RS          0           NA               NA               NA
## 8 Sant~ BA          0         9648             9648                0
## 9 São ~ PE          0           NA               NA               NA
## # ... with 75 more variables: IBGE_DU <dbl>, IBGE_DU_URBAN <dbl>,
## #   IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>,
## #   `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>,
## #   `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## #   `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## #   IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## #   LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## #   AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## #   Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## #   MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>

3.2.1 Relace Long Lat values

As there are 9 rows with LONG/LAT value that are NA, we will need to replace and input their LONG LAT values. We will search and replace by STATE and CITY (some states may have same city names)

brazil$LONG[brazil$CITY == "Balneário Rincão" & brazil$STATE=="SC"] <- -49.2351
brazil$LAT[brazil$CITY == "Balneário Rincão" & brazil$STATE=="SC"] <- -28.8314

brazil$LONG[brazil$CITY == "Lagoa Dos Patos" & brazil$STATE=="RS"] <- -51.2500
brazil$LAT[brazil$CITY == "Lagoa Dos Patos" & brazil$STATE=="RS"] <- -31.1000

brazil$LONG[brazil$CITY == "Mojuí Dos Campos" & brazil$STATE=="PA"] <- -54.6425
brazil$LAT[brazil$CITY == "Mojuí Dos Campos" & brazil$STATE=="PA"] <- -2.6822

brazil$LONG[brazil$CITY == "Paraíso Das Águas" & brazil$STATE=="MS"] <- -53.0116
brazil$LAT[brazil$CITY == "Paraíso Das Águas" & brazil$STATE=="MS"] <- -19.0216
 
brazil$LONG[brazil$CITY == "Pescaria Brava" & brazil$STATE=="SC"] <- -48.8864
brazil$LAT[brazil$CITY == "Pescaria Brava" & brazil$STATE=="SC"] <- -228.3866

brazil$LONG[brazil$CITY == "Pinhal Da Serra" & brazil$STATE=="RS"] <- -51.1707
brazil$LAT[brazil$CITY == "Pinhal Da Serra" & brazil$STATE=="RS"] <- -27.8743

brazil$LONG[brazil$CITY == "Pinto Bandeira" & brazil$STATE=="RS"] <- -51.4503
brazil$LAT[brazil$CITY == "Pinto Bandeira" & brazil$STATE=="RS"] <- -29.0975

brazil$LONG[brazil$CITY == "Santa Terezinha" & brazil$STATE=="BA"] <- -39.6105
brazil$LAT[brazil$CITY == "Santa Terezinha" & brazil$STATE=="BA"] <- -12.7700

brazil$LONG[brazil$CITY == "São Caetano" & brazil$STATE=="PE"] <- -36.1361
brazil$LAT[brazil$CITY == "São Caetano" & brazil$STATE=="PE"] <- -8.3318

We then convert the data into a sf object and tranform the crs to 4674 to match it with muni.

brazil_3414 <- st_as_sf(brazil, coords = c("LONG","LAT"),crs=4326)
brazil_4674 <- st_transform(brazil_3414,4674)
head(brazil_4674)
## Simple feature collection with 6 features and 79 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -49.44055 ymin: -19.15585 xmax: -39.04755 ymax: -1.72347
## geographic CRS: SIRGAS 2000
## # A tibble: 6 x 80
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>
## 1 Abad~ GO          0         6876             6876                0
## 2 Abad~ MG          0         6704             6704                0
## 3 Abad~ GO          0        15757            15609              148
## 4 Abae~ MG          0        22690            22690                0
## 5 Abae~ PA          0       141100           141040               60
## 6 Abai~ CE          0        10496            10496                0
## # ... with 74 more variables: IBGE_DU <dbl>, IBGE_DU_URBAN <dbl>,
## #   IBGE_DU_RURAL <dbl>, IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>,
## #   `IBGE_5-9` <dbl>, `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>,
## #   `IBGE_60+` <dbl>, IBGE_PLANTED_AREA <dbl>,
## #   `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking 2010` <dbl>, IDHM <dbl>,
## #   IDHM_Renda <dbl>, IDHM_Longevidade <dbl>, IDHM_Educacao <dbl>,
## #   ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>, AREA <dbl>,
## #   REGIAO_TUR <chr>, CATEGORIA_TUR <chr>, ESTIMATED_POP <dbl>,
## #   RURAL_URBAN <chr>, GVA_AGROPEC <dbl>, GVA_INDUSTRY <dbl>,
## #   GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL ` <dbl>,
## #   TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>,
## #   Cars <dbl>, Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>,
## #   MAC <dbl>, `WAL-MART` <dbl>, POST_OFFICES <dbl>, geometry <POINT [°]>

3.3 Join Brazil to municipality boundary file

brazil_muni <- st_join(muni,brazil_4674,join = st_contains)

After joning, we check the row count again to ensure proper joining.

nrow(brazil_muni)
## [1] 5574

As seen above, we have 5574, which is more than what we should have (5572) as muni has 5572 rows. Hence, there are at least 2 records duplicated. We filter them out to identify the codes.

brazil_muni %>% group_by(code_muni) %>% tally() %>% filter(n>1)
## Simple feature collection with 2 features and 2 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -39.67875 ymin: -12.93082 xmax: -36.06152 ymax: -8.187331
## geographic CRS: SIRGAS 2000
## # A tibble: 2 x 3
##   code_muni     n                                                      geom
## *     <dbl> <int>                                            <GEOMETRY [°]>
## 1   2613107     2 POLYGON ((-36.11492 -8.195208, -36.08247 -8.22815, -36.0~
## 2   2928505     2 MULTIPOLYGON (((-39.56748 -12.55098, -39.56109 -12.55148~

We then print out the data to better understand why they are duplicated.

brazil_muni %>% filter(code_muni==2613107|code_muni==2928505)
## Simple feature collection with 4 features and 83 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -39.67875 ymin: -12.93082 xmax: -36.06152 ymax: -8.187331
## geographic CRS: SIRGAS 2000
##   code_muni       name_muni code_state abbrev_state            CITY STATE
## 1   2613107     São Caitano         26           PE     São Caetano    PE
## 2   2613107     São Caitano         26           PE     São Caitano    PE
## 3   2928505 Santa Terezinha         29           BA Santa Teresinha    BA
## 4   2928505 Santa Terezinha         29           BA Santa Terezinha    BA
##   CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU
## 1       0           NA                NA                NA      NA
## 2       0        35274             35274                 0   10718
## 3       0           NA                NA                NA      NA
## 4       0         9648              9648                 0    2891
##   IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4 IBGE_5-9 IBGE_10-14
## 1            NA            NA       NA     NA       NA       NA         NA
## 2          8363          2355    27047    438     1791     2401       2784
## 3            NA            NA       NA     NA       NA       NA         NA
## 4           734          2157     2332     40      126      191        217
##   IBGE_15-59 IBGE_60+ IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$
## 1         NA       NA                NA                     NA
## 2      16586     3047               783                    468
## 3         NA       NA                NA                     NA
## 4       1419      339               358                    496
##   IDHM Ranking 2010  IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao    ALT
## 1                NA    NA         NA               NA            NA     NA
## 2              4380 0.591      0.583            0.756         0.469 555.25
## 3              4493 0.590      0.549            0.804         0.459 222.51
## 4                NA    NA         NA               NA            NA     NA
##   PAY_TV FIXED_PHONES   AREA            REGIAO_TUR CATEGORIA_TUR
## 1     NA           NA     NA                  <NA>          <NA>
## 2    175          673 382.47                  <NA>          <NA>
## 3     92          125     NA Caminhos Do Jiquiriçá             D
## 4     NA           NA 719.26                  <NA>          <NA>
##   ESTIMATED_POP             RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 1            NA                    <NA>          NA           NA
## 2         37119 Intermediário Adjacente       17.34     34337.46
## 3         10345                    <NA>          NA           NA
## 4            NA         Rural Adjacente    13235.20      5398.61
##   GVA_SERVICES GVA_PUBLIC  GVA_TOTAL     TAXES       GDP POP_GDP
## 1           NA         NA          NA       NA        NA      NA
## 2    113666.63     133.73   299079.63 23230.19 322309.83   36895
## 3           NA         NA          NA       NA        NA      NA
## 4     17754.37   32630.97    69019.14  3149.33  72168.48   10619
##   GDP_CAPITA
## 1         NA
## 2    8735.87
## 3         NA
## 4    6796.16
##                                                               GVA_MAIN
## 1                                                                 <NA>
## 2 Administração, defesa, educação e saúde públicas e seguridade social
## 3                                                                 <NA>
## 4 Administração, defesa, educação e saúde públicas e seguridade social
##   MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1     53609740       NA     NA     NA     NA     NA     NA     NA     NA
## 2           NA      244      2      2     43      0      1     11    119
## 3     22225531       NA     NA     NA     NA     NA     NA     NA     NA
## 4           NA       74      2      1      4      0      0      3     37
##   COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q
## 1     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 2     10     15      2      0      3      4      4      4     12      1
## 3     NA     NA     NA     NA     NA     NA     NA     NA     NA     NA
## 4      0      3      1      0      0      1      2      2     12      2
##   COMP_R COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank
## 1     NA     NA     NA     NA     NA   NA          NA          NA      NA
## 2      1     10      0      0     NA   NA           0           1       0
## 3     NA     NA     NA     NA     NA   NA          NA          NA      NA
## 4      0      4      0      0     NA   NA          NA          NA      NA
##   Pu_Bank Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC
## 1      NA        NA        NA   NA          NA              NA   NA  NA
## 2       1         0  37171342 3619        5478               0   NA  NA
## 3      NA        NA        NA  810         453               0   NA  NA
## 4      NA        NA        NA   NA          NA              NA   NA  NA
##   WAL-MART POST_OFFICES                           geom
## 1       NA           NA POLYGON ((-36.11492 -8.1952...
## 2       NA            3 POLYGON ((-36.11492 -8.1952...
## 3       NA           NA MULTIPOLYGON (((-39.56748 -...
## 4       NA            1 MULTIPOLYGON (((-39.56748 -...

As seen above, its due to the fact that the original brazil dataset consists of duplicated data whereby the 2 of the cities are repeated twice. The spelling differs by one letter (São Caetano vs São Caitano) and (Santa Teresinha vs Santa Terezinha). I googled and they each represent the same place. Hence, we removed 1 of the 2 for both. We will remove the ones with GDP_CAPITA is NA. In this case, since we also need to remove those with GDP_CAPITA == NA for all datarows, we will simply remove all datasets with GDP_CAPITA == NA as they are irrelevant for the analysis.

brazil_muni_o <- brazil_muni %>% filter(!is.na(GDP_CAPITA))
nrow(brazil_muni_o)
## [1] 5569
summary(brazil_muni_o)
##    code_muni              name_muni      code_state    abbrev_state 
##  Min.   :1100015   Bom Jesus   :   5   31     : 853   MG     : 853  
##  1st Qu.:2512101   São Domingos:   5   35     : 645   SP     : 645  
##  Median :3146255   Bonito      :   4   43     : 497   RS     : 497  
##  Mean   :3253419   Santa Helena:   4   29     : 417   BA     : 417  
##  3rd Qu.:4119152   Santa Inês  :   4   41     : 399   PR     : 399  
##  Max.   :5300108   Santa Luzia :   4   42     : 294   SC     : 294  
##                    (Other)     :5543   (Other):2464   (Other):2464  
##      CITY              STATE              CAPITAL        
##  Length:5569        Length:5569        Min.   :0.000000  
##  Class :character   Class :character   1st Qu.:0.000000  
##  Mode  :character   Mode  :character   Median :0.000000  
##                                        Mean   :0.004848  
##                                        3rd Qu.:0.000000  
##                                        Max.   :1.000000  
##                                                          
##   IBGE_RES_POP      IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR 
##  Min.   :     805   Min.   :     805   Min.   :     0.0  
##  1st Qu.:    5235   1st Qu.:    5230   1st Qu.:     0.0  
##  Median :   10934   Median :   10926   Median :     0.0  
##  Mean   :   34278   Mean   :   34200   Mean   :    77.5  
##  3rd Qu.:   23424   3rd Qu.:   23390   3rd Qu.:    10.0  
##  Max.   :11253503   Max.   :11133776   Max.   :119727.0  
##  NA's   :4          NA's   :4          NA's   :4         
##     IBGE_DU        IBGE_DU_URBAN     IBGE_DU_RURAL      IBGE_POP       
##  Min.   :    239   Min.   :     60   Min.   :    3   Min.   :     174  
##  1st Qu.:   1572   1st Qu.:    874   1st Qu.:  487   1st Qu.:    2801  
##  Median :   3174   Median :   1846   Median :  931   Median :    6170  
##  Mean   :  10303   Mean   :   8859   Mean   : 1463   Mean   :   27595  
##  3rd Qu.:   6726   3rd Qu.:   4624   3rd Qu.: 1832   3rd Qu.:   15302  
##  Max.   :3576148   Max.   :3548433   Max.   :33809   Max.   :10463636  
##  NA's   :6         NA's   :6         NA's   :77      NA's   :4         
##      IBGE_1            IBGE_1-4         IBGE_5-9        IBGE_10-14    
##  Min.   :     0.0   Min.   :     5   Min.   :     7   Min.   :    12  
##  1st Qu.:    38.0   1st Qu.:   158   1st Qu.:   220   1st Qu.:   259  
##  Median :    92.0   Median :   376   Median :   516   Median :   588  
##  Mean   :   383.3   Mean   :  1544   Mean   :  2069   Mean   :  2381  
##  3rd Qu.:   232.0   3rd Qu.:   951   3rd Qu.:  1300   3rd Qu.:  1478  
##  Max.   :129464.0   Max.   :514794   Max.   :684443   Max.   :783702  
##  NA's   :4          NA's   :4        NA's   :4        NA's   :4       
##    IBGE_15-59         IBGE_60+       IBGE_PLANTED_AREA
##  Min.   :     94   Min.   :     29   Min.   :      0  
##  1st Qu.:   1734   1st Qu.:    341   1st Qu.:    911  
##  Median :   3841   Median :    722   Median :   3473  
##  Mean   :  18212   Mean   :   3004   Mean   :  14182  
##  3rd Qu.:   9628   3rd Qu.:   1724   3rd Qu.:  11201  
##  Max.   :7058221   Max.   :1293012   Max.   :1205669  
##  NA's   :4         NA's   :4                          
##  IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010      IDHM       
##  Min.   :      0        Min.   :   1      Min.   :0.4180  
##  1st Qu.:   2327        1st Qu.:1392      1st Qu.:0.5990  
##  Median :  13846        Median :2782      Median :0.6650  
##  Mean   :  57394        Mean   :2783      Mean   :0.6592  
##  3rd Qu.:  55623        3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :3274885        Max.   :5565      Max.   :0.8620  
##                         NA's   :5         NA's   :5       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         ALT          
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :     0.0  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:   169.7  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :   406.5  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :   894.0  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:   629.0  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :874579.0  
##  NA's   :5        NA's   :5        NA's   :5        NA's   :6         
##      PAY_TV           FIXED_PHONES          AREA          
##  Min.   :      1.0   Min.   :      3   Min.   :     3.57  
##  1st Qu.:     88.0   1st Qu.:    119   1st Qu.:   204.45  
##  Median :    247.0   Median :    328   Median :   416.59  
##  Mean   :   3095.3   Mean   :   6569   Mean   :  1516.14  
##  3rd Qu.:    815.2   3rd Qu.:   1151   3rd Qu.:  1026.44  
##  Max.   :2047668.0   Max.   :5543127   Max.   :159533.33  
##  NA's   :1           NA's   :1         NA's   :1          
##   REGIAO_TUR        CATEGORIA_TUR      ESTIMATED_POP     
##  Length:5569        Length:5569        Min.   :     786  
##  Class :character   Class :character   1st Qu.:    5453  
##  Mode  :character   Mode  :character   Median :   11591  
##                                        Mean   :   37442  
##                                        3rd Qu.:   25299  
##                                        Max.   :12176866  
##                                        NA's   :1         
##  RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Length:5569        Min.   :      0   Min.   :       1  
##  Class :character   1st Qu.:   4193   1st Qu.:    1725  
##  Mode  :character   Median :  20430   Median :    7425  
##                     Mean   :  47279   Mean   :  175959  
##                     3rd Qu.:  51238   3rd Qu.:   41026  
##                     Max.   :1402282   Max.   :63306755  
##                                                         
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL        
##  Min.   :        2   Min.   :       7   Min.   :       17  
##  1st Qu.:    10113   1st Qu.:   17260   1st Qu.:    42254  
##  Median :    31212   Median :   35865   Median :   119504  
##  Mean   :   489539   Mean   :  123784   Mean   :   833136  
##  3rd Qu.:   115448   3rd Qu.:   89256   3rd Qu.:   313988  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503  
##                                                            
##      TAXES                GDP               POP_GDP        
##  Min.   :   -14159   Min.   :       15   Min.   :     815  
##  1st Qu.:     1305   1st Qu.:    43707   1st Qu.:    5481  
##  Median :     5107   Median :   125196   Median :   11584  
##  Mean   :   118885   Mean   :   954740   Mean   :   37003  
##  3rd Qu.:    22208   3rd Qu.:   329717   3rd Qu.:   25088  
##  Max.   :117125387   Max.   :687035890   Max.   :12038175  
##                                                            
##    GDP_CAPITA       GVA_MAIN          MUN_EXPENDIT      
##  Min.   :  3191   Length:5569        Min.   :1.421e+06  
##  1st Qu.:  9062   Class :character   1st Qu.:1.574e+07  
##  Median : 15873   Mode  :character   Median :2.747e+07  
##  Mean   : 21128                      Mean   :1.044e+08  
##  3rd Qu.: 26155                      3rd Qu.:5.675e+07  
##  Max.   :314638                      Max.   :4.577e+10  
##                                      NA's   :1491       
##     COMP_TOT            COMP_A            COMP_B            COMP_C        
##  Min.   :     6.0   Min.   :   0.00   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000   1st Qu.:    3.00  
##  Median :   162.0   Median :   2.00   Median :  0.000   Median :   11.00  
##  Mean   :   906.9   Mean   :  18.26   Mean   :  1.852   Mean   :   73.45  
##  3rd Qu.:   448.0   3rd Qu.:   8.00   3rd Qu.:  2.000   3rd Qu.:   39.00  
##  Max.   :530446.0   Max.   :1948.00   Max.   :274.000   Max.   :31566.00  
##                                                                           
##      COMP_D             COMP_E            COMP_F             COMP_G      
##  Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00   Min.   :     1  
##  1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00   1st Qu.:    32  
##  Median :  0.0000   Median :  0.000   Median :    4.00   Median :    75  
##  Mean   :  0.4263   Mean   :  2.029   Mean   :   43.27   Mean   :   348  
##  3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00   3rd Qu.:   199  
##  Max.   :332.0000   Max.   :657.000   Max.   :25222.00   Max.   :150633  
##                                                                          
##      COMP_H          COMP_I             COMP_J             COMP_K        
##  Min.   :    0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    1   1st Qu.:    2.00   1st Qu.:    0.00   1st Qu.:    0.00  
##  Median :    7   Median :    7.00   Median :    1.00   Median :    0.00  
##  Mean   :   41   Mean   :   55.89   Mean   :   24.75   Mean   :   15.55  
##  3rd Qu.:   25   3rd Qu.:   24.00   3rd Qu.:    5.00   3rd Qu.:    2.00  
##  Max.   :19515   Max.   :29290.00   Max.   :38720.00   Max.   :23738.00  
##                                                                          
##      COMP_L             COMP_M            COMP_N             COMP_O      
##  Min.   :    0.00   Min.   :    0.0   Min.   :    0.00   Min.   :  0.00  
##  1st Qu.:    0.00   1st Qu.:    1.0   1st Qu.:    1.00   1st Qu.:  2.00  
##  Median :    0.00   Median :    4.0   Median :    4.00   Median :  2.00  
##  Mean   :   15.14   Mean   :   51.3   Mean   :   83.71   Mean   :  3.27  
##  3rd Qu.:    3.00   3rd Qu.:   13.0   3rd Qu.:   14.00   3rd Qu.:  3.00  
##  Max.   :14003.00   Max.   :49181.0   Max.   :76757.00   Max.   :204.00  
##                                                                          
##      COMP_P             COMP_Q             COMP_R       
##  Min.   :    0.00   Min.   :    0.00   Min.   :   0.00  
##  1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00  
##  Median :    6.00   Median :    3.00   Median :   2.00  
##  Mean   :   30.97   Mean   :   34.16   Mean   :  12.18  
##  3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00  
##  Max.   :16030.00   Max.   :22248.00   Max.   :6687.00  
##                                                         
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.61   Mean   :0   Mean   :  0.05028   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##                                                     NA's   :4682    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.383   Mean   :  2.829   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :4682      NA's   :2227       NA's   :2227      NA's   :2227    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602  
##  Median :2.00   Median :3.231e+07   Median :1.339e+08   Median :   1438  
##  Mean   :1.58   Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9862  
##  3rd Qu.:2.00   3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4087  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2227   NA's   :2227        NA's   :2227        NA's   :9        
##   Motorcycles      Wheeled_tractor         UBER           MAC         
##  Min.   :      4   Min.   :   0.000   Min.   :1      Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   1st Qu.:1      1st Qu.:  1.000  
##  Median :   1285   Median :   0.000   Median :1      Median :  2.000  
##  Mean   :   4880   Mean   :   5.756   Mean   :1      Mean   :  4.277  
##  3rd Qu.:   3296   3rd Qu.:   1.000   3rd Qu.:1      3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000   Max.   :1      Max.   :130.000  
##  NA's   :9         NA's   :9          NA's   :5444   NA's   :5403     
##     WAL-MART       POST_OFFICES               geom     
##  Min.   : 1.000   Min.   :  1.00   MULTIPOLYGON :2772  
##  1st Qu.: 1.000   1st Qu.:  1.00   POLYGON      :2797  
##  Median : 1.000   Median :  1.00   epsg:4674    :   0  
##  Mean   : 2.059   Mean   :  2.08   +proj=long...:   0  
##  3rd Qu.: 1.750   3rd Qu.:  2.00                       
##  Max.   :26.000   Max.   :225.00                       
##  NA's   :5467     NA's   :117
head(brazil_muni_o)
## Simple feature collection with 6 features and 83 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
##   code_muni             name_muni code_state abbrev_state
## 1   1100015 Alta Floresta D'oeste         11           RO
## 2   1100023             Ariquemes         11           RO
## 3   1100031                Cabixi         11           RO
## 4   1100049                Cacoal         11           RO
## 5   1100056            Cerejeiras         11           RO
## 6   1100064     Colorado Do Oeste         11           RO
##                    CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS
## 1 Alta Floresta D'Oeste    RO       0        24392             24392
## 2             Ariquemes    RO       0        90353             90240
## 3                Cabixi    RO       0         6313              6310
## 4                Cacoal    RO       0        78574             78536
## 5            Cerejeiras    RO       0        17029             17011
## 6     Colorado Do Oeste    RO       0        18591             18562
##   IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1
## 1                 0    7276          4306          2970    13804    203
## 2               113   27236         22971          4265    69068   1122
## 3                 3    1979           887          1092     2689     43
## 4                38   24045         19411          4634    61699    923
## 5                18    5346          4561           785    13755    200
## 6                29    5962          4441          1521    13404    189
##   IBGE_1-4 IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+ IBGE_PLANTED_AREA
## 1      869     1159       1281       9023     1269             18288
## 2     4383     6101       6870      46000     4592              7115
## 3      191      206        272       1742      235             41086
## 4     3599     4790       5722      42027     4638             18180
## 5      827     1129       1333       8976     1290             52326
## 6      822     1002       1196       8801     1394              6268
##   IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010  IDHM IDHM_Renda
## 1                 101470              3283 0.640      0.657
## 2                  33365              1847 0.700      0.716
## 3                 107975              3117 0.650      0.650
## 4                 174665              1377 0.718      0.727
## 5                 150755              2141 0.690      0.688
## 6                  21765              2326 0.685      0.676
##   IDHM_Longevidade IDHM_Educacao    ALT PAY_TV FIXED_PHONES    AREA
## 1            0.763         0.526 337.74    240          687 7067.03
## 2            0.806         0.600 138.69   2267         9191 4426.57
## 3            0.757         0.559 236.06     50          188 1314.35
## 4            0.821         0.620 177.45   1806         6491 3792.89
## 5            0.799         0.602 262.81    307         1215 2783.30
## 6            0.814         0.584 419.09    235         1107 1451.06
##                    REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1             Vale Do Guaporé             D         23167
## 2              Vale Do Jamari             C        106168
## 3             Vale Do Guaporé             D          5438
## 4 Br 364 - Caminhos De Rondon             C         84813
## 5                        <NA>          <NA>         16444
## 6                        <NA>          <NA>         16227
##               RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 1 Intermediário Adjacente   166143.38        31.27    114455.32  142727.55
## 2                  Urbano   145068.79    353163.07       879.97  589088.78
## 3         Rural Adjacente    59081.09      4623.27     24091.22   40041.59
## 4                  Urbano   188004.72    230109.20    846490.22  485242.29
## 5                  Urbano    52891.37     22445.48    175059.55   99495.64
## 6 Intermediário Adjacente    72684.77        28.37     85558.72  100799.66
##    GVA_TOTAL      TAXES       GDP POP_GDP GDP_CAPITA
## 1   454596.58  23186.17  477782.7   25506   18732.17
## 2  1967287.27 216095.93 2183383.2  105896   20618.18
## 3   127837.15      5.51  133345.4    6289   21202.96
## 4  1749846.43 194940.22 1944786.6   87877   22130.78
## 5      349.89     58.16  408047.8   17959   22721.08
## 6   287409.88  18466.33  305876.2   18639   16410.55
##                                                               GVA_MAIN
## 1 Administração, defesa, educação e saúde públicas e seguridade social
## 2                                                      Demais serviços
## 3 Administração, defesa, educação e saúde públicas e seguridade social
## 4                                                      Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Administração, defesa, educação e saúde públicas e seguridade social
##   MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1     50218466      508      3      1     38      7      4     26    270
## 2    201979608     2221     24     28    223      1     12     79   1054
## 3     17904387       60      0      0      7      0      0      1     32
## 4    168366053     1846     13      9    180      1      7     80    839
## 5     42641592      379      6      0     40      0      1      7    185
## 6     31611296      264      3      2     29      3      0     11    147
##   COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q
## 1     23     20      8      1      3     12     10      4     21     13
## 2     68    136     27     32     12     91    116      5     68     74
## 3      2      4      2      0      0      1      2      2      2      1
## 4     76    102     34     19     31     83     93      4     52    104
## 5     19     20      3      1      3     15      8      3     10     12
## 6     11     13      3      3      0      8      9      4      5      4
##   COMP_R COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank
## 1      6     38      0      0     NA   NA           1           2       1
## 2     28    143      0      0      1   78           2           4       2
## 3      0      4      0      0      1   27          NA          NA      NA
## 4     18    101      0      0      1   40           2           3       2
## 5      2     44      0      0      1   40           1           3       1
## 6      1      8      0      0     NA   NA           1           2       1
##   Pu_Bank Pr_Assets  Pu_Assets  Cars Motorcycles Wheeled_tractor UBER MAC
## 1       2  38958866  245288714  2464        9268               1   NA  NA
## 2       3 156637075 2487942240 17513       42664               1   NA  NA
## 3      NA        NA         NA   925        1838               0   NA  NA
## 4       3 145453086 2339395610 17561       38073               3   NA  NA
## 5       3  57939053  392087789  2948        6439               0   NA  NA
## 6       2  77276523  237735184  3158        6874               0   NA  NA
##   WAL-MART POST_OFFICES                           geom
## 1       NA            1 POLYGON ((-62.19465 -11.827...
## 2       NA            2 POLYGON ((-62.53648 -9.7322...
## 3       NA            1 POLYGON ((-60.37075 -13.363...
## 4       NA            2 POLYGON ((-61.0008 -11.2973...
## 5       NA            1 POLYGON ((-61.49976 -13.005...
## 6       NA            1 POLYGON ((-60.50475 -12.966...

3.4 Select variables for model

Lets review the data dictionary again.

data_dict
## # A tibble: 96 x 6
##    FIELD      DESCRIPTION            REFERENCE UNIT  SOURCE           X6   
##    <chr>      <chr>                  <chr>     <chr> <chr>            <chr>
##  1 CITY       Name of the City       <NA>      <NA>  -                <NA> 
##  2 STATE      Name of the State      <NA>      <NA>  -                <NA> 
##  3 CAPITAL    1 if Capital of State  <NA>      <NA>  -                <NA> 
##  4 IBGE_RES_~ "Resident Population " 2010      -     https://sidra.i~ <NA> 
##  5 IBGE_RES_~ Resident Population B~ 2010      -     https://sidra.i~ <NA> 
##  6 IBGE_RES_~ Redident Population F~ 2010      -     https://sidra.i~ <NA> 
##  7 IBGE_DU    "Domestic Units Total~ 2010      -     https://sidra.i~ <NA> 
##  8 IBGE_DU_U~ "Domestic Units Urban~ 2010      -     https://sidra.i~ <NA> 
##  9 IBGE_DU_R~ Domestic Units Rural   2010      -     https://sidra.i~ <NA> 
## 10 IBGE_POP   Resident Population R~ 2010      -     https://sidra.i~ <NA> 
## # ... with 86 more rows

First, we will decide which variables to use for our analysis. We will be choosing the following for our independent variables: IBGE_15-59 - Resident Population Regular Urban Planning - from 15 to 59 y.o IDHM_Renda - HDI GNI Index IDHM_Longevidade - HDI Life Expectancy index IDHM_Educacao - HDI Education index TAXES - Taxes GVA_AGROPEC - Gross Added Value - Agropecuary GVA_INDUSTRY - Gross Added Value - Industry GVA_SERVICES - Gross Added Value - Services GVA_PUBLIC - Gross Added Value - Public Services

We decide too omit other variables as they are irrelevant, and some of the variables re the overall variables for the variables that i have chosen. I chose IBGE_15-59 as it represents the economy active group. GVA_AGROPEC, GVA_INDUSTRY, GVA_SERVICES and GVA_PUBLIC are part of the economy drivers. i included HDI Life Expectancy index as according to a study from https://ideas.repec.org/a/adm/journl/v8y2019i6p74-79.html, the increase in life expectancy is accompanied with the increase in Gross Domestic Product (GDP) per capita income in a state. We have also included the population growth rate as another important factor contributing towards GDP. Taxes are included as Tax cuts boost demand by increasing disposable income and by encouraging businesses to hire and invest more. Hence this affects the GDP economically.

I included HDI Education index as it directly affects economic growth. An increase in workers’ educational level improves their human capital, increasing the productivity of these workers and the economy’s output.

brazil_muni_var <- subset(brazil_muni_o,select=c(`code_muni`,
                                                 `name_muni`,
                                                 `STATE`,
                                                 `GDP_CAPITA`, 
                                                 `IBGE_15-59`,
                                                 `IDHM_Renda`,
                                                 `IDHM_Longevidade`,
                                                 `IDHM_Educacao`,
                                                 `TAXES`,
                                                 `GVA_AGROPEC`,
                                                 `GVA_INDUSTRY`,
                                                 `GVA_SERVICES`,
                                                 `GVA_PUBLIC`
                                                 )) 

head(brazil_muni_var)
## Simple feature collection with 6 features and 13 fields
## geometry type:  POLYGON
## dimension:      XY
## bbox:           xmin: -63.61801 ymin: -13.6937 xmax: -60.33317 ymax: -9.66916
## geographic CRS: SIRGAS 2000
##   code_muni             name_muni STATE GDP_CAPITA IBGE_15-59 IDHM_Renda
## 1   1100015 Alta Floresta D'oeste    RO   18732.17       9023      0.657
## 2   1100023             Ariquemes    RO   20618.18      46000      0.716
## 3   1100031                Cabixi    RO   21202.96       1742      0.650
## 4   1100049                Cacoal    RO   22130.78      42027      0.727
## 5   1100056            Cerejeiras    RO   22721.08       8976      0.688
## 6   1100064     Colorado Do Oeste    RO   16410.55       8801      0.676
##   IDHM_Longevidade IDHM_Educacao     TAXES GVA_AGROPEC GVA_INDUSTRY
## 1            0.763         0.526  23186.17   166143.38        31.27
## 2            0.806         0.600 216095.93   145068.79    353163.07
## 3            0.757         0.559      5.51    59081.09      4623.27
## 4            0.821         0.620 194940.22   188004.72    230109.20
## 5            0.799         0.602     58.16    52891.37     22445.48
## 6            0.814         0.584  18466.33    72684.77        28.37
##   GVA_SERVICES GVA_PUBLIC                           geom
## 1    114455.32  142727.55 POLYGON ((-62.19465 -11.827...
## 2       879.97  589088.78 POLYGON ((-62.53648 -9.7322...
## 3     24091.22   40041.59 POLYGON ((-60.37075 -13.363...
## 4    846490.22  485242.29 POLYGON ((-61.0008 -11.2973...
## 5    175059.55   99495.64 POLYGON ((-61.49976 -13.005...
## 6     85558.72  100799.66 POLYGON ((-60.50475 -12.966...
summary(brazil_muni_var)
##    code_muni              name_muni       STATE             GDP_CAPITA    
##  Min.   :1100015   Bom Jesus   :   5   Length:5569        Min.   :  3191  
##  1st Qu.:2512101   São Domingos:   5   Class :character   1st Qu.:  9062  
##  Median :3146255   Bonito      :   4   Mode  :character   Median : 15873  
##  Mean   :3253419   Santa Helena:   4                      Mean   : 21128  
##  3rd Qu.:4119152   Santa Inês  :   4                      3rd Qu.: 26155  
##  Max.   :5300108   Santa Luzia :   4                      Max.   :314638  
##                    (Other)     :5543                                      
##    IBGE_15-59        IDHM_Renda     IDHM_Longevidade IDHM_Educacao   
##  Min.   :     94   Min.   :0.4000   Min.   :0.6720   Min.   :0.2070  
##  1st Qu.:   1734   1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900  
##  Median :   3841   Median :0.6540   Median :0.8080   Median :0.5600  
##  Mean   :  18212   Mean   :0.6429   Mean   :0.8016   Mean   :0.5591  
##  3rd Qu.:   9628   3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310  
##  Max.   :7058221   Max.   :0.8910   Max.   :0.8940   Max.   :0.8250  
##  NA's   :4         NA's   :5        NA's   :5        NA's   :5       
##      TAXES            GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :   -14159   Min.   :      0   Min.   :       1  
##  1st Qu.:     1305   1st Qu.:   4193   1st Qu.:    1725  
##  Median :     5107   Median :  20430   Median :    7425  
##  Mean   :   118885   Mean   :  47279   Mean   :  175959  
##  3rd Qu.:    22208   3rd Qu.:  51238   3rd Qu.:   41026  
##  Max.   :117125387   Max.   :1402282   Max.   :63306755  
##                                                          
##   GVA_SERVICES         GVA_PUBLIC                  geom     
##  Min.   :        2   Min.   :       7   MULTIPOLYGON :2772  
##  1st Qu.:    10113   1st Qu.:   17260   POLYGON      :2797  
##  Median :    31212   Median :   35865   epsg:4674    :   0  
##  Mean   :   489539   Mean   :  123784   +proj=long...:   0  
##  3rd Qu.:   115448   3rd Qu.:   89256                       
##  Max.   :464656988   Max.   :41902893                       
## 

As there are some NA data, we replace those with 0.

brazil_muni_var[is.na(brazil_muni_var)] = 0

The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in brazil_muni_var data.frame. To avoid multticollinearity, we will plot the scatterplot to better understand the bivariate relationship.

independent <- brazil_muni_var %>% select(-c(`GDP_CAPITA`, `code_muni`, `name_muni`, `STATE`)) %>%st_set_geometry(NULL)
independent.cor = cor(independent)

col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(independent.cor, method="color", col=col(200),  
         type="upper", order="hclust", 
         addCoef.col = "black", # Add coefficient of correlation
         tl.col="black", tl.srt=45, #Text label color and rotation
         # Combine with significance
         sig.level = 0.01, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag=FALSE 
         )

As we can see from the above, the following pairs have a high correlation: 1. IBGE_15-59 & GVA_SERVICES: 0.94 2. AXES & GVVA_SERVICES:0.92 3. GVA_INDUSTRY & IBGE_15-59: 0.89 4. GVA_PUBLIC & IBGE_15-59: 0.87 5. GVA_INDUSTRY & GVA_SERVICES: 0.85 6. TAXES & GVA_PUBLIC: 0.85 7. GVA_PUBLIC & GVA_SERVICES: 0.85

Number 1 and 2 has extremely high correlation. Between both, we can see that the common variable is GVA_SERVICES. Also, we can see that GVA_SERVICES is highly correlated with the most variables. Hence, we remove it.

independent <- brazil_muni_var %>% select(-c(`GDP_CAPITA`, `code_muni`, `name_muni`, `STATE`, GVA_SERVICES, `IBGE_15-59`, `TAXES`)) %>%st_set_geometry(NULL)
independent.cor = cor(independent)

col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(independent.cor, method="color", col=col(200),  
         type="upper", order="hclust", 
         addCoef.col = "black", # Add coefficient of correlation
         tl.col="black", tl.srt=45, #Text label color and rotation
         # Combine with significance
         sig.level = 0.01, insig = "blank", 
         # hide correlation coefficient on the principal diagonal
         diag=FALSE 
         )

brazil_muni_var_o <- brazil_muni_var %>% select(-c(GVA_SERVICES))

4 Task 1 - Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.

  tm_shape(brazil_muni_var_o)+
  tm_borders(alpha = 0.5) +
  tm_fill(col = "GDP_CAPITA", alpha = 0.9, id = "name_muni")

The plot shows that the GDP_CAPITAL in most of the municipalities ae in the lower range of 0 to 50,000. We can see that ome muncipalities in the Central to the South of Brazil have a higher range above 50,000.

5 Exploratory Data Analysis

5.1 Multiple Histogram Plots distribution of variables

summary(brazil_muni_var_o)
##    code_muni              name_muni       STATE             GDP_CAPITA    
##  Min.   :1100015   Bom Jesus   :   5   Length:5569        Min.   :  3191  
##  1st Qu.:2512101   São Domingos:   5   Class :character   1st Qu.:  9062  
##  Median :3146255   Bonito      :   4   Mode  :character   Median : 15873  
##  Mean   :3253419   Santa Helena:   4                      Mean   : 21128  
##  3rd Qu.:4119152   Santa Inês  :   4                      3rd Qu.: 26155  
##  Max.   :5300108   Santa Luzia :   4                      Max.   :314638  
##                    (Other)     :5543                                      
##    IBGE_15-59        IDHM_Renda     IDHM_Longevidade IDHM_Educacao   
##  Min.   :      0   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:   1731   1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900  
##  Median :   3838   Median :0.6540   Median :0.8080   Median :0.5600  
##  Mean   :  18199   Mean   :0.6423   Mean   :0.8008   Mean   :0.5586  
##  3rd Qu.:   9591   3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310  
##  Max.   :7058221   Max.   :0.8910   Max.   :0.8940   Max.   :0.8250  
##                                                                      
##      TAXES            GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :   -14159   Min.   :      0   Min.   :       1  
##  1st Qu.:     1305   1st Qu.:   4193   1st Qu.:    1725  
##  Median :     5107   Median :  20430   Median :    7425  
##  Mean   :   118885   Mean   :  47279   Mean   :  175959  
##  3rd Qu.:    22208   3rd Qu.:  51238   3rd Qu.:   41026  
##  Max.   :117125387   Max.   :1402282   Max.   :63306755  
##                                                          
##    GVA_PUBLIC                  geom     
##  Min.   :       7   MULTIPOLYGON :2772  
##  1st Qu.:   17260   POLYGON      :2797  
##  Median :   35865   epsg:4674    :   0  
##  Mean   :  123784   +proj=long...:   0  
##  3rd Qu.:   89256                       
##  Max.   :41902893                       
## 
GDP_CAPITA_plot <- ggplot(data=brazil_muni_o, aes(x= `GDP_CAPITA`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ECONOMY_ACTIVE_plot <- ggplot(data=brazil_muni_o, aes(x= `IBGE_15-59`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

IDHM_Renda_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Renda`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

IDHM_Longevidade_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Longevidade`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

IDHM_Educacao_plot <- ggplot(data=brazil_muni_o, aes(x= `IDHM_Educacao`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

TAXES_plot <- ggplot(data=brazil_muni_o, aes(x= `TAXES`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

GVA_AGROPEC_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_AGROPEC`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

GVA_INDUSTRY_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_INDUSTRY`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

GVA_PUBLIC_plot <- ggplot(data=brazil_muni_o, aes(x= `GVA_PUBLIC`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

ggarrange(GDP_CAPITA_plot,ECONOMY_ACTIVE_plot,IDHM_Renda_plot,IDHM_Longevidade_plot,IDHM_Educacao_plot,TAXES_plot,GVA_AGROPEC_plot, GVA_INDUSTRY_plot, GVA_PUBLIC_plot, ncol = 3, nrow = 3)

5.2 Data Standardisation

summary(brazil_muni_var_o)
##    code_muni              name_muni       STATE             GDP_CAPITA    
##  Min.   :1100015   Bom Jesus   :   5   Length:5569        Min.   :  3191  
##  1st Qu.:2512101   São Domingos:   5   Class :character   1st Qu.:  9062  
##  Median :3146255   Bonito      :   4   Mode  :character   Median : 15873  
##  Mean   :3253419   Santa Helena:   4                      Mean   : 21128  
##  3rd Qu.:4119152   Santa Inês  :   4                      3rd Qu.: 26155  
##  Max.   :5300108   Santa Luzia :   4                      Max.   :314638  
##                    (Other)     :5543                                      
##    IBGE_15-59        IDHM_Renda     IDHM_Longevidade IDHM_Educacao   
##  Min.   :      0   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:   1731   1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900  
##  Median :   3838   Median :0.6540   Median :0.8080   Median :0.5600  
##  Mean   :  18199   Mean   :0.6423   Mean   :0.8008   Mean   :0.5586  
##  3rd Qu.:   9591   3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310  
##  Max.   :7058221   Max.   :0.8910   Max.   :0.8940   Max.   :0.8250  
##                                                                      
##      TAXES            GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :   -14159   Min.   :      0   Min.   :       1  
##  1st Qu.:     1305   1st Qu.:   4193   1st Qu.:    1725  
##  Median :     5107   Median :  20430   Median :    7425  
##  Mean   :   118885   Mean   :  47279   Mean   :  175959  
##  3rd Qu.:    22208   3rd Qu.:  51238   3rd Qu.:   41026  
##  Max.   :117125387   Max.   :1402282   Max.   :63306755  
##                                                          
##    GVA_PUBLIC                  geom     
##  Min.   :       7   MULTIPOLYGON :2772  
##  1st Qu.:   17260   POLYGON      :2797  
##  Median :   35865   epsg:4674    :   0  
##  Mean   :  123784   +proj=long...:   0  
##  3rd Qu.:   89256                       
##  Max.   :41902893                       
## 

Next, we can see that the data range is very big for GDP_CAPITA, GVA_AGROPEC, GVA_INDUSTRY, GVA_PUBLIC, TAXES and IBGE_15-59.

brazil_muni_var_o$GDP_CAPITA.std <- normalize(brazil_muni_var_o$GDP_CAPITA)
brazil_muni_var_o$GVA_AGROPEC.std <- normalize(brazil_muni_var_o$GVA_AGROPEC)
brazil_muni_var_o$GVA_INDUSTRY.std <- normalize(brazil_muni_var_o$GVA_INDUSTRY)
brazil_muni_var_o$GVA_PUBLIC.std <- normalize(brazil_muni_var_o$GVA_PUBLIC)
brazil_muni_var_o$TAXES.std <- normalize(brazil_muni_var_o$TAXES)
brazil_muni_var_o$IBGE_ECONOMY_ACTIVE.std <- normalize(brazil_muni_var_o$`IBGE_15-59`)

As the 0 values that are being standardised may return Inf values, we will change those to 0.

brazil_muni_var_o[is.na(brazil_muni_var_o)] = 0

We remove the columns that we no longer need

brazil_muni_std <- subset(brazil_muni_var_o, select = -c(`GDP_CAPITA`,`GVA_AGROPEC`, `GVA_INDUSTRY`, `GVA_PUBLIC`,`TAXES`,`IBGE_15-59`)) 

6 Task 2 and 3 - Multiple linear regression method

For the subsequent analysis, we will only use code muni to identify each row and hence, we will no longer need state and name_muni. We remove these 2.

brazil_muni_code <- subset(brazil_muni_std, select = -c(`STATE`, `name_muni`)) 

We need to change the rows by code_muni instead of row number and remove the geometry by using the code chunk below

row.names(brazil_muni_code) <- brazil_muni_code$"code_muni"
brazil_muni_code_var <- select(brazil_muni_code, c(2:ncol(brazil_muni_code))) %>% st_set_geometry(NULL)

6.1 Building a model using multiple linear regression method

IDHM_Renda IDHM_Longevidade IDHM_Educacao GVA_AGROPEC GVA_INDUSTRY GVA_PUBLIC

brazil_muni.mlr <- lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std  + GVA_INDUSTRY.std + GVA_PUBLIC.std + TAXES.std +IBGE_ECONOMY_ACTIVE.std, data=brazil_muni_code_var)
summary(brazil_muni.mlr)
## 
## Call:
## lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + 
##     IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + GVA_PUBLIC.std + 
##     TAXES.std + IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49455 -0.01995 -0.00834  0.00534  0.82985 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -0.10703    0.01262  -8.484  < 2e-16 ***
## IDHM_Renda               0.37800    0.01911  19.783  < 2e-16 ***
## IDHM_Longevidade        -0.13779    0.02396  -5.750 9.39e-09 ***
## IDHM_Educacao            0.04642    0.01291   3.595 0.000327 ***
## GVA_AGROPEC.std          0.18801    0.01119  16.809  < 2e-16 ***
## GVA_INDUSTRY.std         2.09088    0.07596  27.527  < 2e-16 ***
## GVA_PUBLIC.std           0.02381    0.09381   0.254 0.799677    
## TAXES.std                0.74150    0.09776   7.585 3.88e-14 ***
## IBGE_ECONOMY_ACTIVE.std -2.64081    0.11534 -22.895  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05099 on 5560 degrees of freedom
## Multiple R-squared:  0.3909, Adjusted R-squared:   0.39 
## F-statistic:   446 on 8 and 5560 DF,  p-value: < 2.2e-16

With reference to the report above, GVA_PUBLIC.std is statistically insignificant and thus we remove it.

brazil_muni.mlr2 <- lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std  + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std, data=brazil_muni_code_var)
summary(brazil_muni.mlr2)
## 
## Call:
## lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + 
##     IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std + 
##     IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49709 -0.01995 -0.00835  0.00535  0.82990 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -0.10699    0.01261  -8.482  < 2e-16 ***
## IDHM_Renda               0.37821    0.01909  19.815  < 2e-16 ***
## IDHM_Longevidade        -0.13803    0.02394  -5.765 8.61e-09 ***
## IDHM_Educacao            0.04649    0.01291   3.602 0.000318 ***
## GVA_AGROPEC.std          0.18803    0.01118  16.812  < 2e-16 ***
## GVA_INDUSTRY.std         2.08843    0.07533  27.722  < 2e-16 ***
## TAXES.std                0.75257    0.08748   8.603  < 2e-16 ***
## IBGE_ECONOMY_ACTIVE.std -2.62689    0.10145 -25.894  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05099 on 5561 degrees of freedom
## Multiple R-squared:  0.3909, Adjusted R-squared:  0.3901 
## F-statistic: 509.8 on 7 and 5561 DF,  p-value: < 2.2e-16

As seen above, it is clear that all the indepent variables are statistically significant as their p value are all less than the alpha value of 0.05.

ols_regress(brazil_muni.mlr2)
##                         Model Summary                          
## --------------------------------------------------------------
## R                       0.625       RMSE                0.051 
## R-Squared               0.391       Coef. Var          88.525 
## Adj. R-Squared          0.390       MSE                 0.003 
## Pred R-Squared          0.320       MAE                 0.025 
## --------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                 ANOVA                                  
## ----------------------------------------------------------------------
##                Sum of                                                 
##               Squares          DF    Mean Square       F         Sig. 
## ----------------------------------------------------------------------
## Regression      9.277           7          1.325    509.815    0.0000 
## Residual       14.456        5561          0.003                      
## Total          23.734        5568                                     
## ----------------------------------------------------------------------
## 
##                                          Parameter Estimates                                          
## -----------------------------------------------------------------------------------------------------
##                   model      Beta    Std. Error    Std. Beta       t        Sig      lower     upper 
## -----------------------------------------------------------------------------------------------------
##             (Intercept)    -0.107         0.013                  -8.482    0.000    -0.132    -0.082 
##              IDHM_Renda     0.378         0.019        0.480     19.815    0.000     0.341     0.416 
##        IDHM_Longevidade    -0.138         0.024       -0.107     -5.765    0.000    -0.185    -0.091 
##           IDHM_Educacao     0.046         0.013        0.067      3.602    0.000     0.021     0.072 
##         GVA_AGROPEC.std     0.188         0.011        0.182     16.812    0.000     0.166     0.210 
##        GVA_INDUSTRY.std     2.088         0.075        0.638     27.722    0.000     1.941     2.236 
##               TAXES.std     0.753         0.087        0.168      8.603    0.000     0.581     0.924 
## IBGE_ECONOMY_ACTIVE.std    -2.627         0.101       -0.713    -25.894    0.000    -2.826    -2.428 
## -----------------------------------------------------------------------------------------------------

As seen from the model summary, the model explains 39.1% of the variation in GDP_CAPITAL. IDHM_Longevidade and IBGE_ECONOMY_ACTIVE.std have a negative relationship with GDP_CAPITAL while the rest have a positive relationship with GDP_CAPITAL.

  • For every unit increase in IDHM_Renda, GDP_CAPITAL.std increases by 0.378
  • For every unit increase in IDHM_Longevidade, GDP_CAPITAL.std decreases by 0.138
  • For every unit increase in IDHM_Educacao, GDP_CAPITAL.std increases by 0.046
  • For every unit increase in GVA_AGROPEC.std, GDP_CAPITAL.std increases by 0.188
  • For every unit increase in GVA_INDUSTRY.std, GDP_CAPITAL.std increases by 2.088
  • For every unit increase in TAXES.std, GDP_CAPITAL.std increases by 0.753
  • For every unit increase in IBGE_ECONOMY_ACTIVE.std, GDP_CAPITAL.std decreases by 2.627

6.2 Checking for multicolinearity

ols_vif_tol(brazil_muni.mlr2)
##                 Variables Tolerance      VIF
## 1              IDHM_Renda 0.1865197 5.361363
## 2        IDHM_Longevidade 0.3167394 3.157170
## 3           IDHM_Educacao 0.3120479 3.204637
## 4         GVA_AGROPEC.std 0.9337639 1.070935
## 5        GVA_INDUSTRY.std 0.2065560 4.841302
## 6               TAXES.std 0.2867399 3.487481
## 7 IBGE_ECONOMY_ACTIVE.std 0.1442663 6.931627

As seeen above, all VIF are below 10 and thus, there are no signs of multicollinearity.

6.3 Test for Non-Linearity

ols_plot_resid_fit(brazil_muni.mlr2)

The figure above reveals that most of the data points are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.

6.4 Test for Normality Assumption

Lastly, the code chunk below uses ols_plot_resid_hist() of olsrr package to perform normality assumption test.

ols_plot_resid_hist(brazil_muni.mlr2)

The figure reveals that the residual of the multiple linear regression model resemble normal distribution.

6.5 Testing for Spatial Autocorrelation

Ho = The residuals in brazil are randomly distributed.
H1= The residuals in brazil are not randomly distributed.
Confidence interval: 95%

The null hypothesis will be rejected if p-value is smaller than alpha value of 0.05.

mlr.output <- as.data.frame(brazil_muni.mlr2$residuals)
head(mlr.output)
##         brazil_muni.mlr2$residuals
## 1100015                -0.02989087
## 1100023                -0.03995615
## 1100031                -0.01002947
## 1100049                -0.04115962
## 1100056                -0.01279513
## 1100064                -0.02770915
brazil_muni.res.sf <- cbind(brazil_muni_std, 
                        brazil_muni.mlr2$residuals) %>%
  
rename(`MLR_RES` = `brazil_muni.mlr2.residuals`)
brazil_muni.sp <- as_Spatial(brazil_muni.res.sf)
brazil_muni.sp
## class       : SpatialPolygonsDataFrame 
## features    : 5569 
## extent      : -73.99045, -28.83594, -33.75118, 5.271841  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## variables   : 13
## names       : code_muni,       name_muni, STATE, IDHM_Renda, IDHM_Longevidade, IDHM_Educacao, GDP_CAPITA.std, GVA_AGROPEC.std, GVA_INDUSTRY.std, GVA_PUBLIC.std, TAXES.std, IBGE_ECONOMY_ACTIVE.std,            MLR_RES 
## min values  :   1100015, Abadia De Goiás,    AC,          0,                0,             0,              0,               0,                0,              0,         0,                       0, -0.497089956814411 
## max values  :   5300108,          Zortéa,    TO,      0.891,            0.894,         0.825,              1,               1,                1,              1,         1,                       1,  0.829896826754127

Task 3: Prepare a choropleth map showing the distribution of the residual of the GDP per capita.

tm_shape(brazil_muni.res.sf) +  
  tm_fill(col = "MLR_RES",
          alpha = 0.6, style="quantile")



The plot above revealed that there is a sign of spatial autocorrelation. Hence, we will perform Moran’s I test to analyse if there is any spatial autocorrelation.

6.5.1 Modelling Spatial Neighbours using Queens method

nb <- poly2nb(brazil_muni.sp, queen=TRUE)
summary(nb)
## Neighbour list object:
## Number of regions: 5569 
## Number of nonzero links: 31796 
## Percentage nonzero weights: 0.1025222 
## Average number of links: 5.709463 
## 2 regions with no links:
## 1526 3500
## Link number distribution:
## 
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##    2   14  102  387 1015 1304 1137  728  432  227  117   48   30   16    6 
##   16   22 
##    3    1 
## 14 least connected regions:
## 279 606 1173 3180 3274 3713 4350 4408 4628 4642 4712 5000 5150 5513 with 1 link
## 1 most connected region:
## 3830 with 22 links

As we can see, Queens method is not appropriate as there are 2 with 0 connected regions.

6.5.2 Modelling Spatial Neighbours using Fixed distance

6.5.2.1 Determine the cut off distance

coords <- coordinates(brazil_muni.sp)
k1 <- knn2nb(knearneigh(coords,k=1))
k1dists <- unlist(nbdists(k1, coords, longlat = TRUE))
summary(k1dists)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.386  11.747  15.880  21.372  23.968 369.663

The summary report shows that the largest first nearest neighbour distance is 369.663, so using this as the upper threshold gives certainty that all units will have at least 1 neighbour.

6.5.2.2 Computing fixed distance weight matrix

nb<- dnearneigh(coords, 0, 369.663 , longlat = TRUE)
nb
## Neighbour list object:
## Number of regions: 5569 
## Number of nonzero links: 3033960 
## Percentage nonzero weights: 9.782625 
## Average number of links: 544.7944
nb_lw <- nb2listw(nb, zero.policy=TRUE, style = 'W')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5569 
## Number of nonzero links: 3033960 
## Percentage nonzero weights: 9.782625 
## Average number of links: 544.7944 
## Link number distribution:
## 
##   1   4   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23 
##   1   1   1   2   4   5   2   5  13   6   8   5   7   9   5   4   2   4 
##  24  25  26  27  28  29  30  31  32  33  34  36  37  38  39  40  41  42 
##   8   9   6   2   9   6   4   4   1   4   2   1   2   1   2   5   4   4 
##  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
##   2   2   5   3   7   5   5   5   7   3   7   9   8   8   6  12   6   4 
##  61  62  63  64  65  66  67  69  70  71  72  73  74  75  76  77  78  80 
##   1   2   3   1   4   2   4   2   2   5   3   4   3   2   7   2   3   5 
##  81  82  83  84  85  86  87  88  89  90  91  92  93  94  96  97  98 100 
##   3   3   1   3   2   2   1   5   2   2   3   1   1   4   1   3   1   2 
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 118 119 
##   4   2   1   2   1   1   4   3   2   1   4   5   2   4   4   6   1   4 
## 120 121 123 124 125 126 128 129 130 131 132 133 134 135 136 137 138 139 
##   2   3   2   1   5   2   2   1   2   6   3   1   1   6   4   3   1   4 
## 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 
##   2   3   6   5   6  11   5   5   7   5   7   7   6   8   2   9   5   7 
## 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 
##   2   5   2   7  10   8   5   3   4   9   4   4   4   7   2   6   4   2 
## 176 177 178 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 
##   3   1   1   5   2   3   1   1   3   2   1   1   3   7   6   4   5   1 
## 195 196 197 198 200 201 202 203 204 205 206 207 208 209 210 211 212 213 
##   1   2   1   2   4   2   4   2   1   1   1   2   5   5   2   3   7   3 
## 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 
##   6   6   3   1   2   6   4   4   8   1   3   6   5   4   5   2   6   2 
## 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 
##   4   4   1   9   3   4   5   3   8   2   5   5   7   3  10   6   8   3 
## 250 251 252 253 254 255 256 257 258 259 260 262 263 264 265 266 267 268 
##   4   6   5   4   6   5  10   5   3   1   8   7   6  10   4   2   6  10 
## 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 
##   4   4   5   8   8   6  11   9   3   6   7   8  10   3   3   7   5   5 
## 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 
##   5   7   9   6   2   6   6   3   5   2   4   3   5   3   8   6   5   4 
## 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 
##   4   9   3   5   5   3   7   5   4   6   4   6   7   5   3   7  11   7 
## 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 
##   3   7   5   5   4   4   5   9   5   6   5   5   6   5   7   6   6   5 
## 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 
##   5   3   4   4   6   5   7   4   5   6  11   2   2   3   4   3   6   3 
## 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 
##   4  10   6   5   4   9   8   4   7   5   9   2   5   6   5   5   8   7 
## 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 
##  10   7   8   9   4   9   7   9  13  11  15  11   6   7   8   4   9   8 
## 395 396 397 398 399 400 401 402 403 404 405 406 407 408 410 411 412 413 
##   5   6   8   3   5  10   7   5  10   7   3   3   5   4   7   8   3   1 
## 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 
##   9   4   5   2   4   6   6   6   7   7   5   6   1   1   4   9   4   4 
## 432 433 434 435 436 437 438 439 440 441 442 443 444 445 447 448 449 450 
##   3   3   3   3   3   4   7   1   2   4   7   4   6   2   4   2   4   1 
## 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 
##   3   4   3   9   5   4   4   3   4   3   8   2   3   5   6   4   2   3 
## 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 485 486 487 
##   2   3   2   2   2   2   5   2   4   5   3   3   2   5   3   6   5   3 
## 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 
##   8   5   4   7   4   1   6   1   1   8   7   4   6   2   3   3   3   1 
## 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 
##   6   2   4   4   3   5   6   5   6   2   8   4   3   7   4   8   4   3 
## 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 
##   5   6   3   3   3   1   5   2   5  10   6   5   6   8   2   6   5   3 
## 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 
##   5   6   2   1   7   4   7   4   7   5   5   8   6   8   7  10   2   9 
## 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 
##   5   7   4   5  13   6  13   6   3   7   5  10   8   7   8   7  10   5 
## 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 
##   9   7   9   7   8   5   8   8   8   6   8   6   9   9   5   4   7   8 
## 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 
##   5   8  10   7   8   8   9   7   7   8   9   8   8   8   7   8   5   5 
## 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 
##   4   5   4   5   4   5   6  14   7   7   8  10  13   5   5  11   9   7 
## 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 
##  10   6   6  10   7  11  11   8  10   8   8  10  10  10   7   9   3   7 
## 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 
##   9   7  14   9  11  10   6  12  12   8  12   3  10  10  15   6   6  11 
## 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 
##   9   6   8   6   7  11   8   6   6  14   9  16  10  11  11   8  11   7 
## 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 
##   9   8   9  12  12   9  12   6  13   6  12   9  10   7   4  13   8  16 
## 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 
##  17   7  10   8   8   6  11  10  10   6  12   7   8  15  10  13   7   9 
## 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 
##   9  11  11  11   4  11   6  10  11   5  10  12  11  14   9   9  11  13 
## 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 
##   7   5   9   6   7  12  15   9   6   9  13  12   6   8   7  12  10  13 
## 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 
##   4   9   9   6   4  11   9   7  11  10  10   8  11  12   5  10   6   8 
## 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 
##   6  16   8   7   5  10   9   9   7   5   6  11  11   9   6   7  12   4 
## 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 
##  11   6   3   7   6   5  12   4   9   8  14   9   5   9   6  11   5   7 
## 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 
##  15   5  11   7   6   7   8   7   9   7  10  10   8   9  18   8   8   9 
## 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 
##   6   7   8   7   9   8   6   9   1   8   6   7   9  10   4  11   8   6 
## 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 
##   7   4   6   9  13  12  10   8  10  10   7   4   6   7   4  13   7   9 
## 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 
##   9   9  12   6   6  11   8   4  11   5  10   8  10   8   5   5   3  10 
## 884 885 886 887 888 889 890 891 892 893 894 895 897 898 899 900 901 902 
##   6  13   2  11   7   8   4   3   3  10   9   6  10   7   9   3   4   9 
## 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 
##   6   5   6  10   7   5   7   5   9   8   4   6   6   6   4   7   7   5 
## 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 
##   5   4   4   6   7   6  10   2   7   9   3   4   5   1   6   5   4   5 
## 939 940 941 942 945 947 948 949 950 951 953 954 956 957 958 962 964 972 
##   3   2   3   2   1   3   1   1   1   1   1   2   1   1   1   1   1   1 
## 973 974 
##   1   1 
## 1 least connected region:
## 1526 with 1 link
## 1 most connected region:
## 4159 with 974 links
## 
## Weights style: W 
## Weights constants summary:
##      n       nn   S0       S1       S2
## W 5569 31013761 5569 45.62359 22403.91

As we can see from the above, the average number of links is 544, which is way too many. Hence, we will try the k method instead by determining number of neighbours.

6.5.3 Modelling Spatial Neighbours using k neigbours method

6.5.3.1 Assign 8 neighbours to each city

We use the number 8 as for for values that are not normally distributed,it is important to evaluate each feature within the cnotext of at least 8 neighbours as a rule of thumb.

brazil_muni_8 <- knn2nb(knearneigh(coords, k=8))
brazil_muni_8
## Neighbour list object:
## Number of regions: 5569 
## Number of nonzero links: 44552 
## Percentage nonzero weights: 0.1436524 
## Average number of links: 8 
## Non-symmetric neighbours list

6.5.3.2 Row-standardised weights matrix

Next, we need to assign weights to each neighboring polygon. In our case, each neighboring polygon will be assigned equal weight (style=“W”)

nb_lw_8<- nb2listw(brazil_muni_8, style="W", zero.policy = TRUE)
nb_lw_8
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 5569 
## Number of nonzero links: 44552 
## Percentage nonzero weights: 0.1436524 
## Average number of links: 8 
## Non-symmetric neighbours list
## 
## Weights style: W 
## Weights constants summary:
##      n       nn   S0      S1       S2
## W 5569 31013761 5569 1279.25 22580.69

6.5.4 Global Moran’s I test

lm.morantest(brazil_muni.mlr2, nb_lw_8)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade
## + IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std +
## IBGE_ECONOMY_ACTIVE.std, data = brazil_muni_code_var)
## weights: nb_lw_8
## 
## Moran I statistic standard deviate = 10.607, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     6.746011e-02    -5.030635e-04     4.105778e-05

The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value is less than 0.00000000000000022 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals are randomly distributed.

Since the Observed Global Moran I = 0.06746011 which is greater than 0, we can infer than the residuals resemble cluster distribution.

7 Task 4 - Building GDP_CAPITAL Models using Fixed Bandwidth GWR Model

7.1 Computing fixed bandwith

bw.fixed <- bw.gwr(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std  + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std, data=brazil_muni.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=TRUE)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 34.70094 CV score: 100.0252 
## Fixed bandwidth: 21.45065 CV score: 562.666 
## Fixed bandwidth: 42.89007 CV score: 44.72794 
## Fixed bandwidth: 47.95123 CV score: 38.81675 
## Fixed bandwidth: 51.07919 CV score: 36.50962 
## Fixed bandwidth: 53.01239 CV score: 35.44099 
## Fixed bandwidth: 54.20716 CV score: 34.86737 
## Fixed bandwidth: 54.94558 CV score: 34.54285 
## Fixed bandwidth: 55.40194 CV score: 34.33989 
## Fixed bandwidth: 55.68399 CV score: 34.2242 
## Fixed bandwidth: 55.85831 CV score: 34.14794 
## Fixed bandwidth: 55.96604 CV score: 34.10478 
## Fixed bandwidth: 56.03262 CV score: 34.07745 
## Fixed bandwidth: 56.07377 CV score: 34.05968 
## Fixed bandwidth: 56.0992 CV score: 34.05091 
## Fixed bandwidth: 56.11492 CV score: 34.04379 
## Fixed bandwidth: 56.12464 CV score: 34.04201 
## Fixed bandwidth: 56.13064 CV score: 34.03789 
## Fixed bandwidth: 56.13435 CV score: 34.03687 
## Fixed bandwidth: 56.13664 CV score: 34.03573 
## Fixed bandwidth: 56.13806 CV score: 34.03487 
## Fixed bandwidth: 56.13894 CV score: 34.03424 
## Fixed bandwidth: 56.13948 CV score: 34.03322 
## Fixed bandwidth: 56.13981 CV score: 34.03355 
## Fixed bandwidth: 56.13927 CV score: 34.03504 
## Fixed bandwidth: 56.13961 CV score: 34.03415 
## Fixed bandwidth: 56.1394 CV score: 34.0339 
## Fixed bandwidth: 56.13953 CV score: 34.03343 
## Fixed bandwidth: 56.13945 CV score: 34.03438

7.2 GWModel method - Fixed bandwith

gwr.fixed <- gwr.basic(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + IDHM_Educacao + GVA_AGROPEC.std  + GVA_INDUSTRY.std + TAXES.std + IBGE_ECONOMY_ACTIVE.std  , data=brazil_muni.sp, bw=bw.fixed, kernel = 'gaussian', longlat = TRUE)

The output is saved in a list of class “gwrm”. The code below can be used to display the model output.

gwr.fixed
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 13:48:18 
##    Call:
##    gwr.basic(formula = GDP_CAPITA.std ~ IDHM_Renda + IDHM_Longevidade + 
##     IDHM_Educacao + GVA_AGROPEC.std + GVA_INDUSTRY.std + TAXES.std + 
##     IBGE_ECONOMY_ACTIVE.std, data = brazil_muni.sp, bw = bw.fixed, 
##     kernel = "gaussian", longlat = TRUE)
## 
##    Dependent (y) variable:  GDP_CAPITA.std
##    Independent variables:  IDHM_Renda IDHM_Longevidade IDHM_Educacao GVA_AGROPEC.std GVA_INDUSTRY.std TAXES.std IBGE_ECONOMY_ACTIVE.std
##    Number of data points: 5569
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.49709 -0.01995 -0.00835  0.00535  0.82990 
## 
##    Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)             -0.10699    0.01261  -8.482  < 2e-16 ***
##    IDHM_Renda               0.37821    0.01909  19.815  < 2e-16 ***
##    IDHM_Longevidade        -0.13803    0.02394  -5.765 8.61e-09 ***
##    IDHM_Educacao            0.04649    0.01291   3.602 0.000318 ***
##    GVA_AGROPEC.std          0.18803    0.01118  16.812  < 2e-16 ***
##    GVA_INDUSTRY.std         2.08843    0.07533  27.722  < 2e-16 ***
##    TAXES.std                0.75257    0.08748   8.603  < 2e-16 ***
##    IBGE_ECONOMY_ACTIVE.std -2.62689    0.10145 -25.894  < 2e-16 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.05099 on 5561 degrees of freedom
##    Multiple R-squared: 0.3909
##    Adjusted R-squared: 0.3901 
##    F-statistic: 509.8 on 7 and 5561 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 14.45645
##    Sigma(hat): 0.05095893
##    AIC:  -17334.74
##    AICc:  -17334.71
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 56.13948 
##    Regression points: the same locations as observations are used.
##    Distance metric: Great Circle distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                   Min.     1st Qu.      Median     3rd Qu.
##    Intercept               -4.5602e+00 -1.6295e-01 -6.0632e-02 -6.4799e-03
##    IDHM_Renda              -2.7387e+00  5.3042e-02  1.1149e-01  2.1838e-01
##    IDHM_Longevidade        -2.2598e+00 -8.0280e-02  1.4249e-03  8.8746e-02
##    IDHM_Educacao           -2.0456e+00  2.5275e-03  3.5712e-02  1.0883e-01
##    GVA_AGROPEC.std         -9.7822e-01  7.6581e-02  1.5762e-01  2.7593e-01
##    GVA_INDUSTRY.std        -6.0174e+02  2.3876e+00  4.3738e+00  7.7333e+00
##    TAXES.std               -8.9748e+02  6.4132e-01  5.5406e+00  1.9686e+01
##    IBGE_ECONOMY_ACTIVE.std -2.1930e+02 -1.1906e+01 -4.8706e+00 -2.1403e+00
##                                 Max.
##    Intercept                  1.6545
##    IDHM_Renda                 5.2401
##    IDHM_Longevidade           3.7370
##    IDHM_Educacao              2.0845
##    GVA_AGROPEC.std            2.3158
##    GVA_INDUSTRY.std         228.6122
##    TAXES.std               1382.9059
##    IBGE_ECONOMY_ACTIVE.std   71.2297
##    ************************Diagnostic information*************************
##    Number of data points: 5569 
##    Effective number of parameters (2trace(S) - trace(S'S)): 1667.471 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 3901.529 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): -18253.76 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -20385.58 
##    Residual sum of squares: 6.622341 
##    R-square value:  0.7209733 
##    Adjusted R-square value:  0.6016897 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 13:49:03

As seen from the model summary, the model explains 39.1% of the variation in GDP_CAPITAL (Similar to the MLR model). IDHM_Longevidade and IBGE_ECONOMY_ACTIVE.std have a negative relationship with GDP_CAPITAL while the rest have a positive relationship with GDP_CAPITAL.

  • For every unit increase in IDHM_Renda, GDP_CAPITAL.std increases by 0.37821
  • For every unit increase in IDHM_Longevidade, GDP_CAPITAL.std decreases by 0.13803
  • For every unit increase in IDHM_Educacao, GDP_CAPITAL.std increases by 0.04649
  • For every unit increase in GVA_AGROPEC.std, GDP_CAPITAL.std increases by 0.18803
  • For every unit increase in GVA_INDUSTRY.std, GDP_CAPITAL.std increases by 2.08843
  • For every unit increase in TAXES.std, GDP_CAPITAL.std increases by 0.75257
  • For every unit increase in IBGE_ECONOMY_ACTIVE.std, GDP_CAPITAL.std decreases by 2.62689

8 Task 5- Visualising GWR Output

With the model, we will prepare a series of choropleth maps showing the outputs of the geographically weighted regression model.

8.1 Converting SDF into sf data.frame

brazil_muni.sf.fixed <- st_as_sf(gwr.fixed$SDF) %>%
  st_transform(crs=4674)
brazil_muni.sf.fixed.4674 <- st_transform(brazil_muni.sf.fixed, 4674)

We join back to brazil_muni_std and not brazil_muni_code as we want to plot with the state and name of the municipality.

gwr.fixed.output <- as.data.frame(gwr.fixed$SDF)
brazil_muni_code.fixed <- cbind(brazil_muni_std, as.matrix(gwr.fixed.output))
brazil_muni_code.fixed
## Simple feature collection with 5569 features and 42 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 5.271841
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni             name_muni STATE IDHM_Renda IDHM_Longevidade
## 1    1100015 Alta Floresta D'oeste    RO      0.657            0.763
## 2    1100023             Ariquemes    RO      0.716            0.806
## 3    1100031                Cabixi    RO      0.650            0.757
## 4    1100049                Cacoal    RO      0.727            0.821
## 5    1100056            Cerejeiras    RO      0.688            0.799
## 6    1100064     Colorado Do Oeste    RO      0.676            0.814
## 7    1100072            Corumbiara    RO      0.630            0.774
## 8    1100080         Costa Marques    RO      0.616            0.751
## 9    1100098       Espigão D'oeste    RO      0.691            0.819
## 10   1100106         Guajará-Mirim    RO      0.663            0.823
##    IDHM_Educacao GDP_CAPITA.std GVA_AGROPEC.std GVA_INDUSTRY.std
## 1          0.526     0.04990125      0.11848071     4.855722e-07
## 2          0.600     0.05595688      0.10345193     5.578592e-03
## 3          0.559     0.05783450      0.04213210     7.302128e-05
## 4          0.620     0.06081357      0.13407054     3.634820e-03
## 5          0.602     0.06270891      0.03771807     3.545427e-04
## 6          0.584     0.04244695      0.05183320     4.397635e-07
## 7          0.473     0.07657810      0.09425845     1.584682e-04
## 8          0.493     0.02861561      0.04461417     1.045577e-04
## 9          0.536     0.04321665      0.08232194     8.978652e-04
## 10         0.519     0.04026828      0.03560563     4.618831e-04
##    GVA_PUBLIC.std    TAXES.std IBGE_ECONOMY_ACTIVE.std   Intercept
## 1    3.405978e-03 0.0003188051            0.0012783675 -0.45132447
## 2    1.405826e-02 0.0019656423            0.0065172230 -0.09781247
## 3    9.554075e-04 0.0001209158            0.0002468044 -1.32464677
## 4    1.157999e-02 0.0017850396            0.0059543333 -0.15336026
## 5    2.374261e-03 0.0001213652            0.0012717086 -0.58840318
## 6    2.405381e-03 0.0002785126            0.0012469148 -1.08752797
## 7    1.322334e-03 0.0001209434            0.0002387287 -0.38461707
## 8    1.998908e-06 0.0001209290            0.0006174360 -0.08165977
## 9    4.074391e-03 0.0004806875            0.0018401237 -0.11657488
## 10   5.811294e-06 0.0001216977            0.0030613946 -0.08789906
##    IDHM_Renda.1 IDHM_Longevidade.1 IDHM_Educacao.1 GVA_AGROPEC.std.1
## 1     0.2697896        0.388869466      0.01648041        0.31956120
## 2    -0.1652060        0.197892832      0.18895694        0.13452448
## 3     4.1258596       -0.713026931     -1.33793449       -0.12661030
## 4     0.1635282        0.091674590      0.03470668        0.01455588
## 5     1.3440149        0.002231064     -0.37911191       -0.04818984
## 6     3.1671235       -0.522040551     -0.90262790       -0.02113617
## 7     0.8665679       -0.049839668     -0.13349381        0.16077721
## 8     0.1918322        0.037720715     -0.07902682       -0.08223659
## 9     0.1591303        0.048467196      0.04089547       -0.07682944
## 10    0.0317533        0.168862889     -0.05667693       -0.03970428
##    GVA_INDUSTRY.std.1 TAXES.std.1 IBGE_ECONOMY_ACTIVE.std.1          y
## 1           8.7130004   35.727890               -25.1872324 0.04990125
## 2           0.4165859   15.768801                -7.5918675 0.05595688
## 3          -7.3105722   95.438920               -64.0063542 0.05783450
## 4           4.2042029   15.228903                -8.1907680 0.06081357
## 5           5.9505082   32.844711               -42.5604358 0.06270891
## 6           7.2829857   59.658946               -61.4852516 0.04244695
## 7           9.6290605   25.599855               -35.9288499 0.07657810
## 8           5.6726196   50.507181                -0.3816103 0.02861561
## 9           2.9296014    7.898972                -3.7560315 0.04321665
## 10         11.5886168    9.857580                -2.4764401 0.04026828
##          yhat      residual CV_Score Stud_residual Intercept_SE
## 1  0.04836112  1.540126e-03        0  0.1606131686    0.5698745
## 2  0.05453451  1.422371e-03        0  0.1468756953    0.4491582
## 3  0.05937002 -1.535518e-03        0 -0.2487859180    0.5235482
## 4  0.05795436  2.859202e-03        0  0.2157335183    0.5618656
## 5  0.05999013  2.718783e-03        0  0.4542802315    0.6126895
## 6  0.04022839  2.218560e-03        0  0.4111803357    0.5163537
## 7  0.07080164  5.776464e-03        0  0.4514660347    0.5709999
## 8  0.02867325 -5.763875e-05        0 -0.1631089713    0.8033834
## 9  0.04818981 -4.973160e-03        0 -0.3715237955    0.8380354
## 10 0.04026939 -1.106600e-06        0 -0.0009077684    0.6283788
##    IDHM_Renda_SE IDHM_Longevidade_SE IDHM_Educacao_SE GVA_AGROPEC.std_SE
## 1      0.6120484           0.6879137        0.3582075          0.5807456
## 2      1.0466128           0.8612362        0.4565096          0.4220339
## 3      1.3952096           0.7100464        1.0150291          0.3467717
## 4      0.5114617           0.7044159        0.3960576          0.4998172
## 5      1.0223463           0.7102789        0.6395730          0.6956278
## 6      1.0475005           0.7152289        0.8488420          0.2415083
## 7      0.7372813           0.7240281        0.4743384          0.5379428
## 8      1.1525610           1.4384518        0.5179586          0.7375356
## 9      0.6296449           1.0320657        0.4907693          0.5220070
## 10     1.0129753           1.0641209        0.5205934          0.5142687
##    GVA_INDUSTRY.std_SE TAXES.std_SE IBGE_ECONOMY_ACTIVE.std_SE
## 1             28.31909     96.67039                   42.20039
## 2             11.92998     54.02517                   26.98894
## 3             63.88874    173.47374                   44.44462
## 4             20.33337     51.19842                   25.81117
## 5             45.30980    159.79695                   47.57823
## 6             57.87462    159.17505                   39.41955
## 7             40.41984    127.23353                   40.31494
## 8             54.41224    234.07310                   32.99651
## 9             16.16985     52.80571                   25.98817
## 10            33.57885     44.80166                   23.50994
##    Intercept_TV IDHM_Renda_TV IDHM_Longevidade_TV IDHM_Educacao_TV
## 1    -0.7919717    0.44079772          0.56528817       0.04600800
## 2    -0.2177684   -0.15784827          0.22977765       0.41391671
## 3    -2.5301331    2.95716106         -1.00419769      -1.31812428
## 4    -0.2729483    0.31972707          0.13014271       0.08763038
## 5    -0.9603611    1.31463763          0.00314111      -0.59275782
## 6    -2.1061685    3.02350556         -0.72989296      -1.06336381
## 7    -0.6735852    1.17535585         -0.06883665      -0.28143160
## 8    -0.1016448    0.16643993          0.02622313      -0.15257364
## 9    -0.1391049    0.25273030          0.04696135       0.08332931
## 10   -0.1398823    0.03134657          0.15868769      -0.10886986
##    GVA_AGROPEC.std_TV GVA_INDUSTRY.std_TV TAXES.std_TV
## 1          0.55026023          0.30767228    0.3695846
## 2          0.31875281          0.03491925    0.2918788
## 3         -0.36511144         -0.11442661    0.5501635
## 4          0.02912241          0.20676374    0.2974487
## 5         -0.06927532          0.13132938    0.2055403
## 6         -0.08751733          0.12584076    0.3748009
## 7          0.29887417          0.23822608    0.2012037
## 8         -0.11150185          0.10425265    0.2157752
## 9         -0.14718084          0.18117680    0.1495856
## 10        -0.07720532          0.34511654    0.2200271
##    IBGE_ECONOMY_ACTIVE.std_TV  Local_R2                           geom
## 1                 -0.59684838 0.9682039 POLYGON ((-62.19465 -11.827...
## 2                 -0.28129549 0.9746579 POLYGON ((-62.53648 -9.7322...
## 3                 -1.44013741 0.9935490 POLYGON ((-60.37075 -13.363...
## 4                 -0.31733423 0.9506142 POLYGON ((-61.0008 -11.2973...
## 5                 -0.89453595 0.9584837 POLYGON ((-61.49976 -13.005...
## 6                 -1.55976551 0.9940479 POLYGON ((-60.50475 -12.966...
## 7                 -0.89120430 0.9569631 POLYGON ((-61.34273 -12.666...
## 8                 -0.01156517 0.9994902 POLYGON ((-63.71199 -11.650...
## 9                 -0.14452850 0.9360049 POLYGON ((-60.94827 -10.988...
## 10                -0.10533589 0.9987951 POLYGON ((-65.37724 -10.431...

8.2 Visualising local R2

tm_shape(brazil_muni_code.fixed) + 
  tm_fill(col = "Local_R2",
           #size = 0.15,
           border.col = "gray60",
           border.lwd = 1, id="name_muni")



The local R2 values determines have better prediction using the model. By plotting the Local R2 values by municipality, we can identify which municipalities’ GDP_CAPITAL is predicted well and not predicted well with GWRmodel. From the plot above, the highest R2 range is 0.8 to 1.0, Majority of the maps are within this range including the following regions: the North, East, West and Central of Brazil. This shows that the prediction of GDP_CAPITA with this model is more accurate in these regions. Even though the model is accurate for most parts of Brazil, we can see the colour fades out towards the South of Brazil. Hence, this model does not predict as well for the municipalities in those areas.

8.2 Visualising Residuals

tm_shape(brazil_muni_code.fixed) +  
  tm_fill(col = "residual",
           border.col = "gray60",
           border.lwd = 1) 



These residual values is the error that isnt explained by the model. From the above plot, the residual values range from -0.4 to 0.8. Across the municipalities in Brazil, most areas have a residual value ranging from -0.2 to 0.2. The high residuals can be seen in a small area in the North of Brazil, with a residual value of between range of 0.6 to 0.8. Hence, this shows that there is minimal error that isnt eplained by the model in most municipalities, with the exception of a few in the South.

8.3 Visualising Intercept_SE

tm_shape(brazil_muni_code.fixed) +  
  tm_fill(col = "Intercept_SE",
           border.col = "gray60",
           border.lwd=1) 



Intercept_SE represents the standard error of the intercept. It determines if the estimated intercept is statistically significant from a hypothesized value, which in our case is 0.05. As seen from the map, there are some municipalities in the North-West of Brazil with a high intercept SE. Hence, this indicates that the estimated intercept in these municipalities are statisticaly insignificant and if we were to use the prediction model on these municipalities, we may need to re-estimate the model without the intercept term being present.