Overview

Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank and a developing country, with the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal as shown in the figure below. The GDP per capita of the poorest municipality is R$3190.6. On the other hand, the GDP per capita of the richest municipality is R$314638. Half of the municipalities with GDP per capita less than R$16000 and the top 25% municipalities earn R$26155 and above. This assignment determine factors affecting the unequal development of Brazil at the municipality level by using the data provided.

Data

BRAZIL_CITIES.csv. This data file consists of 81 columns and 5573 rows. Each row representing one municipality.
Data_Dictionary.csv. This file provides meta data of each columns in BRAZIL_CITIES.csv.
2016 municipality boundary file from geobr

Installing packages

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse','geobr','cartography','spatstat')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

## Loading required package: olsrr

## 
## Attaching package: 'olsrr'

## The following object is masked from 'package:datasets':
## 
##     rivers

## Loading required package: corrplot

## corrplot 0.84 loaded

## Loading required package: ggpubr

## Loading required package: ggplot2

## Loading required package: sf

## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

## Loading required package: spdep

## Loading required package: sp

## Loading required package: spData

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

## Loading required package: GWmodel

## Loading required package: maptools

## Checking rgeos availability: TRUE

## Loading required package: robustbase

## Loading required package: Rcpp

## Loading required package: spatialreg

## Loading required package: Matrix

## Registered S3 methods overwritten by 'spatialreg':
##   method                   from 
##   residuals.stsls          spdep
##   deviance.stsls           spdep
##   coef.stsls               spdep
##   print.stsls              spdep
##   summary.stsls            spdep
##   print.summary.stsls      spdep
##   residuals.gmsar          spdep
##   deviance.gmsar           spdep
##   coef.gmsar               spdep
##   fitted.gmsar             spdep
##   print.gmsar              spdep
##   summary.gmsar            spdep
##   print.summary.gmsar      spdep
##   print.lagmess            spdep
##   summary.lagmess          spdep
##   print.summary.lagmess    spdep
##   residuals.lagmess        spdep
##   deviance.lagmess         spdep
##   coef.lagmess             spdep
##   fitted.lagmess           spdep
##   logLik.lagmess           spdep
##   fitted.SFResult          spdep
##   print.SFResult           spdep
##   fitted.ME_res            spdep
##   print.ME_res             spdep
##   print.lagImpact          spdep
##   plot.lagImpact           spdep
##   summary.lagImpact        spdep
##   HPDinterval.lagImpact    spdep
##   print.summary.lagImpact  spdep
##   print.sarlm              spdep
##   summary.sarlm            spdep
##   residuals.sarlm          spdep
##   deviance.sarlm           spdep
##   coef.sarlm               spdep
##   vcov.sarlm               spdep
##   fitted.sarlm             spdep
##   logLik.sarlm             spdep
##   anova.sarlm              spdep
##   predict.sarlm            spdep
##   print.summary.sarlm      spdep
##   print.sarlm.pred         spdep
##   as.data.frame.sarlm.pred spdep
##   residuals.spautolm       spdep
##   deviance.spautolm        spdep
##   coef.spautolm            spdep
##   fitted.spautolm          spdep
##   print.spautolm           spdep
##   summary.spautolm         spdep
##   logLik.spautolm          spdep
##   print.summary.spautolm   spdep
##   print.WXImpact           spdep
##   summary.WXImpact         spdep
##   print.summary.WXImpact   spdep
##   predict.SLX              spdep

## 
## Attaching package: 'spatialreg'

## The following objects are masked from 'package:spdep':
## 
##     anova.sarlm, as.spam.listw, as_dgRMatrix_listw, as_dsCMatrix_I,
##     as_dsCMatrix_IrW, as_dsTMatrix_listw, bptest.sarlm, can.be.simmed,
##     cheb_setup, coef.gmsar, coef.sarlm, coef.spautolm, coef.stsls,
##     create_WX, deviance.gmsar, deviance.sarlm, deviance.spautolm,
##     deviance.stsls, do_ldet, eigen_pre_setup, eigen_setup, eigenw,
##     errorsarlm, fitted.gmsar, fitted.ME_res, fitted.sarlm,
##     fitted.SFResult, fitted.spautolm, get.ClusterOption,
##     get.coresOption, get.mcOption, get.VerboseOption,
##     get.ZeroPolicyOption, GMargminImage, GMerrorsar, griffith_sone,
##     gstsls, Hausman.test, HPDinterval.lagImpact, impacts, intImpacts,
##     Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB,
##     lextrS, lextrW, lmSLX, logLik.sarlm, logLik.spautolm, LR.sarlm,
##     LR1.sarlm, LR1.spautolm, LU_prepermutate_setup, LU_setup,
##     Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc,
##     mom_calc_int2, moments_setup, powerWeights, predict.sarlm,
##     predict.SLX, print.gmsar, print.ME_res, print.sarlm,
##     print.sarlm.pred, print.SFResult, print.spautolm, print.stsls,
##     print.summary.gmsar, print.summary.sarlm, print.summary.spautolm,
##     print.summary.stsls, residuals.gmsar, residuals.sarlm,
##     residuals.spautolm, residuals.stsls, sacsarlm, SE_classic_setup,
##     SE_interp_setup, SE_whichMin_setup, set.ClusterOption,
##     set.coresOption, set.mcOption, set.VerboseOption,
##     set.ZeroPolicyOption, similar.listw, spam_setup, spam_update_setup,
##     SpatialFiltering, spautolm, spBreg_err, spBreg_lag, spBreg_sac,
##     stsls, subgraph_eigenw, summary.gmsar, summary.sarlm,
##     summary.spautolm, summary.stsls, trW, vcov.sarlm, Wald1.sarlm

## Welcome to GWmodel version 2.1-4.
## The new version of GWmodel 2.1-4 now is readyLoading required package: tmap
## Loading required package: tidyverse
## -- Attaching packages --------- tidyverse 1.3.0 --
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.0.3     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4     
## -- Conflicts ------------ tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x tidyr::pack()   masks Matrix::pack()
## x tidyr::unpack() masks Matrix::unpack()
## Loading required package: geobr
## Loading required package: cartography
## Loading required package: spatstat
## Loading required package: spatstat.data
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## 
## The following object is masked from 'package:dplyr':
## 
##     collapse
## 
## Loading required package: rpart
## Registered S3 method overwritten by 'spatstat':
##   method     from
##   print.boxx cli 
## 
## spatstat 1.64-1       (nickname: 'Help you I can, yes!') 
## For an introduction to spatstat, type 'beginner' 
## 
## 
## Attaching package: 'spatstat'
## 
## The following objects are masked from 'package:ggpubr':
## 
##     border, rotate

Imporing aspatial data

dictionary <- read_csv2("data/aspatial/Data_Dictionary.csv")

## Using ',' as decimal and '.' as grouping mark. Use read_delim() for more control.

## Warning: Missing column names filled in: 'X6' [6]

## Parsed with column specification:
## cols(
##   FIELD = col_character(),
##   DESCRIPTION = col_character(),
##   REFERENCE = col_character(),
##   UNIT = col_character(),
##   SOURCE = col_character(),
##   X6 = col_character()
## )

NA Check

sum(is.na(dictionary))

## [1] 176

We will not omit theese values as the dictionary is used for reference . Despite this , we will use this dataframe with caution.

brazil <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   CITY = col_character(),
##   STATE = col_character(),
##   AREA = col_number(),
##   REGIAO_TUR = col_character(),
##   CATEGORIA_TUR = col_character(),
##   RURAL_URBAN = col_character(),
##   GVA_MAIN = col_character()
## )

## See spec(...) for full column specifications.

glimpse(brazil)

## Rows: 5,573
## Columns: 81
## $ CITY                     <chr> "Abadia De Goiás", "Abadia Dos Dourados", ...
## $ STATE                    <chr> "GO", "MG", "GO", "MG", "PA", "CE", "BA", ...
## $ CAPITAL                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IBGE_RES_POP             <dbl> 6876, 6704, 15757, 22690, 141100, 10496, 8...
## $ IBGE_RES_POP_BRAS        <dbl> 6876, 6704, 15609, 22690, 141040, 10496, 8...
## $ IBGE_RES_POP_ESTR        <dbl> 0, 0, 148, 0, 60, 0, 0, 0, 0, 0, 0, 16, 17...
## $ IBGE_DU                  <dbl> 2137, 2328, 4655, 7694, 31061, 2791, 2572,...
## $ IBGE_DU_URBAN            <dbl> 1546, 1481, 3233, 6667, 19057, 1251, 1193,...
## $ IBGE_DU_RURAL            <dbl> 591, 847, 1422, 1027, 12004, 1540, 1379, 1...
## $ IBGE_POP                 <dbl> 5300, 4154, 10656, 18464, 82956, 4538, 372...
## $ IBGE_1                   <dbl> 69, 38, 139, 176, 1354, 98, 37, 167, 69, 1...
## $ `IBGE_1-4`               <dbl> 318, 207, 650, 856, 5567, 323, 156, 733, 3...
## $ `IBGE_5-9`               <dbl> 438, 260, 894, 1233, 7618, 421, 263, 978, ...
## $ `IBGE_10-14`             <dbl> 517, 351, 1087, 1539, 8905, 483, 277, 927,...
## $ `IBGE_15-59`             <dbl> 3542, 2709, 6896, 11979, 53516, 2631, 2319...
## $ `IBGE_60+`               <dbl> 416, 589, 990, 2681, 5996, 582, 673, 803, ...
## $ IBGE_PLANTED_AREA        <dbl> 319, 4479, 10307, 1862, 25200, 2598, 895, ...
## $ `IBGE_CROP_PRODUCTION_$` <dbl> 1843, 18017, 33085, 7502, 700872, 5234, 39...
## $ `IDHM Ranking 2010`      <dbl> 1689, 2207, 2202, 1994, 3530, 3522, 4086, ...
## $ IDHM                     <dbl> 0.708, 0.690, 0.690, 0.698, 0.628, 0.628, ...
## $ IDHM_Renda               <dbl> 0.687, 0.693, 0.671, 0.720, 0.579, 0.540, ...
## $ IDHM_Longevidade         <dbl> 0.830, 0.839, 0.841, 0.848, 0.798, 0.748, ...
## $ IDHM_Educacao            <dbl> 0.622, 0.563, 0.579, 0.556, 0.537, 0.612, ...
## $ LONG                     <dbl> -49.44055, -47.39683, -48.71881, -45.44619...
## $ LAT                      <dbl> -16.758812, -18.487565, -16.182672, -19.15...
## $ ALT                      <dbl> 893.60, 753.12, 1017.55, 644.74, 10.12, 40...
## $ PAY_TV                   <dbl> 360, 77, 227, 1230, 3389, 29, 952, 51, 55,...
## $ FIXED_PHONES             <dbl> 842, 296, 720, 1716, 1218, 34, 335, 222, 3...
## $ AREA                     <dbl> 147.26, 881.06, 1045.13, 1817.07, 1610.65,...
## $ REGIAO_TUR               <chr> NA, "Caminhos Do Cerrado", "Região Turísti...
## $ CATEGORIA_TUR            <chr> NA, "D", "C", "D", "D", NA, "D", NA, NA, "...
## $ ESTIMATED_POP            <dbl> 8583, 6972, 19614, 23223, 156292, 11663, 8...
## $ RURAL_URBAN              <chr> "Urbano", "Rural Adjacente", "Rural Adjace...
## $ GVA_AGROPEC              <dbl> 6.20, 50524.57, 42.84, 113824.60, 140463.7...
## $ GVA_INDUSTRY             <dbl> 27991.25, 25917.70, 16728.30, 31002.62, 58...
## $ GVA_SERVICES             <dbl> 74750.32, 62689.23, 138198.58, 172.33, 468...
## $ GVA_PUBLIC               <dbl> 36915.04, 28083.79, 63396.20, 86081.41, 48...
## $ ` GVA_TOTAL `            <dbl> 145857.60, 167215.28, 261161.91, 403241.27...
## $ TAXES                    <dbl> 20554.20, 12873.50, 26822.58, 26994.09, 95...
## $ GDP                      <dbl> 166.41, 180.09, 287984.49, 430235.36, 1249...
## $ POP_GDP                  <dbl> 8053, 7037, 18427, 23574, 151934, 11483, 9...
## $ GDP_CAPITA               <dbl> 20664.57, 25591.70, 15628.40, 18250.42, 82...
## $ GVA_MAIN                 <chr> "Demais serviços", "Demais serviços", "Dem...
## $ MUN_EXPENDIT             <dbl> 28227691, 17909274, 37513019, NA, NA, NA, ...
## $ COMP_TOT                 <dbl> 284, 476, 288, 621, 931, 86, 191, 87, 285,...
## $ COMP_A                   <dbl> 5, 6, 5, 18, 4, 1, 6, 2, 5, 2, 0, 8, 3, 1,...
## $ COMP_B                   <dbl> 1, 6, 9, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, ...
## $ COMP_C                   <dbl> 56, 30, 26, 40, 43, 4, 8, 3, 20, 4, 9, 40,...
## $ COMP_D                   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ COMP_E                   <dbl> 2, 2, 2, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 2, ...
## $ COMP_F                   <dbl> 29, 34, 7, 20, 27, 6, 4, 0, 10, 2, 0, 25, ...
## $ COMP_G                   <dbl> 110, 190, 117, 303, 500, 48, 97, 71, 133, ...
## $ COMP_H                   <dbl> 26, 70, 12, 62, 16, 2, 5, 0, 18, 8, 1, 67,...
## $ COMP_I                   <dbl> 4, 28, 57, 30, 31, 10, 5, 1, 14, 3, 0, 25,...
## $ COMP_J                   <dbl> 5, 11, 2, 9, 6, 2, 3, 1, 8, 1, 1, 9, 5, 14...
## $ COMP_K                   <dbl> 0, 0, 1, 6, 1, 0, 1, 0, 0, 1, 0, 4, 3, 3, ...
## $ COMP_L                   <dbl> 2, 4, 0, 4, 1, 0, 0, 0, 4, 0, 0, 7, 4, 4, ...
## $ COMP_M                   <dbl> 10, 15, 7, 28, 22, 2, 5, 0, 11, 4, 2, 26, ...
## $ COMP_N                   <dbl> 12, 29, 15, 27, 16, 3, 5, 1, 26, 0, 1, 16,...
## $ COMP_O                   <dbl> 4, 2, 3, 2, 2, 2, 2, 2, 2, 2, 6, 2, 4, 2, ...
## $ COMP_P                   <dbl> 6, 9, 11, 15, 155, 0, 8, 0, 8, 1, 6, 14, 1...
## $ COMP_Q                   <dbl> 6, 14, 5, 19, 33, 2, 1, 2, 9, 3, 0, 13, 22...
## $ COMP_R                   <dbl> 1, 6, 1, 9, 15, 0, 2, 0, 4, 0, 0, 4, 6, 6,...
## $ COMP_S                   <dbl> 5, 19, 8, 27, 56, 4, 38, 4, 12, 3, 4, 23, ...
## $ COMP_T                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COMP_U                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ HOTELS                   <dbl> NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, ...
## $ BEDS                     <dbl> NA, NA, 34, NA, NA, NA, 24, NA, NA, NA, NA...
## $ Pr_Agencies              <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Agencies              <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Bank                  <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Bank                  <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Assets                <dbl> NA, NA, 33724584, 44974716, 76181384, NA, ...
## $ Pu_Assets                <dbl> NA, NA, 67091904, 371922572, 800078483, NA...
## $ Cars                     <dbl> 2158, 2227, 2838, 6928, 5277, 553, 896, 61...
## $ Motorcycles              <dbl> 1246, 1142, 1426, 2953, 25661, 1674, 696, ...
## $ Wheeled_tractor          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, ...
## $ UBER                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MAC                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ `WAL-MART`               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ POST_OFFICES             <dbl> 1, 1, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...

Rows that have NA GDP_Capita values

brazil[!complete.cases(brazil$GDP_CAPITA), ]

## # A tibble: 3 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Lago~ RS          0           NA               NA               NA      NA
## 2 Sant~ BA          0           NA               NA               NA      NA
## 3 São ~ PE          0           NA               NA               NA      NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

We will not be omiting this rows yet as GDP_CAPITA is the dependent variable of our analysis and more investigation will be done later on.

Rows that do not have LAT LONG data

brazil[!complete.cases(brazil$LAT), ]

## # A tibble: 9 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Baln~ SC          0           NA               NA               NA      NA
## 2 Lago~ RS          0           NA               NA               NA      NA
## 3 Moju~ PA          0           NA               NA               NA      NA
## 4 Para~ MS          0           NA               NA               NA      NA
## 5 Pesc~ SC          0           NA               NA               NA      NA
## 6 Pinh~ RS          0         2130             2130                0     745
## 7 Pint~ RS          0           NA               NA               NA      NA
## 8 Sant~ BA          0         9648             9648                0    2891
## 9 São ~ PE          0           NA               NA               NA      NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

For these rows, I will be inputting the data manually later on in the analysis, I will filter the valid cities with LAT LONG data and invalid cities without

valid <- brazil %>% filter(!is.na(LAT))
invalid <- brazil %>% filter(is.na(LAT))

convert to sf data frame

valid <- st_as_sf(valid,
              coords = c("LONG", "LAT"),
              crs=4674)

Importing geospatial data

Brazil 2016 municipality boundary

mun <- read_municipality(year=2016)

## Using year 2016

## Loading data for the whole country. This might take a few minutes.

## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%

st_crs(mun)

## Coordinate Reference System:
##   User input: SIRGAS 2000 
##   wkt:
## GEOGCRS["SIRGAS 2000",
##     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Latin America - SIRGAS 2000 by country"],
##         BBOX[-59.87,-122.19,32.72,-25.28]],
##     ID["EPSG",4674]]

Validity Checks

all(st_is_valid(mun))

## [1] FALSE

muni_invalid <- mun %>% filter(!st_is_valid(mun))
st_is_valid(muni_invalid, reason = TRUE)

## [1] "Nested shells[-46.999541435 -24.3495746469999]"

mun <- st_make_valid(mun)

checking for invalid polygons again

check <- function(target_st) {
  validity <- st_is_valid(target_st)
  NA_rows <- target_st[rowSums(is.na(target_st))!=0,]
  Invalid_rows <- which(validity==FALSE)
  print(paste("For:", deparse(substitute(target_st))))
  print(paste("Number of Invalid polygons:", length(Invalid_rows)))
  
}
check(mun)

## [1] "For: mun"
## [1] "Number of Invalid polygons: 0"

Checking the municipality unique identifier

length(unique(mun$name_muni))

## [1] 5299

length(unique(mun$code_muni))

## [1] 5572

From the results , we can see that there are some municipality with the same names. We should use code_muni as the unique identifier.

using st_area function to obtain area of municipility

mun <- mun %>% 
  mutate(area = st_area(mun))

Joining aspatial and geospatial data

merged brazil cities to municipal polygon

brazil_mun <- st_join(valid,mun,join=st_within)

## although coordinates are longitude/latitude, st_within assumes that they are planar

By joining the data, the missing GDP_CAPITA is addressed

merged municipal polygon to brazil cities

mun_brazil <- st_join(mun,valid)

## although coordinates are longitude/latitude, st_intersects assumes that they are planar

For rows that did not have a successful join , we wil filter them out and anaylse them . We will also be verifying the successful joins

success <- mun_brazil %>% filter(!is.na(CITY))
unsucess <- mun_brazil %>% filter(is.na(CITY))  %>% select(code_muni, name_muni, code_state, abbrev_state, geom)

For verifying the successfull joins , the CITY variable from our brazil dataframe is supposed to tally with name_muni from the municipility data from geobr and there should be no rows with dfferent city names

different <- success %>% filter(name_muni != CITY) %>% select(name_muni, CITY, abbrev_state, STATE)
print(different)

## Simple feature collection with 48 features and 4 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -63.19642 ymin: -31.33209 xmax: -35.54424 ymax: -3.917303
## geographic CRS: SIRGAS 2000
## First 10 features:
##                   name_muni                     CITY abbrev_state STATE
## 1     Alta Floresta D'oeste    Alta Floresta D'Oeste           RO    RO
## 2           Espigão D'oeste          Espigão D'Oeste           RO    RO
## 3        Machadinho D'oeste       Machadinho D'Oeste           RO    RO
## 4  Nova Brasilândia D'oeste Nova Brasilândia D'Oeste           RO    RO
## 5       Santa Luzia D'oeste      Santa Luzia D'Oeste           RO    RO
## 6          Alvorada D'oeste         Alvorada D'Oeste           RO    RO
## 7        São Felipe D'oeste       São Felipe D'Oeste           RO    RO
## 8                Pau D'arco               Pau D'Arco           PA    PA
## 9                Pau D'arco               Pau D'Arco           TO    TO
## 10   Olho D'água Das Cunhãs   Olho D'Água Das Cunhãs           MA    MA
##                              geom
## 1  POLYGON ((-62.19465 -11.827...
## 2  POLYGON ((-60.94827 -10.988...
## 3  POLYGON ((-62.22055 -8.5908...
## 4  POLYGON ((-61.9569 -11.2959...
## 5  POLYGON ((-61.53069 -11.856...
## 6  POLYGON ((-62.18776 -11.068...
## 7  POLYGON ((-61.53458 -11.743...
## 8  MULTIPOLYGON (((-50.3382 -7...
## 9  POLYGON ((-49.23783 -7.4883...
## 10 MULTIPOLYGON (((-45.07365 -...

From the output , the cities have the same name and state .Hence , the join was successful.

Next , we wll view that unsuccessful joins and the invalid brazilian cities data

unsucess

## Simple feature collection with 8 features and 4 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -54.83431 ymin: -33.61653 xmax: -48.81431 ymax: -2.593611
## geographic CRS: SIRGAS 2000
##   code_muni         name_muni code_state abbrev_state
## 1   1504752  Mojuí Dos Campos         15           PA
## 2   4212650    Pescaria Brava         42           SC
## 3   4220000  Balneário Rincão         42           SC
## 4   4300001       Lagoa Mirim         43           RS
## 5   4300002   Lagoa Dos Patos         43           RS
## 6   4314464   Pinhal Da Serra         43           RS
## 7   4314548    Pinto Bandeira         43           RS
## 8   5006275 Paraíso Das Águas         50           MS
##                             geom
## 1 MULTIPOLYGON (((-54.58777 -...
## 2 MULTIPOLYGON (((-48.92608 -...
## 3 MULTIPOLYGON (((-49.20933 -...
## 4 POLYGON ((-52.62241 -32.146...
## 5 POLYGON ((-51.2719 -30.0389...
## 6 POLYGON ((-51.29481 -27.725...
## 7 POLYGON ((-51.45725 -29.029...
## 8 POLYGON ((-52.90027 -18.782...

invalid

## # A tibble: 9 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Baln~ SC          0           NA               NA               NA      NA
## 2 Lago~ RS          0           NA               NA               NA      NA
## 3 Moju~ PA          0           NA               NA               NA      NA
## 4 Para~ MS          0           NA               NA               NA      NA
## 5 Pesc~ SC          0           NA               NA               NA      NA
## 6 Pinh~ RS          0         2130             2130                0     745
## 7 Pint~ RS          0           NA               NA               NA      NA
## 8 Sant~ BA          0         9648             9648                0    2891
## 9 São ~ PE          0           NA               NA               NA      NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

When comparing the 2 outputs , there are 3 findings to take note.

São Caetano in state PE is in the BRAZIL_CITIES.csv file but not in the municipality data downloaded through geobr.

sao <- invalid%>%
  filter(CITY == "São Caetano")
 sao

## # A tibble: 1 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 São ~ PE          0           NA               NA               NA      NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

The GDP_CAPITA is also NA , we will remove this city from our study.

Santa Terezinha in state BA is in the BRAZIL_CITIES.csv file but not in the municipality data downloaded through geobr.

santa <- invalid%>%
  filter(CITY == "Santa Terezinha")
 santa

## # A tibble: 1 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Sant~ BA          0         9648             9648                0    2891
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

The GDP_CAPITA is also NA , we will remove this city from our study.

The city Lagoa Mirim in state RS is in municipality data downloaded through geobr, but not in the BRAZIL_CITIES.csv file. Since it is a lagoon (Source: https://en.wikipedia.org/wiki/Lagoon_Mirim) . We will remove this from our study

invalid_fixed <- invalid %>%
  filter(CITY != "Santa Terezinha") %>% filter(CITY != "São Caetano")

From this the cities that do not have LONG LAT are : Balneário Rincão Lagoa Dos Patos Mojuí Dos Campos Paraíso Das Águas Pescaria Brava Pinhal Da Serra Pinto Bandeira Santa Terezinha São Caetano .

I will manually input the data with reference to google maps.

invalid_fixed$LONG[invalid_fixed$CITY == "Balneário Rincão"] <- -49.2361
invalid_fixed$LAT[invalid_fixed$CITY == "Balneário Rincão"] <- -28.8344

invalid_fixed$LONG[invalid_fixed$CITY == "Lagoa Dos Patos"] <- --51.4725
invalid_fixed$LAT[invalid_fixed$CITY == "Lagoa Dos Patos"] <- -31.0697

invalid_fixed$LONG[invalid_fixed$CITY == "Mojuí Dos Campos"] <- -54.6431
invalid_fixed$LAT[invalid_fixed$CITY == "Mojuí Dos Campos"] <- -2.68472

invalid_fixed$LONG[invalid_fixed$CITY == "Paraíso Das Águas"] <- -53.0102
invalid_fixed$LAT[invalid_fixed$CITY == "Paraíso Das Águas"] <- -19.0257

invalid_fixed$LONG[invalid_fixed$CITY == "Pescaria Brava"] <- -48.8956
invalid_fixed$LAT[invalid_fixed$CITY == "Pescaria Brava"] <- -28.4247

invalid_fixed$LONG[invalid_fixed$CITY == "Pinhal Da Serra"] <- -51.1733
invalid_fixed$LAT[invalid_fixed$CITY == "Pinhal Da Serra"] <- -27.8747

invalid_fixed$LONG[invalid_fixed$CITY == "Pinto Bandeira"] <- -51.4503
invalid_fixed$LAT[invalid_fixed$CITY == "Pinto Bandeira"] <- -29.0978

invalid_fixed$LONG[invalid_fixed$CITY == "Santa Terezinha"] <- -39.5184
invalid_fixed$LAT[invalid_fixed$CITY == "Santa Terezinha"] <- -12.7498

invalid_fixed$LONG[invalid_fixed$CITY == "São Caetano"] <- -36.1459
invalid_fixed$LAT[invalid_fixed$CITY == "São Caetano"] <- -8.33

invalid_fixed_sf <- st_as_sf(invalid_fixed,
              coords = c("LONG", "LAT"),
              crs=4674)

Plot fixed data

centroid <- st_centroid(unsucess)

## Warning in st_centroid.sf(unsucess): st_centroid assumes attributes are constant
## over geometries of x

## Warning in st_centroid.sfc(st_geometry(x), of_largest_polygon =
## of_largest_polygon): st_centroid does not give correct centroids for longitude/
## latitude data

getmap <- function(osm) {
  tm_shape(osm)+ 
    tm_layout(legend.outside = TRUE)+
    tm_rgb()+
  tm_shape(unsucess)+
    tm_borders(alpha=0.5, col = "darkblue", lwd = 2, lty="longdash")+
  tm_shape(centroid)+
    tm_bubbles(col="yellow", size=2, alpha=0.5)+
  tm_shape(invalid_fixed_sf)+
    tm_bubbles(col="green", size=2, alpha=0.5)
}

The green circle represents the center of the municipilty from Google Maps while the yellow circle represents the center of the centriod.

Mojuí Dos Campos

mojui <- unsucess%>%
  filter(code_muni == "1504752")
 mojui<-getTiles(x = mojui, type = "osm")
getmap(mojui)

## Warning: Raster values found that are outside the range [0, 255]

Pinhal Da Serra

pinhal <- unsucess%>%
  filter(code_muni == "4314464")
pinhal<-getTiles(x = pinhal, type = "osm")
getmap(pinhal)

## Warning: Raster values found that are outside the range [0, 255]

Pinto Bandeira

pinto <- unsucess%>%
  filter(code_muni == "4314548")
pinto<-getTiles(x = pinto, type = "osm")
getmap(pinto)

## Warning: Raster values found that are outside the range [0, 255]

Pescaria Brava

pes <- unsucess%>%
  filter(code_muni == "4212650")
pes<-getTiles(x = pes, type = "osm")
getmap(pes)

## Warning: Raster values found that are outside the range [0, 255]

Balneário Rincão

bal <- unsucess%>%
  filter(code_muni == "4220000")
bal<-getTiles(x = bal, type = "osm")
getmap(bal)

## Warning: Raster values found that are outside the range [0, 255]

Paraíso Das Águas

par <- unsucess%>%
  filter(code_muni == "5006275")
par<-getTiles(x = par, type = "osm")
getmap(par)

## Warning: Raster values found that are outside the range [0, 255]

From these plots , we can see that using online data shows the center of economic activity which is beneficial for analysis.

Joining fixed data

merged_fixed <- st_join(invalid_fixed_sf, mun,join=st_within)

## although coordinates are longitude/latitude, st_within assumes that they are planar

merged_fixed_sf <- rbind(brazil_mun, merged_fixed)

NA check

merged_fixed_sf[!complete.cases(merged_fixed_sf$GDP_CAPITA), ]

## Simple feature collection with 2 features and 84 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -39.52114 ymin: -31.0697 xmax: 51.4725 ymax: -12.77285
## geographic CRS: SIRGAS 2000
## # A tibble: 2 x 85
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <chr>   <dbl>        <dbl>            <dbl>            <dbl>   <dbl>
## 1 Sant~ BA          0           NA               NA               NA      NA
## 2 Lago~ RS          0           NA               NA               NA      NA
## # ... with 78 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, ALT <dbl>, PAY_TV <dbl>, FIXED_PHONES <dbl>,
## #   AREA <dbl>, REGIAO_TUR <chr>, CATEGORIA_TUR <chr>, ESTIMATED_POP <dbl>,
## #   RURAL_URBAN <chr>, GVA_AGROPEC <dbl>, GVA_INDUSTRY <dbl>,
## #   GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL ` <dbl>, TAXES <dbl>,
## #   GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>, GVA_MAIN <chr>,
## #   MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>, COMP_B <dbl>,
## #   COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>, COMP_G <dbl>,
## #   COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>, COMP_L <dbl>,
## #   COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>, COMP_Q <dbl>,
## #   COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>, HOTELS <dbl>,
## #   BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>, Pr_Bank <dbl>,
## #   Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <dbl>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>, geometry <POINT [°]>,
## #   code_muni <dbl>, name_muni <fct>, code_state <fct>, abbrev_state <fct>,
## #   area [m^2]

Since Santa Terezinha in state BA is in the BRAZIL_CITIES.csv file but not in the municipality data downloaded through geobr and lagoa dos patos is a lagoon , we will be removing these 2 data.

merged_fixed_sf <- merged_fixed_sf %>% filter(!is.na(GDP_CAPITA))

Next , we will join the data into our municipility data

merged_fixed <- as.data.frame(merged_fixed_sf)


muni_merged <- left_join(mun, merged_fixed, "code_muni") %>% filter(!is.na(CITY))

NA check

muni_merged[!complete.cases(muni_merged$GDP_CAPITA), ]

## Simple feature collection with 0 features and 88 fields
## Active geometry column: geom
## bbox:           xmin: NA ymin: NA xmax: NA ymax: NA
## geographic CRS: SIRGAS 2000
##  [1] code_muni              name_muni.x            code_state.x          
##  [4] abbrev_state.x         area.x                 CITY                  
##  [7] STATE                  CAPITAL                IBGE_RES_POP          
## [10] IBGE_RES_POP_BRAS      IBGE_RES_POP_ESTR      IBGE_DU               
## [13] IBGE_DU_URBAN          IBGE_DU_RURAL          IBGE_POP              
## [16] IBGE_1                 IBGE_1-4               IBGE_5-9              
## [19] IBGE_10-14             IBGE_15-59             IBGE_60+              
## [22] IBGE_PLANTED_AREA      IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010     
## [25] IDHM                   IDHM_Renda             IDHM_Longevidade      
## [28] IDHM_Educacao          ALT                    PAY_TV                
## [31] FIXED_PHONES           AREA                   REGIAO_TUR            
## [34] CATEGORIA_TUR          ESTIMATED_POP          RURAL_URBAN           
## [37] GVA_AGROPEC            GVA_INDUSTRY           GVA_SERVICES          
## [40] GVA_PUBLIC              GVA_TOTAL             TAXES                 
## [43] GDP                    POP_GDP                GDP_CAPITA            
## [46] GVA_MAIN               MUN_EXPENDIT           COMP_TOT              
## [49] COMP_A                 COMP_B                 COMP_C                
## [52] COMP_D                 COMP_E                 COMP_F                
## [55] COMP_G                 COMP_H                 COMP_I                
## [58] COMP_J                 COMP_K                 COMP_L                
## [61] COMP_M                 COMP_N                 COMP_O                
## [64] COMP_P                 COMP_Q                 COMP_R                
## [67] COMP_S                 COMP_T                 COMP_U                
## [70] HOTELS                 BEDS                   Pr_Agencies           
## [73] Pu_Agencies            Pr_Bank                Pu_Bank               
## [76] Pr_Assets              Pu_Assets              Cars                  
## [79] Motorcycles            Wheeled_tractor        UBER                  
## [82] MAC                    WAL-MART               POST_OFFICES          
## [85] name_muni.y            code_state.y           abbrev_state.y        
## [88] area.y                 geom                   geometry              
## <0 rows> (or 0-length row.names)

EDA using statistical graphs and chloropleth mapping

tm_shape(muni_merged)+
  tm_fill("GDP_CAPITA",
          n = 5,
          style = "fisher") +
  tm_layout(scale=0.6)

From the plot , we can see that the highest GDP per capita are around the satelight cities around Sao Paulo rather than the main city itself. We can also see concentrations of higher GDP per capita in the inland areas like Campos De Júlio. This could be due to a lower population while the region is still generating a large amount of production.

ggplot(data=muni_merged, aes(x=`GDP_CAPITA`)) +
  geom_histogram(bins=20, color="black", fill="light blue")

The plot reveals that GDP_CAPITA is right skewed distribution. Therefore, there are many brazil municipality earns much lower GDP_CAPITA which indicates the vast difference from the richest to the poorest municipality.

Data cleaning (analysing variables with NA)

Human Development Index

dictionary %>% filter(FIELD %in% c("IDHM Ranking 2010", "IDHM", "IDHM_Renda", "IDHM_Longevidade", "IDHM_Educacao"))

## # A tibble: 4 x 6
##   FIELD     DESCRIPTION      REFERENCE UNIT  SOURCE                        X6   
##   <chr>     <chr>            <chr>     <chr> <chr>                         <chr>
## 1 IDHM      HDI Human Devel~ 2010      -     http://www.br.undp.org/conte~ <NA> 
## 2 IDHM_Ren~ HDI GNI Index    2010      -     http://www.br.undp.org/conte~ <NA> 
## 3 IDHM_Lon~ HDI Life Expect~ 2010      -     http://www.br.undp.org/conte~ <NA> 
## 4 IDHM_Edu~ HDI Education i~ 2010      -     http://www.br.undp.org/conte~ <NA>

Replacing the NA values with 0 will make those rows increase in “value”. Instead , we will use the last position for IDHM Ranking 2010 and the minimum value for the rest of the index variables.

merged_fixed_sf$"IDHM Ranking 2010" <- merged_fixed_sf$"IDHM Ranking 2010" %>% replace_na(5569)
merged_fixed_sf$"IDHM" <- merged_fixed_sf$"IDHM" %>% replace_na(0.4180)
merged_fixed_sf$"IDHM_Renda" <- merged_fixed_sf$"IDHM_Renda" %>% replace_na(0.4000)
merged_fixed_sf$"IDHM_Longevidade" <- merged_fixed_sf$"IDHM_Longevidade" %>% replace_na(0.6720)
merged_fixed_sf$"IDHM_Educacao" <- merged_fixed_sf$"IDHM_Longevidade" %>% replace_na(0.2070)

dictionary %>% filter(FIELD %in% c("AREA"))

## # A tibble: 1 x 6
##   FIELD DESCRIPTION      REFERENCE UNIT              SOURCE                X6   
##   <chr> <chr>            <chr>     <chr>             <chr>                 <chr>
## 1 AREA  City area (squa~ 2018      1 squared Kilome~ https://www.ibge.gov~ <NA>

From the output, there is 1 row with NA for area. We will use the area computed with st_area instead

Lastly , we will replace will numerical variables to 0 and all categorical variables to .

merged_fixed_sf <- merged_fixed_sf %>% 
  mutate_if(is.numeric , replace_na, replace = 0)
merged_fixed_sf <- merged_fixed_sf %>% replace(is.na(.), "NA")
any(is.na(merged_fixed_sf))

## [1] FALSE

Derive new variables

merged_derived <- merged_fixed_sf %>%
  mutate(area = area / 100000) %>%
 
mutate(`Tax_To_GDP` = `TAXES`/`GDP`) %>%
  mutate(`Active`=`IBGE_15-59`/`IBGE_POP`)%>%
  mutate(`Cars_p` = `Cars` /`ESTIMATED_POP` ) %>%
  mutate(`Motorcycles_p` = `Motorcycles` /`ESTIMATED_POP`) %>%
  mutate(`Wheeled_tractor_p` = `Wheeled_tractor` /`ESTIMATED_POP`)%>%
  mutate(`GVA_AGROPEC_p` = `GVA_AGROPEC`/`POP_GDP`)%>%
  mutate(`GVA_SERVICES_p` = `GVA_SERVICES`/`POP_GDP`)%>%
   mutate(`GVA_PUBLIC_p` = `GVA_PUBLIC`/`POP_GDP`)%>%
  mutate(`GVA_INDUSTRY_p` = `GVA_INDUSTRY`/`POP_GDP`) %>%
  mutate(`Pop_density` = `IBGE_POP`/`area`)%>%
 mutate(Brazilian_Foreigners_Ratio = `IBGE_RES_POP_BRAS`/`IBGE_RES_POP_ESTR`) %>%
  mutate(DU_density = IBGE_DU/IBGE_POP) %>% 
   mutate(DU_Urban_Rural_Ratio = IBGE_DU_URBAN/IBGE_DU_RURAL) %>%
   mutate(planted_ratio = IBGE_PLANTED_AREA/area) %>%
   mutate(planted_value_hectare = `IBGE_CROP_PRODUCTION_$`/IBGE_PLANTED_AREA) %>%
   mutate(PAY_TV = PAY_TV / POP_GDP) %>%
  mutate(FIXED_PHONES = FIXED_PHONES / POP_GDP) %>%
  mutate(expenditures_capita = MUN_EXPENDIT/POP_GDP) %>%
   mutate(COMP_A = COMP_A/POP_GDP) %>%
  mutate(COMP_B = COMP_B/POP_GDP) %>%
  mutate(COMP_C = COMP_C/POP_GDP) %>%
  mutate(COMP_D = COMP_D/POP_GDP) %>%
  mutate(COMP_E = COMP_E/POP_GDP) %>%
  mutate(COMP_F = COMP_F/POP_GDP) %>%
  mutate(COMP_G = COMP_G/POP_GDP) %>%
  mutate(COMP_H = COMP_H/POP_GDP) %>%
  mutate(COMP_I = COMP_I/POP_GDP) %>%
  mutate(COMP_J = COMP_J/POP_GDP) %>%
  mutate(COMP_K = COMP_K/POP_GDP) %>%
  mutate(COMP_L = COMP_L/POP_GDP) %>%
  mutate(COMP_M = COMP_M/POP_GDP) %>%
  mutate(COMP_N = COMP_N/POP_GDP) %>%
  mutate(COMP_O = COMP_O/POP_GDP) %>%
  mutate(COMP_P = COMP_P/POP_GDP) %>%
  mutate(COMP_Q = COMP_Q/POP_GDP) %>%
  mutate(COMP_R = COMP_R/POP_GDP) %>%
  mutate(COMP_S = COMP_S/POP_GDP) %>%
  mutate(COMP_T = COMP_T/POP_GDP) %>%
  mutate(COMP_U = COMP_U/POP_GDP) %>%
  mutate(BEDS = BEDS/HOTELS) %>%
  mutate(HOTELS = HOTELS * 100000/ESTIMATED_POP) %>%
   mutate(Pr_Assets = Pr_Assets/Pr_Bank) %>%
  mutate(Pu_Assets = Pu_Assets/Pu_Bank) %>%
   mutate(Pr_Bank = Pr_Bank* 100000/ESTIMATED_POP) %>%
  mutate(Pu_Bank = Pu_Bank* 100000/ESTIMATED_POP) %>%
   mutate(PAY_TV = PAY_TV / ESTIMATED_POP) %>%
  mutate(FIXED_PHONES = FIXED_PHONES / ESTIMATED_POP) %>%

    mutate(MAC     = MAC  * 100000/ESTIMATED_POP) %>%
  mutate(POST_OFFICES = POST_OFFICES* 100000/ESTIMATED_POP) %>%
  
mutate(walmart = `WAL-MART`* 100000/ESTIMATED_POP)

The population data used are different as we will use the ones closest to the date of the source. POP_GDP:2016, ESTIMATED_POP:2018 , IBGE_POP:2010

Extracting indicators

brazil_var <- merged_derived%>% select(code_muni, CITY, STATE, GDP_CAPITA, CAPITAL, Pop_density, Brazilian_Foreigners_Ratio, Active, DU_density, DU_Urban_Rural_Ratio,planted_ratio, planted_value_hectare,  IDHM,IDHM_Renda, IDHM_Longevidade, IDHM_Educacao, PAY_TV, FIXED_PHONES, expenditures_capita, COMP_A, COMP_B, COMP_C, COMP_D, COMP_E, COMP_F, COMP_G, COMP_H, COMP_I, COMP_J, COMP_K, COMP_L, COMP_M, COMP_N, COMP_O, COMP_P, COMP_Q, COMP_R, COMP_S, COMP_T, COMP_U, BEDS, HOTELS, Pr_Assets, Pu_Assets, Pr_Bank,Pu_Bank,  Cars_p,Motorcycles_p, Wheeled_tractor_p, MAC,  walmart, REGIAO_TUR, CATEGORIA_TUR, RURAL_URBAN, GVA_AGROPEC_p,GVA_INDUSTRY_p,GVA_SERVICES_p,GVA_PUBLIC_p)

Choosing categorical variables for regression

We first need to check the spread of the variables

length(unique(brazil_var$CATEGORIA_TUR))

## [1] 6

length(unique(brazil_var$RURAL_URBAN))

## [1] 6

length(unique(brazil_var$REGIAO_TUR))

## [1] 323

eliminate REGIA_TUR as it has too many unique types. Now , we will look at the spread of the remaining 2 variables

table(brazil_var$RURAL_URBAN)

## 
## Intermediário Adjacente    Intermediário Remoto         Rural Adjacente 
##                     686                      60                    3039 
##            Rural Remoto       Sem classificação                  Urbano 
##                     323                       5                    1456

table(brazil_var$CATEGORIA_TUR)

## 
##    A    B    C    D    E   NA 
##   51  168  521 1891  653 2285

we will use RURAL_URBAN as CATEGORIA_TUR has many NA values which is not ideal

Besides REGIAO_TUR and CATEGORIA_TUR, , the following rows will be removed as most of their values are 0 which will not be significant in our analysis.

brazil_var <- brazil_var%>% 
  select(!REGIAO_TUR) %>%
  select(!CATEGORIA_TUR) %>%
  select(!COMP_T) %>%
  select(!COMP_U) %>%
  select(!MAC) %>%
  select(!walmart) %>%
  mutate(value = 1) %>%
  spread(RURAL_URBAN, value, fill=0)

Removing NA

muni_merged.sf <- brazil_var %>%
  replace(is.na(.), 0) %>% 
  mutate_if(is.numeric, function(x) ifelse(is.infinite(x), 0, x)) 


muni_merged_new <- st_set_geometry(brazil_var, NULL)
muni_merged_new <- brazil_var %>% replace(is.na(.), 0) %>%
  mutate_if(is.numeric, function(x) ifelse(is.infinite(x), 0, x))

muni_merged_new<- as.data.frame(muni_merged_new)

Plotting derived indicators

We will be plotting some of the factors to see the general skew of the variables.

GVA ratios

as.data.frame(brazil_var)[48:51] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Company Type ratios

as.data.frame(brazil_var)[20:38] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Vehicles

as.data.frame(brazil_var)[45:47] %>%
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Overall, the variables are right skewed

Correlation analysis

To get a better analysis to identify the relationships between the various factors , I will know conduct a correlation analysis to better understand the relationships between the respective factors.The code chunk below is used to plot a scatterplot matrix of the relationship between the independent variables in muni_merged data.frame. To avoid multticollinearity, we will plot the scatterplot to better understand the bivariate relationship.

corrplot(cor(as.data.frame(muni_merged_new[,4:51])), diag = FALSE, order = "alphabet",
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper",number.cex = 0.2)

We will be setting a threshold of >0.75 as it may lead to issues when running multi-regression analysis. Further discussion can be found here https://www.researchgate.net/post/How_can_I_avoid_multicollinearity

Hence , the following variables will be removed due to their high correlation

muni_merged_new <- muni_merged_new %>%
  select(!IDHM) %>%
  select(!IDHM_Educacao) %>%
  select(!IDHM_Renda) %>%
  select(!IDHM_Longevidade)%>%
  select(!COMP_F)%>%
  select(!COMP_G)%>%
  select(!COMP_K)%>%
  select(!GVA_SERVICES_p)%>%
  select(!GVA_AGROPEC_p)%>%
  select(!GVA_INDUSTRY_p)%>%
  select(!Cars_p)%>%
  select(!COMP_M)%>%
  select(!COMP_H)%>%
  select(!COMP_I)%>%
  select(!COMP_L)%>%
  select(!COMP_P)%>%
  select(!COMP_R)%>%
  select(!COMP_O)%>%
  select(!PAY_TV)

corrplot(cor(as.data.frame(muni_merged_new[,4:32])), diag = FALSE, order = "alphabet",
         tl.pos = "td", tl.cex = 0.5, method = "number", type = "upper",number.cex = 0.2)

Now, all our values fall below the 0.75 correlation coefficient.

Hedonic pricing model using multiple linear regression method

We will be using the following hypothesis.

H0 : The GDP per capita are randomly distributed H1 : The GDP per capita are not randomly distributed

Our confidence level will be set at 95% and our alpha value is 0.05.

muni_merged_new <- muni_merged_new %>%
  rename(RURAL_URBAN_1 = `Intermediário Adjacente`) %>%
  rename(RURAL_URBAN_2 = `Intermediário Remoto`) %>%
  rename(RURAL_URBAN_3 = `Rural Adjacente`) %>%
  rename(RURAL_URBAN_4 = `Rural Remoto`) %>%
  rename(RURAL_URBAN_5 = `Sem classificação`)

muni.mlr <- lm(formula = GDP_CAPITA ~ CAPITAL + Pop_density + Brazilian_Foreigners_Ratio + Active + DU_density + DU_Urban_Rural_Ratio +  planted_ratio + planted_value_hectare +  FIXED_PHONES + expenditures_capita + COMP_A + COMP_B + COMP_C + COMP_D + COMP_E +   COMP_J +    COMP_N   + COMP_Q +  COMP_S + BEDS + HOTELS + Pr_Assets + Pu_Assets + Pr_Bank + Pu_Bank +  Motorcycles_p + Wheeled_tractor_p +  RURAL_URBAN_1 + RURAL_URBAN_2 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_5 + Urbano  +GVA_PUBLIC_p, data=muni_merged_new)
summary(muni.mlr)

## 
## Call:
## lm(formula = GDP_CAPITA ~ CAPITAL + Pop_density + Brazilian_Foreigners_Ratio + 
##     Active + DU_density + DU_Urban_Rural_Ratio + planted_ratio + 
##     planted_value_hectare + FIXED_PHONES + expenditures_capita + 
##     COMP_A + COMP_B + COMP_C + COMP_D + COMP_E + COMP_J + COMP_N + 
##     COMP_Q + COMP_S + BEDS + HOTELS + Pr_Assets + Pu_Assets + 
##     Pr_Bank + Pu_Bank + Motorcycles_p + Wheeled_tractor_p + RURAL_URBAN_1 + 
##     RURAL_URBAN_2 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_5 + 
##     Urbano + GVA_PUBLIC_p, data = muni_merged_new)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -40531  -6984  -2150   3406 272088 
## 
## Coefficients: (1 not defined because of singularities)
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                -9.699e+04  5.934e+03 -16.345  < 2e-16 ***
## CAPITAL                    -6.843e+03  3.497e+03  -1.957 0.050415 .  
## Pop_density                 3.786e+00  4.653e+00   0.814 0.415896    
## Brazilian_Foreigners_Ratio  7.384e-02  1.517e-01   0.487 0.626568    
## Active                      1.674e+05  9.650e+03  17.343  < 2e-16 ***
## DU_density                 -1.722e+03  7.965e+02  -2.162 0.030653 *  
## DU_Urban_Rural_Ratio        5.342e-01  1.690e+00   0.316 0.751922    
## planted_ratio               1.234e+03  9.024e+01  13.676  < 2e-16 ***
## planted_value_hectare       8.645e+01  3.674e+01   2.353 0.018643 *  
## FIXED_PHONES               -2.136e+07  2.323e+07  -0.919 0.357909    
## expenditures_capita         2.648e+00  1.498e-01  17.674  < 2e-16 ***
## COMP_A                      1.495e+05  3.355e+04   4.457 8.47e-06 ***
## COMP_B                     -7.400e+05  6.404e+05  -1.155 0.247974    
## COMP_C                     -1.463e+05  1.162e+05  -1.259 0.208097    
## COMP_D                      1.657e+07  2.175e+06   7.619 2.98e-14 ***
## COMP_E                      8.334e+06  2.201e+06   3.787 0.000154 ***
## COMP_J                      1.345e+05  4.416e+05   0.305 0.760693    
## COMP_N                      1.268e+06  2.682e+05   4.728 2.32e-06 ***
## COMP_Q                      2.005e+06  5.383e+05   3.725 0.000197 ***
## COMP_S                     -5.134e+05  1.662e+05  -3.089 0.002022 ** 
## BEDS                        1.536e+01  6.764e+00   2.270 0.023226 *  
## HOTELS                      1.108e+01  1.746e+01   0.635 0.525548    
## Pr_Assets                   1.010e-07  2.374e-08   4.255 2.13e-05 ***
## Pu_Assets                   2.274e-08  8.058e-09   2.822 0.004782 ** 
## Pr_Bank                     1.215e+02  4.394e+01   2.766 0.005702 ** 
## Pu_Bank                     1.958e+02  3.127e+01   6.261 4.10e-10 ***
## Motorcycles_p              -2.060e+04  3.473e+03  -5.930 3.21e-09 ***
## Wheeled_tractor_p           1.386e+06  5.149e+05   2.692 0.007122 ** 
## RURAL_URBAN_1              -2.630e+03  8.064e+02  -3.262 0.001113 ** 
## RURAL_URBAN_2               5.729e+03  2.169e+03   2.642 0.008265 ** 
## RURAL_URBAN_3              -1.611e+03  6.842e+02  -2.354 0.018608 *  
## RURAL_URBAN_4               2.410e+03  1.117e+03   2.158 0.030972 *  
## RURAL_URBAN_5               1.067e+05  9.496e+03  11.238  < 2e-16 ***
## Urbano                             NA         NA      NA       NA    
## GVA_PUBLIC_p                1.465e+03  1.161e+02  12.618  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16150 on 5535 degrees of freedom
## Multiple R-squared:  0.3729, Adjusted R-squared:  0.3691 
## F-statistic: 99.72 on 33 and 5535 DF,  p-value: < 2.2e-16

From the summary , we will be removing variables which are statistically insignificant.

muni.mlr <- lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A +  COMP_D  + COMP_E + COMP_N + COMP_Q +   COMP_S+BEDS+ Pu_Assets   + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p  +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=muni_merged_new)
summary(muni.mlr)

## 
## Call:
## lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density + 
##     planted_value_hectare + expenditures_capita + COMP_A + COMP_D + 
##     COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + 
##     Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + 
##     RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + 
##     RURAL_URBAN_5 + GVA_PUBLIC_p, data = muni_merged_new)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -39927  -6996  -2195   3343 272207 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -9.356e+04  5.595e+03 -16.721  < 2e-16 ***
## Active                 1.622e+05  9.069e+03  17.886  < 2e-16 ***
## planted_ratio          1.236e+03  8.927e+01  13.847  < 2e-16 ***
## DU_density            -1.666e+03  7.934e+02  -2.100 0.035773 *  
## planted_value_hectare  8.305e+01  3.615e+01   2.297 0.021638 *  
## expenditures_capita    2.598e+00  1.454e-01  17.869  < 2e-16 ***
## COMP_A                 1.458e+05  3.296e+04   4.422 9.94e-06 ***
## COMP_D                 1.628e+07  2.170e+06   7.499 7.43e-14 ***
## COMP_E                 7.816e+06  2.164e+06   3.612 0.000307 ***
## COMP_N                 1.257e+06  2.520e+05   4.986 6.37e-07 ***
## COMP_Q                 1.921e+06  5.261e+05   3.651 0.000264 ***
## COMP_S                -5.653e+05  1.638e+05  -3.452 0.000561 ***
## BEDS                   1.556e+01  6.586e+00   2.363 0.018148 *  
## Pu_Assets              1.855e-08  7.710e-09   2.407 0.016136 *  
## Pr_Assets              1.049e-07  2.272e-08   4.618 3.95e-06 ***
## Pu_Bank                1.904e+02  3.090e+01   6.160 7.80e-10 ***
## Pr_Bank                1.158e+02  4.248e+01   2.725 0.006454 ** 
## Motorcycles_p         -2.057e+04  3.434e+03  -5.989 2.24e-09 ***
## Wheeled_tractor_p      1.210e+06  5.042e+05   2.400 0.016409 *  
## RURAL_URBAN_1         -2.817e+03  7.933e+02  -3.552 0.000386 ***
## RURAL_URBAN_3         -1.939e+03  6.544e+02  -2.963 0.003057 ** 
## RURAL_URBAN_4          2.171e+03  1.098e+03   1.977 0.048128 *  
## RURAL_URBAN_2          5.597e+03  2.165e+03   2.585 0.009751 ** 
## RURAL_URBAN_5          1.033e+05  9.287e+03  11.123  < 2e-16 ***
## GVA_PUBLIC_p           1.453e+03  1.150e+02  12.634  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared:  0.3718, Adjusted R-squared:  0.3691 
## F-statistic: 136.7 on 24 and 5544 DF,  p-value: < 2.2e-16

Since p value is less than alpha value of 0.05, we reject the null hypothesis.The final model has an Adjusted R-Squared of 0.3691 . We will investigate the reason the low Adjusted R-Squared.

Checking for multicolinearity

ols_vif_tol(muni.mlr)

##                Variables Tolerance      VIF
## 1                 Active 0.4110149 2.433002
## 2          planted_ratio 0.7873648 1.270059
## 3             DU_density 0.7533492 1.327406
## 4  planted_value_hectare 0.9173524 1.090094
## 5    expenditures_capita 0.7977351 1.253549
## 6                 COMP_A 0.9450595 1.058134
## 7                 COMP_D 0.9885257 1.011608
## 8                 COMP_E 0.8789610 1.137707
## 9                 COMP_N 0.6025756 1.659543
## 10                COMP_Q 0.5672256 1.762967
## 11                COMP_S 0.7216954 1.385626
## 12                  BEDS 0.8018460 1.247122
## 13             Pu_Assets 0.9695880 1.031366
## 14             Pr_Assets 0.9839621 1.016299
## 15               Pu_Bank 0.8327285 1.200872
## 16               Pr_Bank 0.8571219 1.166695
## 17         Motorcycles_p 0.8795494 1.136946
## 18     Wheeled_tractor_p 0.8255682 1.211287
## 19         RURAL_URBAN_1 0.6890810 1.451208
## 20         RURAL_URBAN_3 0.4411678 2.266711
## 21         RURAL_URBAN_4 0.7106230 1.407216
## 22         RURAL_URBAN_2 0.9378754 1.066240
## 23         RURAL_URBAN_5 0.6054314 1.651715
## 24          GVA_PUBLIC_p 0.9243386 1.081855

summary(muni.mlr)

## 
## Call:
## lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density + 
##     planted_value_hectare + expenditures_capita + COMP_A + COMP_D + 
##     COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + 
##     Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + 
##     RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + 
##     RURAL_URBAN_5 + GVA_PUBLIC_p, data = muni_merged_new)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -39927  -6996  -2195   3343 272207 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -9.356e+04  5.595e+03 -16.721  < 2e-16 ***
## Active                 1.622e+05  9.069e+03  17.886  < 2e-16 ***
## planted_ratio          1.236e+03  8.927e+01  13.847  < 2e-16 ***
## DU_density            -1.666e+03  7.934e+02  -2.100 0.035773 *  
## planted_value_hectare  8.305e+01  3.615e+01   2.297 0.021638 *  
## expenditures_capita    2.598e+00  1.454e-01  17.869  < 2e-16 ***
## COMP_A                 1.458e+05  3.296e+04   4.422 9.94e-06 ***
## COMP_D                 1.628e+07  2.170e+06   7.499 7.43e-14 ***
## COMP_E                 7.816e+06  2.164e+06   3.612 0.000307 ***
## COMP_N                 1.257e+06  2.520e+05   4.986 6.37e-07 ***
## COMP_Q                 1.921e+06  5.261e+05   3.651 0.000264 ***
## COMP_S                -5.653e+05  1.638e+05  -3.452 0.000561 ***
## BEDS                   1.556e+01  6.586e+00   2.363 0.018148 *  
## Pu_Assets              1.855e-08  7.710e-09   2.407 0.016136 *  
## Pr_Assets              1.049e-07  2.272e-08   4.618 3.95e-06 ***
## Pu_Bank                1.904e+02  3.090e+01   6.160 7.80e-10 ***
## Pr_Bank                1.158e+02  4.248e+01   2.725 0.006454 ** 
## Motorcycles_p         -2.057e+04  3.434e+03  -5.989 2.24e-09 ***
## Wheeled_tractor_p      1.210e+06  5.042e+05   2.400 0.016409 *  
## RURAL_URBAN_1         -2.817e+03  7.933e+02  -3.552 0.000386 ***
## RURAL_URBAN_3         -1.939e+03  6.544e+02  -2.963 0.003057 ** 
## RURAL_URBAN_4          2.171e+03  1.098e+03   1.977 0.048128 *  
## RURAL_URBAN_2          5.597e+03  2.165e+03   2.585 0.009751 ** 
## RURAL_URBAN_5          1.033e+05  9.287e+03  11.123  < 2e-16 ***
## GVA_PUBLIC_p           1.453e+03  1.150e+02  12.634  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16150 on 5544 degrees of freedom
## Multiple R-squared:  0.3718, Adjusted R-squared:  0.3691 
## F-statistic: 136.7 on 24 and 5544 DF,  p-value: < 2.2e-16

Non-Linearity Test

Test for non-linearity using ols_plot_resid_fit() by olsrr package

ols_plot_resid_fit(muni.mlr)

The figure above reveals that most of the data poitns are scattered around the 0 line, hence we can safely conclude that the relationships between the dependent variable and independent variables are linear.Additionally, there does not seem to be any obvious signs of heteroscadicity in the plot above.

Test for Normality Assumption

Test for normality using ols_plot_resid_hist() by oslrr package

ols_plot_resid_hist(muni.mlr)

The figure above reveals that the residuals of muni_cities.mlr resemble normal distribution. We would normally use ols_test_normality() to further test this assumption. But the function is limtied to sample sizes between 3 to 5000 and we have 5569 observations. We will skip this step as we have sufficient evidence from the plot that it passes normality test.

Test for Spatial Autocorrelation

H0 : The residuals of the multiple linear regression model are randomly distributed. H1 : The residuals of the multiple linear regression model are not randomly distributed.

We will use a confidence level of 95% with an alpha value of 0.05

First, export the residual of the hedonic model as a dataframe

mlr.output <- as.data.frame(muni.mlr$residuals)

Join the dataframe with the simple feature object - into a Simple feature

muni_merged<- cbind(muni_merged, 
                        muni.mlr$residuals) %>%
rename(`MLR_RES` = `muni.mlr.residuals`)

muni_merged.res.sf <- cbind(muni_merged.sf, 
                        muni.mlr$residuals) %>%
rename(`MLR_RES` = `muni.mlr.residuals`) %>%
  rename(RURAL_URBAN_1 = `Intermediário.Adjacente`) %>%
  rename(RURAL_URBAN_2 = `Intermediário.Remoto`) %>%
  rename(RURAL_URBAN_3 = `Rural.Adjacente`) %>%
  rename(RURAL_URBAN_4 = `Rural.Remoto`) %>%
  rename(RURAL_URBAN_5 = `Sem.classificação`)

Convert the simple feature into a SpatialPointsDataFrame

mc_res.sp <- as_Spatial(muni_merged.res.sf)
mc_res.sp

## class       : SpatialPointsDataFrame 
## features    : 5569 
## extent      : -72.9165, -32.43519, -33.68757, 4.58544  (xmin, xmax, ymin, ymax)
## crs         : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## variables   : 58
## names       : code_muni,            CITY, STATE, GDP_CAPITA, CAPITAL,      Pop_density, Brazilian_Foreigners_Ratio,            Active,       DU_density, DU_Urban_Rural_Ratio,    planted_ratio, planted_value_hectare,  IDHM, IDHM_Renda, IDHM_Longevidade, ... 
## min values  :   1100015, Abadia De Goiás,    AC,    3190.57,       0,                0,                          0,                 0,                0,                    0,                0,                     0, 0.418,        0.4,            0.672, ... 
## max values  :   5300108,          Zortéa,    TO,  314637.69,       1, 1263.80737731171,           51708.6666666667, 0.744760130414532, 5.73393574297189,     7727.63636363636, 18.5998217696347,      106.960227272727, 0.862,      0.891,            0.894, ...

Visualize the distribution of residuals

tmap_mode("plot")

## tmap mode set to plotting

tm_shape(muni_merged)+
  tm_fill("MLR_RES",
          n = 6,
          style = "quantile",
          palette = "RdYlBu" ) +
  tm_borders(alpha = 0.5)

## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

There seem to be spatial autocorrelation above. There isn’t a clear sign on whether it is clustered in any way or if theres a geospatial pattern in distribution.Further test to be done.

Moran’s I test

Build distance-based weight matrix

Find the distance summary

coords <- coordinates(mc_res.sp)
k <- knn2nb(knearneigh(coords))
k_dist <- unlist(nbdists(k, coords, longlat = TRUE))
summary(k_dist)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.6029   9.1050  13.1260  17.0100  19.7337 363.0083

Compute Distance-based matrix

nb <- knn2nb(knearneigh(coordinates(mc_res.sp), k=363, longlat = FALSE))

Convert into spatial weights

nb_lw <- nb2listw(nb, style = 'W')

lm.morantest(muni.mlr, nb_lw)

## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +
## planted_value_hectare + expenditures_capita + COMP_A + COMP_D + COMP_E
## + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + Pu_Assets +
## Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + RURAL_URBAN_1 +
## RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + RURAL_URBAN_5 +
## GVA_PUBLIC_p, data = muni_merged_new)
## weights: nb_lw
## 
## Moran I statistic standard deviate = 21.121, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     1.625916e-02    -5.721239e-04     6.350277e-07

The Global Moran’s I test for residual spatial autocorrelation shows that it’s p-value is < 2.2e-16 which is less than the alpha value of 0.05. Hence, we will reject the null hypothesis that the residuals of the multiple linear regression model are randomly distributed.Since the Observed Global Moran I = 1.625916e-02 which is greater than 0, we can infer than the residuals resemble cluster distribution.

Building explanatory model using GWmodel

There are two possible approaches can be used to determine the stopping rule: + Cross-validation (CV) score + Maximises the model’s predictive power + The optimum bandwidth is derived from estimated with all its observations + AIC corrected (AICc) + Favours a compromise between the predictive power of the model and its complexity + Generally favours larger bandwidths than the CV score + Lower AICc value provides a better fit to the observed data

Geographically weighted regression method

We will be using the adaptive bandwidth approach as each municipal have different sizes, and are spread at differing distances. Using fixed bandwidth will result in some municipal having too many neighbours to conduct any useful analysis.

coords <- coordinates(mc_res.sp)
dMat <- gw.dist(coords, focus=0, p=2, theta=0, longlat=F)

Adaptive argument is set to FALSE indicates that we are interested to compute the fixed bandwidth. CV score approach was defined as the stopping rule. Gaussian kernel Kernel is selected as gaussian to determine the optimal fixed bandwidth to use in explanatory model.

bw.adaptive <- bw.gwr(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A +  COMP_D  + COMP_E + COMP_N + COMP_Q +   COMP_S+BEDS+ Pu_Assets   + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p  +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp,   approach="CV", kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat = dMat)

## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 3449 CV score: 1.470825e+12 
## Adaptive bandwidth: 2140 CV score: 1.459161e+12 
## Adaptive bandwidth: 1329 CV score: 1.445566e+12 
## Adaptive bandwidth: 830 CV score: 1.446466e+12 
## Adaptive bandwidth: 1640 CV score: 1.450185e+12 
## Adaptive bandwidth: 1139 CV score: 1.445868e+12 
## Adaptive bandwidth: 1448 CV score: 1.447149e+12 
## Adaptive bandwidth: 1256 CV score: 1.445101e+12 
## Adaptive bandwidth: 1210 CV score: 1.444943e+12 
## Adaptive bandwidth: 1182 CV score: 1.445261e+12 
## Adaptive bandwidth: 1227 CV score: 1.445041e+12 
## Adaptive bandwidth: 1199 CV score: 1.44508e+12 
## Adaptive bandwidth: 1216 CV score: 1.445033e+12 
## Adaptive bandwidth: 1205 CV score: 1.444918e+12 
## Adaptive bandwidth: 1203 CV score: 1.444927e+12 
## Adaptive bandwidth: 1207 CV score: 1.444911e+12 
## Adaptive bandwidth: 1208 CV score: 1.444914e+12 
## Adaptive bandwidth: 1206 CV score: 1.444902e+12 
## Adaptive bandwidth: 1206 CV score: 1.444902e+12

The results show that 1206 is the recommended data points

Constructing adaptive bandwidth gwr model using CV approach

We will be using the following hypothesis:

H0:The GDP per capita are randomly distributed H1: he GDP per capita are not randomly distributed

The confidence level is 95% and the alpha value is 0.05

gwr.adaptive <- gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A +  COMP_D  + COMP_E + COMP_N + COMP_Q +   COMP_S+BEDS+ Pu_Assets   + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p  +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p  , data=mc_res.sp, bw=bw.adaptive, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)

gwr.adaptive

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-06-01 19:35:31 
##    Call:
##    gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density + 
##     planted_value_hectare + expenditures_capita + COMP_A + COMP_D + 
##     COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + 
##     Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + 
##     RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + 
##     RURAL_URBAN_5 + GVA_PUBLIC_p, data = mc_res.sp, bw = bw.adaptive, 
##     kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  Active planted_ratio DU_density planted_value_hectare expenditures_capita COMP_A COMP_D COMP_E COMP_N COMP_Q COMP_S BEDS Pu_Assets Pr_Assets Pu_Bank Pr_Bank Motorcycles_p Wheeled_tractor_p RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5 GVA_PUBLIC_p
##    Number of data points: 5569
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -39927  -6996  -2195   3343 272207 
## 
##    Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)           -9.356e+04  5.595e+03 -16.721  < 2e-16 ***
##    Active                 1.622e+05  9.069e+03  17.886  < 2e-16 ***
##    planted_ratio          1.236e+03  8.927e+01  13.847  < 2e-16 ***
##    DU_density            -1.666e+03  7.934e+02  -2.100 0.035773 *  
##    planted_value_hectare  8.305e+01  3.615e+01   2.297 0.021638 *  
##    expenditures_capita    2.598e+00  1.454e-01  17.869  < 2e-16 ***
##    COMP_A                 1.458e+05  3.296e+04   4.422 9.94e-06 ***
##    COMP_D                 1.628e+07  2.170e+06   7.499 7.43e-14 ***
##    COMP_E                 7.816e+06  2.164e+06   3.612 0.000307 ***
##    COMP_N                 1.257e+06  2.520e+05   4.986 6.37e-07 ***
##    COMP_Q                 1.921e+06  5.261e+05   3.651 0.000264 ***
##    COMP_S                -5.653e+05  1.638e+05  -3.452 0.000561 ***
##    BEDS                   1.556e+01  6.586e+00   2.363 0.018148 *  
##    Pu_Assets              1.855e-08  7.710e-09   2.407 0.016136 *  
##    Pr_Assets              1.049e-07  2.272e-08   4.618 3.95e-06 ***
##    Pu_Bank                1.904e+02  3.090e+01   6.160 7.80e-10 ***
##    Pr_Bank                1.158e+02  4.248e+01   2.725 0.006454 ** 
##    Motorcycles_p         -2.057e+04  3.434e+03  -5.989 2.24e-09 ***
##    Wheeled_tractor_p      1.210e+06  5.042e+05   2.400 0.016409 *  
##    RURAL_URBAN_1         -2.817e+03  7.933e+02  -3.552 0.000386 ***
##    RURAL_URBAN_3         -1.939e+03  6.544e+02  -2.963 0.003057 ** 
##    RURAL_URBAN_4          2.171e+03  1.098e+03   1.977 0.048128 *  
##    RURAL_URBAN_2          5.597e+03  2.165e+03   2.585 0.009751 ** 
##    RURAL_URBAN_5          1.033e+05  9.287e+03  11.123  < 2e-16 ***
##    GVA_PUBLIC_p           1.453e+03  1.150e+02  12.634  < 2e-16 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 16150 on 5544 degrees of freedom
##    Multiple R-squared: 0.3718
##    Adjusted R-squared: 0.3691 
##    F-statistic: 136.7 on 24 and 5544 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 1.446143e+12
##    Sigma(hat): 16117.4
##    AIC:  123755.2
##    AICc:  123755.5
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Adaptive bandwidth: 1206 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                 Min.     1st Qu.      Median     3rd Qu.
##    Intercept             -1.7274e+05 -1.3546e+05 -9.9400e+04 -6.1219e+04
##    Active                 5.1946e+04  1.0827e+05  1.7583e+05  2.2981e+05
##    planted_ratio          4.8197e+02  7.8775e+02  8.9275e+02  1.1929e+03
##    DU_density            -3.6234e+03 -2.3387e+03 -1.5817e+03 -1.3497e+03
##    planted_value_hectare -5.4421e+01  8.7191e-01  2.4184e+01  1.6681e+02
##    expenditures_capita    1.6338e+00  2.0054e+00  2.4594e+00  3.0437e+00
##    COMP_A                 9.4176e+04  1.1634e+05  1.4494e+05  2.0179e+05
##    COMP_D                 1.0954e+07  1.1862e+07  1.7204e+07  2.0414e+07
##    COMP_E                 5.3961e+06  6.0060e+06  6.6301e+06  8.9392e+06
##    COMP_N                 4.0213e+05  9.3414e+05  1.5075e+06  2.6039e+06
##    COMP_Q                 4.6028e+05  1.0050e+06  1.5126e+06  1.8844e+06
##    COMP_S                -9.5130e+05 -7.4147e+05 -6.3458e+05 -5.3186e+05
##    BEDS                   2.9484e+00  1.1232e+01  1.5878e+01  1.9906e+01
##    Pu_Assets              5.9829e-10  1.1810e-08  1.7233e-08  2.0776e-08
##    Pr_Assets              9.2609e-08  9.8002e-08  1.0136e-07  1.1011e-07
##    Pu_Bank                1.1826e+02  1.4059e+02  1.7664e+02  2.2127e+02
##    Pr_Bank                6.7282e+01  9.0944e+01  1.0634e+02  1.5669e+02
##    Motorcycles_p         -3.6365e+04 -3.1853e+04 -2.5444e+04 -1.1417e+04
##    Wheeled_tractor_p     -3.5129e+05  7.0059e+05  1.2385e+06  1.5875e+06
##    RURAL_URBAN_1         -4.0419e+03 -2.7381e+03 -2.4534e+03 -2.1270e+03
##    RURAL_URBAN_3         -4.3041e+03 -2.1951e+03 -1.6596e+03 -8.0206e+02
##    RURAL_URBAN_4         -2.1975e+03 -9.4049e+02  6.8743e+01  2.3049e+03
##    RURAL_URBAN_2          1.5528e+03  4.9652e+03  8.1162e+03  1.2372e+04
##    RURAL_URBAN_5          3.6761e+04  8.5266e+04  1.1122e+05  1.5370e+05
##    GVA_PUBLIC_p           6.4299e+02  1.1354e+03  1.4953e+03  1.7311e+03
##                                 Max.
##    Intercept             -2.5451e+04
##    Active                 2.8647e+05
##    planted_ratio          1.5685e+03
##    DU_density            -1.0460e+03
##    planted_value_hectare  2.2937e+02
##    expenditures_capita    3.5690e+00
##    COMP_A                 8.0693e+05
##    COMP_D                 3.0476e+07
##    COMP_E                 1.0995e+07
##    COMP_N                 3.2467e+06
##    COMP_Q                 2.4112e+06
##    COMP_S                -3.4383e+05
##    BEDS                   2.6006e+01
##    Pu_Assets              0.0000e+00
##    Pr_Assets              0.0000e+00
##    Pu_Bank                2.7269e+02
##    Pr_Bank                2.0541e+02
##    Motorcycles_p         -6.6058e+03
##    Wheeled_tractor_p      2.2485e+06
##    RURAL_URBAN_1         -1.7136e+03
##    RURAL_URBAN_3          6.0635e+02
##    RURAL_URBAN_4          5.2639e+03
##    RURAL_URBAN_2          2.1293e+04
##    RURAL_URBAN_5          2.0087e+05
##    GVA_PUBLIC_p           1.8908e+03
##    ************************Diagnostic information*************************
##    Number of data points: 5569 
##    Effective number of parameters (2trace(S) - trace(S'S)): 84.38916 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 5484.611 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 123496.9 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 123428.9 
##    Residual sum of squares: 1.360803e+12 
##    R-square value:  0.4088895 
##    Adjusted R-square value:  0.3997927 
## 
##    ***********************************************************************
##    Program stops at: 2020-06-01 19:36:27

Adjusted R-square value: 0.3997927 which i better than the Multiple Linear Regression model which has an Adjusted R-Squared of 0.3691 and P-VALUE < 2.2e-16 , we reject the null hypothesis

Constructing adaptive bandwidth gwr model using AIC approach

bw.gwr.aic <- bw.gwr(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A +  COMP_D  + COMP_E + COMP_N + COMP_Q +   COMP_S+BEDS+ Pu_Assets   + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p  +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, approach="AIC", kernel = 'gaussian', adaptive=TRUE, longlat = FALSE, dMat = dMat)

## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 3449 AICc value: 123671.7 
## Adaptive bandwidth (number of nearest neighbours): 2140 AICc value: 123602.1 
## Adaptive bandwidth (number of nearest neighbours): 1329 AICc value: 123513.7 
## Adaptive bandwidth (number of nearest neighbours): 830 AICc value: 123430.6 
## Adaptive bandwidth (number of nearest neighbours): 519 AICc value: 123298.8 
## Adaptive bandwidth (number of nearest neighbours): 329 AICc value: 123113.5 
## Adaptive bandwidth (number of nearest neighbours): 209 AICc value: 122919.4 
## Adaptive bandwidth (number of nearest neighbours): 137 AICc value: 122753 
## Adaptive bandwidth (number of nearest neighbours): 90 AICc value: 122705.1 
## Error in gw_reg(X, Y, W.i, TRUE, i) : inv(): matrix seems singular
## Adaptive bandwidth (number of nearest neighbours): 63 AICc value: Inf 
## Adaptive bandwidth (number of nearest neighbours): 108 AICc value: 122707.1 
## Adaptive bandwidth (number of nearest neighbours): 80 AICc value: 122702.9 
## Adaptive bandwidth (number of nearest neighbours): 72 AICc value: Inf 
## Adaptive bandwidth (number of nearest neighbours): 83 AICc value: 122701.6 
## Adaptive bandwidth (number of nearest neighbours): 87 AICc value: 122700.6 
## Adaptive bandwidth (number of nearest neighbours): 87 AICc value: 122700.6

The results show that 87 is the recommended data points

Constructing adaptive bandwidth gwr model using AIC approach

gwr.adaptive.aic <- gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density +planted_value_hectare + expenditures_capita + COMP_A +  COMP_D  + COMP_E + COMP_N + COMP_Q +   COMP_S+BEDS+ Pu_Assets   + Pr_Assets +Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p  +RURAL_URBAN_1+RURAL_URBAN_3+RURAL_URBAN_4 +RURAL_URBAN_2+RURAL_URBAN_5+GVA_PUBLIC_p , data=mc_res.sp, bw=bw.gwr.aic, kernel = 'gaussian', adaptive=TRUE, longlat = FALSE)

gwr.adaptive.aic

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-06-01 19:47:53 
##    Call:
##    gwr.basic(formula = GDP_CAPITA ~ Active + planted_ratio + DU_density + 
##     planted_value_hectare + expenditures_capita + COMP_A + COMP_D + 
##     COMP_E + COMP_N + COMP_Q + COMP_S + BEDS + Pu_Assets + Pr_Assets + 
##     Pu_Assets + Pu_Bank + Pr_Bank + Motorcycles_p + Wheeled_tractor_p + 
##     RURAL_URBAN_1 + RURAL_URBAN_3 + RURAL_URBAN_4 + RURAL_URBAN_2 + 
##     RURAL_URBAN_5 + GVA_PUBLIC_p, data = mc_res.sp, bw = bw.gwr.aic, 
##     kernel = "gaussian", adaptive = TRUE, longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  Active planted_ratio DU_density planted_value_hectare expenditures_capita COMP_A COMP_D COMP_E COMP_N COMP_Q COMP_S BEDS Pu_Assets Pr_Assets Pu_Bank Pr_Bank Motorcycles_p Wheeled_tractor_p RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5 GVA_PUBLIC_p
##    Number of data points: 5569
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##    Min     1Q Median     3Q    Max 
## -39927  -6996  -2195   3343 272207 
## 
##    Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)           -9.356e+04  5.595e+03 -16.721  < 2e-16 ***
##    Active                 1.622e+05  9.069e+03  17.886  < 2e-16 ***
##    planted_ratio          1.236e+03  8.927e+01  13.847  < 2e-16 ***
##    DU_density            -1.666e+03  7.934e+02  -2.100 0.035773 *  
##    planted_value_hectare  8.305e+01  3.615e+01   2.297 0.021638 *  
##    expenditures_capita    2.598e+00  1.454e-01  17.869  < 2e-16 ***
##    COMP_A                 1.458e+05  3.296e+04   4.422 9.94e-06 ***
##    COMP_D                 1.628e+07  2.170e+06   7.499 7.43e-14 ***
##    COMP_E                 7.816e+06  2.164e+06   3.612 0.000307 ***
##    COMP_N                 1.257e+06  2.520e+05   4.986 6.37e-07 ***
##    COMP_Q                 1.921e+06  5.261e+05   3.651 0.000264 ***
##    COMP_S                -5.653e+05  1.638e+05  -3.452 0.000561 ***
##    BEDS                   1.556e+01  6.586e+00   2.363 0.018148 *  
##    Pu_Assets              1.855e-08  7.710e-09   2.407 0.016136 *  
##    Pr_Assets              1.049e-07  2.272e-08   4.618 3.95e-06 ***
##    Pu_Bank                1.904e+02  3.090e+01   6.160 7.80e-10 ***
##    Pr_Bank                1.158e+02  4.248e+01   2.725 0.006454 ** 
##    Motorcycles_p         -2.057e+04  3.434e+03  -5.989 2.24e-09 ***
##    Wheeled_tractor_p      1.210e+06  5.042e+05   2.400 0.016409 *  
##    RURAL_URBAN_1         -2.817e+03  7.933e+02  -3.552 0.000386 ***
##    RURAL_URBAN_3         -1.939e+03  6.544e+02  -2.963 0.003057 ** 
##    RURAL_URBAN_4          2.171e+03  1.098e+03   1.977 0.048128 *  
##    RURAL_URBAN_2          5.597e+03  2.165e+03   2.585 0.009751 ** 
##    RURAL_URBAN_5          1.033e+05  9.287e+03  11.123  < 2e-16 ***
##    GVA_PUBLIC_p           1.453e+03  1.150e+02  12.634  < 2e-16 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 16150 on 5544 degrees of freedom
##    Multiple R-squared: 0.3718
##    Adjusted R-squared: 0.3691 
##    F-statistic: 136.7 on 24 and 5544 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 1.446143e+12
##    Sigma(hat): 16117.4
##    AIC:  123755.2
##    AICc:  123755.5
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Adaptive bandwidth: 87 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                 Min.     1st Qu.      Median     3rd Qu.
##    Intercept             -5.4629e+05 -1.2777e+05 -6.7949e+04 -2.6143e+04
##    Active                -5.4537e+03  5.2546e+04  1.2506e+05  2.1605e+05
##    planted_ratio         -1.6683e+03  1.8837e+02  6.4545e+02  1.3601e+03
##    DU_density            -3.5703e+04 -2.7308e+03 -7.3839e+02  1.1503e+03
##    planted_value_hectare -1.7476e+03 -1.0581e+02  5.3066e+01  2.1085e+02
##    expenditures_capita   -5.8789e-01  7.7567e-01  1.4612e+00  3.0214e+00
##    COMP_A                -1.0705e+07  1.1739e+05  3.7802e+05  2.0203e+06
##    COMP_D                -2.5846e+07  3.1670e+06  1.2676e+07  5.0454e+07
##    COMP_E                -1.1534e+07  9.1163e+04  4.8487e+06  9.2929e+06
##    COMP_N                -4.6191e+06  4.0592e+05  1.5839e+06  2.9969e+06
##    COMP_Q                -8.0985e+06 -1.1377e+05  1.4255e+06  2.7713e+06
##    COMP_S                -2.6794e+06 -1.0350e+06 -3.7615e+05 -2.9591e+04
##    BEDS                  -5.4589e+01 -5.4787e+00  6.4660e+00  1.9853e+01
##    Pu_Assets             -2.6278e-06 -2.8519e-07 -3.8908e-08  2.0332e-08
##    Pr_Assets             -1.6078e-05  8.9026e-08  4.5819e-07  4.2317e-06
##    Pu_Bank               -2.7718e+02  4.9562e+01  1.3283e+02  2.5057e+02
##    Pr_Bank               -4.1138e+02  7.9902e+00  8.9937e+01  2.6333e+02
##    Motorcycles_p         -8.8703e+04 -3.0836e+04 -1.4021e+04 -1.6920e+03
##    Wheeled_tractor_p     -2.2267e+07 -1.0084e+06  1.0705e+06  4.5573e+06
##    RURAL_URBAN_1         -9.4069e+03 -2.7748e+03 -1.4794e+03 -3.4852e+02
##    RURAL_URBAN_3         -1.3727e+04 -4.0365e+03 -1.0761e+03  7.4963e+02
##    RURAL_URBAN_4         -6.4934e+04 -5.2464e+03 -1.3588e+03  2.4970e+03
##    RURAL_URBAN_2         -5.2839e+04 -2.6828e+03  1.2654e+03  7.4415e+03
##    RURAL_URBAN_5         -7.0391e+04  3.1256e+04  9.0883e+04  1.6457e+05
##    GVA_PUBLIC_p          -4.5569e+02  4.4710e+02  1.0128e+03  1.6687e+03
##                                Max.
##    Intercept             3.4126e+03
##    Active                8.1968e+05
##    planted_ratio         5.1318e+03
##    DU_density            3.4376e+04
##    planted_value_hectare 6.5101e+02
##    expenditures_capita   1.3516e+01
##    COMP_A                8.9676e+06
##    COMP_D                9.3676e+08
##    COMP_E                9.8327e+07
##    COMP_N                7.1392e+06
##    COMP_Q                1.2614e+07
##    COMP_S                3.2963e+06
##    BEDS                  1.5669e+02
##    Pu_Assets             0.0000e+00
##    Pr_Assets             1.0000e-04
##    Pu_Bank               1.1505e+03
##    Pr_Bank               3.2803e+03
##    Motorcycles_p         2.4970e+04
##    Wheeled_tractor_p     3.6013e+07
##    RURAL_URBAN_1         5.1539e+03
##    RURAL_URBAN_3         9.0297e+03
##    RURAL_URBAN_4         2.8712e+04
##    RURAL_URBAN_2         7.2807e+04
##    RURAL_URBAN_5         5.1475e+05
##    GVA_PUBLIC_p          6.4268e+03
##    ************************Diagnostic information*************************
##    Number of data points: 5569 
##    Effective number of parameters (2trace(S) - trace(S'S)): 842.3825 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 4726.617 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 122700.6 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 121936.6 
##    Residual sum of squares: 943655332749 
##    R-square value:  0.5900915 
##    Adjusted R-square value:  0.5170217 
## 
##    ***********************************************************************
##    Program stops at: 2020-06-01 19:49:03

Adjusted R-square value: 0.5170217 which i better than the Multiple Linear Regression model which has an Adjusted R-Squared of 0.3691 . P-VALUE < 2.2e-16 , we reject the null hypothesis

Overall, the AIC approach is the most suitable scheme for the explanatory model. As the adjusted r-squared value for both methods are better than the multiple linear regression model, location affects GDP per capita.

Visualising GWR Output

merge data

gwr.adaptive.output <- as.data.frame(gwr.adaptive.aic$SDF)

muni_merged_adaptive <- cbind(muni_merged, as.matrix(gwr.adaptive.output))

muni_merged_adaptive

## Simple feature collection with 5569 features and 172 fields
## Active geometry column: geom
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 5.271841
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni           name_muni.x code_state.x abbrev_state.x
## 1    1100015 Alta Floresta D'oeste           11             RO
## 2    1100023             Ariquemes           11             RO
## 3    1100031                Cabixi           11             RO
## 4    1100049                Cacoal           11             RO
## 5    1100056            Cerejeiras           11             RO
## 6    1100064     Colorado Do Oeste           11             RO
## 7    1100072            Corumbiara           11             RO
## 8    1100080         Costa Marques           11             RO
## 9    1100098       Espigão D'oeste           11             RO
## 10   1100106         Guajará-Mirim           11             RO
##               area.x                  CITY STATE CAPITAL IBGE_RES_POP
## 1   7067196247 [m^2] Alta Floresta D'Oeste    RO       0        24392
## 2   4427857503 [m^2]             Ariquemes    RO       0        90353
## 3   1314136221 [m^2]                Cabixi    RO       0         6313
## 4   3792029006 [m^2]                Cacoal    RO       0        78574
## 5   2783149399 [m^2]            Cerejeiras    RO       0        17029
## 6   1450315158 [m^2]     Colorado Do Oeste    RO       0        18591
## 7   3059753489 [m^2]            Corumbiara    RO       0         8783
## 8   4987285502 [m^2]         Costa Marques    RO       0        13678
## 9   4518626296 [m^2]       Espigão D'Oeste    RO       0        28729
## 10 24854844824 [m^2]         Guajará-Mirim    RO       0        41656
##    IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1              24392                 0    7276          4306          2970
## 2              90240               113   27236         22971          4265
## 3               6310                 3    1979           887          1092
## 4              78536                38   24045         19411          4634
## 5              17011                18    5346          4561           785
## 6              18562                29    5962          4441          1521
## 7               8778                 5    2655           834          1821
## 8              13462               216    3718          1980          1738
## 9              28720                 9    8407          6342          2065
## 10             40709               947   10274          9550           724
##    IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 IBGE_15.59 IBGE_60.
## 1     13804    203      869     1159       1281       9023     1269
## 2     69068   1122     4383     6101       6870      46000     4592
## 3      2689     43      191      206        272       1742      235
## 4     61699    923     3599     4790       5722      42027     4638
## 5     13755    200      827     1129       1333       8976     1290
## 6     13404    189      822     1002       1196       8801     1394
## 7      2559     47      159      238        240       1685      190
## 8      7366    137      644      843        902       4358      482
## 9     19649    324     1335     1652       1906      12988     1444
## 10    34686    643     2521     3546       3809      21608     2559
##    IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_. IDHM.Ranking.2010  IDHM IDHM_Renda
## 1              18288                 101470              3283 0.640      0.657
## 2               7115                  33365              1847 0.700      0.716
## 3              41086                 107975              3117 0.650      0.650
## 4              18180                 174665              1377 0.718      0.727
## 5              52326                 150755              2141 0.690      0.688
## 6               6268                  21765              2326 0.685      0.676
## 7              71742                 182267              3865 0.613      0.630
## 8               1395                   5232              3901 0.611      0.616
## 9               5588                  35076              2623 0.672      0.691
## 10               787                   6764              2982 0.657      0.663
##    IDHM_Longevidade IDHM_Educacao    ALT PAY_TV FIXED_PHONES     AREA
## 1             0.763         0.526 337.74    240          687  7067.03
## 2             0.806         0.600 138.69   2267         9191  4426.57
## 3             0.757         0.559 236.06     50          188  1314.35
## 4             0.821         0.620 177.45   1806         6491  3792.89
## 5             0.799         0.602 262.81    307         1215  2783.30
## 6             0.814         0.584 419.09    235         1107  1451.06
## 7             0.774         0.473 269.17     66          301  3060.32
## 8             0.751         0.493 145.21    192          346  4987.18
## 9             0.819         0.536 262.53    283         1172  4518.03
## 10            0.823         0.519 133.41   1116         3291 24855.72
##                     REGIAO_TUR CATEGORIA_TUR ESTIMATED_POP
## 1              Vale Do Guaporé             D         23167
## 2               Vale Do Jamari             C        106168
## 3              Vale Do Guaporé             D          5438
## 4  Br 364 - Caminhos De Rondon             C         84813
## 5                         <NA>          <NA>         16444
## 6                         <NA>          <NA>         16227
## 7                         <NA>          <NA>          7567
## 8              Vale Do Guaporé             D         17855
## 9  Br 364 - Caminhos De Rondon             D         32047
## 10          Polo Guajará-Mirim             C         45783
##                RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES GVA_PUBLIC
## 1  Intermediário Adjacente   166143.38        31.27    114455.32  142727.55
## 2                   Urbano   145068.79    353163.07       879.97  589088.78
## 3          Rural Adjacente    59081.09      4623.27     24091.22   40041.59
## 4                   Urbano   188004.72    230109.20    846490.22  485242.29
## 5                   Urbano    52891.37     22445.48    175059.55   99495.64
## 6  Intermediário Adjacente    72684.77        28.37     85558.72  100799.66
## 7             Rural Remoto   132176.94     10032.64     30210.50   55416.89
## 8             Rural Remoto    62561.65      6619.74     38862.21      91.02
## 9  Intermediário Adjacente   115438.59     56841.46    159498.64  170735.98
## 10                  Urbano    49929.14     29240.85       313.12     250.77
##    X.GVA_TOTAL.     TAXES        GDP POP_GDP GDP_CAPITA
## 1     454596.58  23186.17  477782.74   25506   18732.17
## 2    1967287.27 216095.93 2183383.20  105896   20618.18
## 3     127837.15      5.51  133345.39    6289   21202.96
## 4    1749846.43 194940.22 1944786.64   87877   22130.78
## 5        349.89     58.16  408047.84   17959   22721.08
## 6     287409.88  18466.33  305876.20   18639   16410.55
## 7     227836.98      8.74     236.58    8749   27040.60
## 8     199063.72      7.06  206123.18   17031   12102.82
## 9     502514.66  42149.00  544663.66   32712   16650.27
## 10    643057.92     97.10     740.16   47048   15732.01
##                                                                GVA_MAIN
## 1  Administração, defesa, educação e saúde públicas e seguridade social
## 2                                                       Demais serviços
## 3  Administração, defesa, educação e saúde públicas e seguridade social
## 4                                                       Demais serviços
## 5  Administração, defesa, educação e saúde públicas e seguridade social
## 6  Administração, defesa, educação e saúde públicas e seguridade social
## 7                                  Pecuária, inclusive apoio à pecuária
## 8  Administração, defesa, educação e saúde públicas e seguridade social
## 9  Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
##    MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1      50218466      508      3      1     38      7      4     26    270
## 2     201979608     2221     24     28    223      1     12     79   1054
## 3      17904387       60      0      0      7      0      0      1     32
## 4     168366053     1846     13      9    180      1      7     80    839
## 5      42641592      379      6      0     40      0      1      7    185
## 6      31611296      264      3      2     29      3      0     11    147
## 7      25501329       94      2      0      9      0      0      2     54
## 8      25561919      322      1      0     37      0      0     10    158
## 9      59162828      612      1      2     96      2      0     31    306
## 10           NA      502      2      1     27      0      2     10    322
##    COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1      23     20      8      1      3     12     10      4     21     13      6
## 2      68    136     27     32     12     91    116      5     68     74     28
## 3       2      4      2      0      0      1      2      2      2      1      0
## 4      76    102     34     19     31     83     93      4     52    104     18
## 5      19     20      3      1      3     15      8      3     10     12      2
## 6      11     13      3      3      0      8      9      4      5      4      1
## 7       1      3      0      0      0      3      4      4      2      4      0
## 8      10     14      2      1      0      4      6      4     10      6      3
## 9      31     15      5      3      5     17     15      4     16     20      5
## 10     18     32      5      2      0     10     12      5     12      9      3
##    COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1      38      0      0     NA   NA           1           2       1       2
## 2     143      0      0      1   78           2           4       2       3
## 3       4      0      0      1   27          NA          NA      NA      NA
## 4     101      0      0      1   40           2           3       2       3
## 5      44      0      0      1   40           1           3       1       3
## 6       8      0      0     NA   NA           1           2       1       2
## 7       6      0      0     NA   NA          NA          NA      NA      NA
## 8      56      0      0      2  142           0           1       0       1
## 9      38      0      0     NA   NA           1           2       1       2
## 10     30      0      0      2   67           1           3       1       3
##    Pr_Assets  Pu_Assets  Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1   38958866  245288714  2464        9268               1   NA  NA       NA
## 2  156637075 2487942240 17513       42664               1   NA  NA       NA
## 3         NA         NA   925        1838               0   NA  NA       NA
## 4  145453086 2339395610 17561       38073               3   NA  NA       NA
## 5   57939053  392087789  2948        6439               0   NA  NA       NA
## 6   77276523  237735184  3158        6874               0   NA  NA       NA
## 7         NA         NA   831        2380               0   NA  NA       NA
## 8          0   94261213   823        2920               0   NA  NA       NA
## 9   48113784  193339551  3473       11579               4   NA  NA       NA
## 10  38194278  356229103  3906        9812               1   NA  NA       NA
##    POST_OFFICES           name_muni.y code_state.y abbrev_state.y
## 1             1 Alta Floresta D'oeste           11             RO
## 2             2             Ariquemes           11             RO
## 3             1                Cabixi           11             RO
## 4             2                Cacoal           11             RO
## 5             1            Cerejeiras           11             RO
## 6             1     Colorado Do Oeste           11             RO
## 7             1            Corumbiara           11             RO
## 8             2         Costa Marques           11             RO
## 9             2       Espigão D'oeste           11             RO
## 10            1         Guajará-Mirim           11             RO
##               area.y    MLR_RES    Intercept     Active planted_ratio
## 1   7067196247 [m^2] -12020.522  -81790.8172 136735.490    3221.53389
## 2   4427857503 [m^2]  -5894.806 -228132.7968 377201.121    1309.86510
## 3   1314136221 [m^2]  -7676.922  -94825.5544 159255.044    3386.81708
## 4   3792029006 [m^2]  -1308.227 -168590.2009 285763.371    1374.62953
## 5   2783149399 [m^2]  -8799.358     434.3238   2365.147     291.62911
## 6   1450315158 [m^2]   4818.894   -3609.0853  21849.293     -64.12743
## 7   3059753489 [m^2]  -4769.690  -37676.2210  74759.926     738.61435
## 8   4987285502 [m^2]   1074.968  -15045.5275  39295.656     493.51277
## 9   4518626296 [m^2]  -1104.994  -82727.9944 152067.067     358.71984
## 10 24854844824 [m^2] -24918.626  -62564.9170 115109.151    1427.33585
##    DU_density planted_value_hectare expenditures_capita   COMP_A.1  COMP_D.1
## 1   9124.8622              2.348508           1.2263023  183901.01  14062447
## 2   7749.5459           -154.470303           1.7475411  162745.11  77450580
## 3   6894.8037             19.149274           0.9367677  214693.29   6426898
## 4   6948.0735           -164.083760           1.3807173   38978.57  -2457925
## 5   1256.5443            283.923775           0.9415918 7787190.64 208600834
## 6   -998.0603             37.884183           0.2135672 1985352.21   9717070
## 7   -734.3313            352.675520           2.5368855 2572954.65  -1575933
## 8   -634.9888             58.461733           0.8470693 1148944.00  10107818
## 9   3196.8903           -261.543382           1.2287037  104670.02  44522571
## 10 -2417.6358            -79.316931           3.9260895 2460044.74  16098601
##    COMP_E.1  COMP_N.1    COMP_Q.1    COMP_S.1     BEDS.1   Pu_Assets.1
## 1   8161865 -419438.4  3577178.40  -837335.61 67.6056848  2.573309e-08
## 2  -1778884 1030386.4   714175.61 -1870685.13 37.4029197  1.544545e-08
## 3  12064146 -547595.3  2352147.50  -937381.75 64.4677275  1.169330e-08
## 4  -4938728 3372418.9   -93895.07  -725282.66  1.3935288 -2.216843e-08
## 5   8043484 2168688.8 12110889.49 -1569835.95 43.4670988  1.915530e-07
## 6  -1420066  303539.4   388625.86    42491.29 15.0422851  1.411929e-07
## 7  14401477 2531312.1 -3646786.18   -31366.07 -5.3226287 -4.083299e-08
## 8   1398041 2414449.9 -1049949.14  -261448.71  9.4565118  5.842581e-08
## 9   3674043 3186917.0 -1251370.06  -372445.30 -0.2259307 -9.971455e-08
## 10  7892352  527726.4  1185028.97  -507749.26  5.0460608 -1.562359e-07
##      Pr_Assets.1  Pu_Bank.1  Pr_Bank.1 Motorcycles_p Wheeled_tractor_p
## 1  -8.507633e-07   3.526389 120.961962    -33188.866        26176835.9
## 2   9.225396e-08 165.298763  74.695710    -47427.037         -378925.4
## 3   5.177459e-07 -16.830953  97.314352    -31304.321        29005199.0
## 4   1.011897e-07 370.547858 -36.640401    -55878.324         9312133.0
## 5  -5.498973e-06 114.264661  56.151427      6781.724        14080003.0
## 6   3.886574e-06  40.783697   3.356608      2726.624         4910068.0
## 7   1.642224e-06 244.707723  58.867459    -12350.688         1392647.9
## 8   2.161800e-06 122.763183 148.626622     -3130.386          479136.6
## 9   1.088409e-07 106.714240 277.678978    -10658.267         6511210.1
## 10  3.004948e-06  77.254939 396.002707    -28681.370          139593.5
##    RURAL_URBAN_1 RURAL_URBAN_3 RURAL_URBAN_4 RURAL_URBAN_2 RURAL_URBAN_5
## 1       83.40903     2466.6462    6818.12239    24992.3133    146031.755
## 2    -1525.78445     5432.4798    4661.02294    24631.9052    286731.638
## 3     1146.27797     3818.2589    2925.47049     8448.5118    162763.847
## 4     -481.88436     3488.6070   -1002.43025    13820.4407    245977.182
## 5    -1484.27973     -261.5996      42.41828     1371.4774      2106.466
## 6    -2047.45224    -2949.2889   -3987.45126    -1323.1565     11355.143
## 7    -6149.86364    -6893.5634   -8075.41308    -4244.6785     95080.800
## 8    -2175.75462    -3452.9636   -4328.71432     -648.5095     22727.295
## 9    -2999.68152    -1790.8810    4948.95546    71997.0265    133385.613
## 10   -5109.45700    -4261.5036   -2738.34268    -6538.4753     56567.743
##    GVA_PUBLIC_p        y      yhat    residual CV_Score Stud_residual
## 1     1518.7074 20664.57 24081.976  -3417.4059        0   -0.25443223
## 2     1245.8657 25591.70 40583.081 -14991.3809        0   -1.21004035
## 3     1291.4609 15628.40 26315.966 -10687.5657        0   -0.78420254
## 4      610.9763 18250.42 19957.133  -1706.7125        0   -0.12463040
## 5      821.6804  8222.36 17261.498  -9039.1377        0   -0.70699870
## 6      143.2738  6370.41  6654.224   -283.8140        0   -0.02109761
## 7      920.8330  6982.70  7806.517   -823.8168        0   -0.06620694
## 8      323.6922  6256.80  5196.603   1060.1966        0    0.09049018
## 9      998.9155 21173.60 26177.432  -5003.8325        0   -0.39762594
## 10    1819.3957 24739.02 53037.021 -28298.0012        0   -2.13475403
##    Intercept_SE Active_SE planted_ratio_SE DU_density_SE
## 1      46531.49  70900.81         474.8718      9198.145
## 2      21073.15  32165.23         227.6791      5062.159
## 3      39324.58  60670.06         488.5516      8104.618
## 4      23790.52  36312.98         329.2352      5606.601
## 5      22733.91  38234.90        1786.3611      4425.647
## 6      27223.15  43972.78        1157.6803      3365.626
## 7      24326.12  39831.41         808.0594      3276.072
## 8      16772.02  27095.87         452.7589      1951.257
## 9      37508.02  57999.54         251.7487      4762.059
## 10     17527.16  26357.93         283.9353      1766.524
##    planted_value_hectare_SE expenditures_capita_SE  COMP_A_SE COMP_D_SE
## 1                  466.3661              0.5777266  114666.34  20580667
## 2                   96.7850              0.3426780   37989.49  15137219
## 3                  447.0636              0.5451733  139563.28  24715649
## 4                  109.1037              0.4420232   58450.26  16526757
## 5                  171.5802              0.7116749 5183358.62  53261396
## 6                  187.2376              0.7418658 3845564.93   9260617
## 7                  193.9674              0.6404575 3062217.31  21406442
## 8                  134.4118              0.4500085 2930633.23   4634350
## 9                  246.3038              0.5388786   65237.07  23676871
## 10                 138.8924              0.4777640  889737.62   5131412
##    COMP_E_SE COMP_N_SE COMP_Q_SE COMP_S_SE  BEDS_SE Pu_Assets_SE Pr_Assets_SE
## 1   12568998 1243295.9   2007141 1359702.4 18.65744 4.728138e-08 4.582336e-06
## 2    6131440  760532.8   1235692  520464.9 13.69406 7.135043e-09 2.065692e-08
## 3   11696326 1168501.0   1760101 1017804.0 18.38738 4.738672e-08 4.582679e-06
## 4    8347299  923454.9   1548482  588201.0 19.30464 1.926100e-08 2.454826e-08
## 5   21734517 4562985.5   6920993 1591692.8 45.12674 8.790578e-07 1.587749e-05
## 6   16063748 4210504.7   4439774  751348.0 41.59532 9.032875e-07 2.865467e-05
## 7   18757150 2856030.8   3119070  745900.5 28.01771 7.730981e-08 4.510292e-06
## 8   11058068 2328663.6   2970073  575643.8 21.35235 4.178892e-07 4.767374e-06
## 9    7136855 1072360.9   1971822  749787.3 25.82491 2.138241e-07 2.734076e-08
## 10   3411436  540969.1   1106473  291876.4 18.66749 1.905825e-07 2.276738e-06
##    Pu_Bank_SE Pr_Bank_SE Motorcycles_p_SE Wheeled_tractor_p_SE RURAL_URBAN_1_SE
## 1   195.07211  159.58865        17893.189             12301219         3467.111
## 2    99.50461   72.48511        10903.035              3177177         1697.271
## 3   185.88750  150.63077        17027.929             11472920         3400.672
## 4   146.24729  118.02740        13791.116              4413250         2380.205
## 5   283.04420  458.41269        15279.479             20148887         3231.886
## 6   322.64122  584.77076        13674.802             18586131         4084.991
## 7   243.33192  238.20835        17654.469             18752139         3102.002
## 8   170.05639  381.63985         9418.921              9555210         2204.494
## 9   128.68059   88.72080        21935.170              3408513         2440.624
## 10   53.01121  251.55979        13063.787               635494         3469.998
##    RURAL_URBAN_3_SE RURAL_URBAN_4_SE RURAL_URBAN_2_SE RURAL_URBAN_5_SE
## 1          3131.592         6779.617         8298.985         49957.80
## 2          1637.794         4762.435         6790.064         25985.61
## 3          2879.105         5100.590        10009.645         43331.53
## 4          1996.715         5040.333        10742.306         28407.03
## 5          3008.234         3693.793         6135.595         27101.29
## 6          4325.620         8219.486        15403.639         30846.39
## 7          2945.812         3968.754         8391.658         31078.64
## 8          2077.206         5166.359        14302.321         21915.44
## 9          2537.558         9054.746        14939.489         39914.79
## 10         2155.517        12241.911        14726.952         20181.12
##    GVA_PUBLIC_p_SE Intercept_TV   Active_TV planted_ratio_TV DU_density_TV
## 1         618.8910  -1.75775197  1.92854618       6.78400745     0.9920329
## 2         329.8425 -10.82575524 11.72698411       5.75312017     1.5308776
## 3         566.9846  -2.41135603  2.62493627       6.93236356     0.8507253
## 4         389.2164  -7.08644348  7.86945571       4.17522054     1.2392667
## 5         450.6956   0.01910467  0.06185833       0.16325316     0.2839233
## 6         502.3422  -0.13257411  0.49688227      -0.05539304    -0.2965452
## 7         563.5005  -1.54879687  1.87690900       0.91405950    -0.2241499
## 8         333.3970  -0.89706113  1.45024508       1.09001224    -0.3254254
## 9         496.2932  -2.20560803  2.62186665       1.42491257     0.6713253
## 10        380.9301  -3.56959818  4.36715436       5.02697602    -1.3685836
##    planted_value_hectare_TV expenditures_capita_TV COMP_A_TV   COMP_D_TV
## 1               0.005035761              2.1226343 1.6037924  0.68328433
## 2              -1.596014975              5.0996601 4.2839508  5.11656590
## 3               0.042833448              1.7182936 1.5383222  0.26003354
## 4              -1.503924218              3.1236310 0.6668673 -0.14872395
## 5               1.654757976              1.3230645 1.5023446  3.91654836
## 6               0.202332172              0.2878784 0.5162706  1.04928967
## 7               1.818220629              3.9610523 0.8402260 -0.07361955
## 8               0.434944881              1.8823405 0.3920463  2.18106476
## 9              -1.061873305              2.2801123 1.6044561  1.88042462
## 10             -0.571067529              8.2176336 2.7649103  3.13726552
##      COMP_E_TV   COMP_N_TV   COMP_Q_TV   COMP_S_TV      BEDS_TV Pu_Assets_TV
## 1   0.64936478 -0.33736004  1.78222615 -0.61582269  3.623524252    0.5442541
## 2  -0.29012493  1.35482178  0.57795591 -3.59425811  2.731323811    2.1647305
## 3   1.03144752 -0.46863062  1.33637085 -0.92098454  3.506086036    0.2467632
## 4  -0.59165579  3.65195831 -0.06063684 -1.23305233  0.072186202   -1.1509494
## 5   0.37007881  0.47527847  1.74987744 -0.98626814  0.963222575    0.2179072
## 6  -0.08840189  0.07209099  0.08753280  0.05655340  0.361634097    0.1563101
## 7   0.76778597  0.88630420 -1.16919027 -0.04205128 -0.189973701   -0.5281735
## 8   0.12642720  1.03683927 -0.35350956 -0.45418484  0.442879272    0.1398117
## 9   0.51479864  2.97186999 -0.63462626 -0.49673464 -0.008748557   -0.4663390
## 10  2.31349859  0.97552041  1.07099641 -1.73960340  0.270312787   -0.8197810
##    Pr_Assets_TV  Pu_Bank_TV  Pr_Bank_TV Motorcycles_p_TV Wheeled_tractor_p_TV
## 1    -0.1856615  0.01807736  0.75796092       -1.8548324           2.12798708
## 2     4.4660080  1.66121712  1.03049729       -4.3498930          -0.11926482
## 3     0.1129789 -0.09054376  0.64604565       -1.8384104           2.52814435
## 4     4.1220743  2.53370743 -0.31043980       -4.0517623           2.11003976
## 5    -0.3463377  0.40369900  0.12249099        0.4438452           0.69879804
## 6     0.1356349  0.12640572  0.00574004        0.1993904           0.26417914
## 7     0.3641059  1.00565404  0.24712592       -0.6995786           0.07426608
## 8     0.4534572  0.72189693  0.38944209       -0.3323508           0.05014401
## 9     3.9809008  0.82929553  3.12980695       -0.4858985           1.91027869
## 10    1.3198479  1.45733213  1.57418920       -2.1954867           0.21966140
##    RURAL_URBAN_1_TV RURAL_URBAN_3_TV RURAL_URBAN_4_TV RURAL_URBAN_2_TV
## 1        0.02405721        0.7876652       1.00567956       3.01149038
## 2       -0.89896360        3.3169488       0.97870586       3.62763938
## 3        0.33707399        1.3261965       0.57355526       0.84403708
## 4       -0.20245496        1.7471733      -0.19888177       1.28654323
## 5       -0.45926118       -0.0869612       0.01148366       0.22352801
## 6       -0.50121346       -0.6818188      -0.48512174      -0.08589896
## 7       -1.98254635       -2.3401237      -2.03474795      -0.50582117
## 8       -0.98696324       -1.6623112      -0.83786561      -0.04534295
## 9       -1.22906316       -0.7057497       0.54655927       4.81924292
## 10      -1.47246693       -1.9770213      -0.22368589      -0.44398020
##    RURAL_URBAN_5_TV GVA_PUBLIC_p_TV  Local_R2 coords.x1  coords.x2
## 1        2.92310195       2.4539173 0.4515786 -49.44055 -16.758812
## 2       11.03424812       3.7771538 0.4558879 -47.39683 -18.487565
## 3        3.75624521       2.2777709 0.4454983 -48.71881 -16.182672
## 4        8.65902478       1.5697601 0.3843157 -45.44619 -19.155848
## 5        0.07772568       1.8231383 0.8293956 -48.88440  -1.723470
## 6        0.36811908       0.2852115 0.9777543 -39.04755  -7.356977
## 7        3.05936168       1.6341300 0.7694517 -41.66161 -13.253532
## 8        1.03704485       0.9708910 0.8836936 -39.11659  -8.723418
## 9        3.34175938       2.0127528 0.4646219 -50.31253 -23.300494
## 10       2.80300260       4.7761926 0.5307132 -51.02527 -27.608987
##                              geom                    geometry
## 1  POLYGON ((-62.19465 -11.827... POINT (-61.99982 -11.93554)
## 2  POLYGON ((-62.53648 -9.7322... POINT (-63.03327 -9.908463)
## 3  POLYGON ((-60.37075 -13.363... POINT (-60.54431 -13.49976)
## 4  POLYGON ((-61.0008 -11.2973... POINT (-61.44294 -11.43387)
## 5  POLYGON ((-61.49976 -13.005... POINT (-60.81843 -13.19503)
## 6  POLYGON ((-60.50475 -12.966... POINT (-60.55507 -13.13056)
## 7  POLYGON ((-61.34273 -12.666...  POINT (-60.9487 -12.99752)
## 8  POLYGON ((-63.71199 -11.650... POINT (-64.23165 -12.43601)
## 9  POLYGON ((-60.94827 -10.988... POINT (-61.02017 -11.52855)
## 10 POLYGON ((-65.37724 -10.431... POINT (-65.32395 -10.77388)

Check if any empty geometry after joining

check(muni_merged_adaptive)

## [1] "For: muni_merged_adaptive"
## [1] "Number of Invalid polygons: 0"

summary(gwr.adaptive.aic$SDF$Local_R2)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3191  0.4925  0.6173  0.6408  0.7669  0.9857

Visualising local R2

tm_shape(muni_merged_adaptive)+
  tm_layout(legend.outside = TRUE)+
  tm_polygons(col="Local_R2", border.alpha = 0.1)

The local R2 values determines have better prediction using the model. By plotting the Local R2 values by municipality, we can identify which municipalities’ GDP_CAPITAL is predicted well and not predicted well with GWRmodel.The values range from 0.3 to 1.0 .Mapping the local R2 values on the brazil municipality spatialpolygons to determine where does GWR predicts better and where it doesn’t could provide insights on which variables affects the regression model. If the values are low it would mean that the local model is performing poorly. It is hard to interpet from the plot as there seems to be little pattern in the distribution except the low r2 values in the central of brazil.

Take_home_ex4

Vanessa Su

5/30/2020

Overview

Data

Installing packages

Imporing aspatial data

Importing geospatial data

Validity Checks

Joining aspatial and geospatial data

Plot fixed data

Joining fixed data

EDA using statistical graphs and chloropleth mapping

Data cleaning (analysing variables with NA)

Derive new variables

Extracting indicators

Choosing categorical variables for regression

Plotting derived indicators

Correlation analysis

Hedonic pricing model using multiple linear regression method

Checking for multicolinearity

Non-Linearity Test

Test for Normality Assumption

Test for Spatial Autocorrelation

Moran’s I test

Building explanatory model using GWmodel

Geographically weighted regression method

Constructing adaptive bandwidth gwr model using CV approach

Constructing adaptive bandwidth gwr model using AIC approach

Constructing adaptive bandwidth gwr model using AIC approach

Visualising GWR Output

Visualising local R2