1. Introduction

1.1 Overview

Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank. As a developing country, Brazil has the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal. The GDP per capita of the poorest municipality is R$3190.6. On the other hand, the GDP per capita of the richest municipality is R$314638. Half of the municipalities with GDP per capita less than R$16000 and the top 25% municipalities earn R$26155 and above.

1.2 Objective

In this take-home exercise, we will be determining the factors affecting the unequal development of Brazil at the municipality level by using the data provided. The specific task of the analysis are as follows:
1. Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.
2. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method.
3. Prepare a choropleth map showing the distribution of the residual of the GDP per capita.
4. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.
5. Prepare a series of choropleth maps showing the outputs of the geographically weighted regression model.

2. Getting started

2.1 The data

We are provided with the first 2 data sets, and the last data set is retrived from the geobr package in r 1. BRAZIL_CITIES.csv. This data file consists of 81 columns and 5573 rows. Each row representing one municipality.
2. Data_Dictionary.csv. This file provides meta data of each columns in BRAZIL_CITIES.csv.
3. 2016 municipality boundary file

2.2 Importing and Installing packages

packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr', 'readr', 'anchors', 'DT', 'fitdistrplus', 'Orcs')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}
## Loading required package: olsrr
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers
## Loading required package: corrplot
## corrplot 0.84 loaded
## Loading required package: ggpubr
## Loading required package: ggplot2
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: spdep
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: GWmodel
## Loading required package: maptools
## Checking rgeos availability: FALSE
##      Note: when rgeos is not available, polygon geometry     computations in maptools depend on gpclib,
##      which has a restricted licence. It is disabled by default;
##      to enable gpclib, type gpclibPermit()
## Loading required package: robustbase
## Loading required package: Rcpp
## Loading required package: spatialreg
## Loading required package: Matrix
## Registered S3 methods overwritten by 'spatialreg':
##   method                   from 
##   residuals.stsls          spdep
##   deviance.stsls           spdep
##   coef.stsls               spdep
##   print.stsls              spdep
##   summary.stsls            spdep
##   print.summary.stsls      spdep
##   residuals.gmsar          spdep
##   deviance.gmsar           spdep
##   coef.gmsar               spdep
##   fitted.gmsar             spdep
##   print.gmsar              spdep
##   summary.gmsar            spdep
##   print.summary.gmsar      spdep
##   print.lagmess            spdep
##   summary.lagmess          spdep
##   print.summary.lagmess    spdep
##   residuals.lagmess        spdep
##   deviance.lagmess         spdep
##   coef.lagmess             spdep
##   fitted.lagmess           spdep
##   logLik.lagmess           spdep
##   fitted.SFResult          spdep
##   print.SFResult           spdep
##   fitted.ME_res            spdep
##   print.ME_res             spdep
##   print.lagImpact          spdep
##   plot.lagImpact           spdep
##   summary.lagImpact        spdep
##   HPDinterval.lagImpact    spdep
##   print.summary.lagImpact  spdep
##   print.sarlm              spdep
##   summary.sarlm            spdep
##   residuals.sarlm          spdep
##   deviance.sarlm           spdep
##   coef.sarlm               spdep
##   vcov.sarlm               spdep
##   fitted.sarlm             spdep
##   logLik.sarlm             spdep
##   anova.sarlm              spdep
##   predict.sarlm            spdep
##   print.summary.sarlm      spdep
##   print.sarlm.pred         spdep
##   as.data.frame.sarlm.pred spdep
##   residuals.spautolm       spdep
##   deviance.spautolm        spdep
##   coef.spautolm            spdep
##   fitted.spautolm          spdep
##   print.spautolm           spdep
##   summary.spautolm         spdep
##   logLik.spautolm          spdep
##   print.summary.spautolm   spdep
##   print.WXImpact           spdep
##   summary.WXImpact         spdep
##   print.summary.WXImpact   spdep
##   predict.SLX              spdep
## 
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
## 
##     anova.sarlm, as.spam.listw, as_dgRMatrix_listw, as_dsCMatrix_I,
##     as_dsCMatrix_IrW, as_dsTMatrix_listw, bptest.sarlm, can.be.simmed,
##     cheb_setup, coef.gmsar, coef.sarlm, coef.spautolm, coef.stsls,
##     create_WX, deviance.gmsar, deviance.sarlm, deviance.spautolm,
##     deviance.stsls, do_ldet, eigen_pre_setup, eigen_setup, eigenw,
##     errorsarlm, fitted.gmsar, fitted.ME_res, fitted.sarlm,
##     fitted.SFResult, fitted.spautolm, get.ClusterOption,
##     get.coresOption, get.mcOption, get.VerboseOption,
##     get.ZeroPolicyOption, GMargminImage, GMerrorsar, griffith_sone,
##     gstsls, Hausman.test, HPDinterval.lagImpact, impacts, intImpacts,
##     Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB,
##     lextrS, lextrW, lmSLX, logLik.sarlm, logLik.spautolm, LR.sarlm,
##     LR1.sarlm, LR1.spautolm, LU_prepermutate_setup, LU_setup,
##     Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc,
##     mom_calc_int2, moments_setup, powerWeights, predict.sarlm,
##     predict.SLX, print.gmsar, print.ME_res, print.sarlm,
##     print.sarlm.pred, print.SFResult, print.spautolm, print.stsls,
##     print.summary.gmsar, print.summary.sarlm, print.summary.spautolm,
##     print.summary.stsls, residuals.gmsar, residuals.sarlm,
##     residuals.spautolm, residuals.stsls, sacsarlm, SE_classic_setup,
##     SE_interp_setup, SE_whichMin_setup, set.ClusterOption,
##     set.coresOption, set.mcOption, set.VerboseOption,
##     set.ZeroPolicyOption, similar.listw, spam_setup, spam_update_setup,
##     SpatialFiltering, spautolm, spBreg_err, spBreg_lag, spBreg_sac,
##     stsls, subgraph_eigenw, summary.gmsar, summary.sarlm,
##     summary.spautolm, summary.stsls, trW, vcov.sarlm, Wald1.sarlm
## Welcome to GWmodel version 2.1-4.
## The new version of GWmodel 2.1-4 now is readyLoading required package: tmap
## Loading required package: tidyverse
## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble  3.0.1     v dplyr   0.8.5
## v tidyr   1.1.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0
## v purrr   0.3.4     
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x tidyr::pack()   masks Matrix::pack()
## x tidyr::unpack() masks Matrix::unpack()
## Loading required package: geobr
## Loading required package: anchors
## Loading required package: rgenoud
## ##  rgenoud (Version 5.8-3.0, Build Date: 2019-01-22)
## ##  See http://sekhon.berkeley.edu/rgenoud for additional documentation.
## ##  Please cite software as:
## ##   Walter Mebane, Jr. and Jasjeet S. Sekhon. 2011.
## ##   ``Genetic Optimization Using Derivatives: The rgenoud package for R.''
## ##   Journal of Statistical Software, 42(11): 1-26. 
## ##
## 
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## The following object is masked from 'package:olsrr':
## 
##     cement
## 
## 
## ##  anchors (Version 3.0-8, Build Date: 2014-02-24)
## ##  See http://wand.stanford.edu/anchors for additional documentation and support.
## 
## 
## Loading required package: DT
## Loading required package: fitdistrplus
## Loading required package: survival
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:robustbase':
## 
##     heart
## 
## Loading required package: Orcs
## Loading required package: raster
## 
## Attaching package: 'raster'
## 
## The following objects are masked from 'package:MASS':
## 
##     area, select
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## The following object is masked from 'package:ggpubr':
## 
##     rotate

3. Geospatial Data Wrangling

3.1 Importing Geospatial Data

3.1.1 Import mun data using geobr package

Brazil 2016 municipality boundary

mun <- read_municipality(code_muni= "all", year=2016)
## Using year 2016
## Loading data for the whole country. This might take a few minutes.
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |=============                                                         |  19%
  |                                                                            
  |================                                                      |  22%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |=============================                                         |  41%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |======================================================                |  78%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |======================================================================| 100%

Plot the boundary of Brazil

no_axis <- theme(axis.title=element_blank(),
                   axis.text=element_blank(),
                   axis.ticks=element_blank())

ggplot() +
    geom_sf(data=mun, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) +
    labs(subtitle="Brazil", size=8) +
    theme_minimal() +
    no_axis

Check crs of mun

st_crs(mun)
## Coordinate Reference System:
##   User input: SIRGAS 2000 
##   wkt:
## GEOGCRS["SIRGAS 2000",
##     DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
##         ELLIPSOID["GRS 1980",6378137,298.257222101,
##             LENGTHUNIT["metre",1]]],
##     PRIMEM["Greenwich",0,
##         ANGLEUNIT["degree",0.0174532925199433]],
##     CS[ellipsoidal,2],
##         AXIS["geodetic latitude (Lat)",north,
##             ORDER[1],
##             ANGLEUNIT["degree",0.0174532925199433]],
##         AXIS["geodetic longitude (Lon)",east,
##             ORDER[2],
##             ANGLEUNIT["degree",0.0174532925199433]],
##     USAGE[
##         SCOPE["unknown"],
##         AREA["Latin America - SIRGAS 2000 by country"],
##         BBOX[-59.87,-122.19,32.72,-25.28]],
##     ID["EPSG",4674]]


The crs of brazil is set at 4674.

We will check if the geometry in mun is valid

all(st_is_valid(mun))
## [1] FALSE

Since FALSE, we need to make sure geometry is valid

mun <- st_make_valid(mun)

Now, check again if geometry is valid

all(st_is_valid(mun))
## [1] TRUE

Next, we will check for any empty geometries

any(is.na(st_dimension(mun)))
## [1] FALSE

4. Aspatial Data Wrangling

4.1 Importing Aspatial Data

brazil <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   CITY = col_character(),
##   STATE = col_character(),
##   AREA = col_number(),
##   REGIAO_TUR = col_character(),
##   CATEGORIA_TUR = col_character(),
##   RURAL_URBAN = col_character(),
##   GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.

After importing the data, we will need to check if the data is imported correctly.
We will use glimpse() to check.

glimpse(brazil)
## Rows: 5,573
## Columns: 81
## $ CITY                     <chr> "Abadia De Goiás", "Abadia Dos Dourados", ...
## $ STATE                    <chr> "GO", "MG", "GO", "MG", "PA", "CE", "BA", ...
## $ CAPITAL                  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IBGE_RES_POP             <dbl> 6876, 6704, 15757, 22690, 141100, 10496, 8...
## $ IBGE_RES_POP_BRAS        <dbl> 6876, 6704, 15609, 22690, 141040, 10496, 8...
## $ IBGE_RES_POP_ESTR        <dbl> 0, 0, 148, 0, 60, 0, 0, 0, 0, 0, 0, 16, 17...
## $ IBGE_DU                  <dbl> 2137, 2328, 4655, 7694, 31061, 2791, 2572,...
## $ IBGE_DU_URBAN            <dbl> 1546, 1481, 3233, 6667, 19057, 1251, 1193,...
## $ IBGE_DU_RURAL            <dbl> 591, 847, 1422, 1027, 12004, 1540, 1379, 1...
## $ IBGE_POP                 <dbl> 5300, 4154, 10656, 18464, 82956, 4538, 372...
## $ IBGE_1                   <dbl> 69, 38, 139, 176, 1354, 98, 37, 167, 69, 1...
## $ `IBGE_1-4`               <dbl> 318, 207, 650, 856, 5567, 323, 156, 733, 3...
## $ `IBGE_5-9`               <dbl> 438, 260, 894, 1233, 7618, 421, 263, 978, ...
## $ `IBGE_10-14`             <dbl> 517, 351, 1087, 1539, 8905, 483, 277, 927,...
## $ `IBGE_15-59`             <dbl> 3542, 2709, 6896, 11979, 53516, 2631, 2319...
## $ `IBGE_60+`               <dbl> 416, 589, 990, 2681, 5996, 582, 673, 803, ...
## $ IBGE_PLANTED_AREA        <dbl> 319, 4479, 10307, 1862, 25200, 2598, 895, ...
## $ `IBGE_CROP_PRODUCTION_$` <dbl> 1843, 18017, 33085, 7502, 700872, 5234, 39...
## $ `IDHM Ranking 2010`      <dbl> 1689, 2207, 2202, 1994, 3530, 3522, 4086, ...
## $ IDHM                     <dbl> 0.708, 0.690, 0.690, 0.698, 0.628, 0.628, ...
## $ IDHM_Renda               <dbl> 0.687, 0.693, 0.671, 0.720, 0.579, 0.540, ...
## $ IDHM_Longevidade         <dbl> 0.830, 0.839, 0.841, 0.848, 0.798, 0.748, ...
## $ IDHM_Educacao            <dbl> 0.622, 0.563, 0.579, 0.556, 0.537, 0.612, ...
## $ LONG                     <dbl> -49.44055, -47.39683, -48.71881, -45.44619...
## $ LAT                      <dbl> -16.758812, -18.487565, -16.182672, -19.15...
## $ ALT                      <dbl> 893.60, 753.12, 1017.55, 644.74, 10.12, 40...
## $ PAY_TV                   <dbl> 360, 77, 227, 1230, 3389, 29, 952, 51, 55,...
## $ FIXED_PHONES             <dbl> 842, 296, 720, 1716, 1218, 34, 335, 222, 3...
## $ AREA                     <dbl> 147.26, 881.06, 1045.13, 1817.07, 1610.65,...
## $ REGIAO_TUR               <chr> NA, "Caminhos Do Cerrado", "Região Turísti...
## $ CATEGORIA_TUR            <chr> NA, "D", "C", "D", "D", NA, "D", NA, NA, "...
## $ ESTIMATED_POP            <dbl> 8583, 6972, 19614, 23223, 156292, 11663, 8...
## $ RURAL_URBAN              <chr> "Urbano", "Rural Adjacente", "Rural Adjace...
## $ GVA_AGROPEC              <dbl> 6.20, 50524.57, 42.84, 113824.60, 140463.7...
## $ GVA_INDUSTRY             <dbl> 27991.25, 25917.70, 16728.30, 31002.62, 58...
## $ GVA_SERVICES             <dbl> 74750.32, 62689.23, 138198.58, 172.33, 468...
## $ GVA_PUBLIC               <dbl> 36915.04, 28083.79, 63396.20, 86081.41, 48...
## $ ` GVA_TOTAL `            <dbl> 145857.60, 167215.28, 261161.91, 403241.27...
## $ TAXES                    <dbl> 20554.20, 12873.50, 26822.58, 26994.09, 95...
## $ GDP                      <dbl> 166.41, 180.09, 287984.49, 430235.36, 1249...
## $ POP_GDP                  <dbl> 8053, 7037, 18427, 23574, 151934, 11483, 9...
## $ GDP_CAPITA               <dbl> 20664.57, 25591.70, 15628.40, 18250.42, 82...
## $ GVA_MAIN                 <chr> "Demais serviços", "Demais serviços", "Dem...
## $ MUN_EXPENDIT             <dbl> 28227691, 17909274, 37513019, NA, NA, NA, ...
## $ COMP_TOT                 <dbl> 284, 476, 288, 621, 931, 86, 191, 87, 285,...
## $ COMP_A                   <dbl> 5, 6, 5, 18, 4, 1, 6, 2, 5, 2, 0, 8, 3, 1,...
## $ COMP_B                   <dbl> 1, 6, 9, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, ...
## $ COMP_C                   <dbl> 56, 30, 26, 40, 43, 4, 8, 3, 20, 4, 9, 40,...
## $ COMP_D                   <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ COMP_E                   <dbl> 2, 2, 2, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 2, ...
## $ COMP_F                   <dbl> 29, 34, 7, 20, 27, 6, 4, 0, 10, 2, 0, 25, ...
## $ COMP_G                   <dbl> 110, 190, 117, 303, 500, 48, 97, 71, 133, ...
## $ COMP_H                   <dbl> 26, 70, 12, 62, 16, 2, 5, 0, 18, 8, 1, 67,...
## $ COMP_I                   <dbl> 4, 28, 57, 30, 31, 10, 5, 1, 14, 3, 0, 25,...
## $ COMP_J                   <dbl> 5, 11, 2, 9, 6, 2, 3, 1, 8, 1, 1, 9, 5, 14...
## $ COMP_K                   <dbl> 0, 0, 1, 6, 1, 0, 1, 0, 0, 1, 0, 4, 3, 3, ...
## $ COMP_L                   <dbl> 2, 4, 0, 4, 1, 0, 0, 0, 4, 0, 0, 7, 4, 4, ...
## $ COMP_M                   <dbl> 10, 15, 7, 28, 22, 2, 5, 0, 11, 4, 2, 26, ...
## $ COMP_N                   <dbl> 12, 29, 15, 27, 16, 3, 5, 1, 26, 0, 1, 16,...
## $ COMP_O                   <dbl> 4, 2, 3, 2, 2, 2, 2, 2, 2, 2, 6, 2, 4, 2, ...
## $ COMP_P                   <dbl> 6, 9, 11, 15, 155, 0, 8, 0, 8, 1, 6, 14, 1...
## $ COMP_Q                   <dbl> 6, 14, 5, 19, 33, 2, 1, 2, 9, 3, 0, 13, 22...
## $ COMP_R                   <dbl> 1, 6, 1, 9, 15, 0, 2, 0, 4, 0, 0, 4, 6, 6,...
## $ COMP_S                   <dbl> 5, 19, 8, 27, 56, 4, 38, 4, 12, 3, 4, 23, ...
## $ COMP_T                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COMP_U                   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ HOTELS                   <dbl> NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, ...
## $ BEDS                     <dbl> NA, NA, 34, NA, NA, NA, 24, NA, NA, NA, NA...
## $ Pr_Agencies              <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Agencies              <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Bank                  <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Bank                  <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Assets                <dbl> NA, NA, 33724584, 44974716, 76181384, NA, ...
## $ Pu_Assets                <dbl> NA, NA, 67091904, 371922572, 800078483, NA...
## $ Cars                     <dbl> 2158, 2227, 2838, 6928, 5277, 553, 896, 61...
## $ Motorcycles              <dbl> 1246, 1142, 1426, 2953, 25661, 1674, 696, ...
## $ Wheeled_tractor          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, ...
## $ UBER                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MAC                      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ `WAL-MART`               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ POST_OFFICES             <dbl> 1, 1, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...
head(brazil$LONG)
## [1] -49.44055 -47.39683 -48.71881 -45.44619 -48.88440 -39.04755
head(brazil$LAT)
## [1] -16.758812 -18.487565 -16.182672 -19.155848  -1.723470  -7.356977

Convert columns to factor types

brazil$STATE <- factor(brazil$STATE)
brazil$CAPITAL <- factor(brazil$CAPITAL)
brazil$REGIAO_TUR <- factor(brazil$REGIAO_TUR)
brazil$CATEGORIA_TUR <- factor(brazil$CATEGORIA_TUR)
brazil$UBER <- factor(brazil$UBER)

Check the summary of brazil

summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:5573        MG     : 853   0:5546   Min.   :     805  
##  Class :character   SP     : 645   1:  27   1st Qu.:    5235  
##  Mode  :character   RS     : 498            Median :   10934  
##                     BA     : 418            Mean   :   34278  
##                     PR     : 399            3rd Qu.:   23424  
##                     SC     : 295            Max.   :11253503  
##                     (Other):2465            NA's   :8         
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR     IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.0   Min.   :    239   Min.   :     60  
##  1st Qu.:    5230   1st Qu.:     0.0   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10926   Median :     0.0   Median :   3174   Median :   1846  
##  Mean   :   34200   Mean   :    77.5   Mean   :  10303   Mean   :   8859  
##  3rd Qu.:   23390   3rd Qu.:    10.0   3rd Qu.:   6726   3rd Qu.:   4624  
##  Max.   :11133776   Max.   :119727.0   Max.   :3576148   Max.   :3548433  
##  NA's   :8          NA's   :8          NA's   :10        NA's   :10       
##  IBGE_DU_RURAL      IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  487   1st Qu.:    2801   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931   Median :    6170   Median :    92.0   Median :   376  
##  Mean   : 1463   Mean   :   27595   Mean   :   383.3   Mean   :  1544  
##  3rd Qu.: 1832   3rd Qu.:   15302   3rd Qu.:   232.0   3rd Qu.:   951  
##  Max.   :33809   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :81      NA's   :8          NA's   :8          NA's   :8       
##     IBGE_5-9        IBGE_10-14       IBGE_15-59         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   259   1st Qu.:   1734   1st Qu.:    341  
##  Median :   516   Median :   588   Median :   3841   Median :    722  
##  Mean   :  2069   Mean   :  2381   Mean   :  18212   Mean   :   3004  
##  3rd Qu.:  1300   3rd Qu.:  1478   3rd Qu.:   9628   3rd Qu.:   1724  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##  NA's   :8        NA's   :8        NA's   :8         NA's   :8        
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010      IDHM       
##  Min.   :      0.0   Min.   :      0        Min.   :   1      Min.   :0.4180  
##  1st Qu.:    910.2   1st Qu.:   2326        1st Qu.:1392      1st Qu.:0.5990  
##  Median :   3471.5   Median :  13846        Median :2783      Median :0.6650  
##  Mean   :  14179.9   Mean   :  57384        Mean   :2783      Mean   :0.6592  
##  3rd Qu.:  11194.2   3rd Qu.:  55619        3rd Qu.:4174      3rd Qu.:0.7180  
##  Max.   :1205669.0   Max.   :3274885        Max.   :5565      Max.   :0.8620  
##  NA's   :3           NA's   :3              NA's   :8         NA's   :8       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.23  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.40  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   :-32.44  
##  NA's   :8        NA's   :8        NA's   :8        NA's   :9       
##       LAT               ALT               PAY_TV         FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1   Min.   :      3  
##  1st Qu.:-22.838   1st Qu.:   169.8   1st Qu.:     88   1st Qu.:    119  
##  Median :-18.089   Median :   406.5   Median :    247   Median :    327  
##  Mean   :-16.444   Mean   :   893.8   Mean   :   3094   Mean   :   6567  
##  3rd Qu.: -8.489   3rd Qu.:   628.9   3rd Qu.:    815   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668   Max.   :5543127  
##  NA's   :9         NA's   :9          NA's   :3         NA's   :3        
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  59   A   :  51    
##  1st Qu.:   204.44   Vale Do Contestado  :  45   B   : 168    
##  Median :   416.59   Amazônia Atlântica  :  40   C   : 521    
##  Mean   :  1517.44   Araguaia-Tocantins  :  39   D   :1892    
##  3rd Qu.:  1026.57   Cariri              :  37   E   : 653    
##  Max.   :159533.33   (Other)             :3065   NA's:2288    
##  NA's   :3           NA's                :2288                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:5573        Min.   :      0   Min.   :       1  
##  1st Qu.:    5454   Class :character   1st Qu.:   4189   1st Qu.:    1726  
##  Median :   11590   Mode  :character   Median :  20426   Median :    7424  
##  Mean   :   37432                      Mean   :  47271   Mean   :  175928  
##  3rd Qu.:   25296                      3rd Qu.:  51227   3rd Qu.:   41022  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##  NA's   :3                             NA's   :3         NA's   :3         
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       7   Min.   :       17   Min.   :   -14159  
##  1st Qu.:    10112   1st Qu.:   17267   1st Qu.:    42253   1st Qu.:     1305  
##  Median :    31211   Median :   35866   Median :   119492   Median :     5100  
##  Mean   :   489451   Mean   :  123768   Mean   :   832987   Mean   :   118864  
##  3rd Qu.:   115406   3rd Qu.:   89245   3rd Qu.:   313963   3rd Qu.:    22197  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##  NA's   :3           NA's   :3          NA's   :3           NA's   :3          
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       15   Min.   :     815   Min.   :  3191   Length:5573       
##  1st Qu.:    43709   1st Qu.:    5483   1st Qu.:  9058   Class :character  
##  Median :   125153   Median :   11578   Median : 15870   Mode  :character  
##  Mean   :   954584   Mean   :   36998   Mean   : 21126                     
##  3rd Qu.:   329539   3rd Qu.:   25085   3rd Qu.: 26155                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##  NA's   :3           NA's   :3          NA's   :3                          
##   MUN_EXPENDIT          COMP_TOT            COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     6.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.573e+07   1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.746e+07   Median :   162.0   Median :   2.00   Median :  0.000  
##  Mean   :1.043e+08   Mean   :   906.8   Mean   :  18.25   Mean   :  1.852  
##  3rd Qu.:5.666e+07   3rd Qu.:   448.0   3rd Qu.:   8.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :1948.00   Max.   :274.000  
##  NA's   :1492        NA's   :3          NA's   :3         NA's   :3        
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   11.00   Median :  0.0000   Median :  0.000   Median :    4.00  
##  Mean   :   73.44   Mean   :  0.4262   Mean   :  2.029   Mean   :   43.26  
##  3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##  NA's   :3          NA's   :3          NA's   :3         NA's   :3         
##      COMP_G             COMP_H          COMP_I             COMP_J        
##  Min.   :     1.0   Min.   :    0   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    32.0   1st Qu.:    1   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    74.5   Median :    7   Median :    7.00   Median :    1.00  
##  Mean   :   348.0   Mean   :   41   Mean   :   55.88   Mean   :   24.74  
##  3rd Qu.:   199.0   3rd Qu.:   25   3rd Qu.:   24.00   3rd Qu.:    5.00  
##  Max.   :150633.0   Max.   :19515   Max.   :29290.00   Max.   :38720.00  
##  NA's   :3          NA's   :3       NA's   :3          NA's   :3         
##      COMP_K             COMP_L             COMP_M             COMP_N       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.0  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.0  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    4.0  
##  Mean   :   15.55   Mean   :   15.14   Mean   :   51.29   Mean   :   83.7  
##  3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.0  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.0  
##  NA's   :3          NA's   :3          NA's   :3          NA's   :3        
##      COMP_O            COMP_P             COMP_Q             COMP_R       
##  Min.   :  0.000   Min.   :    0.00   Min.   :    0.00   Min.   :   0.00  
##  1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00  
##  Median :  2.000   Median :    6.00   Median :    3.00   Median :   2.00  
##  Mean   :  3.269   Mean   :   30.96   Mean   :   34.15   Mean   :  12.18  
##  3rd Qu.:  3.000   3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00  
##  Max.   :204.000   Max.   :16030.00   Max.   :22248.00   Max.   :6687.00  
##  NA's   :3         NA's   :3          NA's   :3          NA's   :3        
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.61   Mean   :0   Mean   :  0.05027   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##  NA's   :3          NA's   :3   NA's   :3           NA's   :4686    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.383   Mean   :  2.829   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :4686      NA's   :2231       NA's   :2231      NA's   :2231    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602  
##  Median :2.00   Median :3.231e+07   Median :1.339e+08   Median :   1438  
##  Mean   :1.58   Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9859  
##  3rd Qu.:2.00   3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4086  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2231   NA's   :2231        NA's   :2231        NA's   :11       
##   Motorcycles      Wheeled_tractor      UBER           MAC         
##  Min.   :      4   Min.   :   0.000   1   : 125   Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   NA's:5448   1st Qu.:  1.000  
##  Median :   1285   Median :   0.000               Median :  2.000  
##  Mean   :   4879   Mean   :   5.754               Mean   :  4.277  
##  3rd Qu.:   3294   3rd Qu.:   1.000               3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000               Max.   :130.000  
##  NA's   :11        NA's   :11                     NA's   :5407     
##     WAL-MART       POST_OFFICES    
##  Min.   : 1.000   Min.   :  1.000  
##  1st Qu.: 1.000   1st Qu.:  1.000  
##  Median : 1.000   Median :  1.000  
##  Mean   : 2.059   Mean   :  2.081  
##  3rd Qu.: 1.750   3rd Qu.:  2.000  
##  Max.   :26.000   Max.   :225.000  
##  NA's   :5471     NA's   :120


### 4.2 Check for NAs From the summary of brazil, we have identified that there are 9 NAs for LATLONG value
We will now check which are the rows with NA values for LATLONG data

brazil[!complete.cases(brazil$LAT),]
## # A tibble: 9 x 81
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <fct> <fct>          <dbl>            <dbl>            <dbl>   <dbl>
## 1 Baln~ SC    0                 NA               NA               NA      NA
## 2 Lago~ RS    0                 NA               NA               NA      NA
## 3 Moju~ PA    0                 NA               NA               NA      NA
## 4 Para~ MS    0                 NA               NA               NA      NA
## 5 Pesc~ SC    0                 NA               NA               NA      NA
## 6 Pinh~ RS    0               2130             2130                0     745
## 7 Pint~ RS    0                 NA               NA               NA      NA
## 8 Sant~ BA    0               9648             9648                0    2891
## 9 São ~ PE    0                 NA               NA               NA      NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <fct>, CATEGORIA_TUR <fct>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## #   ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## #   GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## #   COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## #   COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## #   COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## #   COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## #   HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## #   Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <fct>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>

We have identified that the cities that do not have LATLONG are:
1. Balneário Rincão
2. Lagoa Dos Patos
3. Mojuí Dos Campos
4. Paraíso Das Águas
5. Pescaria Brava
6. Pinhal Da Serra
7. Pinto Bandeira
8. Santa Terezinha
9. São Caetano

We will now fill the LATLONG values for the above cities, using https://pt.db-city.com/ to find the values.

Replace NA value in LATLONG
brazil$LONG[brazil$CITY == "Balneário Rincão"] <- -49.2361
brazil$LAT[brazil$CITY == "Balneário Rincão"] <- -28.8344

brazil$LONG[brazil$CITY == "Lagoa Dos Patos"] <- --51.4725
brazil$LAT[brazil$CITY == "Lagoa Dos Patos"] <- -31.0697

brazil$LONG[brazil$CITY == "Mojuí Dos Campos"] <- -54.6431
brazil$LAT[brazil$CITY == "Mojuí Dos Campos"] <- -2.68472

brazil$LONG[brazil$CITY == "Paraíso Das Águas"] <- -53.0102
brazil$LAT[brazil$CITY == "Paraíso Das Águas"] <- -19.0257

brazil$LONG[brazil$CITY == "Pescaria Brava"] <- -48.8956
brazil$LAT[brazil$CITY == "Pescaria Brava"] <- -28.4247

brazil$LONG[brazil$CITY == "Pinhal Da Serra"] <- -51.1733
brazil$LAT[brazil$CITY == "Pinhal Da Serra"] <- -27.8747

brazil$LONG[brazil$CITY == "Pinto Bandeira"] <- -51.4503
brazil$LAT[brazil$CITY == "Pinto Bandeira"] <- -29.0978

brazil$LONG[brazil$CITY == "Santa Terezinha"] <- -39.5184
brazil$LAT[brazil$CITY == "Santa Terezinha"] <- -12.7498

brazil$LONG[brazil$CITY == "São Caetano"] <- -36.1459
brazil$LAT[brazil$CITY == "São Caetano"] <- -8.33

Comparing between cities in brazils and mun, we have identified that there are 2 CITIES which exist in brazil, but not in mun. The two cities are Santa Terezinha and São Caetano. Hence, we will remove these two cities for consistency.

brazil <- brazil%>%
  filter(CITY!="Santa Terezinha") %>%
  filter(CITY!="São Caetano")

Check the summary of brazil again

summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:5568        MG     : 853   0:5541   Min.   :     805  
##  Class :character   SP     : 645   1:  27   1st Qu.:    5231  
##  Mode  :character   RS     : 498            Median :   10936  
##                     BA     : 417            Mean   :   34296  
##                     PR     : 399            3rd Qu.:   23513  
##                     SC     : 294            Max.   :11253503  
##                     (Other):2462            NA's   :7         
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR      IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.00   Min.   :    239   Min.   :     60  
##  1st Qu.:    5223   1st Qu.:     0.00   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10934   Median :     0.00   Median :   3178   Median :   1850  
##  Mean   :   34218   Mean   :    77.56   Mean   :  10308   Mean   :   8864  
##  3rd Qu.:   23397   3rd Qu.:    10.00   3rd Qu.:   6727   3rd Qu.:   4628  
##  Max.   :11133776   Max.   :119727.00   Max.   :3576148   Max.   :3548433  
##  NA's   :7          NA's   :7           NA's   :9         NA's   :9        
##  IBGE_DU_RURAL        IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3.0   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  486.8   1st Qu.:    2802   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931.0   Median :    6177   Median :    92.0   Median :   377  
##  Mean   : 1462.6   Mean   :   27612   Mean   :   383.5   Mean   :  1546  
##  3rd Qu.: 1831.2   3rd Qu.:   15306   3rd Qu.:   232.0   3rd Qu.:   952  
##  Max.   :33809.0   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :80        NA's   :7          NA's   :7          NA's   :7       
##     IBGE_5-9        IBGE_10-14       IBGE_15-59         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   260   1st Qu.:   1735   1st Qu.:    341  
##  Median :   516   Median :   589   Median :   3842   Median :    723  
##  Mean   :  2071   Mean   :  2383   Mean   :  18223   Mean   :   3006  
##  3rd Qu.:  1301   3rd Qu.:  1479   3rd Qu.:   9633   3rd Qu.:   1725  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##  NA's   :7        NA's   :7        NA's   :7         NA's   :7        
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010      IDHM       
##  Min.   :      0.0   Min.   :      0        Min.   :   1      Min.   :0.4180  
##  1st Qu.:    910.2   1st Qu.:   2328        1st Qu.:1391      1st Qu.:0.5990  
##  Median :   3471.5   Median :  13846        Median :2782      Median :0.6650  
##  Mean   :  14180.2   Mean   :  57389        Mean   :2783      Mean   :0.6592  
##  3rd Qu.:  11173.2   3rd Qu.:  55594        3rd Qu.:4174      3rd Qu.:0.7180  
##  Max.   :1205669.0   Max.   :3274885        Max.   :5565      Max.   :0.8620  
##  NA's   :2           NA's   :2              NA's   :6         NA's   :6       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.20  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.40  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   : 51.47  
##  NA's   :6        NA's   :6        NA's   :6                        
##       LAT               ALT               PAY_TV           FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1.0   Min.   :      3  
##  1st Qu.:-22.845   1st Qu.:   169.4   1st Qu.:     88.0   1st Qu.:    118  
##  Median :-18.107   Median :   406.4   Median :    247.0   Median :    328  
##  Mean   :-16.457   Mean   :   894.0   Mean   :   3095.8   Mean   :   6570  
##  3rd Qu.: -8.495   3rd Qu.:   628.8   3rd Qu.:    815.5   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668.0   Max.   :5543127  
##                    NA's   :7          NA's   :1           NA's   :1        
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  59   A   :  51    
##  1st Qu.:   204.44   Vale Do Contestado  :  45   B   : 168    
##  Median :   415.86   Amazônia Atlântica  :  40   C   : 521    
##  Mean   :  1517.07   Araguaia-Tocantins  :  39   D   :1890    
##  3rd Qu.:  1026.57   Cariri              :  37   E   : 653    
##  Max.   :159533.33   (Other)             :3063   NA's:2285    
##  NA's   :2           NA's                :2285                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:5568        Min.   :      0   Min.   :       1  
##  1st Qu.:    5452   Class :character   1st Qu.:   4193   1st Qu.:    1725  
##  Median :   11591   Mode  :character   Median :  20432   Median :    7428  
##  Mean   :   37447                      Mean   :  47285   Mean   :  176050  
##  3rd Qu.:   25301                      3rd Qu.:  51227   3rd Qu.:   41240  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##  NA's   :1                             NA's   :2         NA's   :2         
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       7   Min.   :       17   Min.   :   -14159  
##  1st Qu.:    10107   1st Qu.:   17254   1st Qu.:    42223   1st Qu.:     1305  
##  Median :    31214   Median :   35838   Median :   119492   Median :     5108  
##  Mean   :   489787   Mean   :  123829   Mean   :   833504   Mean   :   118947  
##  3rd Qu.:   115503   3rd Qu.:   89301   3rd Qu.:   314139   3rd Qu.:    22208  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##  NA's   :2           NA's   :2          NA's   :2           NA's   :2          
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       15   Min.   :     815   Min.   :  3191   Length:5568       
##  1st Qu.:    43706   1st Qu.:    5480   1st Qu.:  9062   Class :character  
##  Median :   125153   Median :   11584   Median : 15870   Mode  :character  
##  Mean   :   955185   Mean   :   37018   Mean   : 21132                     
##  3rd Qu.:   329764   3rd Qu.:   25098   3rd Qu.: 26156                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##  NA's   :2           NA's   :2          NA's   :2                          
##   MUN_EXPENDIT          COMP_TOT            COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     6.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.573e+07   1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.748e+07   Median :   162.0   Median :   2.00   Median :  0.000  
##  Mean   :1.044e+08   Mean   :   907.3   Mean   :  18.27   Mean   :  1.853  
##  3rd Qu.:5.678e+07   3rd Qu.:   448.8   3rd Qu.:   8.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :1948.00   Max.   :274.000  
##  NA's   :1491        NA's   :2          NA's   :2         NA's   :2        
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   11.00   Median :  0.0000   Median :  0.000   Median :    4.00  
##  Mean   :   73.49   Mean   :  0.4265   Mean   :  2.031   Mean   :   43.29  
##  3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##  NA's   :2          NA's   :2          NA's   :2         NA's   :2         
##      COMP_G             COMP_H             COMP_I             COMP_J        
##  Min.   :     1.0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    75.0   Median :    7.00   Median :    7.00   Median :    1.00  
##  Mean   :   348.2   Mean   :   41.02   Mean   :   55.91   Mean   :   24.76  
##  3rd Qu.:   199.8   3rd Qu.:   25.00   3rd Qu.:   24.00   3rd Qu.:    5.00  
##  Max.   :150633.0   Max.   :19515.00   Max.   :29290.00   Max.   :38720.00  
##  NA's   :2          NA's   :2          NA's   :2          NA's   :2         
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.00  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    4.00  
##  Mean   :   15.56   Mean   :   15.15   Mean   :   51.33   Mean   :   83.76  
##  3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.00  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.00  
##  NA's   :2          NA's   :2          NA's   :2          NA's   :2         
##      COMP_O           COMP_P             COMP_Q             COMP_R       
##  Min.   :  0.00   Min.   :    0.00   Min.   :    0.00   Min.   :   0.00  
##  1st Qu.:  2.00   1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00  
##  Median :  2.00   Median :    6.00   Median :    3.00   Median :   2.00  
##  Mean   :  3.27   Mean   :   30.98   Mean   :   34.17   Mean   :  12.19  
##  3rd Qu.:  3.00   3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00  
##  Max.   :204.00   Max.   :16030.00   Max.   :22248.00   Max.   :6687.00  
##  NA's   :2        NA's   :2          NA's   :2          NA's   :2        
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.64   Mean   :0   Mean   :  0.05031   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##  NA's   :2          NA's   :2   NA's   :2           NA's   :4681    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.383   Mean   :  2.829   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :4681      NA's   :2226       NA's   :2226      NA's   :2226    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602  
##  Median :2.00   Median :3.231e+07   Median :1.339e+08   Median :   1440  
##  Mean   :1.58   Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9864  
##  3rd Qu.:2.00   3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4088  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2226   NA's   :2226        NA's   :2226        NA's   :9        
##   Motorcycles      Wheeled_tractor      UBER           MAC         
##  Min.   :      4   Min.   :   0.000   1   : 125   Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   NA's:5443   1st Qu.:  1.000  
##  Median :   1285   Median :   0.000               Median :  2.000  
##  Mean   :   4881   Mean   :   5.756               Mean   :  4.277  
##  3rd Qu.:   3297   3rd Qu.:   1.000               3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000               Max.   :130.000  
##  NA's   :9         NA's   :9                      NA's   :5402     
##     WAL-MART       POST_OFFICES    
##  Min.   : 1.000   Min.   :  1.000  
##  1st Qu.: 1.000   1st Qu.:  1.000  
##  Median : 1.000   Median :  1.000  
##  Mean   : 2.059   Mean   :  2.081  
##  3rd Qu.: 1.750   3rd Qu.:  2.000  
##  Max.   :26.000   Max.   :225.000  
##  NA's   :5466     NA's   :119


We have also identified that there are NA values for GDP_CAPITA, and we will remove these NA values, as it is hard to assign a value to it.

brazil <- brazil%>%
  filter(!is.na(`GDP_CAPITA`))
summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:5566        MG     : 853   0:5539   Min.   :     805  
##  Class :character   SP     : 645   1:  27   1st Qu.:    5231  
##  Mode  :character   RS     : 497            Median :   10936  
##                     BA     : 416            Mean   :   34296  
##                     PR     : 399            3rd Qu.:   23513  
##                     SC     : 294            Max.   :11253503  
##                     (Other):2462            NA's   :5         
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR      IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.00   Min.   :    239   Min.   :     60  
##  1st Qu.:    5223   1st Qu.:     0.00   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10934   Median :     0.00   Median :   3178   Median :   1850  
##  Mean   :   34218   Mean   :    77.56   Mean   :  10308   Mean   :   8864  
##  3rd Qu.:   23397   3rd Qu.:    10.00   3rd Qu.:   6727   3rd Qu.:   4628  
##  Max.   :11133776   Max.   :119727.00   Max.   :3576148   Max.   :3548433  
##  NA's   :5          NA's   :5           NA's   :7         NA's   :7        
##  IBGE_DU_RURAL        IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3.0   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  486.8   1st Qu.:    2802   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931.0   Median :    6177   Median :    92.0   Median :   377  
##  Mean   : 1462.6   Mean   :   27612   Mean   :   383.5   Mean   :  1546  
##  3rd Qu.: 1831.2   3rd Qu.:   15306   3rd Qu.:   232.0   3rd Qu.:   952  
##  Max.   :33809.0   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :78        NA's   :5          NA's   :5          NA's   :5       
##     IBGE_5-9        IBGE_10-14       IBGE_15-59         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   260   1st Qu.:   1735   1st Qu.:    341  
##  Median :   516   Median :   589   Median :   3842   Median :    723  
##  Mean   :  2071   Mean   :  2383   Mean   :  18223   Mean   :   3006  
##  3rd Qu.:  1301   3rd Qu.:  1479   3rd Qu.:   9633   3rd Qu.:   1725  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##  NA's   :5        NA's   :5        NA's   :5         NA's   :5        
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010      IDHM       
##  Min.   :      0.0   Min.   :      0        Min.   :   1      Min.   :0.4180  
##  1st Qu.:    910.2   1st Qu.:   2328        1st Qu.:1391      1st Qu.:0.5990  
##  Median :   3471.5   Median :  13846        Median :2782      Median :0.6650  
##  Mean   :  14180.2   Mean   :  57389        Mean   :2782      Mean   :0.6592  
##  3rd Qu.:  11173.2   3rd Qu.:  55594        3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :1205669.0   Max.   :3274885        Max.   :5565      Max.   :0.8620  
##                                             NA's   :5         NA's   :5       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.53  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.22  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   : 51.47  
##  NA's   :5        NA's   :5        NA's   :5                        
##       LAT               ALT               PAY_TV           FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1.0   Min.   :      3  
##  1st Qu.:-22.843   1st Qu.:   169.4   1st Qu.:     88.0   1st Qu.:    118  
##  Median :-18.107   Median :   406.5   Median :    247.0   Median :    328  
##  Mean   :-16.455   Mean   :   894.2   Mean   :   3096.3   Mean   :   6572  
##  3rd Qu.: -8.491   3rd Qu.:   628.9   3rd Qu.:    815.8   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668.0   Max.   :5543127  
##                    NA's   :6                                               
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  59   A   :  51    
##  1st Qu.:   204.43   Vale Do Contestado  :  45   B   : 168    
##  Median :   415.81   Amazônia Atlântica  :  40   C   : 521    
##  Mean   :  1515.52   Araguaia-Tocantins  :  39   D   :1889    
##  3rd Qu.:  1026.38   Cariri              :  37   E   : 653    
##  Max.   :159533.33   (Other)             :3062   NA's:2284    
##  NA's   :1           NA's                :2284                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:5566        Min.   :      0   Min.   :       1  
##  1st Qu.:    5451   Class :character   1st Qu.:   4193   1st Qu.:    1725  
##  Median :   11591   Mode  :character   Median :  20432   Median :    7428  
##  Mean   :   37452                      Mean   :  47285   Mean   :  176050  
##  3rd Qu.:   25303                      3rd Qu.:  51227   3rd Qu.:   41240  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##                                                                            
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       7   Min.   :       17   Min.   :   -14159  
##  1st Qu.:    10107   1st Qu.:   17254   1st Qu.:    42223   1st Qu.:     1305  
##  Median :    31214   Median :   35838   Median :   119492   Median :     5108  
##  Mean   :   489787   Mean   :  123829   Mean   :   833504   Mean   :   118947  
##  3rd Qu.:   115503   3rd Qu.:   89301   3rd Qu.:   314139   3rd Qu.:    22208  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##                                                                                
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       15   Min.   :     815   Min.   :  3191   Length:5566       
##  1st Qu.:    43706   1st Qu.:    5480   1st Qu.:  9062   Class :character  
##  Median :   125153   Median :   11584   Median : 15870   Mode  :character  
##  Mean   :   955185   Mean   :   37018   Mean   : 21132                     
##  3rd Qu.:   329764   3rd Qu.:   25098   3rd Qu.: 26156                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##                                                                            
##   MUN_EXPENDIT          COMP_TOT            COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     6.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.573e+07   1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.749e+07   Median :   162.0   Median :   2.00   Median :  0.000  
##  Mean   :1.044e+08   Mean   :   907.3   Mean   :  18.27   Mean   :  1.853  
##  3rd Qu.:5.679e+07   3rd Qu.:   448.8   3rd Qu.:   8.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :1948.00   Max.   :274.000  
##  NA's   :1490                                                              
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   11.00   Median :  0.0000   Median :  0.000   Median :    4.00  
##  Mean   :   73.49   Mean   :  0.4265   Mean   :  2.031   Mean   :   43.29  
##  3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##                                                                            
##      COMP_G             COMP_H             COMP_I             COMP_J        
##  Min.   :     1.0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    75.0   Median :    7.00   Median :    7.00   Median :    1.00  
##  Mean   :   348.2   Mean   :   41.02   Mean   :   55.91   Mean   :   24.76  
##  3rd Qu.:   199.8   3rd Qu.:   25.00   3rd Qu.:   24.00   3rd Qu.:    5.00  
##  Max.   :150633.0   Max.   :19515.00   Max.   :29290.00   Max.   :38720.00  
##                                                                             
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.00  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    4.00  
##  Mean   :   15.56   Mean   :   15.15   Mean   :   51.33   Mean   :   83.76  
##  3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.00  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.00  
##                                                                             
##      COMP_O           COMP_P             COMP_Q             COMP_R       
##  Min.   :  0.00   Min.   :    0.00   Min.   :    0.00   Min.   :   0.00  
##  1st Qu.:  2.00   1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00  
##  Median :  2.00   Median :    6.00   Median :    3.00   Median :   2.00  
##  Mean   :  3.27   Mean   :   30.98   Mean   :   34.17   Mean   :  12.19  
##  3rd Qu.:  3.00   3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00  
##  Max.   :204.00   Max.   :16030.00   Max.   :22248.00   Max.   :6687.00  
##                                                                          
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.64   Mean   :0   Mean   :  0.05031   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##                                                     NA's   :4679    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.383   Mean   :  2.829   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :4679      NA's   :2224       NA's   :2224      NA's   :2224    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602  
##  Median :2.00   Median :3.231e+07   Median :1.339e+08   Median :   1440  
##  Mean   :1.58   Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9866  
##  3rd Qu.:2.00   3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4089  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2224   NA's   :2224        NA's   :2224        NA's   :8        
##   Motorcycles      Wheeled_tractor      UBER           MAC         
##  Min.   :      4   Min.   :   0.000   1   : 125   Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   NA's:5441   1st Qu.:  1.000  
##  Median :   1285   Median :   0.000               Median :  2.000  
##  Mean   :   4881   Mean   :   5.757               Mean   :  4.277  
##  3rd Qu.:   3298   3rd Qu.:   1.000               3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000               Max.   :130.000  
##  NA's   :8         NA's   :8                      NA's   :5400     
##     WAL-MART       POST_OFFICES    
##  Min.   : 1.000   Min.   :  1.000  
##  1st Qu.: 1.000   1st Qu.:  1.000  
##  Median : 1.000   Median :  1.000  
##  Mean   : 2.059   Mean   :  2.081  
##  3rd Qu.: 1.750   3rd Qu.:  2.000  
##  Max.   :26.000   Max.   :225.000  
##  NA's   :5464     NA's   :117


There are now no more NA values for GDP_CAPITA

5. Exploratory Data Analysis (EDA)

To help us better understand which variables are needed in calculating GDP_CAPITA, and why we need these variables ### 5.1 EDA using statistical graphics We can plot the distribution of GDP_CAPITA by using histograms

ggplot(data=brazil, aes(x=`GDP_CAPITA`))+
  geom_histogram(bins=20, color="black", fill="light blue")


The histogram above shows a right skewed distribution. This suggests that more municipalities have relatively lower GDP per CAPITA. We will normalise the skewed distribution by using log transformation

brazil <- brazil%>%
  mutate(`LOG_GDP_CAPITA`=log(GDP_CAPITA))

Then, plot the LOG_GDP_CAPITA histogram

ggplot(data=brazil, aes(x=`LOG_GDP_CAPITA`))+
  geom_histogram(bins=20, color="black", fill="light blue")


The histogram now is less skewed, and in fact, resembles a normal distribution.

5.2 Renaming of variables first

Next, we will rename the variables as some original names are quite long / difficult to use. This ensures simplicity for us.

names(brazil)[names(brazil) == "IBGE_CROP_PRODUCTION_$"] <- "IBGE_CROP_PRODUCTION"
names(brazil)[names(brazil) == " GVA_TOTAL "] <- "GVA_TOTAL"
names(brazil)[names(brazil) == "IBGE_15-59"] <- "Active_pop"

5.3 Check the distribution of GDP per CAPITA

GDP_CAPITA by continent

We draw a boxplot to illustrate the distribution over the different states in brazil

brazil_states <- brazil%>%
  group_by(STATE)
ggplot(data=brazil_states, mapping=aes(x=STATE, y=GDP_CAPITA)) + geom_boxplot()+
  ggtitle("Distribution of GDP per CAPITA across states in brazil")


There are a few outliners for some states (BA, MS), but with some countries very well spread out, such as AC, RR.

Let’s take a look at the top 10 GDP_CAPITA

We will filter out the top 10 countries with the highest GDP_CAPITA, and identify if the variables that might have possibly resulted in the high GDP value

brazil_gdp_capita <- brazil%>%
  arrange(desc(GDP_CAPITA))%>%
  top_n(n=10, wt=GDP_CAPITA)
head(brazil_gdp_capita)
## # A tibble: 6 x 82
##   CITY  STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
##   <chr> <fct> <fct>          <dbl>            <dbl>            <dbl>   <dbl>
## 1 Paul~ SP    0              82146            81967              179   24311
## 2 Selv~ MS    0               6287             6287                0    2003
## 3 São ~ BA    0              33183            33183                0    9503
## 4 Triu~ RS    0              25793            25787                6    8635
## 5 Brej~ SP    0               2573             2565                8     822
## 6 Seba~ SP    0               3031             3031                0    1055
## # ... with 75 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## #   IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## #   `IBGE_10-14` <dbl>, Active_pop <dbl>, `IBGE_60+` <dbl>,
## #   IBGE_PLANTED_AREA <dbl>, IBGE_CROP_PRODUCTION <dbl>, `IDHM Ranking
## #   2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## #   IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## #   FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <fct>, CATEGORIA_TUR <fct>,
## #   ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## #   GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, GVA_TOTAL <dbl>,
## #   TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>, GVA_MAIN <chr>,
## #   MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>, COMP_B <dbl>,
## #   COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>, COMP_G <dbl>,
## #   COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>, COMP_L <dbl>,
## #   COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>, COMP_Q <dbl>,
## #   COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>, HOTELS <dbl>,
## #   BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>, Pr_Bank <dbl>,
## #   Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## #   Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <fct>, MAC <dbl>,
## #   `WAL-MART` <dbl>, POST_OFFICES <dbl>, LOG_GDP_CAPITA <dbl>


We are able to identify several variables that are high in value in these countries that have high GDP_CAPITA
For example, IBGE_RES_POP, Active_pop, IBGE_CROP_PRODUCTION, IDHM, PAY_TV, FIXED_PHONES, AREA, GVA_TOTAL, TAXES, GDP, POP_GDP, MUN_EXPENDIT, COMP_TOT, Cars, Motorcycles –> shows correlation between the two variables.
### 5.4 Variables selection Besides the variables that affect GDP_CAPITA in the analysis above, we will consider also the GDP equation, which is C + I + G + (X-M). Simply put, it is the sum of Consumption, Investment, Government expenditures, and Net Exports. We will then identify variables that might have an impact on these components as mentioned, before determining which factors to use.

summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:5566        MG     : 853   0:5539   Min.   :     805  
##  Class :character   SP     : 645   1:  27   1st Qu.:    5231  
##  Mode  :character   RS     : 497            Median :   10936  
##                     BA     : 416            Mean   :   34296  
##                     PR     : 399            3rd Qu.:   23513  
##                     SC     : 294            Max.   :11253503  
##                     (Other):2462            NA's   :5         
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR      IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.00   Min.   :    239   Min.   :     60  
##  1st Qu.:    5223   1st Qu.:     0.00   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10934   Median :     0.00   Median :   3178   Median :   1850  
##  Mean   :   34218   Mean   :    77.56   Mean   :  10308   Mean   :   8864  
##  3rd Qu.:   23397   3rd Qu.:    10.00   3rd Qu.:   6727   3rd Qu.:   4628  
##  Max.   :11133776   Max.   :119727.00   Max.   :3576148   Max.   :3548433  
##  NA's   :5          NA's   :5           NA's   :7         NA's   :7        
##  IBGE_DU_RURAL        IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3.0   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  486.8   1st Qu.:    2802   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931.0   Median :    6177   Median :    92.0   Median :   377  
##  Mean   : 1462.6   Mean   :   27612   Mean   :   383.5   Mean   :  1546  
##  3rd Qu.: 1831.2   3rd Qu.:   15306   3rd Qu.:   232.0   3rd Qu.:   952  
##  Max.   :33809.0   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :78        NA's   :5          NA's   :5          NA's   :5       
##     IBGE_5-9        IBGE_10-14       Active_pop         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   260   1st Qu.:   1735   1st Qu.:    341  
##  Median :   516   Median :   589   Median :   3842   Median :    723  
##  Mean   :  2071   Mean   :  2383   Mean   :  18223   Mean   :   3006  
##  3rd Qu.:  1301   3rd Qu.:  1479   3rd Qu.:   9633   3rd Qu.:   1725  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##  NA's   :5        NA's   :5        NA's   :5         NA's   :5        
##  IBGE_PLANTED_AREA   IBGE_CROP_PRODUCTION IDHM Ranking 2010      IDHM       
##  Min.   :      0.0   Min.   :      0      Min.   :   1      Min.   :0.4180  
##  1st Qu.:    910.2   1st Qu.:   2328      1st Qu.:1391      1st Qu.:0.5990  
##  Median :   3471.5   Median :  13846      Median :2782      Median :0.6650  
##  Mean   :  14180.2   Mean   :  57389      Mean   :2782      Mean   :0.6592  
##  3rd Qu.:  11173.2   3rd Qu.:  55594      3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :1205669.0   Max.   :3274885      Max.   :5565      Max.   :0.8620  
##                                           NA's   :5         NA's   :5       
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.53  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.22  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   : 51.47  
##  NA's   :5        NA's   :5        NA's   :5                        
##       LAT               ALT               PAY_TV           FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1.0   Min.   :      3  
##  1st Qu.:-22.843   1st Qu.:   169.4   1st Qu.:     88.0   1st Qu.:    118  
##  Median :-18.107   Median :   406.5   Median :    247.0   Median :    328  
##  Mean   :-16.455   Mean   :   894.2   Mean   :   3096.3   Mean   :   6572  
##  3rd Qu.: -8.491   3rd Qu.:   628.9   3rd Qu.:    815.8   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668.0   Max.   :5543127  
##                    NA's   :6                                               
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  59   A   :  51    
##  1st Qu.:   204.43   Vale Do Contestado  :  45   B   : 168    
##  Median :   415.81   Amazônia Atlântica  :  40   C   : 521    
##  Mean   :  1515.52   Araguaia-Tocantins  :  39   D   :1889    
##  3rd Qu.:  1026.38   Cariri              :  37   E   : 653    
##  Max.   :159533.33   (Other)             :3062   NA's:2284    
##  NA's   :1           NA's                :2284                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:5566        Min.   :      0   Min.   :       1  
##  1st Qu.:    5451   Class :character   1st Qu.:   4193   1st Qu.:    1725  
##  Median :   11591   Mode  :character   Median :  20432   Median :    7428  
##  Mean   :   37452                      Mean   :  47285   Mean   :  176050  
##  3rd Qu.:   25303                      3rd Qu.:  51227   3rd Qu.:   41240  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##                                                                            
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       7   Min.   :       17   Min.   :   -14159  
##  1st Qu.:    10107   1st Qu.:   17254   1st Qu.:    42223   1st Qu.:     1305  
##  Median :    31214   Median :   35838   Median :   119492   Median :     5108  
##  Mean   :   489787   Mean   :  123829   Mean   :   833504   Mean   :   118947  
##  3rd Qu.:   115503   3rd Qu.:   89301   3rd Qu.:   314139   3rd Qu.:    22208  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##                                                                                
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       15   Min.   :     815   Min.   :  3191   Length:5566       
##  1st Qu.:    43706   1st Qu.:    5480   1st Qu.:  9062   Class :character  
##  Median :   125153   Median :   11584   Median : 15870   Mode  :character  
##  Mean   :   955185   Mean   :   37018   Mean   : 21132                     
##  3rd Qu.:   329764   3rd Qu.:   25098   3rd Qu.: 26156                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##                                                                            
##   MUN_EXPENDIT          COMP_TOT            COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     6.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.573e+07   1st Qu.:    68.0   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.749e+07   Median :   162.0   Median :   2.00   Median :  0.000  
##  Mean   :1.044e+08   Mean   :   907.3   Mean   :  18.27   Mean   :  1.853  
##  3rd Qu.:5.679e+07   3rd Qu.:   448.8   3rd Qu.:   8.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :1948.00   Max.   :274.000  
##  NA's   :1490                                                              
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   11.00   Median :  0.0000   Median :  0.000   Median :    4.00  
##  Mean   :   73.49   Mean   :  0.4265   Mean   :  2.031   Mean   :   43.29  
##  3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##                                                                            
##      COMP_G             COMP_H             COMP_I             COMP_J        
##  Min.   :     1.0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    75.0   Median :    7.00   Median :    7.00   Median :    1.00  
##  Mean   :   348.2   Mean   :   41.02   Mean   :   55.91   Mean   :   24.76  
##  3rd Qu.:   199.8   3rd Qu.:   25.00   3rd Qu.:   24.00   3rd Qu.:    5.00  
##  Max.   :150633.0   Max.   :19515.00   Max.   :29290.00   Max.   :38720.00  
##                                                                             
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.00  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    4.00  
##  Mean   :   15.56   Mean   :   15.15   Mean   :   51.33   Mean   :   83.76  
##  3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.00  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.00  
##                                                                             
##      COMP_O           COMP_P             COMP_Q             COMP_R       
##  Min.   :  0.00   Min.   :    0.00   Min.   :    0.00   Min.   :   0.00  
##  1st Qu.:  2.00   1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.00  
##  Median :  2.00   Median :    6.00   Median :    3.00   Median :   2.00  
##  Mean   :  3.27   Mean   :   30.98   Mean   :   34.17   Mean   :  12.19  
##  3rd Qu.:  3.00   3rd Qu.:   17.00   3rd Qu.:   12.00   3rd Qu.:   6.00  
##  Max.   :204.00   Max.   :16030.00   Max.   :22248.00   Max.   :6687.00  
##                                                                          
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.64   Mean   :0   Mean   :  0.05031   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##                                                     NA's   :4679    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.383   Mean   :  2.829   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :4679      NA's   :2224       NA's   :2224      NA's   :2224    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.047e+07   1st Qu.:    602  
##  Median :2.00   Median :3.231e+07   Median :1.339e+08   Median :   1440  
##  Mean   :1.58   Mean   :9.180e+09   Mean   :6.005e+09   Mean   :   9866  
##  3rd Qu.:2.00   3rd Qu.:1.148e+08   3rd Qu.:4.970e+08   3rd Qu.:   4089  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2224   NA's   :2224        NA's   :2224        NA's   :8        
##   Motorcycles      Wheeled_tractor      UBER           MAC         
##  Min.   :      4   Min.   :   0.000   1   : 125   Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   NA's:5441   1st Qu.:  1.000  
##  Median :   1285   Median :   0.000               Median :  2.000  
##  Mean   :   4881   Mean   :   5.757               Mean   :  4.277  
##  3rd Qu.:   3298   3rd Qu.:   1.000               3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000               Max.   :130.000  
##  NA's   :8         NA's   :8                      NA's   :5400     
##     WAL-MART       POST_OFFICES     LOG_GDP_CAPITA  
##  Min.   : 1.000   Min.   :  1.000   Min.   : 8.068  
##  1st Qu.: 1.000   1st Qu.:  1.000   1st Qu.: 9.112  
##  Median : 1.000   Median :  1.000   Median : 9.672  
##  Mean   : 2.059   Mean   :  2.081   Mean   : 9.697  
##  3rd Qu.: 1.750   3rd Qu.:  2.000   3rd Qu.:10.172  
##  Max.   :26.000   Max.   :225.000   Max.   :12.659  
##  NA's   :5464     NA's   :117

Raw Variables that we will consider using:
IBGE_RES_POP
Active_pop
IBGE_CROP_PRODUCTION
IDHM_Longevidade
IDHM_Educacao
IDHM_Renda
PAY_TV
FIXED_PHONES
AREA
GVA_AGROPEC
GVA_INDUSTRY
GVA_SERVICES
TAXES
GDP
POP_GDP
GDP_CAPITA
MUN_EXPENDIT
COMP_TOT
Cars
Motorcycles
Pr_Assets
Pu_Assets

5.5 DATA CLEANING

By variables

IBGE_RES_POP

We choose IBGE_RES_POP since it is the population of Brazil, which includes residents and foreigners both.
IBGE_RES_POP has 8 NA values, which could be due to the particular city not having any population. We will remove these NA values as it does not have any impact on GDP_CAPITA. It is also hard to assign a value to these
We have identified that after removing these 8 NAs, the amount of NAs in other variables have decreased too. This could be due to cities having multiple NA values for different variables

brazil <- brazil%>%
  filter(!is.na(`IBGE_RES_POP`))
summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:5561        MG     : 853   0:5534   Min.   :     805  
##  Class :character   SP     : 645   1:  27   1st Qu.:    5231  
##  Mode  :character   RS     : 496            Median :   10936  
##                     BA     : 416            Mean   :   34296  
##                     PR     : 399            3rd Qu.:   23513  
##                     SC     : 292            Max.   :11253503  
##                     (Other):2460                              
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR      IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     805   Min.   :     0.00   Min.   :    239   Min.   :     60  
##  1st Qu.:    5223   1st Qu.:     0.00   1st Qu.:   1572   1st Qu.:    874  
##  Median :   10934   Median :     0.00   Median :   3178   Median :   1850  
##  Mean   :   34218   Mean   :    77.56   Mean   :  10308   Mean   :   8864  
##  3rd Qu.:   23397   3rd Qu.:    10.00   3rd Qu.:   6727   3rd Qu.:   4628  
##  Max.   :11133776   Max.   :119727.00   Max.   :3576148   Max.   :3548433  
##                                         NA's   :2         NA's   :2        
##  IBGE_DU_RURAL        IBGE_POP            IBGE_1            IBGE_1-4     
##  Min.   :    3.0   Min.   :     174   Min.   :     0.0   Min.   :     5  
##  1st Qu.:  486.8   1st Qu.:    2802   1st Qu.:    38.0   1st Qu.:   158  
##  Median :  931.0   Median :    6177   Median :    92.0   Median :   377  
##  Mean   : 1462.6   Mean   :   27612   Mean   :   383.5   Mean   :  1546  
##  3rd Qu.: 1831.2   3rd Qu.:   15306   3rd Qu.:   232.0   3rd Qu.:   952  
##  Max.   :33809.0   Max.   :10463636   Max.   :129464.0   Max.   :514794  
##  NA's   :73                                                              
##     IBGE_5-9        IBGE_10-14       Active_pop         IBGE_60+      
##  Min.   :     7   Min.   :    12   Min.   :     94   Min.   :     29  
##  1st Qu.:   220   1st Qu.:   260   1st Qu.:   1735   1st Qu.:    341  
##  Median :   516   Median :   589   Median :   3842   Median :    723  
##  Mean   :  2071   Mean   :  2383   Mean   :  18223   Mean   :   3006  
##  3rd Qu.:  1301   3rd Qu.:  1479   3rd Qu.:   9633   3rd Qu.:   1725  
##  Max.   :684443   Max.   :783702   Max.   :7058221   Max.   :1293012  
##                                                                       
##  IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM Ranking 2010      IDHM       
##  Min.   :      0   Min.   :      0      Min.   :   1      Min.   :0.4180  
##  1st Qu.:    911   1st Qu.:   2333      1st Qu.:1391      1st Qu.:0.5990  
##  Median :   3473   Median :  13845      Median :2782      Median :0.6650  
##  Mean   :  14171   Mean   :  57356      Mean   :2782      Mean   :0.6592  
##  3rd Qu.:  11171   3rd Qu.:  55490      3rd Qu.:4173      3rd Qu.:0.7180  
##  Max.   :1205669   Max.   :3274885      Max.   :5565      Max.   :0.8620  
##                                                                           
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4000   Min.   :0.6720   Min.   :0.2070   Min.   :-72.92  
##  1st Qu.:0.5720   1st Qu.:0.7690   1st Qu.:0.4900   1st Qu.:-50.87  
##  Median :0.6540   Median :0.8080   Median :0.5600   Median :-46.52  
##  Mean   :0.6429   Mean   :0.8016   Mean   :0.5591   Mean   :-46.21  
##  3rd Qu.:0.7070   3rd Qu.:0.8360   3rd Qu.:0.6310   3rd Qu.:-41.41  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   : 51.47  
##                                                                     
##       LAT               ALT               PAY_TV         FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1   Min.   :      3  
##  1st Qu.:-22.841   1st Qu.:   169.4   1st Qu.:     88   1st Qu.:    118  
##  Median :-18.097   Median :   406.5   Median :    247   Median :    328  
##  Mean   :-16.450   Mean   :   894.2   Mean   :   3099   Mean   :   6577  
##  3rd Qu.: -8.490   3rd Qu.:   628.9   3rd Qu.:    816   3rd Qu.:   1151  
##  Max.   :  4.585   Max.   :874579.0   Max.   :2047668   Max.   :5543127  
##                    NA's   :1                                             
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  59   A   :  51    
##  1st Qu.:   204.54   Vale Do Contestado  :  45   B   : 168    
##  Median :   415.86   Amazônia Atlântica  :  40   C   : 521    
##  Mean   :  1515.03   Araguaia-Tocantins  :  39   D   :1889    
##  3rd Qu.:  1025.73   Cariri              :  37   E   : 648    
##  Max.   :159533.33   (Other)             :3057   NA's:2284    
##  NA's   :1           NA's                :2284                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:5561        Min.   :      0   Min.   :       1  
##  1st Qu.:    5450   Class :character   1st Qu.:   4193   1st Qu.:    1724  
##  Median :   11591   Mode  :character   Median :  20434   Median :    7432  
##  Mean   :   37477                      Mean   :  47277   Mean   :  176171  
##  3rd Qu.:   25311                      3rd Qu.:  51238   3rd Qu.:   41026  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##                                                                            
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       7   Min.   :       17   Min.   :   -14159  
##  1st Qu.:    10112   1st Qu.:   17252   1st Qu.:    42253   1st Qu.:     1303  
##  Median :    31216   Median :   35747   Median :   119481   Median :     5108  
##  Mean   :   490191   Mean   :  123904   Mean   :   834110   Mean   :   119046  
##  3rd Qu.:   115644   3rd Qu.:   89363   3rd Qu.:   314190   3rd Qu.:    22251  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##                                                                                
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       15   Min.   :     815   Min.   :  3191   Length:5561       
##  1st Qu.:    43645   1st Qu.:    5481   1st Qu.:  9062   Class :character  
##  Median :   125111   Median :   11584   Median : 15866   Mode  :character  
##  Mean   :   955869   Mean   :   37043   Mean   : 21125                     
##  3rd Qu.:   329780   3rd Qu.:   25114   3rd Qu.: 26157                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##                                                                            
##   MUN_EXPENDIT          COMP_TOT          COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     6   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.573e+07   1st Qu.:    68   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.749e+07   Median :   163   Median :   2.00   Median :  0.000  
##  Mean   :1.045e+08   Mean   :   908   Mean   :  18.28   Mean   :  1.854  
##  3rd Qu.:5.681e+07   3rd Qu.:   450   3rd Qu.:   8.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446   Max.   :1948.00   Max.   :274.000  
##  NA's   :1489                                                            
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    3.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   11.00   Median :  0.0000   Median :  0.000   Median :    4.00  
##  Mean   :   73.55   Mean   :  0.4267   Mean   :  2.032   Mean   :   43.31  
##  3rd Qu.:   39.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   15.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##                                                                            
##      COMP_G             COMP_H             COMP_I             COMP_J        
##  Min.   :     1.0   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    32.0   1st Qu.:    1.00   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    75.0   Median :    7.00   Median :    7.00   Median :    1.00  
##  Mean   :   348.5   Mean   :   41.05   Mean   :   55.96   Mean   :   24.78  
##  3rd Qu.:   200.0   3rd Qu.:   25.00   3rd Qu.:   24.00   3rd Qu.:    5.00  
##  Max.   :150633.0   Max.   :19515.00   Max.   :29290.00   Max.   :38720.00  
##                                                                             
##      COMP_K             COMP_L             COMP_M             COMP_N        
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    1.00  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    4.00  
##  Mean   :   15.58   Mean   :   15.16   Mean   :   51.37   Mean   :   83.82  
##  3rd Qu.:    2.00   3rd Qu.:    3.00   3rd Qu.:   13.00   3rd Qu.:   14.00  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.00  
##                                                                             
##      COMP_O            COMP_P          COMP_Q            COMP_R      
##  Min.   :  1.000   Min.   :    0   Min.   :    0.0   Min.   :   0.0  
##  1st Qu.:  2.000   1st Qu.:    2   1st Qu.:    1.0   1st Qu.:   0.0  
##  Median :  2.000   Median :    6   Median :    3.0   Median :   2.0  
##  Mean   :  3.272   Mean   :   31   Mean   :   34.2   Mean   :  12.2  
##  3rd Qu.:  3.000   3rd Qu.:   17   3rd Qu.:   12.0   3rd Qu.:   6.0  
##  Max.   :204.000   Max.   :16030   Max.   :22248.0   Max.   :6687.0  
##                                                                      
##      COMP_S             COMP_T      COMP_U              HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   :  0.00000   Min.   : 1.000  
##  1st Qu.:    5.00   1st Qu.:0   1st Qu.:  0.00000   1st Qu.: 1.000  
##  Median :   12.00   Median :0   Median :  0.00000   Median : 1.000  
##  Mean   :   51.68   Mean   :0   Mean   :  0.05035   Mean   : 3.131  
##  3rd Qu.:   31.00   3rd Qu.:0   3rd Qu.:  0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :123.00000   Max.   :97.000  
##                                                     NA's   :4674    
##       BEDS          Pr_Agencies        Pu_Agencies        Pr_Bank      
##  Min.   :    2.0   Min.   :   0.000   Min.   :  0.00   Min.   : 0.000  
##  1st Qu.:   40.0   1st Qu.:   0.000   1st Qu.:  1.00   1st Qu.: 0.000  
##  Median :   82.0   Median :   1.000   Median :  2.00   Median : 1.000  
##  Mean   :  257.5   Mean   :   3.384   Mean   :  2.83   Mean   : 1.312  
##  3rd Qu.:  200.0   3rd Qu.:   2.000   3rd Qu.:  2.00   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.00   Max.   :83.000  
##  NA's   :4674      NA's   :2220       NA's   :2220     NA's   :2220    
##     Pu_Bank       Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.00   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      2  
##  1st Qu.:1.00   1st Qu.:0.000e+00   1st Qu.:4.048e+07   1st Qu.:    602  
##  Median :2.00   Median :3.234e+07   Median :1.339e+08   Median :   1440  
##  Mean   :1.58   Mean   :9.183e+09   Mean   :6.007e+09   Mean   :   9873  
##  3rd Qu.:2.00   3rd Qu.:1.149e+08   3rd Qu.:4.976e+08   3rd Qu.:   4095  
##  Max.   :8.00   Max.   :1.947e+13   Max.   :8.016e+12   Max.   :5740995  
##  NA's   :2220   NA's   :2220        NA's   :2220        NA's   :8        
##   Motorcycles      Wheeled_tractor      UBER           MAC         
##  Min.   :      4   Min.   :   0.000   1   : 125   Min.   :  1.000  
##  1st Qu.:    591   1st Qu.:   0.000   NA's:5436   1st Qu.:  1.000  
##  Median :   1285   Median :   0.000               Median :  2.000  
##  Mean   :   4885   Mean   :   5.761               Mean   :  4.277  
##  3rd Qu.:   3299   3rd Qu.:   1.000               3rd Qu.:  3.000  
##  Max.   :1134570   Max.   :3236.000               Max.   :130.000  
##  NA's   :8         NA's   :8                      NA's   :5395     
##     WAL-MART       POST_OFFICES     LOG_GDP_CAPITA  
##  Min.   : 1.000   Min.   :  1.000   Min.   : 8.068  
##  1st Qu.: 1.000   1st Qu.:  1.000   1st Qu.: 9.112  
##  Median : 1.000   Median :  1.000   Median : 9.672  
##  Mean   : 2.059   Mean   :  2.081   Mean   : 9.697  
##  3rd Qu.: 1.750   3rd Qu.:  2.000   3rd Qu.:10.172  
##  Max.   :26.000   Max.   :225.000   Max.   :12.659  
##  NA's   :5459     NA's   :117


IBGE_CROP_PRODUCTION is chosen since it represents the earnings from the production of crops, which these earnings will contribute to the economy. It is also reported online that agriculture is one of the principal bases of Brazil’s economy.
##### IDHM_Longevidade IDHM is the Human Development Index, which is calculated from IDHM_Longevidade, IDHM_Educacao, and IDHM_Renda.
The Human Development Index might contribute to the economy as a higher HDI value suggests that life expectancy (IDHM_Longevidade) is longer, IDHM_Educacao (Education level) is higher, which will raise the nation’s Gross National Income. This could be due to more people being able to work and contribute to the economy.
We will extract these three variables individually and conduct separate analysis on each three.
There is 1 NA value in IDHM_Longevidade. As it is hard to assign a value to this NA, we will drop this NA value

brazil <- brazil%>%
  filter(!is.na(IDHM_Longevidade))
PAY_TV & FIXED PHONES

PAY_TV and FIXED_PHONES are selected as these as luxury goods, and only a better off person can afford it. A municipality with higher number of PAY_TV and/or FIXED_PHONES may suggest a higher GDP, although it could be due to other factors. We will just include PAY_TV and FIXED_PHONES into our analysis first.
There are no NA values inside PAY_TV and FIXED_PHONES

AREA

A municipality with a bigger area might have more space for economic development, and could result in a change in GDP too.
There is 1 NA value for AREA, and this could be due to the city not being well developed enough for AREA to be calculated. However, it is hard to assign a value to this NA, therefore we will drop this NA value.

brazil <- brazil%>%
  filter(!is.na(`AREA`))
MUN_EXPENDIT

A municipality with higher MUN_EXPENDIT probably suggests higher expenditure, which might be due to individuals being better off. Expenditure might affect GDP.
There are 1489 NA values in MUN_EXPENDIT, and we assume that these municipalities do not have any expenditures. We will drop the NA values as it is difficult to assign a value to it.

brazil <- brazil%>%
  filter(!is.na(MUN_EXPENDIT))
Cars

Owning a vehicle such as cars or motorcycles is part of private consumption, and it might have an impact on GDP.
There are 8 NA values for Cars, and could be due to the cities not having any Cars, but we are unable to correctly assign a value to these cities, hence, we will drop the NA values

brazil <- brazil%>%
  filter(!is.na(`Cars`))
Motorcycles

There are 9 NA values for Motorcycles, and could be due to the cities not having any motorcycles, but we are unable to correctly assign a value to these cities, hence, we will drop the NA values

brazil <- brazil%>%
  filter(!is.na(`Motorcycles`))
summary(brazil)
##      CITY               STATE      CAPITAL   IBGE_RES_POP     
##  Length:4066        MG     : 615   0:4041   Min.   :     815  
##  Class :character   SP     : 542   1:  25   1st Qu.:    5156  
##  Mode  :character   RS     : 468            Median :   11031  
##                     PR     : 332            Mean   :   37587  
##                     BA     : 250            3rd Qu.:   23922  
##                     SC     : 249            Max.   :11253503  
##                     (Other):1610                              
##  IBGE_RES_POP_BRAS  IBGE_RES_POP_ESTR      IBGE_DU        IBGE_DU_URBAN    
##  Min.   :     815   Min.   :     0.00   Min.   :    290   Min.   :     60  
##  1st Qu.:    5150   1st Qu.:     0.00   1st Qu.:   1574   1st Qu.:    865  
##  Median :   11017   Median :     0.00   Median :   3241   Median :   1924  
##  Mean   :   37491   Mean   :    96.74   Mean   :  11420   Mean   :  10018  
##  3rd Qu.:   23904   3rd Qu.:    11.00   3rd Qu.:   6948   3rd Qu.:   5050  
##  Max.   :11133776   Max.   :119727.00   Max.   :3576148   Max.   :3548433  
##                                         NA's   :1         NA's   :1        
##  IBGE_DU_RURAL      IBGE_POP            IBGE_1            IBGE_1-4       
##  Min.   :    3   Min.   :     174   Min.   :     0.0   Min.   :     5.0  
##  1st Qu.:  477   1st Qu.:    2722   1st Qu.:    37.0   1st Qu.:   150.2  
##  Median :  915   Median :    6308   Median :    92.0   Median :   375.0  
##  Mean   : 1421   Mean   :   30805   Mean   :   420.8   Mean   :  1691.2  
##  3rd Qu.: 1748   3rd Qu.:   16381   3rd Qu.:   240.0   3rd Qu.:   979.0  
##  Max.   :33809   Max.   :10463636   Max.   :129464.0   Max.   :514794.0  
##  NA's   :55                                                              
##     IBGE_5-9          IBGE_10-14       Active_pop         IBGE_60+        
##  Min.   :     7.0   Min.   :    12   Min.   :     94   Min.   :     36.0  
##  1st Qu.:   212.0   1st Qu.:   250   1st Qu.:   1702   1st Qu.:    339.2  
##  Median :   514.5   Median :   592   Median :   3941   Median :    754.0  
##  Mean   :  2262.5   Mean   :  2613   Mean   :  20404   Mean   :   3413.6  
##  3rd Qu.:  1334.8   3rd Qu.:  1536   3rd Qu.:  10388   3rd Qu.:   1864.2  
##  Max.   :684443.0   Max.   :783702   Max.   :7058221   Max.   :1293012.0  
##                                                                           
##  IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM Ranking 2010      IDHM       
##  Min.   :      0   Min.   :      0      Min.   :   1      Min.   :0.4400  
##  1st Qu.:   1037   1st Qu.:   2762      1st Qu.:1203      1st Qu.:0.6100  
##  Median :   4038   Median :  16585      Median :2472      Median :0.6800  
##  Mean   :  15485   Mean   :  61425      Mean   :2576      Mean   :0.6686  
##  3rd Qu.:  12625   3rd Qu.:  60599      3rd Qu.:3904      3rd Qu.:0.7230  
##  Max.   :1205669   Max.   :3274885      Max.   :5564      Max.   :0.8620  
##                                                                           
##    IDHM_Renda     IDHM_Longevidade IDHM_Educacao         LONG       
##  Min.   :0.4170   Min.   :0.6720   Min.   :0.2660   Min.   :-72.92  
##  1st Qu.:0.5830   1st Qu.:0.7750   1st Qu.:0.5030   1st Qu.:-51.40  
##  Median :0.6680   Median :0.8130   Median :0.5730   Median :-47.40  
##  Mean   :0.6533   Mean   :0.8062   Mean   :0.5704   Mean   :-46.65  
##  3rd Qu.:0.7150   3rd Qu.:0.8400   3rd Qu.:0.6390   3rd Qu.:-41.78  
##  Max.   :0.8910   Max.   :0.8940   Max.   :0.8250   Max.   : 51.47  
##                                                                     
##       LAT               ALT               PAY_TV           FIXED_PHONES    
##  Min.   :-33.688   Min.   :     0.0   Min.   :      1.0   Min.   :      4  
##  1st Qu.:-23.640   1st Qu.:   195.3   1st Qu.:     90.0   1st Qu.:    137  
##  Median :-19.806   Median :   430.9   Median :    260.0   Median :    375  
##  Mean   :-17.545   Mean   :  1022.2   Mean   :   3641.6   Mean   :   7899  
##  3rd Qu.: -9.681   3rd Qu.:   641.5   3rd Qu.:    880.5   3rd Qu.:   1430  
##  Max.   :  3.350   Max.   :874579.0   Max.   :2047668.0   Max.   :5543127  
##                    NA's   :1                                               
##       AREA                          REGIAO_TUR   CATEGORIA_TUR
##  Min.   :     3.57   Corredores Das Águas:  49   A   :  46    
##  1st Qu.:   196.56   Vale Do Contestado  :  37   B   : 140    
##  Median :   401.38   Cariri              :  30   C   : 414    
##  Mean   :  1320.71   Rota Do Yucumã      :  30   D   :1383    
##  3rd Qu.:   943.82   Trilhas Do Rio Doce :  29   E   : 485    
##  Max.   :122461.09   (Other)             :2293   NA's:1598    
##                      NA's                :1598                
##  ESTIMATED_POP      RURAL_URBAN         GVA_AGROPEC       GVA_INDUSTRY     
##  Min.   :     786   Length:4066        Min.   :      0   Min.   :       1  
##  1st Qu.:    5354   Class :character   1st Qu.:   4835   1st Qu.:    1880  
##  Median :   11661   Mode  :character   Median :  22184   Median :    8559  
##  Mean   :   41050                      Mean   :  49549   Mean   :  204934  
##  3rd Qu.:   25854                      3rd Qu.:  54289   3rd Qu.:   50500  
##  Max.   :12176866                      Max.   :1402282   Max.   :63306755  
##                                                                            
##   GVA_SERVICES         GVA_PUBLIC         GVA_TOTAL             TAXES          
##  Min.   :        2   Min.   :       9   Min.   :       17   Min.   :     -235  
##  1st Qu.:    10570   1st Qu.:   17205   1st Qu.:    45494   1st Qu.:     1468  
##  Median :    34758   Median :   36062   Median :   126260   Median :     6116  
##  Mean   :   581146   Mean   :  135527   Mean   :   952261   Mean   :   141943  
##  3rd Qu.:   133709   3rd Qu.:   92256   3rd Qu.:   342166   3rd Qu.:    26746  
##  Max.   :464656988   Max.   :41902893   Max.   :569910503   Max.   :117125387  
##                                                                                
##       GDP               POP_GDP           GDP_CAPITA       GVA_MAIN        
##  Min.   :       18   Min.   :     815   Min.   :  4586   Length:4066       
##  1st Qu.:    46564   1st Qu.:    5380   1st Qu.:  9706   Class :character  
##  Median :   133937   Median :   11636   Median : 17614   Mode  :character  
##  Mean   :  1107127   Mean   :   40570   Mean   : 22469                     
##  3rd Qu.:   364308   3rd Qu.:   25550   3rd Qu.: 28018                     
##  Max.   :687035890   Max.   :12038175   Max.   :314638                     
##                                                                            
##   MUN_EXPENDIT          COMP_TOT            COMP_A            COMP_B       
##  Min.   :1.421e+06   Min.   :     8.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:1.574e+07   1st Qu.:    76.0   1st Qu.:   1.00   1st Qu.:  0.000  
##  Median :2.749e+07   Median :   184.0   Median :   3.00   Median :  0.000  
##  Mean   :1.046e+08   Mean   :  1063.5   Mean   :  20.35   Mean   :  1.933  
##  3rd Qu.:5.680e+07   3rd Qu.:   519.8   3rd Qu.:  10.00   3rd Qu.:  2.000  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :1948.00   Max.   :274.000  
##                                                                            
##      COMP_C             COMP_D             COMP_E            COMP_F        
##  Min.   :    0.00   Min.   :  0.0000   Min.   :  0.000   Min.   :    0.00  
##  1st Qu.:    4.00   1st Qu.:  0.0000   1st Qu.:  0.000   1st Qu.:    1.00  
##  Median :   14.00   Median :  0.0000   Median :  0.000   Median :    5.00  
##  Mean   :   85.95   Mean   :  0.4921   Mean   :  2.342   Mean   :   51.08  
##  3rd Qu.:   47.00   3rd Qu.:  0.0000   3rd Qu.:  1.000   3rd Qu.:   17.00  
##  Max.   :31566.00   Max.   :332.0000   Max.   :657.000   Max.   :25222.00  
##                                                                            
##      COMP_G           COMP_H             COMP_I             COMP_J        
##  Min.   :     2   Min.   :    0.00   Min.   :    0.00   Min.   :    0.00  
##  1st Qu.:    34   1st Qu.:    2.00   1st Qu.:    2.00   1st Qu.:    0.00  
##  Median :    81   Median :    9.00   Median :    8.00   Median :    1.00  
##  Mean   :   400   Mean   :   48.33   Mean   :   65.35   Mean   :   30.48  
##  3rd Qu.:   231   3rd Qu.:   29.00   3rd Qu.:   28.00   3rd Qu.:    6.00  
##  Max.   :150633   Max.   :19515.00   Max.   :29290.00   Max.   :38720.00  
##                                                                           
##      COMP_K             COMP_L             COMP_M             COMP_N       
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :    0.0  
##  1st Qu.:    0.00   1st Qu.:    0.00   1st Qu.:    1.00   1st Qu.:    2.0  
##  Median :    0.00   Median :    0.00   Median :    4.00   Median :    5.0  
##  Mean   :   19.63   Mean   :   18.61   Mean   :   62.66   Mean   :  102.8  
##  3rd Qu.:    3.00   3rd Qu.:    4.00   3rd Qu.:   15.00   3rd Qu.:   17.0  
##  Max.   :23738.00   Max.   :14003.00   Max.   :49181.00   Max.   :76757.0  
##                                                                            
##      COMP_O            COMP_P             COMP_Q             COMP_R      
##  Min.   :  1.000   Min.   :    0.00   Min.   :    0.00   Min.   :   0.0  
##  1st Qu.:  2.000   1st Qu.:    2.00   1st Qu.:    1.00   1st Qu.:   0.0  
##  Median :  2.000   Median :    6.00   Median :    4.00   Median :   2.0  
##  Mean   :  3.425   Mean   :   35.48   Mean   :   40.57   Mean   :  14.5  
##  3rd Qu.:  4.000   3rd Qu.:   19.00   3rd Qu.:   14.00   3rd Qu.:   7.0  
##  Max.   :153.000   Max.   :16030.00   Max.   :22248.00   Max.   :6687.0  
##                                                                          
##      COMP_S             COMP_T      COMP_U             HOTELS      
##  Min.   :    0.00   Min.   :0   Min.   : 0.00000   Min.   : 1.000  
##  1st Qu.:    6.00   1st Qu.:0   1st Qu.: 0.00000   1st Qu.: 1.000  
##  Median :   14.00   Median :0   Median : 0.00000   Median : 1.000  
##  Mean   :   59.36   Mean   :0   Mean   : 0.03788   Mean   : 3.381  
##  3rd Qu.:   35.00   3rd Qu.:0   3rd Qu.: 0.00000   3rd Qu.: 3.000  
##  Max.   :24832.00   Max.   :0   Max.   :64.00000   Max.   :97.000  
##                                                    NA's   :3425    
##       BEDS          Pr_Agencies        Pu_Agencies         Pr_Bank      
##  Min.   :    5.0   Min.   :   0.000   Min.   :  0.000   Min.   : 0.000  
##  1st Qu.:   42.0   1st Qu.:   0.000   1st Qu.:  1.000   1st Qu.: 0.000  
##  Median :   90.0   Median :   1.000   Median :  2.000   Median : 1.000  
##  Mean   :  290.8   Mean   :   3.919   Mean   :  3.069   Mean   : 1.383  
##  3rd Qu.:  239.0   3rd Qu.:   2.000   3rd Qu.:  2.000   3rd Qu.: 2.000  
##  Max.   :13247.0   Max.   :1693.000   Max.   :626.000   Max.   :83.000  
##  NA's   :3425      NA's   :1544       NA's   :1544      NA's   :1544    
##     Pu_Bank        Pr_Assets           Pu_Assets              Cars        
##  Min.   :0.000   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :      3  
##  1st Qu.:1.000   1st Qu.:0.000e+00   1st Qu.:4.478e+07   1st Qu.:    723  
##  Median :2.000   Median :3.749e+07   Median :1.505e+08   Median :   1657  
##  Mean   :1.615   Mean   :1.198e+10   Mean   :4.336e+09   Mean   :  11497  
##  3rd Qu.:2.000   3rd Qu.:1.296e+08   3rd Qu.:5.602e+08   3rd Qu.:   4727  
##  Max.   :8.000   Max.   :1.947e+13   Max.   :2.893e+12   Max.   :5740995  
##  NA's   :1544    NA's   :1544        NA's   :1544                         
##   Motorcycles        Wheeled_tractor      UBER           MAC         
##  Min.   :     33.0   Min.   :   0.000   1   : 103   Min.   :  1.000  
##  1st Qu.:    607.2   1st Qu.:   0.000   NA's:3963   1st Qu.:  1.000  
##  Median :   1370.5   Median :   0.000               Median :  2.000  
##  Mean   :   5401.2   Mean   :   7.081               Mean   :  4.597  
##  3rd Qu.:   3638.0   3rd Qu.:   2.000               3rd Qu.:  4.000  
##  Max.   :1134570.0   Max.   :3236.000               Max.   :130.000  
##                                                     NA's   :3927     
##     WAL-MART       POST_OFFICES     LOG_GDP_CAPITA  
##  Min.   : 1.000   Min.   :  1.000   Min.   : 8.431  
##  1st Qu.: 1.000   1st Qu.:  1.000   1st Qu.: 9.181  
##  Median : 1.000   Median :  1.000   Median : 9.776  
##  Mean   : 2.151   Mean   :  2.179   Mean   : 9.764  
##  3rd Qu.: 2.000   3rd Qu.:  2.000   3rd Qu.:10.241  
##  Max.   :26.000   Max.   :225.000   Max.   :12.659  
##  NA's   :3973     NA's   :85
Pr_Assets

Pr_Assets and Pu_Assets are selected as assets are equivalent to money, which can count towards interests and investments, and investments is a factor that affects GDP.
There are 1544 NA values for Pr_Assets, and could be due to the cities not having any Private Assets. We will drop the NA values as it is difficult to assign a value to it.

brazil <- brazil%>%
  filter(!is.na(Pr_Assets))
Pu_Assets

There are 1544 NA values for Pu_Assets, and could be due to the cities not having any Public Assets. We will drop the NA values as it is difficult to assign a value to it.

brazil <- brazil%>%
  filter(!is.na(Pu_Assets))


After removing all the NA values, we can conduct a correlation matrix

Correlation matrix

We will select our variables first

brazil3 <- brazil%>%
  dplyr::select("IBGE_RES_POP", "Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV", "FIXED_PHONES", "AREA", "GVA_AGROPEC", "GVA_INDUSTRY", "GVA_SERVICES", "TAXES", "GDP", "POP_GDP", "GDP_CAPITA", "MUN_EXPENDIT", "COMP_TOT", "Cars", "Motorcycles", "Pr_Assets", "Pu_Assets")


Before doing so, we will double check the NAs in the variables

summary(brazil3)
##   IBGE_RES_POP        Active_pop      IBGE_CROP_PRODUCTION IDHM_Longevidade
##  Min.   :    1641   Min.   :    358   Min.   :      0      Min.   :0.6770  
##  1st Qu.:   10328   1st Qu.:   3826   1st Qu.:   6572      1st Qu.:0.7940  
##  Median :   18948   Median :   7970   Median :  32439      Median :0.8230  
##  Mean   :   56654   Mean   :  31660   Mean   :  85956      Mean   :0.8164  
##  3rd Qu.:   37652   3rd Qu.:  17681   3rd Qu.:  88482      3rd Qu.:0.8450  
##  Max.   :11253503   Max.   :7058221   Max.   :3274885      Max.   :0.8940  
##  IDHM_Educacao      IDHM_Renda         PAY_TV           FIXED_PHONES    
##  Min.   :0.3150   Min.   :0.4380   Min.   :      7.0   Min.   :     30  
##  1st Qu.:0.5320   1st Qu.:0.6240   1st Qu.:    220.2   1st Qu.:    375  
##  Median :0.6020   Median :0.6930   Median :    583.5   Median :    902  
##  Mean   :0.5932   Mean   :0.6766   Mean   :   5779.7   Mean   :  12631  
##  3rd Qu.:0.6590   3rd Qu.:0.7290   3rd Qu.:   1763.8   3rd Qu.:   3096  
##  Max.   :0.8250   Max.   :0.8910   Max.   :2047668.0   Max.   :5543127  
##       AREA            GVA_AGROPEC       GVA_INDUSTRY       GVA_SERVICES      
##  Min.   :     3.61   Min.   :      0   Min.   :       2   Min.   :        9  
##  1st Qu.:   249.04   1st Qu.:  11106   1st Qu.:    5661   1st Qu.:    30409  
##  Median :   495.38   Median :  34661   Median :   23559   Median :    91102  
##  Mean   :  1684.11   Mean   :  67995   Mean   :  324442   Mean   :   925640  
##  3rd Qu.:  1195.48   3rd Qu.:  80643   3rd Qu.:  139071   3rd Qu.:   266119  
##  Max.   :122461.09   Max.   :1402282   Max.   :63306755   Max.   :464656988  
##      TAXES                GDP               POP_GDP           GDP_CAPITA    
##  Min.   :     -235   Min.   :       34   Min.   :    1573   Min.   :  4849  
##  1st Qu.:     4210   1st Qu.:   116175   1st Qu.:   10810   1st Qu.: 12565  
##  Median :    15862   Median :   259097   Median :   20184   Median : 21051  
##  Mean   :   226337   Mean   :  1738249   Mean   :   61249   Mean   : 25624  
##  3rd Qu.:    61560   3rd Qu.:   709298   3rd Qu.:   40625   3rd Qu.: 31663  
##  Max.   :117125387   Max.   :687035890   Max.   :12038175   Max.   :314638  
##   MUN_EXPENDIT          COMP_TOT             Cars          Motorcycles     
##  Min.   :2.823e+06   Min.   :    15.0   Min.   :      7   Min.   :    114  
##  1st Qu.:2.593e+07   1st Qu.:   188.0   1st Qu.:   1671   1st Qu.:   1197  
##  Median :4.494e+07   Median :   377.0   Median :   3442   Median :   2650  
##  Mean   :1.575e+08   Mean   :  1664.0   Mean   :  18065   Mean   :   8205  
##  3rd Qu.:8.971e+07   3rd Qu.:   926.8   3rd Qu.:   9085   3rd Qu.:   6498  
##  Max.   :4.577e+10   Max.   :530446.0   Max.   :5740995   Max.   :1134570  
##    Pr_Assets           Pu_Assets        
##  Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0.000e+00   1st Qu.:4.478e+07  
##  Median :3.749e+07   Median :1.505e+08  
##  Mean   :1.198e+10   Mean   :4.336e+09  
##  3rd Qu.:1.296e+08   3rd Qu.:5.602e+08  
##  Max.   :1.947e+13   Max.   :2.893e+12

After confirming there are no NA values, we can now plot the correlation matrix

corrplot(cor(brazil3), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.4, method="number", type="upper")


As seen in the correlation plot, many variables are highly correlated. To prevent this, we will derive new variables.

In addition, since IBGE_RES_POP and Active_pop have a correlation value of 1, we conclude that these two variables are rather similar. IBGE_RES_POP and Active_pop are both population of the municipality, but we would choose Active_pop instead since they are the economically active group, which contributes most to the economy, and hence would perhaps have the greatest correlation to GDP_CAPITA. This could be due to the economically active group earning income, spending income, and paying taxes.

5.6 Deriving new variables

Since GDP_CAPITA is taken by GDP divided by population, population affects GDP. For consistency, we will divide several variables by population.

PAY_TV

We will divide PAY_TV by population, to be able to calculate approximate number of pay_tv each individual has

brazil <- brazil %>%
  mutate(PAY_TV_p = PAY_TV/POP_GDP)
FIXED_PHONES

We will divide FIXED_PHONES by population, to be able to calculate approximate number of fixed_phone each individual has

brazil <- brazil %>%
  mutate(FIXED_PHONES_p = FIXED_PHONES/POP_GDP)
Cars

We will divide Cars by population, to be able to calculate approximate number of cars each individual has

brazil <- brazil %>%
  mutate(Cars_p =Cars/POP_GDP)
Motorcycles

We will divide Motorcycles by population, to be able to calculate approximate number of motorcycles each individual has

brazil <- brazil %>%
  mutate(Motorcycles_p = Motorcycles/POP_GDP)
GVA_AGROPEC

GVA tells us how much value is added or lost from a municipality, which can be used in the calculation of GDP. We divide GVA_AGROPEC by population, to calculate the approximate Gross Added Value of each individual

brazil <- brazil %>%
  mutate(GVA_AGROPEC_p = GVA_AGROPEC/POP_GDP)
GVA_INDUSTRY

We divide GVA_INDUSTRY by population, to calculate the approximate Gross Added Value of each individual

brazil <- brazil %>%
  mutate(GVA_INDUSTRY_p = GVA_INDUSTRY/POP_GDP)
GVA_SERVICES

We divide GVA_SERVICES by population, to calculate the approximate Gross Added Value of each individual

brazil <- brazil %>%
  mutate(GVA_SERVICES_p = GVA_SERVICES/POP_GDP)
MUN_EXPENDIT

We divide MUN_EXPENDIT by population, to calculate the approximate expenses of each individual in each municipality

brazil <- brazil %>%
  mutate(MUN_EXPENDIT_p = MUN_EXPENDIT/POP_GDP)

We will also derive new variables such as pop_density, as the higher the population density in a city, the higher the possibility of spending in that city, increasing the GDP_CAPITA. We will use population divided by area to calculate pop_density ##### pop_density

brazil <- brazil %>%
  mutate(pop_density = POP_GDP/AREA)

We will also derive a tax to gdp ratio, which tells us how the government spend the tax money. It is calculated by taking taxes dividing by GDP. A higher ratio suggests higher ability for sustainable economic growth. ##### tax_to_gdp

brazil <- brazil %>%
  mutate(tax_to_gdp = TAXES/GDP)

5.7 Correlation matrix

Select variables first

brazil4 <- brazil%>%
  dplyr::select("Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV_p", "FIXED_PHONES_p", "GVA_AGROPEC_p", "GVA_INDUSTRY_p", "GVA_SERVICES_p", "tax_to_gdp", "MUN_EXPENDIT_p", "GDP_CAPITA", "COMP_TOT", "Cars_p", "Motorcycles_p", "Pr_Assets", "Pu_Assets", "pop_density")

We will, again, double confirm there are no NA values in our data set

summary(brazil4)
##    Active_pop      IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Educacao   
##  Min.   :    358   Min.   :      0      Min.   :0.6770   Min.   :0.3150  
##  1st Qu.:   3826   1st Qu.:   6572      1st Qu.:0.7940   1st Qu.:0.5320  
##  Median :   7970   Median :  32439      Median :0.8230   Median :0.6020  
##  Mean   :  31660   Mean   :  85956      Mean   :0.8164   Mean   :0.5932  
##  3rd Qu.:  17681   3rd Qu.:  88482      3rd Qu.:0.8450   3rd Qu.:0.6590  
##  Max.   :7058221   Max.   :3274885      Max.   :0.8940   Max.   :0.8250  
##    IDHM_Renda        PAY_TV_p         FIXED_PHONES_p     GVA_AGROPEC_p    
##  Min.   :0.4380   Min.   :0.0006389   Min.   :0.001669   Min.   : 0.0000  
##  1st Qu.:0.6240   1st Qu.:0.0168189   1st Qu.:0.028060   1st Qu.: 0.3333  
##  Median :0.6930   Median :0.0339047   Median :0.068502   Median : 1.5590  
##  Mean   :0.6766   Mean   :0.0470778   Mean   :0.086474   Mean   : 3.7226  
##  3rd Qu.:0.7290   3rd Qu.:0.0614278   3rd Qu.:0.117982   3rd Qu.: 4.8311  
##  Max.   :0.8910   Max.   :0.4805407   Max.   :1.037262   Max.   :75.9531  
##  GVA_INDUSTRY_p      GVA_SERVICES_p        tax_to_gdp       MUN_EXPENDIT_p    
##  Min.   :  0.00017   Min.   :  0.00124   Min.   : -2.0920   Min.   :   90.45  
##  1st Qu.:  0.43654   1st Qu.:  2.49250   1st Qu.:  0.0398   1st Qu.: 1912.71  
##  Median :  1.54206   Median :  6.22514   Median :  0.0691   Median : 2353.07  
##  Mean   :  4.46843   Mean   :  7.78589   Mean   : 10.1533   Mean   : 2582.38  
##  3rd Qu.:  4.67058   3rd Qu.: 10.88331   3rd Qu.:  0.1127   3rd Qu.: 2912.14  
##  Max.   :183.79405   Max.   :108.80127   Max.   :324.6684   Max.   :12680.22  
##    GDP_CAPITA        COMP_TOT            Cars_p          Motorcycles_p     
##  Min.   :  4849   Min.   :    15.0   Min.   :0.0003393   Min.   :0.008056  
##  1st Qu.: 12565   1st Qu.:   188.0   1st Qu.:0.1123884   1st Qu.:0.095797  
##  Median : 21051   Median :   377.0   Median :0.2648455   Median :0.133799  
##  Mean   : 25624   Mean   :  1664.0   Mean   :0.2440236   Mean   :0.146095  
##  3rd Qu.: 31663   3rd Qu.:   926.8   3rd Qu.:0.3552831   3rd Qu.:0.183595  
##  Max.   :314638   Max.   :530446.0   Max.   :0.6409822   Max.   :0.534068  
##    Pr_Assets           Pu_Assets          pop_density       
##  Min.   :0.000e+00   Min.   :0.000e+00   Min.   :    0.225  
##  1st Qu.:0.000e+00   1st Qu.:4.478e+07   1st Qu.:   17.222  
##  Median :3.749e+07   Median :1.505e+08   Median :   33.820  
##  Mean   :1.198e+10   Mean   :4.336e+09   Mean   :  179.058  
##  3rd Qu.:1.296e+08   3rd Qu.:5.602e+08   3rd Qu.:   82.319  
##  Max.   :1.947e+13   Max.   :2.893e+12   Max.   :13533.497
corrplot(cor(brazil4), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="number", type="upper")

corrplot(cor(brazil4), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="square", type="upper")

We chose to order the variables alphabetically for easy identifcation. From the scatterplot, we see that Active_pop is highly correlated to COMP_TOT (correlation value = 0.97). In view of this, we should only include either one in the subsequent model building. As a result, COMP_TOT is excluded in the subsequent model builing.

5.8 Correlation matrix without COMP_TOT

corrplot(cor(brazil4[,c(1:13, 15:19)]), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="number", type="upper")

corrplot(cor(brazil4[,c(1:13, 15:19)]), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="square", type="upper")


Having selecting our variables needed, we can now do the modelling.
We will be doing a linear modelling, with the GDP_CAPITA with all the independent variables that we have identified previously.

6. Multiple Linear Regression

6.1 Join the aspatial and geospatial data

Before joining, we have identified that the names of variables in name_muni and CITY in mun and brazil respectively are not consistent. We will hence upper case all the letters for both variables.

mun$name_muni <- toupper(mun$name_muni)
brazil$CITY <- toupper(brazil$CITY)
brazil_mun <- right_join(mun, brazil, by=c("name_muni"="CITY", "abbrev_state"="STATE"))
## Warning: Column `abbrev_state`/`STATE` joining character vector and factor,
## coercing into character vector

Check for missing geometries

any(is.na(st_dimension(brazil_mun)))
## [1] FALSE
qtm(brazil_mun, "GDP_CAPITA", border=NULL, scale=0.6) + tm_legend(main.title="GDP_CAPITA in 2016", main.title.position="centre")

6.2 Building Multiple linear regression

Select variables to use first

brazil6 <- brazil%>%
  dplyr::select("Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV_p", "FIXED_PHONES_p", "GVA_AGROPEC_p", "GVA_INDUSTRY_p", "GVA_SERVICES_p", "tax_to_gdp", "MUN_EXPENDIT_p", "GDP_CAPITA", "Cars_p", "Motorcycles_p", "Pr_Assets", "Pu_Assets", "pop_density")
Histogram Plots Distribution of Variables Selected

Before we conduct the linear regression, we will plot histograms of each of the variables, to have an idea of the distribution of the variables.

Active_pop <- ggplot(data=brazil6, aes(x=Active_pop))+
  geom_histogram(bins=25, color="black", fill="light blue")

IBGE_CROP_PRODUCTION <- ggplot(data=brazil6, aes(x=IBGE_CROP_PRODUCTION))+
  geom_histogram(bins=25, color="black", fill="light blue")

IDHM_Longevidade <- ggplot(data=brazil6, aes(x=IDHM_Longevidade))+
  geom_histogram(bins=25, color="black", fill="light blue")

IDHM_Educacao <- ggplot(data=brazil6, aes(x=IDHM_Educacao))+
  geom_histogram(bins=25, color="black", fill="light blue")

IDHM_Renda <- ggplot(data=brazil6, aes(x=IDHM_Renda))+
  geom_histogram(bins=25, color="black", fill="light blue")

PAY_TV_p <- ggplot(data=brazil6, aes(x=PAY_TV_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

FIXED_PHONES_p <- ggplot(data=brazil6, aes(x=FIXED_PHONES_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

GVA_AGROPEC_p <- ggplot(data=brazil6, aes(x=GVA_AGROPEC_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

GVA_INDUSTRY_p <- ggplot(data=brazil6, aes(x=GVA_INDUSTRY_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

GVA_SERVICES_p <- ggplot(data=brazil6, aes(x=GVA_SERVICES_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

tax_to_gdp <- ggplot(data=brazil6, aes(x=tax_to_gdp))+
  geom_histogram(bins=25, color="black", fill="light blue")

MUN_EXPENDIT_p <- ggplot(data=brazil6, aes(x=MUN_EXPENDIT_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

GDP_CAPITA <- ggplot(data=brazil6, aes(x=GDP_CAPITA))+
  geom_histogram(bins=25, color="black", fill="light blue")

Cars_p <- ggplot(data=brazil6, aes(x=Cars_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

Motorcycles_p <- ggplot(data=brazil6, aes(x=Motorcycles_p))+
  geom_histogram(bins=25, color="black", fill="light blue")

Pr_Assets <- ggplot(data=brazil6, aes(x=Pr_Assets))+
  geom_histogram(bins=25, color="black", fill="light blue")

Pu_Assets <- ggplot(data=brazil6, aes(x=Pu_Assets)) +
  geom_histogram(bins=25, color="black", fill="light blue")
  
pop_density <- ggplot(data=brazil6, aes(x=pop_density))+
  geom_histogram(bins=25, color="black", fill="light blue")

ggarrange(Active_pop, IBGE_CROP_PRODUCTION, IDHM_Longevidade, IDHM_Educacao, IDHM_Renda, PAY_TV_p, FIXED_PHONES_p, GVA_AGROPEC_p, GVA_INDUSTRY_p, GVA_SERVICES_p, tax_to_gdp, MUN_EXPENDIT_p, GDP_CAPITA, Cars_p, Motorcycles_p, Pr_Assets, Pu_Assets, pop_density, ncol=3, nrow=6)


We can tell from the data that most of them are skewed, either left or right. Some however, resemble a normal distribution. For eg, IDHM_Longevidade, IDHM_Educacao, IDHM_Renda, or Motorcycles_p (even though it is slightly right skewed)

Specification 1: GDP_CAPITA against all Xs
brazil_lm <- lm(GDP_CAPITA~., data=brazil6)
summary(brazil_lm)
## 
## Call:
## lm(formula = GDP_CAPITA ~ ., data = brazil6)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -36833  -3483  -1012   1588 223485 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.314e+04  5.651e+03  -2.326  0.02011 *  
## Active_pop           -1.338e-03  2.810e-03  -0.476  0.63404    
## IBGE_CROP_PRODUCTION  6.195e-03  1.121e-03   5.524 3.65e-08 ***
## IDHM_Longevidade      1.839e+03  8.235e+03   0.223  0.82335    
## IDHM_Educacao        -2.345e+03  3.799e+03  -0.617  0.53706    
## IDHM_Renda            1.903e+04  6.977e+03   2.728  0.00642 ** 
## PAY_TV_p             -1.145e+04  5.345e+03  -2.142  0.03231 *  
## FIXED_PHONES_p        1.249e+04  4.289e+03   2.912  0.00362 ** 
## GVA_AGROPEC_p         7.222e+02  3.940e+01  18.328  < 2e-16 ***
## GVA_INDUSTRY_p        1.091e+03  2.146e+01  50.864  < 2e-16 ***
## GVA_SERVICES_p        8.076e+02  2.991e+01  27.005  < 2e-16 ***
## tax_to_gdp            1.586e+01  5.950e+00   2.665  0.00774 ** 
## MUN_EXPENDIT_p        3.499e+00  2.213e-01  15.813  < 2e-16 ***
## Cars_p                4.868e+03  2.923e+03   1.665  0.09598 .  
## Motorcycles_p         2.096e+03  2.758e+03   0.760  0.44735    
## Pr_Assets             1.751e-09  8.438e-10   2.076  0.03804 *  
## Pu_Assets            -1.197e-08  7.664e-09  -1.562  0.11835    
## pop_density           1.315e+00  3.109e-01   4.230 2.42e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9102 on 2504 degrees of freedom
## Multiple R-squared:  0.8263, Adjusted R-squared:  0.8251 
## F-statistic: 700.6 on 17 and 2504 DF,  p-value: < 2.2e-16
AIC(brazil_lm)
## [1] 53159.52
BIC(brazil_lm)
## [1] 53270.34


With reference to the report above, it is clear that not all the independent variables are statistically significant. We will revise the model by removing those variables which are not statistically significant (p > 0.05).

brazil_lm2 <- lm(GDP_CAPITA~Active_pop + IBGE_CROP_PRODUCTION  + IDHM_Renda + Pr_Assets + Pu_Assets + FIXED_PHONES_p + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + pop_density, data=brazil6)
summary(brazil_lm2)
## 
## Call:
## lm(formula = GDP_CAPITA ~ Active_pop + IBGE_CROP_PRODUCTION + 
##     IDHM_Renda + Pr_Assets + Pu_Assets + FIXED_PHONES_p + GVA_AGROPEC_p + 
##     GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + 
##     pop_density, data = brazil6)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -37751  -3439  -1025   1539 223491 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.605e+04  2.363e+03  -6.794 1.36e-11 ***
## Active_pop           -2.318e-03  2.765e-03  -0.838  0.40198    
## IBGE_CROP_PRODUCTION  6.010e-03  1.104e-03   5.446 5.66e-08 ***
## IDHM_Renda            2.578e+04  3.909e+03   6.595 5.18e-11 ***
## Pr_Assets             1.978e-09  8.398e-10   2.355  0.01858 *  
## Pu_Assets            -1.093e-08  7.641e-09  -1.431  0.15258    
## FIXED_PHONES_p        9.577e+03  3.901e+03   2.455  0.01415 *  
## GVA_AGROPEC_p         7.374e+02  3.851e+01  19.150  < 2e-16 ***
## GVA_INDUSTRY_p        1.094e+03  2.131e+01  51.342  < 2e-16 ***
## GVA_SERVICES_p        8.016e+02  2.970e+01  26.993  < 2e-16 ***
## MUN_EXPENDIT_p        3.384e+00  2.148e-01  15.757  < 2e-16 ***
## tax_to_gdp            1.555e+01  5.941e+00   2.617  0.00893 ** 
## pop_density           1.211e+00  3.079e-01   3.932 8.67e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9109 on 2509 degrees of freedom
## Multiple R-squared:  0.8257, Adjusted R-squared:  0.8248 
## F-statistic: 990.2 on 12 and 2509 DF,  p-value: < 2.2e-16
ols_regress(brazil_lm2)
##                            Model Summary                             
## --------------------------------------------------------------------
## R                       0.909       RMSE                   9109.326 
## R-Squared               0.826       Coef. Var                35.549 
## Adj. R-Squared          0.825       MSE                82979821.453 
## Pred R-Squared          0.819       MAE                    4298.952 
## --------------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                        ANOVA                                        
## -----------------------------------------------------------------------------------
##                         Sum of                                                     
##                        Squares          DF        Mean Square       F         Sig. 
## -----------------------------------------------------------------------------------
## Regression    986015686339.237          12    82167973861.603    990.216    0.0000 
## Residual      208196372025.838        2509       82979821.453                      
## Total             1.194212e+12        2521                                         
## -----------------------------------------------------------------------------------
## 
##                                              Parameter Estimates                                              
## -------------------------------------------------------------------------------------------------------------
##                model          Beta    Std. Error    Std. Beta      t        Sig          lower         upper 
## -------------------------------------------------------------------------------------------------------------
##          (Intercept)    -16052.762      2362.934                 -6.794    0.000    -20686.263    -11419.261 
##           Active_pop        -0.002         0.003       -0.019    -0.838    0.402        -0.008         0.003 
## IBGE_CROP_PRODUCTION         0.006         0.001        0.053     5.446    0.000         0.004         0.008 
##           IDHM_Renda     25776.575      3908.704        0.086     6.595    0.000     18111.958     33441.193 
##            Pr_Assets         0.000         0.000        0.038     2.355    0.019         0.000         0.000 
##            Pu_Assets         0.000         0.000       -0.034    -1.431    0.153         0.000         0.000 
##       FIXED_PHONES_p      9576.562      3900.617        0.035     2.455    0.014      1927.802     17225.321 
##        GVA_AGROPEC_p       737.390        38.506        0.201    19.150    0.000       661.883       812.896 
##       GVA_INDUSTRY_p      1094.057        21.309        0.508    51.342    0.000      1052.271      1135.842 
##       GVA_SERVICES_p       801.646        29.699        0.302    26.993    0.000       743.410       859.882 
##       MUN_EXPENDIT_p         3.384         0.215        0.167    15.757    0.000         2.963         3.806 
##           tax_to_gdp        15.546         5.941        0.022     2.617    0.009         3.896        27.195 
##          pop_density         1.211         0.308        0.041     3.932    0.000         0.607         1.814 
## -------------------------------------------------------------------------------------------------------------
AIC(brazil_lm2)
## [1] 53158.51
BIC(brazil_lm2)
## [1] 53240.17


Adjusted R2 = 0.8248
AIC = 53158.51
BIC = 53240.17

In order to use this model for the rest of our analysis, we need to check the assumptions of multiple linear regression.
##### 6.2.1 Check for multicolinearity

ols_vif_tol(brazil_lm)
##               Variables Tolerance      VIF
## 1            Active_pop 0.1280218 7.811170
## 2  IBGE_CROP_PRODUCTION 0.7140220 1.400517
## 3      IDHM_Longevidade 0.2998088 3.335459
## 4         IDHM_Educacao 0.2956733 3.382112
## 5            IDHM_Renda 0.1267334 7.890583
## 6              PAY_TV_p 0.5532049 1.807648
## 7        FIXED_PHONES_p 0.2860655 3.495703
## 8         GVA_AGROPEC_p 0.5983846 1.671166
## 9        GVA_INDUSTRY_p 0.7002448 1.428072
## 10       GVA_SERVICES_p 0.5456202 1.832777
## 11           tax_to_gdp 0.9697863 1.031155
## 12       MUN_EXPENDIT_p 0.5852129 1.708780
## 13               Cars_p 0.2096788 4.769200
## 14        Motorcycles_p 0.8963326 1.115657
## 15            Pr_Assets 0.2709912 3.690157
## 16            Pu_Assets 0.1250262 7.998324
## 17          pop_density 0.6246143 1.600988


Since the VIF of the independent variables are less than 10, we can safely conclude that there are no sign of multicollinearity among the independent variables. This suggests that there is a low to moderate correlation between variables, but it is not a significant cause for concern.

6.2.2 Test for Non-Linearity

Test the assumption that linearity and additivity of the relationship between dependent and independent variables

ols_plot_resid_fit(brazil_lm2)


Looking at the residuals vs fitted values plot, the red line is approximately at 0. There is no pattern in the residual plot, suggesting that we can assume a linear relationship between the predictors and outcome variables.

6.2.3 Test for Normality Assumption
ols_plot_resid_hist(brazil_lm2)


The figure reveals that the residual of the multiple linear regression model resemble normal distribution
We will make another test to further verify out testing of assumption

fnorm <- fitdist(residuals(brazil_lm2), distr="norm") 
summary(fnorm)
## Fitting of the distribution ' norm ' by maximum likelihood 
## Parameters : 
##          estimate Std. Error
## mean 1.639439e-13   179.8293
## sd   9.085818e+03   128.1039
## Loglikelihood:  -26565.26   AIC:  53134.51   BIC:  53146.18 
## Correlation matrix:
##      mean sd
## mean    1  0
## sd      0  1
plot(fnorm)


Q-Q plot and P-P plot shows no deviation from normal distribution. From the code chunk below, at the 95% significance level, KS statistic is smaller than the critical value (0.1968861 < 0.3205551). Do not reject null hypothesis. There is insufficient evidence to suspect that the residuals of the multiple linear regression model are not normally distributed. Hence, they are normally distributed.

par(mfrow=c(2,2))
plot(brazil_lm2)

gofstat(fnorm,discrete=FALSE)
## Goodness-of-fit statistics
##                              1-mle-norm
## Kolmogorov-Smirnov statistic  0.1968861
## Cramer-von Mises statistic   39.4846927
## Anderson-Darling statistic          Inf
## 
## Goodness-of-fit criteria
##                                1-mle-norm
## Akaike's Information Criterion   53134.51
## Bayesian Information Criterion   53146.18
KScritvalue <- 1.36/sqrt(length(brazil6))
KScritvalue 
## [1] 0.3205551


Looking at the spread location plot, the variances of the residual points increases with the value of the fitted outcome variable, suggesting heteroscedasticity.
As some of our assumptions are not met, we run a 2nd specification: log GDP_CAPITA against all other Xs to correct extreme values, and eliminate heteroscedasticity..

6.3 Multiple Log-Linear Regression

brazil_lglm <- lm(log(GDP_CAPITA)~., data=brazil6)
summary(brazil_lglm)
## 
## Call:
## lm(formula = log(GDP_CAPITA) ~ ., data = brazil6)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.40751 -0.15373 -0.01677  0.13063  1.99913 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           6.418e+00  1.628e-01  39.413  < 2e-16 ***
## Active_pop            1.626e-07  8.097e-08   2.008  0.04474 *  
## IBGE_CROP_PRODUCTION  2.764e-07  3.232e-08   8.553  < 2e-16 ***
## IDHM_Longevidade      7.303e-01  2.373e-01   3.077  0.00211 ** 
## IDHM_Educacao         1.116e-01  1.095e-01   1.019  0.30822    
## IDHM_Renda            3.231e+00  2.011e-01  16.068  < 2e-16 ***
## PAY_TV_p             -4.996e-01  1.540e-01  -3.243  0.00120 ** 
## FIXED_PHONES_p        7.280e-02  1.236e-01   0.589  0.55592    
## GVA_AGROPEC_p         2.209e-02  1.136e-03  19.451  < 2e-16 ***
## GVA_INDUSTRY_p        1.766e-02  6.184e-04  28.556  < 2e-16 ***
## GVA_SERVICES_p        1.453e-02  8.619e-04  16.858  < 2e-16 ***
## tax_to_gdp            8.545e-04  1.715e-04   4.983 6.67e-07 ***
## MUN_EXPENDIT_p        8.822e-05  6.377e-06  13.834  < 2e-16 ***
## Cars_p                5.273e-01  8.424e-02   6.259 4.53e-10 ***
## Motorcycles_p         5.670e-02  7.949e-02   0.713  0.47572    
## Pr_Assets             2.495e-14  2.432e-14   1.026  0.30494    
## Pu_Assets            -6.188e-13  2.209e-13  -2.801  0.00513 ** 
## pop_density           1.161e-05  8.959e-06   1.296  0.19511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2623 on 2504 degrees of freedom
## Multiple R-squared:  0.8402, Adjusted R-squared:  0.8391 
## F-statistic: 774.6 on 17 and 2504 DF,  p-value: < 2.2e-16
AIC(brazil_lglm)
## [1] 427.1781
BIC(brazil_lglm)
## [1] 538.0015


With reference to the report above, it is clear that not all the independent variables are statistically significant. We will revise the model by removing those variables which are not statistically significant.
##### 6.3.1 Selecting statistically significant variables

brazil_lglm2 <- lm(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil6)
summary(brazil_lglm2)
## 
## Call:
## lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + 
##     IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + 
##     MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
## IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
## IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
## GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
## GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
## GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
## MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
## tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
## Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared:  0.8387, Adjusted R-squared:  0.8381 
## F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16
ols_regress(brazil_lglm2)
##                         Model Summary                         
## -------------------------------------------------------------
## R                       0.916       RMSE               0.263 
## R-Squared               0.839       Coef. Var          2.651 
## Adj. R-Squared          0.838       MSE                0.069 
## Pred R-Squared          0.833       MAE                0.189 
## -------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
## 
##                                  ANOVA                                   
## ------------------------------------------------------------------------
##                 Sum of                                                  
##                Squares          DF    Mean Square       F          Sig. 
## ------------------------------------------------------------------------
## Regression     904.463           9        100.496    1451.153    0.0000 
## Residual       173.962        2512          0.069                       
## Total         1078.424        2521                                      
## ------------------------------------------------------------------------
## 
##                                      Parameter Estimates                                       
## ----------------------------------------------------------------------------------------------
##                model     Beta    Std. Error    Std. Beta      t        Sig     lower    upper 
## ----------------------------------------------------------------------------------------------
##          (Intercept)    6.462         0.158                 40.848    0.000    6.152    6.773 
## IBGE_CROP_PRODUCTION    0.000         0.000        0.081     8.616    0.000    0.000    0.000 
##     IDHM_Longevidade    0.663         0.236        0.041     2.813    0.005    0.201    1.126 
##           IDHM_Renda    3.362         0.174        0.375    19.370    0.000    3.022    3.703 
##        GVA_AGROPEC_p    0.022         0.001        0.202    20.869    0.000    0.020    0.024 
##       GVA_INDUSTRY_p    0.018         0.001        0.275    28.963    0.000    0.017    0.019 
##       GVA_SERVICES_p    0.015         0.001        0.183    17.938    0.000    0.013    0.016 
##       MUN_EXPENDIT_p    0.000         0.000        0.134    13.231    0.000    0.000    0.000 
##           tax_to_gdp    0.001         0.000        0.040     4.928    0.000    0.001    0.001 
##               Cars_p    0.521         0.080        0.108     6.492    0.000    0.364    0.678 
## ----------------------------------------------------------------------------------------------
AIC(brazil_lglm2)
## [1] 435.3718
BIC(brazil_lglm2)
## [1] 499.5327


Adjusted R2=0.8381
AIC = 435.3718
BIC = 499.5327

6.3.2 Checking assumptions
ols_plot_resid_hist(brazil_lglm2)

fnorm <- fitdist(residuals(brazil_lglm2), distr="norm") 
summary(fnorm)
## Fitting of the distribution ' norm ' by maximum likelihood 
## Parameters : 
##          estimate  Std. Error
## mean 6.268681e-18 0.005229764
## sd   2.626362e-01 0.003697760
## Loglikelihood:  -206.6859   AIC:  417.3718   BIC:  429.0374 
## Correlation matrix:
##      mean sd
## mean    1  0
## sd      0  1
plot(fnorm)

par(mfrow=c(2,2))
plot(brazil_lglm2)

gofstat(fnorm,discrete=FALSE)
## Goodness-of-fit statistics
##                               1-mle-norm
## Kolmogorov-Smirnov statistic  0.05891883
## Cramer-von Mises statistic    3.24343354
## Anderson-Darling statistic   21.14080686
## 
## Goodness-of-fit criteria
##                                1-mle-norm
## Akaike's Information Criterion   417.3718
## Bayesian Information Criterion   429.0374
KScritvalue <- 1.36/sqrt(length(brazil6))
KScritvalue 
## [1] 0.3205551


The plots are now more favourable and there is a higher adjusted R2 and lower AIC and BIC values. The significant variables have changed slightly too.

summary(brazil_lglm2)
## 
## Call:
## lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + 
##     IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + 
##     MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
## IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
## IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
## GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
## GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
## GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
## MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
## tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
## Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared:  0.8387, Adjusted R-squared:  0.8381 
## F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16

The following equation includes variables that are significant in the log multiple linear regression model.
###### GDP_CAPITA = 6.465 + 2.758e-07 IBGE_CROP_PRODUCTION + 0.6633 IDHM_Longevidade + 3.362 IDHM_Renda + 0.5208 Cars_p + 0.02223 GVA_AGROPEC_p + 0.01779 GVA_INDUSTRY_p + 0.01459 GVA_SERVICES + 8.189e-05 MUN_EXPENDIT_p + 8.434e-04 tax_to_gdp

6.4 Testing for Spatial Autocorrelation

We have to visualise the residuals of the multiple linear regression model that we have achieved above
In order to perform spatial autocorrelation test, We will need to convert brazil into a SpatialPointsDataFrame

6.4.1 SpatialPoint
brazil.point.sf <- st_as_sf(brazil, coords=c("LONG", "LAT"), crs=4674)%>%
  st_transform(crs=4674)
6.4.2 SpatialPolygon
brazil.polygon.sf <- right_join(mun, brazil, by=c("name_muni" = "CITY", "abbrev_state" = "STATE"))
## Warning: Column `abbrev_state`/`STATE` joining character vector and factor,
## coercing into character vector
6.4.3 Convert to sf
brazil.polygon.sf <- st_as_sf(brazil.polygon.sf, crs=4674) %>%
  st_transform(crs=4674)
6.4.4 Residuals

Next, we will export the residual of the model and save it as a separate data frame

mlr.output <- as.data.frame(brazil_lglm2$residuals)

We will then join this newly created mlr.output data frame with brazil.point.sf object and brazil.polygon.sf object

brazil.point.res.sf <- cbind(brazil.point.sf, 
                        brazil_lglm2$residuals) %>%
rename(`MLR_RES` = `brazil_lglm2.residuals`)

brazil.polygon.res.sf <- cbind(brazil.polygon.sf, 
                        brazil_lglm2$residuals) %>%
rename(`MLR_RES` = `brazil_lglm2.residuals`)

We will load the spatial point and spatial polygon point respectively

brazil.point.res.sf
## Simple feature collection with 2522 features and 91 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -72.9165 ymin: -33.68757 xmax: -34.82395 ymax: 2.816682
## geographic CRS: SIRGAS 2000
## First 10 features:
##             CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## 1      ABADIÂNIA    GO       0        15757             15609               148
## 2  ABDON BATISTA    SC       0         2653              2653                 0
## 3   ABREU E LIMA    PE       0        94429             94407                22
## 4     AÇAILÂNDIA    MA       0       104047            104018                29
## 5      ACAJUTIBA    BA       0        14653             14643                10
## 6          ACARÁ    PA       0        53569             53516                53
## 7         ACARAÚ    CE       0        57551             57542                 9
## 8         ACEGUÁ    RS       0         4394              4265               129
## 9       ACOPIARA    CE       0        51160             51160                 0
## 10    ACRELÂNDIA    AC       0        12538             12535                 3
##    IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9
## 1     4655          3233          1422    10656    139      650      894
## 2      848           234           614      724     12       32       49
## 3    28182         25944          2238    81482   1050     4405     6255
## 4    27523         20612          6911    78081   1442     5896     7924
## 5     4116          3632           484    12727    216      849     1282
## 6    11833          3014          8819    12590    265     1082     1436
## 7    14680          7410          7270    24117    358     1615     2084
## 8     1398           314          1084     1059     17       46       76
## 9    15041          7885          7156    25159    365     1521     1929
## 10    3473          1679          1794     5902    118      508      671
##    IBGE_10.14 Active_pop IBGE_60. IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION
## 1        1087       6896      990             10307                33085
## 2          63        479       89              5502                26195
## 3        7019      54749     8004               387                 2595
## 4        8368      49197     5254             27137                89420
## 5        1404       7412     1564              4570                11442
## 6        1537       7281      989             41637               342851
## 7        2558      15123     2379             18505                38871
## 8         119        684      117             31149               119866
## 9        2422      15121     3801             17482                 4244
## 10        710       3484      411              5807                31152
##    IDHM.Ranking.2010  IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao     ALT
## 1               2202 0.690      0.671            0.841         0.579 1017.55
## 2               2092 0.690      0.660            0.812         0.625  720.98
## 3               2477 0.679      0.625            0.791         0.632   27.06
## 4               2633 0.672      0.643            0.785         0.602  229.05
## 5               4613 0.580      0.560            0.723         0.487  183.93
## 6               5513 0.506      0.517            0.757         0.332    7.40
## 7               4125 0.601      0.554            0.758         0.517   17.29
## 8               2259 0.687      0.703            0.852         0.541  237.92
## 9               4279 0.595      0.563            0.724         0.517  312.96
## 10              4079 0.604      0.584            0.808         0.466  205.89
##    PAY_TV FIXED_PHONES    AREA                          REGIAO_TUR
## 1     227          720 1045.13 Região Turística Do Ouro E Cristais
## 2     109          260  237.16                  Vale Do Contestado
## 3    1418         4661  126.19        Costa Náutica Coroa Do Avião
## 4    1225         2618 5806.44                                <NA>
## 5     426          297  181.48                                <NA>
## 6     964          181 4343.55                  Araguaia-Tocantins
## 7    3032          479  845.47               Litoral Extremo Oeste
## 8     171          298 1551.34                        Pampa Gaúcho
## 9     426          598 2265.35                                <NA>
## 10    184          369 1807.95                                <NA>
##    CATEGORIA_TUR ESTIMATED_POP     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 1              C         19614 Rural Adjacente       42.84     16728.30
## 2              D          2577 Rural Adjacente    24996.75      3578.87
## 3              D         99622          Urbano        7.80    384262.09
## 4           <NA>        111757          Urbano   159853.84    488799.22
## 5           <NA>         15129 Rural Adjacente    23176.61         7.02
## 6              D         55513 Rural Adjacente   441281.74     43934.34
## 7              C         62557 Rural Adjacente    74352.86     94123.87
## 8              D          4858          Urbano   130383.04     10632.62
## 9           <NA>         53931 Rural Adjacente    35057.66     17008.70
## 10          <NA>         15020 Rural Adjacente    91143.89        14.02
##    GVA_SERVICES GVA_PUBLIC  GVA_TOTAL     TAXES        GDP POP_GDP GDP_CAPITA
## 1     138198.58   63396.20  261161.91  26822.58  287984.49   18427   15628.40
## 2      16011.10   17842.64   62429.36   2312.65   64742.01    2617   24739.02
## 3        526.04  336141.88 1254241.30 170264.52 1424505.83   98990   14390.40
## 4        779.84  364811.85 1793302.18 206244.13 1999546.31  110543   18088.40
## 5      33546.04   48142.87  111888.63   3269.82  115158.46   15764    7305.15
## 6      97080.41  188941.14  771237.62     17.18     788.42   54080   14578.70
## 7        187.46  188055.40  543990.18  36737.25  580727.43   61715    9409.83
## 8      66901.48   30193.24  238110.37  12221.86  250332.23    4731   52913.18
## 9     137661.95  155449.90     345.18     26.52  371701.24   53358    6966.18
## 10     32079.69   85106.34  222349.24   5261.66  227610.90   14120   16119.75
##                                                                GVA_MAIN
## 1                                                       Demais serviços
## 2  Administração, defesa, educação e saúde públicas e seguridade social
## 3                                                       Demais serviços
## 4                                                       Demais serviços
## 5  Administração, defesa, educação e saúde públicas e seguridade social
## 6           Agricultura, inclusive apoio à agricultura e a pós colheita
## 7  Administração, defesa, educação e saúde públicas e seguridade social
## 8           Agricultura, inclusive apoio à agricultura e a pós colheita
## 9  Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
##    MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1      37513019      288      5      9     26      0      2      7    117
## 2      19506956       69      2      0      4      0      0      2     35
## 3     119645700      841      1      0    130      0      2     26    434
## 4     214456331     1334     47      1    113      0      5     75    657
## 5      27275310       96      2      0      4      0      0      1     57
## 6     106368816      162      8      3      6      0      0      4     99
## 7     101483437      638     41      1     38      8      0      6    363
## 8      22028721      168      8      0      8      0      1      2     86
## 9      85042995      365      2      0     26      0      0      6    255
## 10     22507579      107      2      0     15      0      0      0     56
##    COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1      12     57      2      1      0      7     15      3     11      5      1
## 2       8      3      1      1      0      4      0      2      1      3      0
## 3      27     36     14      3      4     18     30      2     47     20      6
## 4      61     80     18      5     21     38     72      3     21     52     12
## 5       2      3      1      2      0      1      2      3      3      4      2
## 6       3      2      1      0      0      1      3      3     12      0      1
## 7       7     28      3      0      6      9     14      2     71     17      6
## 8       8      9      0      1      0      2     13      3      6      3      5
## 9       1     15      5      1      4      7      5      2      5      2      4
## 10      1      5      1      0      0      1      1      3      8      6      0
##    COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1       8      0      0      1   34           1           1       1       1
## 2       3      0      0     NA   NA           0           1       0       1
## 3      41      0      0     NA   NA           2           3       2       3
## 4      53      0      0      2   56           2           3       2       3
## 5       9      0      0     NA   NA           0           1       0       1
## 6      16      0      0     NA   NA           1           1       1       1
## 7      18      0      0     NA   NA           1           3       1       3
## 8      13      0      0     NA   NA           0           1       0       1
## 9      25      0      0      1   22           1           3       1       3
## 10      8      0      0      1   27           0           1       0       1
##    Pr_Assets  Pu_Assets  Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1   33724584   67091904  2838        1426               0 <NA>  NA       NA
## 2          0   42909056   976         345               2 <NA>  NA       NA
## 3  155632735  460626103 14579       10122               0 <NA>  NA       NA
## 4  125525251 1494221307  9935       24208              17 <NA>  NA       NA
## 5          0   50185684   834        1444               0 <NA>  NA       NA
## 6   22821995   37523391   652        3342               0 <NA>  NA       NA
## 7   57802114  529074069  3371       10448               0 <NA>  NA       NA
## 8          0   13450411  2046         591               5 <NA>  NA       NA
## 9   33077714  262320355  3158       11056               0 <NA>  NA       NA
## 10         0   86310524  1223        3343               0 <NA>  NA       NA
##    POST_OFFICES LOG_GDP_CAPITA    PAY_TV_p FIXED_PHONES_p     Cars_p
## 1             3       9.656845 0.012318880    0.039073099 0.15401313
## 2             1      10.116137 0.041650745    0.099350401 0.37294612
## 3             1       9.574317 0.014324679    0.047085564 0.14727750
## 4             1       9.803026 0.011081661    0.023683092 0.08987453
## 5             1       8.896335 0.027023598    0.018840396 0.05290535
## 6             1       9.587317 0.017825444    0.003346893 0.01205621
## 7             1       9.149510 0.049129061    0.007761484 0.05462205
## 8             2      10.876408 0.036144578    0.062988797 0.43246671
## 9            10       8.848822 0.007983807    0.011207317 0.05918513
## 10            1       9.687801 0.013031161    0.026133144 0.08661473
##    Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1     0.07738644  2.324849e-03   0.9078146199    7.499787269       2035.764
## 2     0.13183034  9.551681e+00   1.3675468093    6.118112342       7453.938
## 3     0.10225275  7.879584e-05   3.8818273563    0.005314072       1208.665
## 4     0.21899170  1.446078e+00   4.4218016518    0.007054630       1940.026
## 5     0.09160112  1.470224e+00   0.0004453184    2.128015732       1730.228
## 6     0.06179734  8.159795e+00   0.8123953402    1.795125925       1966.879
## 7     0.16929434  1.204778e+00   1.5251376489    0.003037511       1644.389
## 8     0.12492074  2.755930e+01   2.2474360600   14.141086451       4656.250
## 9     0.20720417  6.570272e-01   0.3187656959    2.579968327       1593.819
## 10    0.23675637  6.454950e+00   0.0009929178    2.271932720       1594.021
##    pop_density   tax_to_gdp      MLR_RES                    geometry
## 1    17.631299 9.313897e-02 -0.001283972 POINT (-48.71881 -16.18267)
## 2    11.034744 3.572101e-02 -0.241843560 POINT (-51.02527 -27.60899)
## 3   784.452017 1.195253e-01  0.240172219 POINT (-34.89913 -7.904449)
## 4    19.037999 1.031455e-01  0.316636847 POINT (-47.50666 -4.951377)
## 5    86.863566 2.839409e-02 -0.164691960 POINT (-38.01829 -11.66261)
## 6    12.450645 2.179042e-02  0.400526860 POINT (-48.20046 -1.963437)
## 7    72.994902 6.326075e-02  0.093778846 POINT (-40.11824 -2.885311)
## 8     3.049622 4.882256e-02 -0.013513261 POINT (-54.16473 -31.86402)
## 9    23.553976 7.134762e-05 -0.207225765 POINT (-39.45571 -6.092762)
## 10    7.809950 2.311691e-02  0.364934669 POINT (-67.05232 -10.07379)
brazil.polygon.res.sf
## Simple feature collection with 2522 features and 95 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 3.605727
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni     name_muni code_state abbrev_state CAPITAL IBGE_RES_POP
## 1    5200100     ABADIÂNIA         52           GO       0        15757
## 2    4200051 ABDON BATISTA         42           SC       0         2653
## 3    2600054  ABREU E LIMA         26           PE       0        94429
## 4    2100055    AÇAILÂNDIA         21           MA       0       104047
## 5    2900306     ACAJUTIBA         29           BA       0        14653
## 6    1500206         ACARÁ         15           PA       0        53569
## 7    2300200        ACARAÚ         23           CE       0        57551
## 8    4300034        ACEGUÁ         43           RS       0         4394
## 9    2300309      ACOPIARA         23           CE       0        51160
## 10   1200013    ACRELÂNDIA         12           AC       0        12538
##    IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1              15609               148    4655          3233          1422
## 2               2653                 0     848           234           614
## 3              94407                22   28182         25944          2238
## 4             104018                29   27523         20612          6911
## 5              14643                10    4116          3632           484
## 6              53516                53   11833          3014          8819
## 7              57542                 9   14680          7410          7270
## 8               4265               129    1398           314          1084
## 9              51160                 0   15041          7885          7156
## 10             12535                 3    3473          1679          1794
##    IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 Active_pop IBGE_60.
## 1     10656    139      650      894       1087       6896      990
## 2       724     12       32       49         63        479       89
## 3     81482   1050     4405     6255       7019      54749     8004
## 4     78081   1442     5896     7924       8368      49197     5254
## 5     12727    216      849     1282       1404       7412     1564
## 6     12590    265     1082     1436       1537       7281      989
## 7     24117    358     1615     2084       2558      15123     2379
## 8      1059     17       46       76        119        684      117
## 9     25159    365     1521     1929       2422      15121     3801
## 10     5902    118      508      671        710       3484      411
##    IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM.Ranking.2010  IDHM IDHM_Renda
## 1              10307                33085              2202 0.690      0.671
## 2               5502                26195              2092 0.690      0.660
## 3                387                 2595              2477 0.679      0.625
## 4              27137                89420              2633 0.672      0.643
## 5               4570                11442              4613 0.580      0.560
## 6              41637               342851              5513 0.506      0.517
## 7              18505                38871              4125 0.601      0.554
## 8              31149               119866              2259 0.687      0.703
## 9              17482                 4244              4279 0.595      0.563
## 10              5807                31152              4079 0.604      0.584
##    IDHM_Longevidade IDHM_Educacao      LONG        LAT     ALT PAY_TV
## 1             0.841         0.579 -48.71881 -16.182672 1017.55    227
## 2             0.812         0.625 -51.02527 -27.608987  720.98    109
## 3             0.791         0.632 -34.89913  -7.904449   27.06   1418
## 4             0.785         0.602 -47.50666  -4.951377  229.05   1225
## 5             0.723         0.487 -38.01829 -11.662613  183.93    426
## 6             0.757         0.332 -48.20046  -1.963437    7.40    964
## 7             0.758         0.517 -40.11824  -2.885311   17.29   3032
## 8             0.852         0.541 -54.16473 -31.864015  237.92    171
## 9             0.724         0.517 -39.45571  -6.092762  312.96    426
## 10            0.808         0.466 -67.05232 -10.073794  205.89    184
##    FIXED_PHONES    AREA                          REGIAO_TUR CATEGORIA_TUR
## 1           720 1045.13 Região Turística Do Ouro E Cristais             C
## 2           260  237.16                  Vale Do Contestado             D
## 3          4661  126.19        Costa Náutica Coroa Do Avião             D
## 4          2618 5806.44                                <NA>          <NA>
## 5           297  181.48                                <NA>          <NA>
## 6           181 4343.55                  Araguaia-Tocantins             D
## 7           479  845.47               Litoral Extremo Oeste             C
## 8           298 1551.34                        Pampa Gaúcho             D
## 9           598 2265.35                                <NA>          <NA>
## 10          369 1807.95                                <NA>          <NA>
##    ESTIMATED_POP     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 1          19614 Rural Adjacente       42.84     16728.30    138198.58
## 2           2577 Rural Adjacente    24996.75      3578.87     16011.10
## 3          99622          Urbano        7.80    384262.09       526.04
## 4         111757          Urbano   159853.84    488799.22       779.84
## 5          15129 Rural Adjacente    23176.61         7.02     33546.04
## 6          55513 Rural Adjacente   441281.74     43934.34     97080.41
## 7          62557 Rural Adjacente    74352.86     94123.87       187.46
## 8           4858          Urbano   130383.04     10632.62     66901.48
## 9          53931 Rural Adjacente    35057.66     17008.70    137661.95
## 10         15020 Rural Adjacente    91143.89        14.02     32079.69
##    GVA_PUBLIC  GVA_TOTAL     TAXES        GDP POP_GDP GDP_CAPITA
## 1    63396.20  261161.91  26822.58  287984.49   18427   15628.40
## 2    17842.64   62429.36   2312.65   64742.01    2617   24739.02
## 3   336141.88 1254241.30 170264.52 1424505.83   98990   14390.40
## 4   364811.85 1793302.18 206244.13 1999546.31  110543   18088.40
## 5    48142.87  111888.63   3269.82  115158.46   15764    7305.15
## 6   188941.14  771237.62     17.18     788.42   54080   14578.70
## 7   188055.40  543990.18  36737.25  580727.43   61715    9409.83
## 8    30193.24  238110.37  12221.86  250332.23    4731   52913.18
## 9   155449.90     345.18     26.52  371701.24   53358    6966.18
## 10   85106.34  222349.24   5261.66  227610.90   14120   16119.75
##                                                                GVA_MAIN
## 1                                                       Demais serviços
## 2  Administração, defesa, educação e saúde públicas e seguridade social
## 3                                                       Demais serviços
## 4                                                       Demais serviços
## 5  Administração, defesa, educação e saúde públicas e seguridade social
## 6           Agricultura, inclusive apoio à agricultura e a pós colheita
## 7  Administração, defesa, educação e saúde públicas e seguridade social
## 8           Agricultura, inclusive apoio à agricultura e a pós colheita
## 9  Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
##    MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1      37513019      288      5      9     26      0      2      7    117
## 2      19506956       69      2      0      4      0      0      2     35
## 3     119645700      841      1      0    130      0      2     26    434
## 4     214456331     1334     47      1    113      0      5     75    657
## 5      27275310       96      2      0      4      0      0      1     57
## 6     106368816      162      8      3      6      0      0      4     99
## 7     101483437      638     41      1     38      8      0      6    363
## 8      22028721      168      8      0      8      0      1      2     86
## 9      85042995      365      2      0     26      0      0      6    255
## 10     22507579      107      2      0     15      0      0      0     56
##    COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1      12     57      2      1      0      7     15      3     11      5      1
## 2       8      3      1      1      0      4      0      2      1      3      0
## 3      27     36     14      3      4     18     30      2     47     20      6
## 4      61     80     18      5     21     38     72      3     21     52     12
## 5       2      3      1      2      0      1      2      3      3      4      2
## 6       3      2      1      0      0      1      3      3     12      0      1
## 7       7     28      3      0      6      9     14      2     71     17      6
## 8       8      9      0      1      0      2     13      3      6      3      5
## 9       1     15      5      1      4      7      5      2      5      2      4
## 10      1      5      1      0      0      1      1      3      8      6      0
##    COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1       8      0      0      1   34           1           1       1       1
## 2       3      0      0     NA   NA           0           1       0       1
## 3      41      0      0     NA   NA           2           3       2       3
## 4      53      0      0      2   56           2           3       2       3
## 5       9      0      0     NA   NA           0           1       0       1
## 6      16      0      0     NA   NA           1           1       1       1
## 7      18      0      0     NA   NA           1           3       1       3
## 8      13      0      0     NA   NA           0           1       0       1
## 9      25      0      0      1   22           1           3       1       3
## 10      8      0      0      1   27           0           1       0       1
##    Pr_Assets  Pu_Assets  Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1   33724584   67091904  2838        1426               0 <NA>  NA       NA
## 2          0   42909056   976         345               2 <NA>  NA       NA
## 3  155632735  460626103 14579       10122               0 <NA>  NA       NA
## 4  125525251 1494221307  9935       24208              17 <NA>  NA       NA
## 5          0   50185684   834        1444               0 <NA>  NA       NA
## 6   22821995   37523391   652        3342               0 <NA>  NA       NA
## 7   57802114  529074069  3371       10448               0 <NA>  NA       NA
## 8          0   13450411  2046         591               5 <NA>  NA       NA
## 9   33077714  262320355  3158       11056               0 <NA>  NA       NA
## 10         0   86310524  1223        3343               0 <NA>  NA       NA
##    POST_OFFICES LOG_GDP_CAPITA    PAY_TV_p FIXED_PHONES_p     Cars_p
## 1             3       9.656845 0.012318880    0.039073099 0.15401313
## 2             1      10.116137 0.041650745    0.099350401 0.37294612
## 3             1       9.574317 0.014324679    0.047085564 0.14727750
## 4             1       9.803026 0.011081661    0.023683092 0.08987453
## 5             1       8.896335 0.027023598    0.018840396 0.05290535
## 6             1       9.587317 0.017825444    0.003346893 0.01205621
## 7             1       9.149510 0.049129061    0.007761484 0.05462205
## 8             2      10.876408 0.036144578    0.062988797 0.43246671
## 9            10       8.848822 0.007983807    0.011207317 0.05918513
## 10            1       9.687801 0.013031161    0.026133144 0.08661473
##    Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1     0.07738644  2.324849e-03   0.9078146199    7.499787269       2035.764
## 2     0.13183034  9.551681e+00   1.3675468093    6.118112342       7453.938
## 3     0.10225275  7.879584e-05   3.8818273563    0.005314072       1208.665
## 4     0.21899170  1.446078e+00   4.4218016518    0.007054630       1940.026
## 5     0.09160112  1.470224e+00   0.0004453184    2.128015732       1730.228
## 6     0.06179734  8.159795e+00   0.8123953402    1.795125925       1966.879
## 7     0.16929434  1.204778e+00   1.5251376489    0.003037511       1644.389
## 8     0.12492074  2.755930e+01   2.2474360600   14.141086451       4656.250
## 9     0.20720417  6.570272e-01   0.3187656959    2.579968327       1593.819
## 10    0.23675637  6.454950e+00   0.0009929178    2.271932720       1594.021
##    pop_density   tax_to_gdp      MLR_RES                           geom
## 1    17.631299 9.313897e-02 -0.001283972 MULTIPOLYGON (((-48.84178 -...
## 2    11.034744 3.572101e-02 -0.241843560 MULTIPOLYGON (((-51.03724 -...
## 3   784.452017 1.195253e-01  0.240172219 POLYGON ((-35.10602 -7.8251...
## 4    19.037999 1.031455e-01  0.316636847 MULTIPOLYGON (((-47.00353 -...
## 5    86.863566 2.839409e-02 -0.164691960 MULTIPOLYGON (((-37.98092 -...
## 6    12.450645 2.179042e-02  0.400526860 MULTIPOLYGON (((-48.30974 -...
## 7    72.994902 6.326075e-02  0.093778846 MULTIPOLYGON (((-40.33112 -...
## 8     3.049622 4.882256e-02 -0.013513261 POLYGON ((-54.1094 -31.4331...
## 9    23.553976 7.134762e-05 -0.207225765 MULTIPOLYGON (((-39.15667 -...
## 10    7.809950 2.311691e-02  0.364934669 POLYGON ((-67.13424 -9.6762...

Next, we will convert brazil.point.res.sf simple feature object into a SpatialPointDataFrame by using as_Spatial()

brazil.point.sp <- as_Spatial(brazil.point.res.sf)
brazil.point.sp
## class       : SpatialPointsDataFrame 
## features    : 2522 
## extent      : -72.9165, -34.82395, -33.68757, 2.816682  (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs         : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs 
## variables   : 91
## names       :      CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_POP, IBGE_1, IBGE_1.4, IBGE_5.9, IBGE_10.14, Active_pop, ... 
## min values  : ABADIÂNIA,    AC,       0,         1641,              1641,                 0,     595,           187,             3,      528,      2,        9,       30,         34,        358, ... 
## max values  :   ZÉ DOCA,    TO,       1,     11253503,          11133776,            119727, 3576148,       3548433,         33809, 10463636, 129464,   514794,   684443,     783702,    7058221, ...

6.5 Series of Choropleth maps

6.5.1 GDP_CAPITA
brazil.polygon.res.sf <- st_make_valid(brazil.polygon.res.sf)
qtm(brazil.polygon.res.sf, "GDP_CAPITA", borders=NULL, scale=0.7)+
  tm_legend(main.title="GDP_CAPITA", 
            main.title.position="centre")

6.5.2 Residuals
brazil.polygon.res.sf <- st_make_valid(brazil.polygon.res.sf)
qtm(brazil.polygon.res.sf, "MLR_RES", borders=NULL,scale = 0.7) + tm_legend(
          main.title = "Residuals",
          main.title.position = "centre")
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

6.5.3 Plot point symbol map of residuals
tm_shape(mun)+
  tm_polygons()+
tm_shape(brazil.point.res.sf)+
  tm_dots(col="MLR_RES", alpha=0.6, style="quantile")
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.


The figure above reveal that there is sign of spatial autocorrelation.
We will now perform Moran’s I test, to further confirm our observation
The hypothesis test is as follow:
H0: Residual for regression model is randomly distributed
H1: Residual for regression model is not randomly distributed
Confidence interval: 0.95

6.6 Determine the upper limit for distance band

The code chunk below will tell us the upper limit for distance band

coords <- coordinates(brazil.point.sp)
k <- knn2nb(knearneigh(coords))
kdists <- unlist(nbdists(k, coords, longlat=FALSE))
summary(kdists)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.005484 0.105464 0.161013 0.220505 0.252019 3.093780


The output above tells us that the maximum distance is 3.093780. Using this as the upper threshold, this gives us certainty that all units will have at least one neighbour.

Compute the distance-based weight matrix

nb <- dnearneigh(coords, 0, 3.10, longlat=FALSE)
nb_lw <- nb2listw(nb, style='B')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 2522 
## Number of nonzero links: 640004 
## Percentage nonzero weights: 10.06219 
## Average number of links: 253.7684 
## Link number distribution:
## 
##   1   2   3   4   5   6   7   8  10  12  13  14  15  16  17  18  19  20  21  22 
##   1   4   3   6   5   4   2   7   4   2   6  10   3   2   1   2   4   4   3   7 
##  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42 
##  10   7  10   6  10  13   9   8  11   7  11  11   1   3   3   2   3   9   5   4 
##  43  44  45  46  47  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 
##   1   1   2   3   2   3   8   2   2   2   1   2   1   3   7   3   3   2   2   3 
##  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83 
##   1   5   6   2   6   4   8   5   6   4   4   4   6   6   9   6   3   5   2   2 
##  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 
##   6   6   5   4   3   8   4   5   6   3  13   3  10   6   7   4   2   9   6   9 
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 
##   6   8   4  10   8   5   6   4   8   6   8   5   4   3  10   6   5   4   3   5 
## 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 
##   4   3   1   5   4   7   4   1   8   5   8   5   7   5   4   1   4   7  10   3 
## 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 
##   1   3   9   4   2   9   6  10   7   4   7   8   2   9   3  11   9  11   4   4 
## 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 
##   7   7   3   5   7   7   1   3   9   7   4   1   4   1   4   6   3   8   7   2 
## 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 
##   6   5   9   5   3   5   7   4   5   6  10  11   3   4   9   4   1   6   2   8 
## 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 
##   5   5   2   5   5   2   4   5   7   9   3   7   2   5   7   2   8   6  10   5 
## 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244 
##   3   8   3   2   3   6   6   6   4   7   4   5   2   4   6   4   4   6   1   5 
## 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 
##   4   2   8   4   4   6   5   4   2   4   4   4   5   5   8   2   4   2   5   1 
## 265 266 267 268 269 270 271 272 273 275 276 277 278 279 280 281 282 283 284 285 
##   2   2   3   4   2   3   5   2   1   7   7   5   4   3   2   2   4   4   2   5 
## 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 
##   6   4   4   5   6   6   7   4   8   3   7   4   6   8   6   2   9   3   6   6 
## 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 
##  11   4   5   3   2   6   5   1   3   6   6   4   5  11   4   8   4  10   6   4 
## 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 
##   4   7   6   3   4   4   6   3   6   4   5   4   4   2   8   6   7   6   5   7 
## 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 
##   2   6   7   8   7   6   6  13   8   7  14   5  12  10   5   6   6   4   5   4 
## 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 
##   5   4   3   7   6   4   7   4  10   3   5   9   2   9  13   7   5   3   5   2 
## 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 
##   1   4   6   5   2   3   8   7  11   5   4   5   8   1   8   8   7   7   8   6 
## 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 
##   3   6   6   5   6   2   7   5   5   5   6   4   6   5   3   2   5   3   6   5 
## 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 
##   1   3   7   6   4   6   4   5   5   6   2   3   4   4   6   3   6   6   3   5 
## 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 
##   1   3   1   5   7   3   9   7   4   3   4   8   2   3   6   3   3   3   2   9 
## 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 
##   8   4   7   2   3   4   3   2   4   1   4   6   4   4   2   4   4   7   6   3 
## 486 487 488 489 490 491 492 493 494 496 497 499 500 501 502 503 504 505 506 507 
##   2   4   2   3   5   3   2   2   2   3   5   2   3   2   2   1   3   2   4   1 
## 508 509 511 512 513 514 515 516 517 518 519 520 521 522 524 525 526 527 528 
##   3   2   2   4   2   2   4   2   5   5   3   4   2   3   3   1   1   1   1 
## 1 least connected region:
## 2119 with 1 link
## 1 most connected region:
## 28 with 528 links
## 
## Weights style: B 
## Weights constants summary:
##      n      nn     S0      S1        S2
## B 2522 6360484 640004 1280008 859921536

Computing Global Moran’s I test

lm.morantest(brazil_lglm2, nb_lw)
## 
##  Global Moran I for regression residuals
## 
## data:  
## model: lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
## weights: nb_lw
## 
## Moran I statistic standard deviate = 14.216, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I      Expectation         Variance 
##     2.007820e-02    -8.180248e-04     2.160505e-06


Since pvalue < 2.2e-16, which is less than alpha value (0.05), reject null hypothesis. There is sufficient evidence to conclude that the residuals are not randomly distributed. In fact, from analysis previously, we can now safely conclude that the residuals are normally distributed.
Since the observed Moran I = 0.2007820, which is greater than 0, we can infer that the residuals resemble cluster distribution.

7. Building Models using GWmodel

There are 2 different approaches, and 2 different kernels to use. ### 7.1 CV approach and Gaussian Kernel There are two possible approaches that can be used, CV (cross-validation) and AIC corrected approach. There are two possible kernels that can be used, Gaussian and bi-square kernel We will be testing out all 4 different methods ##### 7.1.1 Computing Fixed Bandwidth GWR Model In the code chunk below, we use bw.gwr()of GWModel package to determine the optimal fixed bandwidth to use in the model
We will use gw.dist() to calculate the dMat value

dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 CV score: 178.8391 
## Fixed bandwidth: 14.55489 CV score: 177.7677 
## Fixed bandwidth: 8.998324 CV score: 176.9538 
## Fixed bandwidth: 5.56418 CV score: 181.3198 
## Fixed bandwidth: 11.12074 CV score: 177.1448 
## Fixed bandwidth: 7.686598 CV score: 177.1898 
## Fixed bandwidth: 9.809016 CV score: 176.979 
## Fixed bandwidth: 8.497289 CV score: 176.9898 
## Fixed bandwidth: 9.307981 CV score: 176.9533 
## Fixed bandwidth: 9.499359 CV score: 176.9597 
## Fixed bandwidth: 9.189703 CV score: 176.9518 
## Fixed bandwidth: 9.116603 CV score: 176.9519


The result shows that the recommended bandwidth is 9.116603m
Constructing the fixed bandwidth gwr model

gwr.fixed <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.fixed
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:14:14 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.fixed, kernel = "gaussian", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 9.189703 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.   Max.
##    Intercept             5.2353e+00  6.3630e+00  6.5508e+00  6.8287e+00 7.3867
##    IBGE_CROP_PRODUCTION  5.3055e-08  2.9313e-07  3.1528e-07  3.3611e-07 0.0000
##    IDHM_Longevidade      3.0786e-01  5.9856e-01  6.8837e-01  7.7004e-01 1.5507
##    IDHM_Renda            2.0687e+00  2.6508e+00  3.1007e+00  3.7208e+00 4.3203
##    GVA_AGROPEC_p         2.0202e-02  2.1351e-02  2.2135e-02  2.3136e-02 0.0295
##    GVA_INDUSTRY_p        1.6383e-02  1.7275e-02  1.7884e-02  1.8100e-02 0.0247
##    GVA_SERVICES_p        9.6425e-03  1.4560e-02  1.4730e-02  1.5160e-02 0.0187
##    MUN_EXPENDIT_p        7.2041e-05  7.4460e-05  7.7082e-05  9.0463e-05 0.0001
##    tax_to_gdp            5.8077e-04  7.5617e-04  7.7445e-04  8.9061e-04 0.0012
##    Cars_p               -2.8773e-01  3.3830e-01  6.6848e-01  7.9079e-01 0.9821
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 26.5562 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2495.444 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 320.7882 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 298.2736 
##    Residual sum of squares: 164.8792 
##    R-square value:  0.847111 
##    Adjusted R-square value:  0.8454833 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:14:25


The adjusted R2 value is 0.8454833, and p-value < 2.22e-16

7.1.2 Computing Adaptive Bandwidth GWR Model

We will now set adaptive = TRUE since we are calculating the adaptive bandwidth

dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.adaptive <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach = "CV", kernel="gaussian", adaptive = TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 1566 CV score: 177.7881 
## Adaptive bandwidth: 976 CV score: 177.3087 
## Adaptive bandwidth: 610 CV score: 179.5891 
## Adaptive bandwidth: 1201 CV score: 177.1208 
## Adaptive bandwidth: 1341 CV score: 177.4033 
## Adaptive bandwidth: 1115 CV score: 177.1015 
## Adaptive bandwidth: 1061 CV score: 177.1428 
## Adaptive bandwidth: 1147 CV score: 177.1051 
## Adaptive bandwidth: 1093 CV score: 177.0983 
## Adaptive bandwidth: 1082 CV score: 177.1206 
## Adaptive bandwidth: 1102 CV score: 177.0993 
## Adaptive bandwidth: 1089 CV score: 177.099 
## Adaptive bandwidth: 1097 CV score: 177.1006 
## Adaptive bandwidth: 1092 CV score: 177.0962 
## Adaptive bandwidth: 1090 CV score: 177.0996 
## Adaptive bandwidth: 1092 CV score: 177.0962


The result shows that 1092 is the recommended data points to be used.
Constructing the adaptive bandwidth gwr model

gwr.adaptive <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output

Display the model output

gwr.adaptive
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:16:11 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.adaptive, kernel = "gaussian", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 1092 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                               Min.    1st Qu.     Median    3rd Qu.   Max.
##    Intercept            6.4622e+00 6.4623e+00 6.4624e+00 6.4624e+00 6.4624
##    IBGE_CROP_PRODUCTION 2.7576e-07 2.7579e-07 2.7579e-07 2.7579e-07 0.0000
##    IDHM_Longevidade     6.6324e-01 6.6325e-01 6.6326e-01 6.6326e-01 0.6633
##    IDHM_Renda           3.3624e+00 3.3624e+00 3.3624e+00 3.3625e+00 3.3626
##    GVA_AGROPEC_p        2.2231e-02 2.2232e-02 2.2232e-02 2.2232e-02 0.0222
##    GVA_INDUSTRY_p       1.7787e-02 1.7787e-02 1.7787e-02 1.7787e-02 0.0178
##    GVA_SERVICES_p       1.4594e-02 1.4594e-02 1.4594e-02 1.4594e-02 0.0146
##    MUN_EXPENDIT_p       8.1886e-05 8.1887e-05 8.1887e-05 8.1888e-05 0.0001
##    tax_to_gdp           8.4337e-04 8.4338e-04 8.4339e-04 8.4340e-04 0.0008
##    Cars_p               5.2077e-01 5.2082e-01 5.2085e-01 5.2086e-01 0.5209
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 10.00117 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.999 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 435.4654 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 423.3596 
##    Residual sum of squares: 173.9611 
##    R-square value:  0.8386896 
##    Adjusted R-square value:  0.8380471 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:16:22


The adjusted R2 is 0.8380471, and pvalue < 2.2e-16

7.2 AIC approach and gaussian kernel

7.2.1 Computing Fixed Bandwidth GWR Model
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed2 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach = "AIC", kernel="gaussian", adaptive = FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 AICc value: 412.4282 
## Fixed bandwidth: 14.55489 AICc value: 381.0282 
## Fixed bandwidth: 8.998324 AICc value: 316.9127 
## Fixed bandwidth: 5.56418 AICc value: 188.2232 
## Fixed bandwidth: 3.441763 AICc value: 69.8468 
## Fixed bandwidth: 2.130036 AICc value: -3.991837 
## Fixed bandwidth: 1.319345 AICc value: -0.5756209 
## Fixed bandwidth: 2.631071 AICc value: 23.68389 
## Fixed bandwidth: 1.82038 AICc value: -16.35188 
## Fixed bandwidth: 1.629001 AICc value: -18.72857 
## Fixed bandwidth: 1.510723 AICc value: -16.39338 
## Fixed bandwidth: 1.702101 AICc value: -18.5296 
## Fixed bandwidth: 1.583823 AICc value: -18.28341 
## Fixed bandwidth: 1.656923 AICc value: -18.77527 
## Fixed bandwidth: 1.67418 AICc value: -18.72591 
## Fixed bandwidth: 1.646258 AICc value: -18.77651 
## Fixed bandwidth: 1.639667 AICc value: -18.76563 
## Fixed bandwidth: 1.650332 AICc value: -18.77875 
## Fixed bandwidth: 1.652849 AICc value: -18.77845 
## Fixed bandwidth: 1.648776 AICc value: -18.77829 
## Fixed bandwidth: 1.651293 AICc value: -18.77878


The results shows that the recommended bandwidth is 1.651293m
Constructing the fixed bandwidth gwr model

gwr.fixed2 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed2, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output

Display the model output

gwr.fixed2
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:18:32 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.fixed2, kernel = "gaussian", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 1.651293 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.    Max.
##    Intercept             3.5215e+00  6.3136e+00  6.8398e+00  7.7959e+00 11.3203
##    IBGE_CROP_PRODUCTION -7.5668e-07  2.8135e-07  3.8394e-07  5.6728e-07  0.0000
##    IDHM_Longevidade     -3.9407e+00  2.5051e-01  7.1351e-01  1.4550e+00  6.9458
##    IDHM_Renda           -1.7380e+00  1.3014e+00  2.1133e+00  3.0724e+00  5.3516
##    GVA_AGROPEC_p        -3.2826e-02  1.7117e-02  2.2537e-02  4.0983e-02  0.1108
##    GVA_INDUSTRY_p       -3.2572e-01  1.5245e-02  1.9853e-02  2.6880e-02  0.1789
##    GVA_SERVICES_p       -6.9099e-02  1.3607e-02  1.5936e-02  1.9998e-02  0.3289
##    MUN_EXPENDIT_p       -2.3390e-04  5.9268e-05  7.3281e-05  9.6047e-05  0.0004
##    tax_to_gdp           -1.4518e-02  2.9002e-04  7.5854e-04  1.0303e-03  0.0040
##    Cars_p               -1.4322e+00  5.3910e-01  9.4238e-01  1.5196e+00  8.6382
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 350.0251 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2171.975 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): -18.77878 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -349.4203 
##    Residual sum of squares: 115.7155 
##    R-square value:  0.8926995 
##    Adjusted R-square value:  0.8753995 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:18:42


The adjusted r2 is 0.8753995, and pvalue < 2.2e-16

7.2.2 Computing Adaptive Bandwidth GWR Model
dmat <- gw.dist(dp.locat = coordinates(brazil.point.sp))
bw.adaptive2 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="gaussian", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 1566 AICc value: 375.3037 
## Adaptive bandwidth (number of nearest neighbours): 976 AICc value: 334.4534 
## Adaptive bandwidth (number of nearest neighbours): 610 AICc value: 296.5408 
## Adaptive bandwidth (number of nearest neighbours): 385 AICc value: 234.011 
## Adaptive bandwidth (number of nearest neighbours): 244 AICc value: 146.8711 
## Adaptive bandwidth (number of nearest neighbours): 159 AICc value: 52.35883 
## Adaptive bandwidth (number of nearest neighbours): 104 AICc value: -58.53556 
## Adaptive bandwidth (number of nearest neighbours): 72 AICc value: -131.4551 
## Adaptive bandwidth (number of nearest neighbours): 50 AICc value: -200.397 
## Adaptive bandwidth (number of nearest neighbours): 39 AICc value: -237.422 
## Adaptive bandwidth (number of nearest neighbours): 29 AICc value: -265.716 
## Adaptive bandwidth (number of nearest neighbours): 26 AICc value: -266.6002 
## Adaptive bandwidth (number of nearest neighbours): 21 AICc value: -275.1402 
## Adaptive bandwidth (number of nearest neighbours): 21 AICc value: -275.1402


The result shows that 21 is the recommended data points to be used
Constructing the adaptive bandwidth gwr model

gwr.adaptive2 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive2, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.adaptive2
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:20:50 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.adaptive2, kernel = "gaussian", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: gaussian 
##    Fixed bandwidth: 21 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                               Min.    1st Qu.     Median    3rd Qu.   Max.
##    Intercept            6.1794e+00 6.4128e+00 6.4903e+00 6.5236e+00 6.6368
##    IBGE_CROP_PRODUCTION 2.2399e-07 2.7965e-07 2.8483e-07 2.8985e-07 0.0000
##    IDHM_Longevidade     6.0373e-01 6.5560e-01 6.8403e-01 6.9853e-01 0.8447
##    IDHM_Renda           3.0376e+00 3.2130e+00 3.2873e+00 3.4315e+00 3.6493
##    GVA_AGROPEC_p        2.1810e-02 2.2031e-02 2.2160e-02 2.2375e-02 0.0228
##    GVA_INDUSTRY_p       1.7480e-02 1.7705e-02 1.7791e-02 1.7958e-02 0.0183
##    GVA_SERVICES_p       1.3994e-02 1.4559e-02 1.4602e-02 1.4631e-02 0.0147
##    MUN_EXPENDIT_p       7.7340e-05 7.9584e-05 8.0709e-05 8.3413e-05 0.0001
##    tax_to_gdp           7.8632e-04 8.0696e-04 8.2160e-04 8.5048e-04 0.0009
##    Cars_p               3.1531e-01 4.8545e-01 5.6320e-01 5.9127e-01 0.6732
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 13.07303 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2508.927 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 407.0152 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 393.227 
##    Residual sum of squares: 171.7826 
##    R-square value:  0.8407097 
##    Adjusted R-square value:  0.8398794 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:21:01


The adjusted r2 is 0.8398794, pvalue < 2.2e-16

7.3 CV approach and Bi-square kernel

7.3.1 Computing Fixed Bandwidth GWR Model
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed3 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="bisquare", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 CV score: 177.2407 
## Fixed bandwidth: 14.55489 CV score: 182.7619 
## Fixed bandwidth: 29.10215 CV score: 177.4802 
## Fixed bandwidth: 20.11145 CV score: 176.3287 
## Fixed bandwidth: 17.98903 CV score: 176.0173 
## Fixed bandwidth: 16.6773 CV score: 176.8538 
## Fixed bandwidth: 18.79972 CV score: 175.9784 
## Fixed bandwidth: 19.30076 CV score: 176.0674 
## Fixed bandwidth: 18.49006 CV score: 175.9627 
## Fixed bandwidth: 18.29869 CV score: 175.9709 
## Fixed bandwidth: 18.60834 CV score: 175.9648 
## Fixed bandwidth: 18.41696 CV score: 175.9641 
## Fixed bandwidth: 18.53524 CV score: 175.9629 
## Fixed bandwidth: 18.46214 CV score: 175.963 
## Fixed bandwidth: 18.50732 CV score: 175.9627


The result shows that the recommended bandwidth is 18.50732m
Constructing the fixed bandwidth gwr model

gwr.fixed3 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed3, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.fixed3
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:21:48 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.fixed3, kernel = "bisquare", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Fixed bandwidth: 18.50732 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.   Max.
##    Intercept             4.8252e+00  6.3883e+00  6.5991e+00  7.0581e+00 7.7033
##    IBGE_CROP_PRODUCTION -3.3349e-07  2.8214e-07  3.2965e-07  3.6047e-07 0.0000
##    IDHM_Longevidade      5.5489e-02  5.3515e-01  6.4530e-01  7.8001e-01 2.2054
##    IDHM_Renda            1.6200e+00  2.4049e+00  2.9986e+00  3.5334e+00 4.3333
##    GVA_AGROPEC_p         1.8017e-02  2.0991e-02  2.2101e-02  2.4050e-02 0.0524
##    GVA_INDUSTRY_p        1.0660e-02  1.7002e-02  1.7903e-02  1.8188e-02 0.0524
##    GVA_SERVICES_p       -4.8527e-04  1.4595e-02  1.4910e-02  1.5538e-02 0.0316
##    MUN_EXPENDIT_p        4.1144e-05  7.3532e-05  7.5580e-05  9.5306e-05 0.0002
##    tax_to_gdp            2.1798e-04  7.5500e-04  7.7579e-04  9.2807e-04 0.0016
##    Cars_p               -5.7055e-01  4.8067e-01  7.4510e-01  8.9274e-01 2.6361
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 34.21613 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2487.784 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 262.811 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 233.6465 
##    Residual sum of squares: 160.3009 
##    R-square value:  0.8513564 
##    Adjusted R-square value:  0.8493112 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:21:57


The adjusted R2 = 0.8493112, pvalue= 2.2e-16

7.3.2 Computing Adaptive Bandwidth GWR Model
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.adaptive3 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="bisquare", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 1566 CV score: 183.7409 
## Adaptive bandwidth: 976 CV score: 204.2585 
## Adaptive bandwidth: 1932 CV score: 176.7985 
## Adaptive bandwidth: 2157 CV score: 176.5077 
## Adaptive bandwidth: 2297 CV score: 176.7875 
## Adaptive bandwidth: 2071 CV score: 176.5366 
## Adaptive bandwidth: 2210 CV score: 176.601 
## Adaptive bandwidth: 2123 CV score: 176.4937 
## Adaptive bandwidth: 2103 CV score: 176.5012 
## Adaptive bandwidth: 2136 CV score: 176.4752 
## Adaptive bandwidth: 2143 CV score: 176.4837 
## Adaptive bandwidth: 2130 CV score: 176.4768 
## Adaptive bandwidth: 2138 CV score: 176.4798 
## Adaptive bandwidth: 2133 CV score: 176.4711 
## Adaptive bandwidth: 2133 CV score: 176.4711


The result shows that 2133 is the recommended data points to be used
Constructing the adaptive bandwidth gwr model

gwr.adaptive3 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive3, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.adaptive3
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:23:22 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.adaptive3, kernel = "bisquare", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Fixed bandwidth: 2133 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                               Min.    1st Qu.     Median    3rd Qu.   Max.
##    Intercept            6.4622e+00 6.4623e+00 6.4624e+00 6.4624e+00 6.4624
##    IBGE_CROP_PRODUCTION 2.7576e-07 2.7579e-07 2.7579e-07 2.7579e-07 0.0000
##    IDHM_Longevidade     6.6324e-01 6.6325e-01 6.6326e-01 6.6327e-01 0.6633
##    IDHM_Renda           3.3623e+00 3.3624e+00 3.3624e+00 3.3625e+00 3.3626
##    GVA_AGROPEC_p        2.2231e-02 2.2232e-02 2.2232e-02 2.2232e-02 0.0222
##    GVA_INDUSTRY_p       1.7787e-02 1.7787e-02 1.7787e-02 1.7787e-02 0.0178
##    GVA_SERVICES_p       1.4594e-02 1.4594e-02 1.4594e-02 1.4594e-02 0.0146
##    MUN_EXPENDIT_p       8.1885e-05 8.1887e-05 8.1887e-05 8.1888e-05 0.0001
##    tax_to_gdp           8.4337e-04 8.4338e-04 8.4339e-04 8.4340e-04 0.0008
##    Cars_p               5.2076e-01 5.2082e-01 5.2086e-01 5.2087e-01 0.5209
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 10.00123 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.999 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 435.4649 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 423.359 
##    Residual sum of squares: 173.9611 
##    R-square value:  0.8386896 
##    Adjusted R-square value:  0.8380471 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:23:31


Adjusted R2 = 0.8380471, value < 2.2e-16

7.4 AIC method and Bi-square kernel

7.4.1 Computing Fixed Bandwidth GWR Model
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed4 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="bisquare", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 AICc value: 337.9902 
## Fixed bandwidth: 14.55489 AICc value: 194.3918 
## Fixed bandwidth: 8.998324 AICc value: 96.12548 
## Fixed bandwidth: 5.56418 AICc value: 11773.8 
## Fixed bandwidth: 11.12074 AICc value: 126.2947 
## Fixed bandwidth: 7.686598 AICc value: 79.04865 
## Fixed bandwidth: 6.875907 AICc value: 62.14869 
## Fixed bandwidth: 6.374872 AICc value: 2233.271 
## Fixed bandwidth: 7.185563 AICc value: 69.59574 
## Fixed bandwidth: 6.684528 AICc value: 25167.85 
## Fixed bandwidth: 6.994185 AICc value: 65.04558 
## Fixed bandwidth: 6.802807 AICc value: 4120.464 
## Fixed bandwidth: 6.921085 AICc value: 63.21221 
## Fixed bandwidth: 6.847985 AICc value: 61.4571 
## Fixed bandwidth: 6.830728 AICc value: 130.1504 
## Fixed bandwidth: 6.85865 AICc value: 61.72205 
## Fixed bandwidth: 6.841393 AICc value: 61.29374 
## Fixed bandwidth: 6.83732 AICc value: 2014.162 
## Fixed bandwidth: 6.843911 AICc value: 61.3561 
## Fixed bandwidth: 6.839837 AICc value: 61.25522 
## Fixed bandwidth: 6.838876 AICc value: 61.23142 
## Fixed bandwidth: 6.838281 AICc value: 61.21671 
## Fixed bandwidth: 6.837914 AICc value: 61.20763 
## Fixed bandwidth: 6.837687 AICc value: 61.20209 
## Fixed bandwidth: 6.837547 AICc value: 61.19847 
## Fixed bandwidth: 6.83746 AICc value: 63.5259 
## Fixed bandwidth: 6.8376 AICc value: 61.19983 
## Fixed bandwidth: 6.837514 AICc value: 61.20102


The result shows that the recommended bandwidth is 6.837514m
Constructing the fixed bandwidth gwr model

gwr.fixed4 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed4, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.fixed4
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:26:03 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = bw.fixed4, kernel = "bisquare", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Fixed bandwidth: 6.837547 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                Min.     1st Qu.      Median     3rd Qu.    Max.
##    Intercept             3.7228e+00  6.3174e+00  6.7982e+00  7.6754e+00 10.5289
##    IBGE_CROP_PRODUCTION -3.0273e-07  2.9595e-07  3.5234e-07  4.6128e-07  0.0000
##    IDHM_Longevidade     -3.4846e+00  2.7581e-01  7.3632e-01  1.2301e+00  4.7401
##    IDHM_Renda           -9.3779e-01  1.5373e+00  2.2102e+00  3.0612e+00  5.0615
##    GVA_AGROPEC_p        -7.6501e-02  1.8035e-02  2.3185e-02  3.4198e-02  0.0932
##    GVA_INDUSTRY_p       -6.2565e-01  1.5045e-02  1.9287e-02  2.0152e-02  0.3404
##    GVA_SERVICES_p       -6.4762e-02  1.4122e-02  1.6024e-02  1.8955e-02  1.0405
##    MUN_EXPENDIT_p       -1.5178e-04  6.3667e-05  7.6561e-05  9.2970e-05  0.0004
##    tax_to_gdp           -7.6089e-03  5.5813e-04  7.9482e-04  1.0216e-03  0.2455
##    Cars_p               -1.1388e+01  5.4224e-01  1.0497e+00  1.3031e+00  6.2200
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 163.8131 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2358.187 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 61.19893 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -84.40912 
##    Residual sum of squares: 135.6657 
##    R-square value:  0.8742001 
##    Adjusted R-square value:  0.8654576 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:26:13


The adjusted r2 is 0.8654576, pvalue < 1.2e-16

7.4.2 Computing Adaptive Bandwidth GWR Model
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
gw.adaptive4 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="bisquare", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
##           -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 1566 AICc value: 212.9455 
## Adaptive bandwidth (number of nearest neighbours): 976 AICc value: 126.6319 
## Adaptive bandwidth (number of nearest neighbours): 610 AICc value: 25.90335 
## Adaptive bandwidth (number of nearest neighbours): 385 AICc value: -68.74051 
## Adaptive bandwidth (number of nearest neighbours): 244 AICc value: -169.9868 
## Adaptive bandwidth (number of nearest neighbours): 159 AICc value: -198.9041 
## Adaptive bandwidth (number of nearest neighbours): 104 AICc value: -154.3044 
## Adaptive bandwidth (number of nearest neighbours): 190 AICc value: -191.1021 
## Adaptive bandwidth (number of nearest neighbours): 136 AICc value: -198.2876 
## Adaptive bandwidth (number of nearest neighbours): 169 AICc value: -198.4536 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602 
## Adaptive bandwidth (number of nearest neighbours): 154 AICc value: -199.307 
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162 
## Adaptive bandwidth (number of nearest neighbours): 152 AICc value: -199.4733 
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643 
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643 
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602 
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602 
## Adaptive bandwidth (number of nearest neighbours): 145 AICc value: -199.6369 
## Adaptive bandwidth (number of nearest neighbours): 151 AICc value: -200.2273 
## Adaptive bandwidth (number of nearest neighbours): 144 AICc value: -199.8091 
## Adaptive bandwidth (number of nearest neighbours): 144 AICc value: -199.8091 
## Adaptive bandwidth (number of nearest neighbours): 143 AICc value: -200.5664 
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836 
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836 
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162 
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379 
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643 
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643


The result shows that 147 is the new recommended data points to be used
Constructing the adaptive bandwidth gwr model

gwr.adaptive4 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION  + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=gw.adaptive4, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output


Display the model output

gwr.adaptive4
##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2020-05-31 17:31:03 
##    Call:
##    gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + 
##     IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + 
##     GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp, 
##     bw = gw.adaptive4, kernel = "bisquare", longlat = FALSE)
## 
##    Dependent (y) variable:  GDP_CAPITA
##    Independent variables:  IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
##    Number of data points: 2522
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.41555 -0.15453 -0.02065  0.13162  2.01229 
## 
##    Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)          6.462e+00  1.582e-01  40.848  < 2e-16 ***
##    IBGE_CROP_PRODUCTION 2.758e-07  3.201e-08   8.616  < 2e-16 ***
##    IDHM_Longevidade     6.633e-01  2.357e-01   2.813  0.00494 ** 
##    IDHM_Renda           3.362e+00  1.736e-01  19.370  < 2e-16 ***
##    GVA_AGROPEC_p        2.223e-02  1.065e-03  20.869  < 2e-16 ***
##    GVA_INDUSTRY_p       1.779e-02  6.141e-04  28.963  < 2e-16 ***
##    GVA_SERVICES_p       1.459e-02  8.136e-04  17.938  < 2e-16 ***
##    MUN_EXPENDIT_p       8.189e-05  6.189e-06  13.231  < 2e-16 ***
##    tax_to_gdp           8.434e-04  1.711e-04   4.928 8.84e-07 ***
##    Cars_p               5.208e-01  8.022e-02   6.492 1.01e-10 ***
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.2632 on 2512 degrees of freedom
##    Multiple R-squared: 0.8387
##    Adjusted R-squared: 0.8381 
##    F-statistic:  1451 on 9 and 2512 DF,  p-value: < 2.2e-16 
##    ***Extra Diagnostic information
##    Residual sum of squares: 173.962
##    Sigma(hat): 0.2627404
##    AIC:  435.3718
##    AICc:  435.477
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Fixed bandwidth: 143 
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                               Min.    1st Qu.     Median    3rd Qu.   Max.
##    Intercept            6.4390e+00 6.4571e+00 6.4649e+00 6.4677e+00 6.4766
##    IBGE_CROP_PRODUCTION 2.7127e-07 2.7610e-07 2.7660e-07 2.7706e-07 0.0000
##    IDHM_Longevidade     6.5988e-01 6.6303e-01 6.6512e-01 6.6655e-01 0.6805
##    IDHM_Renda           3.3347e+00 3.3497e+00 3.3556e+00 3.3682e+00 3.3874
##    GVA_AGROPEC_p        2.2190e-02 2.2211e-02 2.2223e-02 2.2243e-02 0.0223
##    GVA_INDUSTRY_p       1.7755e-02 1.7777e-02 1.7785e-02 1.7804e-02 0.0178
##    GVA_SERVICES_p       1.4546e-02 1.4588e-02 1.4592e-02 1.4595e-02 0.0146
##    MUN_EXPENDIT_p       8.1389e-05 8.1666e-05 8.1783e-05 8.2020e-05 0.0001
##    tax_to_gdp           8.3676e-04 8.3967e-04 8.4126e-04 8.4386e-04 0.0008
##    Cars_p               5.0383e-01 5.1791e-01 5.2486e-01 5.2732e-01 0.5350
##    ************************Diagnostic information*************************
##    Number of data points: 2522 
##    Effective number of parameters (2trace(S) - trace(S'S)): 10.27537 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.725 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 432.771 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 420.5244 
##    Residual sum of squares: 173.7561 
##    R-square value:  0.8388797 
##    Adjusted R-square value:  0.8382203 
## 
##    ***********************************************************************
##    Program stops at: 2020-05-31 17:31:12


The adjusted R2 value is 0.8382203, pvalue < 2.2e-16

7.5 Comparing the adjusted r2 values

We will now compare the r2 values of the 4 different approaches (8 in total since 2 per each (fixed, and adaptive)), before deciding which method is the most suitable.
We will compare based on adjusted R2 value. The higher the Adjusted R2 value, the better the method.
This is because the adjusted R2 value calculates the correlation of the variables. In other words, it takes into account all the variables in the model.
A higher adjusted R2 would mean that the higher the percentage of variation in GDP_CAPITA can be explained in the regression model.

CV approach and Gaussian kernel:
Fixed: 0.8454853
Adaptive: 0.8380471

AIC approach and Gaussian kernel:
Fixed: 0.8753995
Adaptive: 0.8398794

CV approach and bi-square kernel:
Fixed: 0.8493112
Adaptive: 0.8380471

AIC approach and Gaussian kernel:
Fixed: 0.8654576
Adaptive: 0.8382203

From the adjusted r2 values above, we can conclude that AIC approach and Gaussian kernel, fixed method has the highest adjusted R2 value of 0.875995. Hence, this method should be used.

8. Visualising GWR Output

In addition to regression residuals, the output feature class table includes fields for observed and predicted y values, condition number, local R2, residuals, and explanatory variable coefficients and standard errors.
We would now attempt to visualise the GWR Output using the AIC approach and Guassian kernel, fixed method, as identified above.

8.1 Converting SDF into sf data.frame

To visualise the SDF, we need to first convert it into sf data.frame

brazil.sf.fixed2 <- st_as_sf(gwr.fixed2$SDF) %>%
  st_transform(crs=4674)
gwr.fixed2.output <- as.data.frame(gwr.fixed2$SDF)
brazil.sf.fixed2 <- cbind(brazil.polygon.res.sf, as.matrix(gwr.fixed2.output))
brazil.sf.fixed2
## Simple feature collection with 2522 features and 133 fields
## geometry type:  GEOMETRY
## dimension:      XY
## bbox:           xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 3.605727
## geographic CRS: SIRGAS 2000
## First 10 features:
##    code_muni     name_muni code_state abbrev_state CAPITAL IBGE_RES_POP
## 1    5200100     ABADIÂNIA         52           GO       0        15757
## 2    4200051 ABDON BATISTA         42           SC       0         2653
## 3    2600054  ABREU E LIMA         26           PE       0        94429
## 4    2100055    AÇAILÂNDIA         21           MA       0       104047
## 5    2900306     ACAJUTIBA         29           BA       0        14653
## 6    1500206         ACARÁ         15           PA       0        53569
## 7    2300200        ACARAÚ         23           CE       0        57551
## 8    4300034        ACEGUÁ         43           RS       0         4394
## 9    2300309      ACOPIARA         23           CE       0        51160
## 10   1200013    ACRELÂNDIA         12           AC       0        12538
##    IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1              15609               148    4655          3233          1422
## 2               2653                 0     848           234           614
## 3              94407                22   28182         25944          2238
## 4             104018                29   27523         20612          6911
## 5              14643                10    4116          3632           484
## 6              53516                53   11833          3014          8819
## 7              57542                 9   14680          7410          7270
## 8               4265               129    1398           314          1084
## 9              51160                 0   15041          7885          7156
## 10             12535                 3    3473          1679          1794
##    IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 Active_pop IBGE_60.
## 1     10656    139      650      894       1087       6896      990
## 2       724     12       32       49         63        479       89
## 3     81482   1050     4405     6255       7019      54749     8004
## 4     78081   1442     5896     7924       8368      49197     5254
## 5     12727    216      849     1282       1404       7412     1564
## 6     12590    265     1082     1436       1537       7281      989
## 7     24117    358     1615     2084       2558      15123     2379
## 8      1059     17       46       76        119        684      117
## 9     25159    365     1521     1929       2422      15121     3801
## 10     5902    118      508      671        710       3484      411
##    IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM.Ranking.2010  IDHM IDHM_Renda
## 1              10307                33085              2202 0.690      0.671
## 2               5502                26195              2092 0.690      0.660
## 3                387                 2595              2477 0.679      0.625
## 4              27137                89420              2633 0.672      0.643
## 5               4570                11442              4613 0.580      0.560
## 6              41637               342851              5513 0.506      0.517
## 7              18505                38871              4125 0.601      0.554
## 8              31149               119866              2259 0.687      0.703
## 9              17482                 4244              4279 0.595      0.563
## 10              5807                31152              4079 0.604      0.584
##    IDHM_Longevidade IDHM_Educacao      LONG        LAT     ALT PAY_TV
## 1             0.841         0.579 -48.71881 -16.182672 1017.55    227
## 2             0.812         0.625 -51.02527 -27.608987  720.98    109
## 3             0.791         0.632 -34.89913  -7.904449   27.06   1418
## 4             0.785         0.602 -47.50666  -4.951377  229.05   1225
## 5             0.723         0.487 -38.01829 -11.662613  183.93    426
## 6             0.757         0.332 -48.20046  -1.963437    7.40    964
## 7             0.758         0.517 -40.11824  -2.885311   17.29   3032
## 8             0.852         0.541 -54.16473 -31.864015  237.92    171
## 9             0.724         0.517 -39.45571  -6.092762  312.96    426
## 10            0.808         0.466 -67.05232 -10.073794  205.89    184
##    FIXED_PHONES    AREA                          REGIAO_TUR CATEGORIA_TUR
## 1           720 1045.13 Região Turística Do Ouro E Cristais             C
## 2           260  237.16                  Vale Do Contestado             D
## 3          4661  126.19        Costa Náutica Coroa Do Avião             D
## 4          2618 5806.44                                <NA>          <NA>
## 5           297  181.48                                <NA>          <NA>
## 6           181 4343.55                  Araguaia-Tocantins             D
## 7           479  845.47               Litoral Extremo Oeste             C
## 8           298 1551.34                        Pampa Gaúcho             D
## 9           598 2265.35                                <NA>          <NA>
## 10          369 1807.95                                <NA>          <NA>
##    ESTIMATED_POP     RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 1          19614 Rural Adjacente       42.84     16728.30    138198.58
## 2           2577 Rural Adjacente    24996.75      3578.87     16011.10
## 3          99622          Urbano        7.80    384262.09       526.04
## 4         111757          Urbano   159853.84    488799.22       779.84
## 5          15129 Rural Adjacente    23176.61         7.02     33546.04
## 6          55513 Rural Adjacente   441281.74     43934.34     97080.41
## 7          62557 Rural Adjacente    74352.86     94123.87       187.46
## 8           4858          Urbano   130383.04     10632.62     66901.48
## 9          53931 Rural Adjacente    35057.66     17008.70    137661.95
## 10         15020 Rural Adjacente    91143.89        14.02     32079.69
##    GVA_PUBLIC  GVA_TOTAL     TAXES        GDP POP_GDP GDP_CAPITA
## 1    63396.20  261161.91  26822.58  287984.49   18427   15628.40
## 2    17842.64   62429.36   2312.65   64742.01    2617   24739.02
## 3   336141.88 1254241.30 170264.52 1424505.83   98990   14390.40
## 4   364811.85 1793302.18 206244.13 1999546.31  110543   18088.40
## 5    48142.87  111888.63   3269.82  115158.46   15764    7305.15
## 6   188941.14  771237.62     17.18     788.42   54080   14578.70
## 7   188055.40  543990.18  36737.25  580727.43   61715    9409.83
## 8    30193.24  238110.37  12221.86  250332.23    4731   52913.18
## 9   155449.90     345.18     26.52  371701.24   53358    6966.18
## 10   85106.34  222349.24   5261.66  227610.90   14120   16119.75
##                                                                GVA_MAIN
## 1                                                       Demais serviços
## 2  Administração, defesa, educação e saúde públicas e seguridade social
## 3                                                       Demais serviços
## 4                                                       Demais serviços
## 5  Administração, defesa, educação e saúde públicas e seguridade social
## 6           Agricultura, inclusive apoio à agricultura e a pós colheita
## 7  Administração, defesa, educação e saúde públicas e seguridade social
## 8           Agricultura, inclusive apoio à agricultura e a pós colheita
## 9  Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
##    MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1      37513019      288      5      9     26      0      2      7    117
## 2      19506956       69      2      0      4      0      0      2     35
## 3     119645700      841      1      0    130      0      2     26    434
## 4     214456331     1334     47      1    113      0      5     75    657
## 5      27275310       96      2      0      4      0      0      1     57
## 6     106368816      162      8      3      6      0      0      4     99
## 7     101483437      638     41      1     38      8      0      6    363
## 8      22028721      168      8      0      8      0      1      2     86
## 9      85042995      365      2      0     26      0      0      6    255
## 10     22507579      107      2      0     15      0      0      0     56
##    COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1      12     57      2      1      0      7     15      3     11      5      1
## 2       8      3      1      1      0      4      0      2      1      3      0
## 3      27     36     14      3      4     18     30      2     47     20      6
## 4      61     80     18      5     21     38     72      3     21     52     12
## 5       2      3      1      2      0      1      2      3      3      4      2
## 6       3      2      1      0      0      1      3      3     12      0      1
## 7       7     28      3      0      6      9     14      2     71     17      6
## 8       8      9      0      1      0      2     13      3      6      3      5
## 9       1     15      5      1      4      7      5      2      5      2      4
## 10      1      5      1      0      0      1      1      3      8      6      0
##    COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1       8      0      0      1   34           1           1       1       1
## 2       3      0      0     NA   NA           0           1       0       1
## 3      41      0      0     NA   NA           2           3       2       3
## 4      53      0      0      2   56           2           3       2       3
## 5       9      0      0     NA   NA           0           1       0       1
## 6      16      0      0     NA   NA           1           1       1       1
## 7      18      0      0     NA   NA           1           3       1       3
## 8      13      0      0     NA   NA           0           1       0       1
## 9      25      0      0      1   22           1           3       1       3
## 10      8      0      0      1   27           0           1       0       1
##    Pr_Assets  Pu_Assets  Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1   33724584   67091904  2838        1426               0 <NA>  NA       NA
## 2          0   42909056   976         345               2 <NA>  NA       NA
## 3  155632735  460626103 14579       10122               0 <NA>  NA       NA
## 4  125525251 1494221307  9935       24208              17 <NA>  NA       NA
## 5          0   50185684   834        1444               0 <NA>  NA       NA
## 6   22821995   37523391   652        3342               0 <NA>  NA       NA
## 7   57802114  529074069  3371       10448               0 <NA>  NA       NA
## 8          0   13450411  2046         591               5 <NA>  NA       NA
## 9   33077714  262320355  3158       11056               0 <NA>  NA       NA
## 10         0   86310524  1223        3343               0 <NA>  NA       NA
##    POST_OFFICES LOG_GDP_CAPITA    PAY_TV_p FIXED_PHONES_p     Cars_p
## 1             3       9.656845 0.012318880    0.039073099 0.15401313
## 2             1      10.116137 0.041650745    0.099350401 0.37294612
## 3             1       9.574317 0.014324679    0.047085564 0.14727750
## 4             1       9.803026 0.011081661    0.023683092 0.08987453
## 5             1       8.896335 0.027023598    0.018840396 0.05290535
## 6             1       9.587317 0.017825444    0.003346893 0.01205621
## 7             1       9.149510 0.049129061    0.007761484 0.05462205
## 8             2      10.876408 0.036144578    0.062988797 0.43246671
## 9            10       8.848822 0.007983807    0.011207317 0.05918513
## 10            1       9.687801 0.013031161    0.026133144 0.08661473
##    Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1     0.07738644  2.324849e-03   0.9078146199    7.499787269       2035.764
## 2     0.13183034  9.551681e+00   1.3675468093    6.118112342       7453.938
## 3     0.10225275  7.879584e-05   3.8818273563    0.005314072       1208.665
## 4     0.21899170  1.446078e+00   4.4218016518    0.007054630       1940.026
## 5     0.09160112  1.470224e+00   0.0004453184    2.128015732       1730.228
## 6     0.06179734  8.159795e+00   0.8123953402    1.795125925       1966.879
## 7     0.16929434  1.204778e+00   1.5251376489    0.003037511       1644.389
## 8     0.12492074  2.755930e+01   2.2474360600   14.141086451       4656.250
## 9     0.20720417  6.570272e-01   0.3187656959    2.579968327       1593.819
## 10    0.23675637  6.454950e+00   0.0009929178    2.271932720       1594.021
##    pop_density   tax_to_gdp      MLR_RES Intercept IBGE_CROP_PRODUCTION.1
## 1    17.631299 9.313897e-02 -0.001283972  6.597287           2.777367e-07
## 2    11.034744 3.572101e-02 -0.241843560  8.189936           5.672827e-07
## 3   784.452017 1.195253e-01  0.240172219  7.690896           1.498522e-06
## 4    19.037999 1.031455e-01  0.316636847  5.621880           3.475785e-07
## 5    86.863566 2.839409e-02 -0.164691960  6.731945           1.842714e-07
## 6    12.450645 2.179042e-02  0.400526860  6.920214           1.615765e-07
## 7    72.994902 6.326075e-02  0.093778846  7.936980           4.870924e-07
## 8     3.049622 4.882256e-02 -0.013513261  7.357133           2.706744e-07
## 9    23.553976 7.134762e-05 -0.207225765  7.075304           7.461556e-08
## 10    7.809950 2.311691e-02  0.364934669  7.660411          -2.618270e-08
##    IDHM_Longevidade.1 IDHM_Renda.1 GVA_AGROPEC_p.1 GVA_INDUSTRY_p.1
## 1           1.4813248    1.8415302      0.02156504      0.015148346
## 2           0.0847707    1.3069467      0.01358269      0.014479816
## 3           0.4576879    1.3033866      0.07639569      0.028447246
## 4           1.5009399    3.3379677      0.04059035      0.027162696
## 5           1.4044373    1.1242071      0.07743475      0.005213864
## 6           1.5240248    0.8009765      0.07633172      0.049969254
## 7          -0.7909009    2.4032606      0.05842963      0.055760725
## 8           0.5034629    2.1229538      0.02265421      0.014055985
## 9          -0.1253044    2.9617273      0.08345629      0.032905428
## 10          0.0753320    2.5994594      0.03769048     -0.035619436
##    GVA_SERVICES_p.1 MUN_EXPENDIT_p.1  tax_to_gdp.1  Cars_p.1         y
## 1       0.026897387     1.327321e-04  9.011492e-05 0.6328407  9.656845
## 2       0.015768268     7.854375e-05  3.379579e-04 1.3767096 10.116137
## 3       0.048717274    -1.457816e-05  1.200115e-03 1.6374547  9.574317
## 4      -0.004672260     1.894414e-04  6.255445e-04 1.4015512  9.803026
## 5       0.028399896     1.337906e-04  1.000011e-04 4.5140828  8.896335
## 6       0.030518674     9.615383e-05  3.918865e-04 4.9286169  9.587317
## 7       0.006838951     3.386892e-05 -1.358367e-03 3.5756980  9.149510
## 8       0.014682190     7.910365e-05  1.008933e-03 0.9234775 10.876408
## 9       0.023164711     3.474733e-05 -8.985955e-06 2.7743166  8.848822
## 10     -0.010880621    -9.682922e-05 -1.104611e-03 4.5244044  9.687801
##         yhat     residual CV_Score Stud_residual Intercept_SE
## 1   9.671149 -0.014303751        0   -0.06485682    1.0504153
## 2  10.481137 -0.364999808        0   -1.73304301    0.4033128
## 3   9.205809  0.368507968        0    1.76656698    0.6033006
## 4   9.649833  0.153192724        0    0.93719792    0.9127804
## 5   9.023612 -0.127276741        0   -0.56613874    0.5272937
## 6   9.510185  0.077131549        0    0.60697619    1.3335524
## 7   9.094195  0.055314680        0    0.25579239    1.1426845
## 8  10.942259 -0.065851575        0   -0.35832177    0.7946915
## 9   8.997019 -0.148196388        0   -0.67582022    0.6705061
## 10  9.694589 -0.006788715        0   -0.17136088    2.9725849
##    IBGE_CROP_PRODUCTION_SE IDHM_Longevidade_SE IDHM_Renda_SE GVA_AGROPEC_p_SE
## 1             8.484024e-08           1.3177188     0.8873868      0.003504811
## 2             1.290444e-07           0.5151458     0.4163829      0.002020650
## 3             7.413602e-07           0.7879453     0.8727743      0.014356430
## 4             4.614686e-07           1.2285379     0.8864023      0.012913531
## 5             1.946111e-07           0.6984230     0.7654763      0.012859999
## 6             4.642615e-07           1.6630944     1.2057719      0.019326965
## 7             1.100752e-06           1.4288003     1.2359445      0.031319249
## 8             1.675201e-07           1.0194275     0.6575403      0.002914116
## 9             1.870839e-07           0.8438609     0.8137637      0.017063834
## 10            1.072791e-06           3.7547695     3.5833143      0.041169811
##    GVA_INDUSTRY_p_SE GVA_SERVICES_p_SE MUN_EXPENDIT_p_SE tax_to_gdp_SE
## 1       0.0022550210       0.004879588      2.750713e-05  0.0008638681
## 2       0.0009756658       0.001779603      1.323539e-05  0.0002918066
## 3       0.0049360525       0.008842004      3.746259e-05  0.0006001835
## 4       0.0071949655       0.018413722      3.756291e-05  0.0011783000
## 5       0.0027032225       0.007730917      2.916379e-05  0.0007335302
## 6       0.0138239499       0.021059938      7.987995e-05  0.0014168033
## 7       0.0100904776       0.012943162      7.238931e-05  0.0012485628
## 8       0.0015549421       0.003430252      2.436199e-05  0.0004721408
## 9       0.0055450193       0.008943831      4.619489e-05  0.0009341655
## 10      0.1126677386       0.039796305      3.679609e-04  0.0031127056
##    Cars_p_SE Intercept_TV IBGE_CROP_PRODUCTION_TV IDHM_Longevidade_TV
## 1  0.4335130     6.280646              3.27364338          1.12415854
## 2  0.2218850    20.306662              4.39602790          0.16455671
## 3  0.9374201    12.748034              2.02131471          0.58086250
## 4  1.4909104     6.159072              0.75320076          1.22172860
## 5  0.9069956    12.766973              0.94686971          2.01086928
## 6  2.2354432     5.189308              0.34802910          0.91637903
## 7  1.4761085     6.945907              0.44250867         -0.55354192
## 8  0.3458147     9.257847              1.61577302          0.49386826
## 9  0.8596388    10.552184              0.39883464         -0.14848937
## 10 3.8876976     2.577020             -0.02440614          0.02006302
##    IDHM_Renda_TV GVA_AGROPEC_p_TV GVA_INDUSTRY_p_TV GVA_SERVICES_p_TV
## 1      2.0752283        6.1529837         6.7176073         5.5122252
## 2      3.1388098        6.7219413        14.8409592         8.8605529
## 3      1.4933832        5.3213568         5.7631572         5.5097551
## 4      3.7657480        3.1432423         3.7752365        -0.2537379
## 5      1.4686374        6.0213648         1.9287586         3.6735482
## 6      0.6642852        3.9494932         3.6146872         1.4491341
## 7      1.9444729        1.8656141         5.5260739         0.5283833
## 8      3.2286294        7.7739552         9.0395554         4.2802074
## 9      3.6395420        4.8908286         5.9342314         2.5900211
## 10     0.7254344        0.9154882        -0.3161458        -0.2734078
##    MUN_EXPENDIT_p_TV tax_to_gdp_TV Cars_p_TV  Local_R2 coords.x1  coords.x2
## 1          4.8253691   0.104315591  1.459796 0.8409292 -48.71881 -16.182672
## 2          5.9343752   1.158157225  6.204609 0.8586014 -51.02527 -27.608987
## 3         -0.3891390   1.999580526  1.746767 0.9438067 -34.89913  -7.904449
## 4          5.0433106   0.530887256  0.940064 0.9505465 -47.50666  -4.951377
## 5          4.5875581   0.136328509  4.976962 0.8851918 -38.01829 -11.662613
## 6          1.2037292   0.276599065  2.204761 0.9745541 -48.20046  -1.963437
## 7          0.4678718  -1.087944044  2.422382 0.9492447 -40.11824  -2.885311
## 8          3.2470118   2.136933421  2.670440 0.8619977 -54.16473 -31.864015
## 9          0.7521900  -0.009619232  3.227305 0.9705163 -39.45571  -6.092762
## 10        -0.2631508  -0.354871591  1.163775 0.9980819 -67.05232 -10.073794
##                              geom
## 1  MULTIPOLYGON (((-48.84178 -...
## 2  MULTIPOLYGON (((-51.03724 -...
## 3  POLYGON ((-35.10602 -7.8251...
## 4  MULTIPOLYGON (((-47.00353 -...
## 5  MULTIPOLYGON (((-37.98092 -...
## 6  MULTIPOLYGON (((-48.30974 -...
## 7  MULTIPOLYGON (((-40.33112 -...
## 8  POLYGON ((-54.1094 -31.4331...
## 9  MULTIPOLYGON (((-39.15667 -...
## 10 POLYGON ((-67.13424 -9.6762...
summary(gwr.fixed2$SDF$yhat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   8.302   9.481   9.989   9.923  10.309  14.606


The maximum value is 14.606
Remove code_muni from brazil.sf.fixed2 data frame

brazil.sf.fixed2 <- subset(brazil.sf.fixed2, select= -code_muni)

8.2 Local R2

The code chunks below are used to plot a choropleth to visualise local R2

qtm(brazil.sf.fixed2, "Local_R2", border=NULL)


The range of local R2 values are from 0.75 < Local_R2 < 1.00, which is relatively high. In fact, the choropleth shows a large area in Brazil with darker shades, indicating higher Local R2 value. This suggest that our model is predicting well.
It can be seen that the upper parts of brazil are darker shaded, which suggests that the relationship between GDP_CAPITA and the independent variables are stronger, since the variables are more correlated.

8.3 Intercept

The code chunks below are used to plot an choropleth map to visualise intercept of the regression model

qtm(brazil.sf.fixed2, "Intercept", border=NULL)


We can see that there is a varying range of intercept value from 2 to 12, and since intercept value > 0, we can conclude that the slope is positive.

8.4 Residuals


Residual is the difference between the observed GDP_CAPITA and the predicted GDP_CAPITA. Residual will be 0 if there is no difference bewteen observed and predicted values of GDP_CAPITA.

qtm(brazil.sf.fixed2,"residual", border=NULL)
## Variable(s) "residual" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

8.5 Y hat plot

The code chunk below allow us to visualise the y hat of the regression model.
Y-hat represents the predicted equation for a line of best fit in a linear regression model. The value of y hat helps us to differentiate between the predicted data and the observed data of GDP_CAPITA.

qtm(brazil.sf.fixed2, "yhat", border=NULL)


Overall, our regression model is able to perform well in explaining the factors that are affecting GDP_CAPITA in Brazil.