Brazil is the world’s fifth-largest country by area and the sixth most populous. Brazil is classified as an upper-middle income economy by the World Bank. As a developing country, Brazil has the largest share of global wealth in Latin America. It is considered an advanced emerging economy. It has the ninth largest GDP in the world by nominal, and eighth by PPP measures. Behind all this impressive figures, the spatial development of Brazil is highly unequal. The GDP per capita of the poorest municipality is R$3190.6. On the other hand, the GDP per capita of the richest municipality is R$314638. Half of the municipalities with GDP per capita less than R$16000 and the top 25% municipalities earn R$26155 and above.
In this take-home exercise, we will be determining the factors affecting the unequal development of Brazil at the municipality level by using the data provided. The specific task of the analysis are as follows:
1. Prepare a choropleth map showing the distribution of GDP per capita, 2016 at municipality level.
2. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using multiple linear regression method.
3. Prepare a choropleth map showing the distribution of the residual of the GDP per capita.
4. Calibrate an explanatory model to explain factors affecting the GDP per capita at the municipality level by using geographically weighted regression method.
5. Prepare a series of choropleth maps showing the outputs of the geographically weighted regression model.
We are provided with the first 2 data sets, and the last data set is retrived from the geobr package in r 1. BRAZIL_CITIES.csv. This data file consists of 81 columns and 5573 rows. Each row representing one municipality.
2. Data_Dictionary.csv. This file provides meta data of each columns in BRAZIL_CITIES.csv.
3. 2016 municipality boundary file
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'geobr', 'readr', 'anchors', 'DT', 'fitdistrplus', 'Orcs')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
## Loading required package: olsrr
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
## Loading required package: corrplot
## corrplot 0.84 loaded
## Loading required package: ggpubr
## Loading required package: ggplot2
## Loading required package: sf
## Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
## Loading required package: spdep
## Loading required package: sp
## Loading required package: spData
## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`
## Loading required package: GWmodel
## Loading required package: maptools
## Checking rgeos availability: FALSE
## Note: when rgeos is not available, polygon geometry computations in maptools depend on gpclib,
## which has a restricted licence. It is disabled by default;
## to enable gpclib, type gpclibPermit()
## Loading required package: robustbase
## Loading required package: Rcpp
## Loading required package: spatialreg
## Loading required package: Matrix
## Registered S3 methods overwritten by 'spatialreg':
## method from
## residuals.stsls spdep
## deviance.stsls spdep
## coef.stsls spdep
## print.stsls spdep
## summary.stsls spdep
## print.summary.stsls spdep
## residuals.gmsar spdep
## deviance.gmsar spdep
## coef.gmsar spdep
## fitted.gmsar spdep
## print.gmsar spdep
## summary.gmsar spdep
## print.summary.gmsar spdep
## print.lagmess spdep
## summary.lagmess spdep
## print.summary.lagmess spdep
## residuals.lagmess spdep
## deviance.lagmess spdep
## coef.lagmess spdep
## fitted.lagmess spdep
## logLik.lagmess spdep
## fitted.SFResult spdep
## print.SFResult spdep
## fitted.ME_res spdep
## print.ME_res spdep
## print.lagImpact spdep
## plot.lagImpact spdep
## summary.lagImpact spdep
## HPDinterval.lagImpact spdep
## print.summary.lagImpact spdep
## print.sarlm spdep
## summary.sarlm spdep
## residuals.sarlm spdep
## deviance.sarlm spdep
## coef.sarlm spdep
## vcov.sarlm spdep
## fitted.sarlm spdep
## logLik.sarlm spdep
## anova.sarlm spdep
## predict.sarlm spdep
## print.summary.sarlm spdep
## print.sarlm.pred spdep
## as.data.frame.sarlm.pred spdep
## residuals.spautolm spdep
## deviance.spautolm spdep
## coef.spautolm spdep
## fitted.spautolm spdep
## print.spautolm spdep
## summary.spautolm spdep
## logLik.spautolm spdep
## print.summary.spautolm spdep
## print.WXImpact spdep
## summary.WXImpact spdep
## print.summary.WXImpact spdep
## predict.SLX spdep
##
## Attaching package: 'spatialreg'
## The following objects are masked from 'package:spdep':
##
## anova.sarlm, as.spam.listw, as_dgRMatrix_listw, as_dsCMatrix_I,
## as_dsCMatrix_IrW, as_dsTMatrix_listw, bptest.sarlm, can.be.simmed,
## cheb_setup, coef.gmsar, coef.sarlm, coef.spautolm, coef.stsls,
## create_WX, deviance.gmsar, deviance.sarlm, deviance.spautolm,
## deviance.stsls, do_ldet, eigen_pre_setup, eigen_setup, eigenw,
## errorsarlm, fitted.gmsar, fitted.ME_res, fitted.sarlm,
## fitted.SFResult, fitted.spautolm, get.ClusterOption,
## get.coresOption, get.mcOption, get.VerboseOption,
## get.ZeroPolicyOption, GMargminImage, GMerrorsar, griffith_sone,
## gstsls, Hausman.test, HPDinterval.lagImpact, impacts, intImpacts,
## Jacobian_W, jacobianSetup, l_max, lagmess, lagsarlm, lextrB,
## lextrS, lextrW, lmSLX, logLik.sarlm, logLik.spautolm, LR.sarlm,
## LR1.sarlm, LR1.spautolm, LU_prepermutate_setup, LU_setup,
## Matrix_J_setup, Matrix_setup, mcdet_setup, MCMCsamp, ME, mom_calc,
## mom_calc_int2, moments_setup, powerWeights, predict.sarlm,
## predict.SLX, print.gmsar, print.ME_res, print.sarlm,
## print.sarlm.pred, print.SFResult, print.spautolm, print.stsls,
## print.summary.gmsar, print.summary.sarlm, print.summary.spautolm,
## print.summary.stsls, residuals.gmsar, residuals.sarlm,
## residuals.spautolm, residuals.stsls, sacsarlm, SE_classic_setup,
## SE_interp_setup, SE_whichMin_setup, set.ClusterOption,
## set.coresOption, set.mcOption, set.VerboseOption,
## set.ZeroPolicyOption, similar.listw, spam_setup, spam_update_setup,
## SpatialFiltering, spautolm, spBreg_err, spBreg_lag, spBreg_sac,
## stsls, subgraph_eigenw, summary.gmsar, summary.sarlm,
## summary.spautolm, summary.stsls, trW, vcov.sarlm, Wald1.sarlm
## Welcome to GWmodel version 2.1-4.
## The new version of GWmodel 2.1-4 now is readyLoading required package: tmap
## Loading required package: tidyverse
## -- Attaching packages ---------------------------------------------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.1 v dplyr 0.8.5
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts ------------------------------------------------------------------------- tidyverse_conflicts() --
## x tidyr::expand() masks Matrix::expand()
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x tidyr::pack() masks Matrix::pack()
## x tidyr::unpack() masks Matrix::unpack()
## Loading required package: geobr
## Loading required package: anchors
## Loading required package: rgenoud
## ## rgenoud (Version 5.8-3.0, Build Date: 2019-01-22)
## ## See http://sekhon.berkeley.edu/rgenoud for additional documentation.
## ## Please cite software as:
## ## Walter Mebane, Jr. and Jasjeet S. Sekhon. 2011.
## ## ``Genetic Optimization Using Derivatives: The rgenoud package for R.''
## ## Journal of Statistical Software, 42(11): 1-26.
## ##
##
## Loading required package: MASS
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
##
## The following object is masked from 'package:olsrr':
##
## cement
##
##
## ## anchors (Version 3.0-8, Build Date: 2014-02-24)
## ## See http://wand.stanford.edu/anchors for additional documentation and support.
##
##
## Loading required package: DT
## Loading required package: fitdistrplus
## Loading required package: survival
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:robustbase':
##
## heart
##
## Loading required package: Orcs
## Loading required package: raster
##
## Attaching package: 'raster'
##
## The following objects are masked from 'package:MASS':
##
## area, select
##
## The following object is masked from 'package:dplyr':
##
## select
##
## The following object is masked from 'package:tidyr':
##
## extract
##
## The following object is masked from 'package:ggpubr':
##
## rotate
Brazil 2016 municipality boundary
mun <- read_municipality(code_muni= "all", year=2016)
## Using year 2016
## Loading data for the whole country. This might take a few minutes.
##
|
| | 0%
|
|=== | 4%
|
|===== | 7%
|
|======== | 11%
|
|========== | 15%
|
|============= | 19%
|
|================ | 22%
|
|================== | 26%
|
|===================== | 30%
|
|======================= | 33%
|
|========================== | 37%
|
|============================= | 41%
|
|=============================== | 44%
|
|================================== | 48%
|
|==================================== | 52%
|
|======================================= | 56%
|
|========================================= | 59%
|
|============================================ | 63%
|
|=============================================== | 67%
|
|================================================= | 70%
|
|==================================================== | 74%
|
|====================================================== | 78%
|
|========================================================= | 81%
|
|============================================================ | 85%
|
|============================================================== | 89%
|
|================================================================= | 93%
|
|=================================================================== | 96%
|
|======================================================================| 100%
Plot the boundary of Brazil
no_axis <- theme(axis.title=element_blank(),
axis.text=element_blank(),
axis.ticks=element_blank())
ggplot() +
geom_sf(data=mun, fill="#2D3E50", color="#FEBF57", size=.15, show.legend = FALSE) +
labs(subtitle="Brazil", size=8) +
theme_minimal() +
no_axis
Check crs of mun
st_crs(mun)
## Coordinate Reference System:
## User input: SIRGAS 2000
## wkt:
## GEOGCRS["SIRGAS 2000",
## DATUM["Sistema de Referencia Geocentrico para las AmericaS 2000",
## ELLIPSOID["GRS 1980",6378137,298.257222101,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["unknown"],
## AREA["Latin America - SIRGAS 2000 by country"],
## BBOX[-59.87,-122.19,32.72,-25.28]],
## ID["EPSG",4674]]
The crs of brazil is set at 4674.
We will check if the geometry in mun is valid
all(st_is_valid(mun))
## [1] FALSE
Since FALSE, we need to make sure geometry is valid
mun <- st_make_valid(mun)
Now, check again if geometry is valid
all(st_is_valid(mun))
## [1] TRUE
Next, we will check for any empty geometries
any(is.na(st_dimension(mun)))
## [1] FALSE
brazil <- read_delim("data/aspatial/BRAZIL_CITIES.csv", delim = ";")
## Parsed with column specification:
## cols(
## .default = col_double(),
## CITY = col_character(),
## STATE = col_character(),
## AREA = col_number(),
## REGIAO_TUR = col_character(),
## CATEGORIA_TUR = col_character(),
## RURAL_URBAN = col_character(),
## GVA_MAIN = col_character()
## )
## See spec(...) for full column specifications.
After importing the data, we will need to check if the data is imported correctly.
We will use glimpse() to check.
glimpse(brazil)
## Rows: 5,573
## Columns: 81
## $ CITY <chr> "Abadia De Goiás", "Abadia Dos Dourados", ...
## $ STATE <chr> "GO", "MG", "GO", "MG", "PA", "CE", "BA", ...
## $ CAPITAL <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ IBGE_RES_POP <dbl> 6876, 6704, 15757, 22690, 141100, 10496, 8...
## $ IBGE_RES_POP_BRAS <dbl> 6876, 6704, 15609, 22690, 141040, 10496, 8...
## $ IBGE_RES_POP_ESTR <dbl> 0, 0, 148, 0, 60, 0, 0, 0, 0, 0, 0, 16, 17...
## $ IBGE_DU <dbl> 2137, 2328, 4655, 7694, 31061, 2791, 2572,...
## $ IBGE_DU_URBAN <dbl> 1546, 1481, 3233, 6667, 19057, 1251, 1193,...
## $ IBGE_DU_RURAL <dbl> 591, 847, 1422, 1027, 12004, 1540, 1379, 1...
## $ IBGE_POP <dbl> 5300, 4154, 10656, 18464, 82956, 4538, 372...
## $ IBGE_1 <dbl> 69, 38, 139, 176, 1354, 98, 37, 167, 69, 1...
## $ `IBGE_1-4` <dbl> 318, 207, 650, 856, 5567, 323, 156, 733, 3...
## $ `IBGE_5-9` <dbl> 438, 260, 894, 1233, 7618, 421, 263, 978, ...
## $ `IBGE_10-14` <dbl> 517, 351, 1087, 1539, 8905, 483, 277, 927,...
## $ `IBGE_15-59` <dbl> 3542, 2709, 6896, 11979, 53516, 2631, 2319...
## $ `IBGE_60+` <dbl> 416, 589, 990, 2681, 5996, 582, 673, 803, ...
## $ IBGE_PLANTED_AREA <dbl> 319, 4479, 10307, 1862, 25200, 2598, 895, ...
## $ `IBGE_CROP_PRODUCTION_$` <dbl> 1843, 18017, 33085, 7502, 700872, 5234, 39...
## $ `IDHM Ranking 2010` <dbl> 1689, 2207, 2202, 1994, 3530, 3522, 4086, ...
## $ IDHM <dbl> 0.708, 0.690, 0.690, 0.698, 0.628, 0.628, ...
## $ IDHM_Renda <dbl> 0.687, 0.693, 0.671, 0.720, 0.579, 0.540, ...
## $ IDHM_Longevidade <dbl> 0.830, 0.839, 0.841, 0.848, 0.798, 0.748, ...
## $ IDHM_Educacao <dbl> 0.622, 0.563, 0.579, 0.556, 0.537, 0.612, ...
## $ LONG <dbl> -49.44055, -47.39683, -48.71881, -45.44619...
## $ LAT <dbl> -16.758812, -18.487565, -16.182672, -19.15...
## $ ALT <dbl> 893.60, 753.12, 1017.55, 644.74, 10.12, 40...
## $ PAY_TV <dbl> 360, 77, 227, 1230, 3389, 29, 952, 51, 55,...
## $ FIXED_PHONES <dbl> 842, 296, 720, 1716, 1218, 34, 335, 222, 3...
## $ AREA <dbl> 147.26, 881.06, 1045.13, 1817.07, 1610.65,...
## $ REGIAO_TUR <chr> NA, "Caminhos Do Cerrado", "Região Turísti...
## $ CATEGORIA_TUR <chr> NA, "D", "C", "D", "D", NA, "D", NA, NA, "...
## $ ESTIMATED_POP <dbl> 8583, 6972, 19614, 23223, 156292, 11663, 8...
## $ RURAL_URBAN <chr> "Urbano", "Rural Adjacente", "Rural Adjace...
## $ GVA_AGROPEC <dbl> 6.20, 50524.57, 42.84, 113824.60, 140463.7...
## $ GVA_INDUSTRY <dbl> 27991.25, 25917.70, 16728.30, 31002.62, 58...
## $ GVA_SERVICES <dbl> 74750.32, 62689.23, 138198.58, 172.33, 468...
## $ GVA_PUBLIC <dbl> 36915.04, 28083.79, 63396.20, 86081.41, 48...
## $ ` GVA_TOTAL ` <dbl> 145857.60, 167215.28, 261161.91, 403241.27...
## $ TAXES <dbl> 20554.20, 12873.50, 26822.58, 26994.09, 95...
## $ GDP <dbl> 166.41, 180.09, 287984.49, 430235.36, 1249...
## $ POP_GDP <dbl> 8053, 7037, 18427, 23574, 151934, 11483, 9...
## $ GDP_CAPITA <dbl> 20664.57, 25591.70, 15628.40, 18250.42, 82...
## $ GVA_MAIN <chr> "Demais serviços", "Demais serviços", "Dem...
## $ MUN_EXPENDIT <dbl> 28227691, 17909274, 37513019, NA, NA, NA, ...
## $ COMP_TOT <dbl> 284, 476, 288, 621, 931, 86, 191, 87, 285,...
## $ COMP_A <dbl> 5, 6, 5, 18, 4, 1, 6, 2, 5, 2, 0, 8, 3, 1,...
## $ COMP_B <dbl> 1, 6, 9, 1, 2, 0, 0, 0, 0, 0, 0, 2, 2, 0, ...
## $ COMP_C <dbl> 56, 30, 26, 40, 43, 4, 8, 3, 20, 4, 9, 40,...
## $ COMP_D <dbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, ...
## $ COMP_E <dbl> 2, 2, 2, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 2, ...
## $ COMP_F <dbl> 29, 34, 7, 20, 27, 6, 4, 0, 10, 2, 0, 25, ...
## $ COMP_G <dbl> 110, 190, 117, 303, 500, 48, 97, 71, 133, ...
## $ COMP_H <dbl> 26, 70, 12, 62, 16, 2, 5, 0, 18, 8, 1, 67,...
## $ COMP_I <dbl> 4, 28, 57, 30, 31, 10, 5, 1, 14, 3, 0, 25,...
## $ COMP_J <dbl> 5, 11, 2, 9, 6, 2, 3, 1, 8, 1, 1, 9, 5, 14...
## $ COMP_K <dbl> 0, 0, 1, 6, 1, 0, 1, 0, 0, 1, 0, 4, 3, 3, ...
## $ COMP_L <dbl> 2, 4, 0, 4, 1, 0, 0, 0, 4, 0, 0, 7, 4, 4, ...
## $ COMP_M <dbl> 10, 15, 7, 28, 22, 2, 5, 0, 11, 4, 2, 26, ...
## $ COMP_N <dbl> 12, 29, 15, 27, 16, 3, 5, 1, 26, 0, 1, 16,...
## $ COMP_O <dbl> 4, 2, 3, 2, 2, 2, 2, 2, 2, 2, 6, 2, 4, 2, ...
## $ COMP_P <dbl> 6, 9, 11, 15, 155, 0, 8, 0, 8, 1, 6, 14, 1...
## $ COMP_Q <dbl> 6, 14, 5, 19, 33, 2, 1, 2, 9, 3, 0, 13, 22...
## $ COMP_R <dbl> 1, 6, 1, 9, 15, 0, 2, 0, 4, 0, 0, 4, 6, 6,...
## $ COMP_S <dbl> 5, 19, 8, 27, 56, 4, 38, 4, 12, 3, 4, 23, ...
## $ COMP_T <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ COMP_U <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ HOTELS <dbl> NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, NA, ...
## $ BEDS <dbl> NA, NA, 34, NA, NA, NA, 24, NA, NA, NA, NA...
## $ Pr_Agencies <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Agencies <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Bank <dbl> NA, NA, 1, 2, 2, NA, NA, 1, 0, 0, 0, 1, 0,...
## $ Pu_Bank <dbl> NA, NA, 1, 2, 4, NA, NA, 0, 1, 1, 1, 2, 1,...
## $ Pr_Assets <dbl> NA, NA, 33724584, 44974716, 76181384, NA, ...
## $ Pu_Assets <dbl> NA, NA, 67091904, 371922572, 800078483, NA...
## $ Cars <dbl> 2158, 2227, 2838, 6928, 5277, 553, 896, 61...
## $ Motorcycles <dbl> 1246, 1142, 1426, 2953, 25661, 1674, 696, ...
## $ Wheeled_tractor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, ...
## $ UBER <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ MAC <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ `WAL-MART` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ POST_OFFICES <dbl> 1, 1, 3, 4, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...
head(brazil$LONG)
## [1] -49.44055 -47.39683 -48.71881 -45.44619 -48.88440 -39.04755
head(brazil$LAT)
## [1] -16.758812 -18.487565 -16.182672 -19.155848 -1.723470 -7.356977
Convert columns to factor types
brazil$STATE <- factor(brazil$STATE)
brazil$CAPITAL <- factor(brazil$CAPITAL)
brazil$REGIAO_TUR <- factor(brazil$REGIAO_TUR)
brazil$CATEGORIA_TUR <- factor(brazil$CATEGORIA_TUR)
brazil$UBER <- factor(brazil$UBER)
Check the summary of brazil
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5573 MG : 853 0:5546 Min. : 805
## Class :character SP : 645 1: 27 1st Qu.: 5235
## Mode :character RS : 498 Median : 10934
## BA : 418 Mean : 34278
## PR : 399 3rd Qu.: 23424
## SC : 295 Max. :11253503
## (Other):2465 NA's :8
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.0 Min. : 239 Min. : 60
## 1st Qu.: 5230 1st Qu.: 0.0 1st Qu.: 1572 1st Qu.: 874
## Median : 10926 Median : 0.0 Median : 3174 Median : 1846
## Mean : 34200 Mean : 77.5 Mean : 10303 Mean : 8859
## 3rd Qu.: 23390 3rd Qu.: 10.0 3rd Qu.: 6726 3rd Qu.: 4624
## Max. :11133776 Max. :119727.0 Max. :3576148 Max. :3548433
## NA's :8 NA's :8 NA's :10 NA's :10
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 487 1st Qu.: 2801 1st Qu.: 38.0 1st Qu.: 158
## Median : 931 Median : 6170 Median : 92.0 Median : 376
## Mean : 1463 Mean : 27595 Mean : 383.3 Mean : 1544
## 3rd Qu.: 1832 3rd Qu.: 15302 3rd Qu.: 232.0 3rd Qu.: 951
## Max. :33809 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :81 NA's :8 NA's :8 NA's :8
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 259 1st Qu.: 1734 1st Qu.: 341
## Median : 516 Median : 588 Median : 3841 Median : 722
## Mean : 2069 Mean : 2381 Mean : 18212 Mean : 3004
## 3rd Qu.: 1300 3rd Qu.: 1478 3rd Qu.: 9628 3rd Qu.: 1724
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :8 NA's :8 NA's :8 NA's :8
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2326 1st Qu.:1392 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2783 Median :0.6650
## Mean : 14179.9 Mean : 57384 Mean :2783 Mean :0.6592
## 3rd Qu.: 11194.2 3rd Qu.: 55619 3rd Qu.:4174 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :3 NA's :3 NA's :8 NA's :8
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.23
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.40
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. :-32.44
## NA's :8 NA's :8 NA's :8 NA's :9
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1 Min. : 3
## 1st Qu.:-22.838 1st Qu.: 169.8 1st Qu.: 88 1st Qu.: 119
## Median :-18.089 Median : 406.5 Median : 247 Median : 327
## Mean :-16.444 Mean : 893.8 Mean : 3094 Mean : 6567
## 3rd Qu.: -8.489 3rd Qu.: 628.9 3rd Qu.: 815 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668 Max. :5543127
## NA's :9 NA's :9 NA's :3 NA's :3
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 59 A : 51
## 1st Qu.: 204.44 Vale Do Contestado : 45 B : 168
## Median : 416.59 Amazônia Atlântica : 40 C : 521
## Mean : 1517.44 Araguaia-Tocantins : 39 D :1892
## 3rd Qu.: 1026.57 Cariri : 37 E : 653
## Max. :159533.33 (Other) :3065 NA's:2288
## NA's :3 NA's :2288
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:5573 Min. : 0 Min. : 1
## 1st Qu.: 5454 Class :character 1st Qu.: 4189 1st Qu.: 1726
## Median : 11590 Mode :character Median : 20426 Median : 7424
## Mean : 37432 Mean : 47271 Mean : 175928
## 3rd Qu.: 25296 3rd Qu.: 51227 3rd Qu.: 41022
## Max. :12176866 Max. :1402282 Max. :63306755
## NA's :3 NA's :3 NA's :3
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 7 Min. : 17 Min. : -14159
## 1st Qu.: 10112 1st Qu.: 17267 1st Qu.: 42253 1st Qu.: 1305
## Median : 31211 Median : 35866 Median : 119492 Median : 5100
## Mean : 489451 Mean : 123768 Mean : 832987 Mean : 118864
## 3rd Qu.: 115406 3rd Qu.: 89245 3rd Qu.: 313963 3rd Qu.: 22197
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
## NA's :3 NA's :3 NA's :3 NA's :3
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 15 Min. : 815 Min. : 3191 Length:5573
## 1st Qu.: 43709 1st Qu.: 5483 1st Qu.: 9058 Class :character
## Median : 125153 Median : 11578 Median : 15870 Mode :character
## Mean : 954584 Mean : 36998 Mean : 21126
## 3rd Qu.: 329539 3rd Qu.: 25085 3rd Qu.: 26155
## Max. :687035890 Max. :12038175 Max. :314638
## NA's :3 NA's :3 NA's :3
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 6.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.573e+07 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.746e+07 Median : 162.0 Median : 2.00 Median : 0.000
## Mean :1.043e+08 Mean : 906.8 Mean : 18.25 Mean : 1.852
## 3rd Qu.:5.666e+07 3rd Qu.: 448.0 3rd Qu.: 8.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00 Max. :274.000
## NA's :1492 NA's :3 NA's :3 NA's :3
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 11.00 Median : 0.0000 Median : 0.000 Median : 4.00
## Mean : 73.44 Mean : 0.4262 Mean : 2.029 Mean : 43.26
## 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_G COMP_H COMP_I COMP_J
## Min. : 1.0 Min. : 0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 1 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 74.5 Median : 7 Median : 7.00 Median : 1.00
## Mean : 348.0 Mean : 41 Mean : 55.88 Mean : 24.74
## 3rd Qu.: 199.0 3rd Qu.: 25 3rd Qu.: 24.00 3rd Qu.: 5.00
## Max. :150633.0 Max. :19515 Max. :29290.00 Max. :38720.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.0
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 4.0
## Mean : 15.55 Mean : 15.14 Mean : 51.29 Mean : 83.7
## 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.0
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.0
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00
## Median : 2.000 Median : 6.00 Median : 3.00 Median : 2.00
## Mean : 3.269 Mean : 30.96 Mean : 34.15 Mean : 12.18
## 3rd Qu.: 3.000 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00
## Max. :204.000 Max. :16030.00 Max. :22248.00 Max. :6687.00
## NA's :3 NA's :3 NA's :3 NA's :3
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.61 Mean :0 Mean : 0.05027 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :3 NA's :3 NA's :3 NA's :4686
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 257.5 Mean : 3.383 Mean : 2.829 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :4686 NA's :2231 NA's :2231 NA's :2231
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602
## Median :2.00 Median :3.231e+07 Median :1.339e+08 Median : 1438
## Mean :1.58 Mean :9.180e+09 Mean :6.005e+09 Mean : 9859
## 3rd Qu.:2.00 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4086
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2231 NA's :2231 NA's :2231 NA's :11
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 1 : 125 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 NA's:5448 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median : 2.000
## Mean : 4879 Mean : 5.754 Mean : 4.277
## 3rd Qu.: 3294 3rd Qu.: 1.000 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :130.000
## NA's :11 NA's :11 NA's :5407
## WAL-MART POST_OFFICES
## Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 1.000 Median : 1.000
## Mean : 2.059 Mean : 2.081
## 3rd Qu.: 1.750 3rd Qu.: 2.000
## Max. :26.000 Max. :225.000
## NA's :5471 NA's :120
### 4.2 Check for NAs From the summary of brazil, we have identified that there are 9 NAs for LATLONG value
We will now check which are the rows with NA values for LATLONG data
brazil[!complete.cases(brazil$LAT),]
## # A tibble: 9 x 81
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Baln~ SC 0 NA NA NA NA
## 2 Lago~ RS 0 NA NA NA NA
## 3 Moju~ PA 0 NA NA NA NA
## 4 Para~ MS 0 NA NA NA NA
## 5 Pesc~ SC 0 NA NA NA NA
## 6 Pinh~ RS 0 2130 2130 0 745
## 7 Pint~ RS 0 NA NA NA NA
## 8 Sant~ BA 0 9648 9648 0 2891
## 9 São ~ PE 0 NA NA NA NA
## # ... with 74 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, `IBGE_15-59` <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, `IBGE_CROP_PRODUCTION_$` <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <fct>, CATEGORIA_TUR <fct>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, ` GVA_TOTAL
## # ` <dbl>, TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>,
## # GVA_MAIN <chr>, MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>,
## # COMP_B <dbl>, COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>,
## # COMP_G <dbl>, COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>,
## # COMP_L <dbl>, COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>,
## # COMP_Q <dbl>, COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>,
## # HOTELS <dbl>, BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>,
## # Pr_Bank <dbl>, Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <fct>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>
We have identified that the cities that do not have LATLONG are:
1. Balneário Rincão
2. Lagoa Dos Patos
3. Mojuí Dos Campos
4. Paraíso Das Águas
5. Pescaria Brava
6. Pinhal Da Serra
7. Pinto Bandeira
8. Santa Terezinha
9. São Caetano
We will now fill the LATLONG values for the above cities, using https://pt.db-city.com/ to find the values.
brazil$LONG[brazil$CITY == "Balneário Rincão"] <- -49.2361
brazil$LAT[brazil$CITY == "Balneário Rincão"] <- -28.8344
brazil$LONG[brazil$CITY == "Lagoa Dos Patos"] <- --51.4725
brazil$LAT[brazil$CITY == "Lagoa Dos Patos"] <- -31.0697
brazil$LONG[brazil$CITY == "Mojuí Dos Campos"] <- -54.6431
brazil$LAT[brazil$CITY == "Mojuí Dos Campos"] <- -2.68472
brazil$LONG[brazil$CITY == "Paraíso Das Águas"] <- -53.0102
brazil$LAT[brazil$CITY == "Paraíso Das Águas"] <- -19.0257
brazil$LONG[brazil$CITY == "Pescaria Brava"] <- -48.8956
brazil$LAT[brazil$CITY == "Pescaria Brava"] <- -28.4247
brazil$LONG[brazil$CITY == "Pinhal Da Serra"] <- -51.1733
brazil$LAT[brazil$CITY == "Pinhal Da Serra"] <- -27.8747
brazil$LONG[brazil$CITY == "Pinto Bandeira"] <- -51.4503
brazil$LAT[brazil$CITY == "Pinto Bandeira"] <- -29.0978
brazil$LONG[brazil$CITY == "Santa Terezinha"] <- -39.5184
brazil$LAT[brazil$CITY == "Santa Terezinha"] <- -12.7498
brazil$LONG[brazil$CITY == "São Caetano"] <- -36.1459
brazil$LAT[brazil$CITY == "São Caetano"] <- -8.33
Comparing between cities in brazils and mun, we have identified that there are 2 CITIES which exist in brazil, but not in mun. The two cities are Santa Terezinha and São Caetano. Hence, we will remove these two cities for consistency.
brazil <- brazil%>%
filter(CITY!="Santa Terezinha") %>%
filter(CITY!="São Caetano")
Check the summary of brazil again
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5568 MG : 853 0:5541 Min. : 805
## Class :character SP : 645 1: 27 1st Qu.: 5231
## Mode :character RS : 498 Median : 10936
## BA : 417 Mean : 34296
## PR : 399 3rd Qu.: 23513
## SC : 294 Max. :11253503
## (Other):2462 NA's :7
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.00 Min. : 239 Min. : 60
## 1st Qu.: 5223 1st Qu.: 0.00 1st Qu.: 1572 1st Qu.: 874
## Median : 10934 Median : 0.00 Median : 3178 Median : 1850
## Mean : 34218 Mean : 77.56 Mean : 10308 Mean : 8864
## 3rd Qu.: 23397 3rd Qu.: 10.00 3rd Qu.: 6727 3rd Qu.: 4628
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :7 NA's :7 NA's :9 NA's :9
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3.0 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 486.8 1st Qu.: 2802 1st Qu.: 38.0 1st Qu.: 158
## Median : 931.0 Median : 6177 Median : 92.0 Median : 377
## Mean : 1462.6 Mean : 27612 Mean : 383.5 Mean : 1546
## 3rd Qu.: 1831.2 3rd Qu.: 15306 3rd Qu.: 232.0 3rd Qu.: 952
## Max. :33809.0 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :80 NA's :7 NA's :7 NA's :7
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 260 1st Qu.: 1735 1st Qu.: 341
## Median : 516 Median : 589 Median : 3842 Median : 723
## Mean : 2071 Mean : 2383 Mean : 18223 Mean : 3006
## 3rd Qu.: 1301 3rd Qu.: 1479 3rd Qu.: 9633 3rd Qu.: 1725
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :7 NA's :7 NA's :7 NA's :7
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2328 1st Qu.:1391 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2782 Median :0.6650
## Mean : 14180.2 Mean : 57389 Mean :2783 Mean :0.6592
## 3rd Qu.: 11173.2 3rd Qu.: 55594 3rd Qu.:4174 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :2 NA's :2 NA's :6 NA's :6
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.20
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.40
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. : 51.47
## NA's :6 NA's :6 NA's :6
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1.0 Min. : 3
## 1st Qu.:-22.845 1st Qu.: 169.4 1st Qu.: 88.0 1st Qu.: 118
## Median :-18.107 Median : 406.4 Median : 247.0 Median : 328
## Mean :-16.457 Mean : 894.0 Mean : 3095.8 Mean : 6570
## 3rd Qu.: -8.495 3rd Qu.: 628.8 3rd Qu.: 815.5 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668.0 Max. :5543127
## NA's :7 NA's :1 NA's :1
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 59 A : 51
## 1st Qu.: 204.44 Vale Do Contestado : 45 B : 168
## Median : 415.86 Amazônia Atlântica : 40 C : 521
## Mean : 1517.07 Araguaia-Tocantins : 39 D :1890
## 3rd Qu.: 1026.57 Cariri : 37 E : 653
## Max. :159533.33 (Other) :3063 NA's:2285
## NA's :2 NA's :2285
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:5568 Min. : 0 Min. : 1
## 1st Qu.: 5452 Class :character 1st Qu.: 4193 1st Qu.: 1725
## Median : 11591 Mode :character Median : 20432 Median : 7428
## Mean : 37447 Mean : 47285 Mean : 176050
## 3rd Qu.: 25301 3rd Qu.: 51227 3rd Qu.: 41240
## Max. :12176866 Max. :1402282 Max. :63306755
## NA's :1 NA's :2 NA's :2
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 7 Min. : 17 Min. : -14159
## 1st Qu.: 10107 1st Qu.: 17254 1st Qu.: 42223 1st Qu.: 1305
## Median : 31214 Median : 35838 Median : 119492 Median : 5108
## Mean : 489787 Mean : 123829 Mean : 833504 Mean : 118947
## 3rd Qu.: 115503 3rd Qu.: 89301 3rd Qu.: 314139 3rd Qu.: 22208
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
## NA's :2 NA's :2 NA's :2 NA's :2
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 15 Min. : 815 Min. : 3191 Length:5568
## 1st Qu.: 43706 1st Qu.: 5480 1st Qu.: 9062 Class :character
## Median : 125153 Median : 11584 Median : 15870 Mode :character
## Mean : 955185 Mean : 37018 Mean : 21132
## 3rd Qu.: 329764 3rd Qu.: 25098 3rd Qu.: 26156
## Max. :687035890 Max. :12038175 Max. :314638
## NA's :2 NA's :2 NA's :2
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 6.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.573e+07 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.748e+07 Median : 162.0 Median : 2.00 Median : 0.000
## Mean :1.044e+08 Mean : 907.3 Mean : 18.27 Mean : 1.853
## 3rd Qu.:5.678e+07 3rd Qu.: 448.8 3rd Qu.: 8.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00 Max. :274.000
## NA's :1491 NA's :2 NA's :2 NA's :2
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 11.00 Median : 0.0000 Median : 0.000 Median : 4.00
## Mean : 73.49 Mean : 0.4265 Mean : 2.031 Mean : 43.29
## 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
## NA's :2 NA's :2 NA's :2 NA's :2
## COMP_G COMP_H COMP_I COMP_J
## Min. : 1.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 75.0 Median : 7.00 Median : 7.00 Median : 1.00
## Mean : 348.2 Mean : 41.02 Mean : 55.91 Mean : 24.76
## 3rd Qu.: 199.8 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00
## Max. :150633.0 Max. :19515.00 Max. :29290.00 Max. :38720.00
## NA's :2 NA's :2 NA's :2 NA's :2
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 4.00
## Mean : 15.56 Mean : 15.15 Mean : 51.33 Mean : 83.76
## 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.00
## NA's :2 NA's :2 NA's :2 NA's :2
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00
## Median : 2.00 Median : 6.00 Median : 3.00 Median : 2.00
## Mean : 3.27 Mean : 30.98 Mean : 34.17 Mean : 12.19
## 3rd Qu.: 3.00 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00
## Max. :204.00 Max. :16030.00 Max. :22248.00 Max. :6687.00
## NA's :2 NA's :2 NA's :2 NA's :2
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.64 Mean :0 Mean : 0.05031 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :2 NA's :2 NA's :2 NA's :4681
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 257.5 Mean : 3.383 Mean : 2.829 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :4681 NA's :2226 NA's :2226 NA's :2226
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602
## Median :2.00 Median :3.231e+07 Median :1.339e+08 Median : 1440
## Mean :1.58 Mean :9.180e+09 Mean :6.005e+09 Mean : 9864
## 3rd Qu.:2.00 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4088
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2226 NA's :2226 NA's :2226 NA's :9
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 1 : 125 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 NA's:5443 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median : 2.000
## Mean : 4881 Mean : 5.756 Mean : 4.277
## 3rd Qu.: 3297 3rd Qu.: 1.000 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :130.000
## NA's :9 NA's :9 NA's :5402
## WAL-MART POST_OFFICES
## Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 1.000 Median : 1.000
## Mean : 2.059 Mean : 2.081
## 3rd Qu.: 1.750 3rd Qu.: 2.000
## Max. :26.000 Max. :225.000
## NA's :5466 NA's :119
We have also identified that there are NA values for GDP_CAPITA, and we will remove these NA values, as it is hard to assign a value to it.
brazil <- brazil%>%
filter(!is.na(`GDP_CAPITA`))
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5566 MG : 853 0:5539 Min. : 805
## Class :character SP : 645 1: 27 1st Qu.: 5231
## Mode :character RS : 497 Median : 10936
## BA : 416 Mean : 34296
## PR : 399 3rd Qu.: 23513
## SC : 294 Max. :11253503
## (Other):2462 NA's :5
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.00 Min. : 239 Min. : 60
## 1st Qu.: 5223 1st Qu.: 0.00 1st Qu.: 1572 1st Qu.: 874
## Median : 10934 Median : 0.00 Median : 3178 Median : 1850
## Mean : 34218 Mean : 77.56 Mean : 10308 Mean : 8864
## 3rd Qu.: 23397 3rd Qu.: 10.00 3rd Qu.: 6727 3rd Qu.: 4628
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :5 NA's :5 NA's :7 NA's :7
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3.0 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 486.8 1st Qu.: 2802 1st Qu.: 38.0 1st Qu.: 158
## Median : 931.0 Median : 6177 Median : 92.0 Median : 377
## Mean : 1462.6 Mean : 27612 Mean : 383.5 Mean : 1546
## 3rd Qu.: 1831.2 3rd Qu.: 15306 3rd Qu.: 232.0 3rd Qu.: 952
## Max. :33809.0 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :78 NA's :5 NA's :5 NA's :5
## IBGE_5-9 IBGE_10-14 IBGE_15-59 IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 260 1st Qu.: 1735 1st Qu.: 341
## Median : 516 Median : 589 Median : 3842 Median : 723
## Mean : 2071 Mean : 2383 Mean : 18223 Mean : 3006
## 3rd Qu.: 1301 3rd Qu.: 1479 3rd Qu.: 9633 3rd Qu.: 1725
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :5 NA's :5 NA's :5 NA's :5
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION_$ IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2328 1st Qu.:1391 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2782 Median :0.6650
## Mean : 14180.2 Mean : 57389 Mean :2782 Mean :0.6592
## 3rd Qu.: 11173.2 3rd Qu.: 55594 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :5 NA's :5
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.53
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.22
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. : 51.47
## NA's :5 NA's :5 NA's :5
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1.0 Min. : 3
## 1st Qu.:-22.843 1st Qu.: 169.4 1st Qu.: 88.0 1st Qu.: 118
## Median :-18.107 Median : 406.5 Median : 247.0 Median : 328
## Mean :-16.455 Mean : 894.2 Mean : 3096.3 Mean : 6572
## 3rd Qu.: -8.491 3rd Qu.: 628.9 3rd Qu.: 815.8 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668.0 Max. :5543127
## NA's :6
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 59 A : 51
## 1st Qu.: 204.43 Vale Do Contestado : 45 B : 168
## Median : 415.81 Amazônia Atlântica : 40 C : 521
## Mean : 1515.52 Araguaia-Tocantins : 39 D :1889
## 3rd Qu.: 1026.38 Cariri : 37 E : 653
## Max. :159533.33 (Other) :3062 NA's:2284
## NA's :1 NA's :2284
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:5566 Min. : 0 Min. : 1
## 1st Qu.: 5451 Class :character 1st Qu.: 4193 1st Qu.: 1725
## Median : 11591 Mode :character Median : 20432 Median : 7428
## Mean : 37452 Mean : 47285 Mean : 176050
## 3rd Qu.: 25303 3rd Qu.: 51227 3rd Qu.: 41240
## Max. :12176866 Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 7 Min. : 17 Min. : -14159
## 1st Qu.: 10107 1st Qu.: 17254 1st Qu.: 42223 1st Qu.: 1305
## Median : 31214 Median : 35838 Median : 119492 Median : 5108
## Mean : 489787 Mean : 123829 Mean : 833504 Mean : 118947
## 3rd Qu.: 115503 3rd Qu.: 89301 3rd Qu.: 314139 3rd Qu.: 22208
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
##
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 15 Min. : 815 Min. : 3191 Length:5566
## 1st Qu.: 43706 1st Qu.: 5480 1st Qu.: 9062 Class :character
## Median : 125153 Median : 11584 Median : 15870 Mode :character
## Mean : 955185 Mean : 37018 Mean : 21132
## 3rd Qu.: 329764 3rd Qu.: 25098 3rd Qu.: 26156
## Max. :687035890 Max. :12038175 Max. :314638
##
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 6.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.573e+07 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.749e+07 Median : 162.0 Median : 2.00 Median : 0.000
## Mean :1.044e+08 Mean : 907.3 Mean : 18.27 Mean : 1.853
## 3rd Qu.:5.679e+07 3rd Qu.: 448.8 3rd Qu.: 8.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00 Max. :274.000
## NA's :1490
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 11.00 Median : 0.0000 Median : 0.000 Median : 4.00
## Mean : 73.49 Mean : 0.4265 Mean : 2.031 Mean : 43.29
## 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
##
## COMP_G COMP_H COMP_I COMP_J
## Min. : 1.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 75.0 Median : 7.00 Median : 7.00 Median : 1.00
## Mean : 348.2 Mean : 41.02 Mean : 55.91 Mean : 24.76
## 3rd Qu.: 199.8 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00
## Max. :150633.0 Max. :19515.00 Max. :29290.00 Max. :38720.00
##
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 4.00
## Mean : 15.56 Mean : 15.15 Mean : 51.33 Mean : 83.76
## 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.00
##
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00
## Median : 2.00 Median : 6.00 Median : 3.00 Median : 2.00
## Mean : 3.27 Mean : 30.98 Mean : 34.17 Mean : 12.19
## 3rd Qu.: 3.00 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00
## Max. :204.00 Max. :16030.00 Max. :22248.00 Max. :6687.00
##
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.64 Mean :0 Mean : 0.05031 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :4679
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 257.5 Mean : 3.383 Mean : 2.829 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :4679 NA's :2224 NA's :2224 NA's :2224
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602
## Median :2.00 Median :3.231e+07 Median :1.339e+08 Median : 1440
## Mean :1.58 Mean :9.180e+09 Mean :6.005e+09 Mean : 9866
## 3rd Qu.:2.00 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4089
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2224 NA's :2224 NA's :2224 NA's :8
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 1 : 125 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 NA's:5441 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median : 2.000
## Mean : 4881 Mean : 5.757 Mean : 4.277
## 3rd Qu.: 3298 3rd Qu.: 1.000 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :130.000
## NA's :8 NA's :8 NA's :5400
## WAL-MART POST_OFFICES
## Min. : 1.000 Min. : 1.000
## 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 1.000 Median : 1.000
## Mean : 2.059 Mean : 2.081
## 3rd Qu.: 1.750 3rd Qu.: 2.000
## Max. :26.000 Max. :225.000
## NA's :5464 NA's :117
There are now no more NA values for GDP_CAPITA
To help us better understand which variables are needed in calculating GDP_CAPITA, and why we need these variables ### 5.1 EDA using statistical graphics We can plot the distribution of GDP_CAPITA by using histograms
ggplot(data=brazil, aes(x=`GDP_CAPITA`))+
geom_histogram(bins=20, color="black", fill="light blue")
The histogram above shows a right skewed distribution. This suggests that more municipalities have relatively lower GDP per CAPITA. We will normalise the skewed distribution by using log transformation
brazil <- brazil%>%
mutate(`LOG_GDP_CAPITA`=log(GDP_CAPITA))
Then, plot the LOG_GDP_CAPITA histogram
ggplot(data=brazil, aes(x=`LOG_GDP_CAPITA`))+
geom_histogram(bins=20, color="black", fill="light blue")
The histogram now is less skewed, and in fact, resembles a normal distribution.
Next, we will rename the variables as some original names are quite long / difficult to use. This ensures simplicity for us.
names(brazil)[names(brazil) == "IBGE_CROP_PRODUCTION_$"] <- "IBGE_CROP_PRODUCTION"
names(brazil)[names(brazil) == " GVA_TOTAL "] <- "GVA_TOTAL"
names(brazil)[names(brazil) == "IBGE_15-59"] <- "Active_pop"
We draw a boxplot to illustrate the distribution over the different states in brazil
brazil_states <- brazil%>%
group_by(STATE)
ggplot(data=brazil_states, mapping=aes(x=STATE, y=GDP_CAPITA)) + geom_boxplot()+
ggtitle("Distribution of GDP per CAPITA across states in brazil")
There are a few outliners for some states (BA, MS), but with some countries very well spread out, such as AC, RR.
We will filter out the top 10 countries with the highest GDP_CAPITA, and identify if the variables that might have possibly resulted in the high GDP value
brazil_gdp_capita <- brazil%>%
arrange(desc(GDP_CAPITA))%>%
top_n(n=10, wt=GDP_CAPITA)
head(brazil_gdp_capita)
## # A tibble: 6 x 82
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BR~ IBGE_RES_POP_ES~ IBGE_DU
## <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 Paul~ SP 0 82146 81967 179 24311
## 2 Selv~ MS 0 6287 6287 0 2003
## 3 São ~ BA 0 33183 33183 0 9503
## 4 Triu~ RS 0 25793 25787 6 8635
## 5 Brej~ SP 0 2573 2565 8 822
## 6 Seba~ SP 0 3031 3031 0 1055
## # ... with 75 more variables: IBGE_DU_URBAN <dbl>, IBGE_DU_RURAL <dbl>,
## # IBGE_POP <dbl>, IBGE_1 <dbl>, `IBGE_1-4` <dbl>, `IBGE_5-9` <dbl>,
## # `IBGE_10-14` <dbl>, Active_pop <dbl>, `IBGE_60+` <dbl>,
## # IBGE_PLANTED_AREA <dbl>, IBGE_CROP_PRODUCTION <dbl>, `IDHM Ranking
## # 2010` <dbl>, IDHM <dbl>, IDHM_Renda <dbl>, IDHM_Longevidade <dbl>,
## # IDHM_Educacao <dbl>, LONG <dbl>, LAT <dbl>, ALT <dbl>, PAY_TV <dbl>,
## # FIXED_PHONES <dbl>, AREA <dbl>, REGIAO_TUR <fct>, CATEGORIA_TUR <fct>,
## # ESTIMATED_POP <dbl>, RURAL_URBAN <chr>, GVA_AGROPEC <dbl>,
## # GVA_INDUSTRY <dbl>, GVA_SERVICES <dbl>, GVA_PUBLIC <dbl>, GVA_TOTAL <dbl>,
## # TAXES <dbl>, GDP <dbl>, POP_GDP <dbl>, GDP_CAPITA <dbl>, GVA_MAIN <chr>,
## # MUN_EXPENDIT <dbl>, COMP_TOT <dbl>, COMP_A <dbl>, COMP_B <dbl>,
## # COMP_C <dbl>, COMP_D <dbl>, COMP_E <dbl>, COMP_F <dbl>, COMP_G <dbl>,
## # COMP_H <dbl>, COMP_I <dbl>, COMP_J <dbl>, COMP_K <dbl>, COMP_L <dbl>,
## # COMP_M <dbl>, COMP_N <dbl>, COMP_O <dbl>, COMP_P <dbl>, COMP_Q <dbl>,
## # COMP_R <dbl>, COMP_S <dbl>, COMP_T <dbl>, COMP_U <dbl>, HOTELS <dbl>,
## # BEDS <dbl>, Pr_Agencies <dbl>, Pu_Agencies <dbl>, Pr_Bank <dbl>,
## # Pu_Bank <dbl>, Pr_Assets <dbl>, Pu_Assets <dbl>, Cars <dbl>,
## # Motorcycles <dbl>, Wheeled_tractor <dbl>, UBER <fct>, MAC <dbl>,
## # `WAL-MART` <dbl>, POST_OFFICES <dbl>, LOG_GDP_CAPITA <dbl>
We are able to identify several variables that are high in value in these countries that have high GDP_CAPITA
For example, IBGE_RES_POP, Active_pop, IBGE_CROP_PRODUCTION, IDHM, PAY_TV, FIXED_PHONES, AREA, GVA_TOTAL, TAXES, GDP, POP_GDP, MUN_EXPENDIT, COMP_TOT, Cars, Motorcycles –> shows correlation between the two variables.
### 5.4 Variables selection Besides the variables that affect GDP_CAPITA in the analysis above, we will consider also the GDP equation, which is C + I + G + (X-M). Simply put, it is the sum of Consumption, Investment, Government expenditures, and Net Exports. We will then identify variables that might have an impact on these components as mentioned, before determining which factors to use.
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5566 MG : 853 0:5539 Min. : 805
## Class :character SP : 645 1: 27 1st Qu.: 5231
## Mode :character RS : 497 Median : 10936
## BA : 416 Mean : 34296
## PR : 399 3rd Qu.: 23513
## SC : 294 Max. :11253503
## (Other):2462 NA's :5
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.00 Min. : 239 Min. : 60
## 1st Qu.: 5223 1st Qu.: 0.00 1st Qu.: 1572 1st Qu.: 874
## Median : 10934 Median : 0.00 Median : 3178 Median : 1850
## Mean : 34218 Mean : 77.56 Mean : 10308 Mean : 8864
## 3rd Qu.: 23397 3rd Qu.: 10.00 3rd Qu.: 6727 3rd Qu.: 4628
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :5 NA's :5 NA's :7 NA's :7
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3.0 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 486.8 1st Qu.: 2802 1st Qu.: 38.0 1st Qu.: 158
## Median : 931.0 Median : 6177 Median : 92.0 Median : 377
## Mean : 1462.6 Mean : 27612 Mean : 383.5 Mean : 1546
## 3rd Qu.: 1831.2 3rd Qu.: 15306 3rd Qu.: 232.0 3rd Qu.: 952
## Max. :33809.0 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :78 NA's :5 NA's :5 NA's :5
## IBGE_5-9 IBGE_10-14 Active_pop IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 260 1st Qu.: 1735 1st Qu.: 341
## Median : 516 Median : 589 Median : 3842 Median : 723
## Mean : 2071 Mean : 2383 Mean : 18223 Mean : 3006
## 3rd Qu.: 1301 3rd Qu.: 1479 3rd Qu.: 9633 3rd Qu.: 1725
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
## NA's :5 NA's :5 NA's :5 NA's :5
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM Ranking 2010 IDHM
## Min. : 0.0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 910.2 1st Qu.: 2328 1st Qu.:1391 1st Qu.:0.5990
## Median : 3471.5 Median : 13846 Median :2782 Median :0.6650
## Mean : 14180.2 Mean : 57389 Mean :2782 Mean :0.6592
## 3rd Qu.: 11173.2 3rd Qu.: 55594 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :1205669.0 Max. :3274885 Max. :5565 Max. :0.8620
## NA's :5 NA's :5
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.53
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.22
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. : 51.47
## NA's :5 NA's :5 NA's :5
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1.0 Min. : 3
## 1st Qu.:-22.843 1st Qu.: 169.4 1st Qu.: 88.0 1st Qu.: 118
## Median :-18.107 Median : 406.5 Median : 247.0 Median : 328
## Mean :-16.455 Mean : 894.2 Mean : 3096.3 Mean : 6572
## 3rd Qu.: -8.491 3rd Qu.: 628.9 3rd Qu.: 815.8 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668.0 Max. :5543127
## NA's :6
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 59 A : 51
## 1st Qu.: 204.43 Vale Do Contestado : 45 B : 168
## Median : 415.81 Amazônia Atlântica : 40 C : 521
## Mean : 1515.52 Araguaia-Tocantins : 39 D :1889
## 3rd Qu.: 1026.38 Cariri : 37 E : 653
## Max. :159533.33 (Other) :3062 NA's:2284
## NA's :1 NA's :2284
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:5566 Min. : 0 Min. : 1
## 1st Qu.: 5451 Class :character 1st Qu.: 4193 1st Qu.: 1725
## Median : 11591 Mode :character Median : 20432 Median : 7428
## Mean : 37452 Mean : 47285 Mean : 176050
## 3rd Qu.: 25303 3rd Qu.: 51227 3rd Qu.: 41240
## Max. :12176866 Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 7 Min. : 17 Min. : -14159
## 1st Qu.: 10107 1st Qu.: 17254 1st Qu.: 42223 1st Qu.: 1305
## Median : 31214 Median : 35838 Median : 119492 Median : 5108
## Mean : 489787 Mean : 123829 Mean : 833504 Mean : 118947
## 3rd Qu.: 115503 3rd Qu.: 89301 3rd Qu.: 314139 3rd Qu.: 22208
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
##
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 15 Min. : 815 Min. : 3191 Length:5566
## 1st Qu.: 43706 1st Qu.: 5480 1st Qu.: 9062 Class :character
## Median : 125153 Median : 11584 Median : 15870 Mode :character
## Mean : 955185 Mean : 37018 Mean : 21132
## 3rd Qu.: 329764 3rd Qu.: 25098 3rd Qu.: 26156
## Max. :687035890 Max. :12038175 Max. :314638
##
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 6.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.573e+07 1st Qu.: 68.0 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.749e+07 Median : 162.0 Median : 2.00 Median : 0.000
## Mean :1.044e+08 Mean : 907.3 Mean : 18.27 Mean : 1.853
## 3rd Qu.:5.679e+07 3rd Qu.: 448.8 3rd Qu.: 8.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00 Max. :274.000
## NA's :1490
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 11.00 Median : 0.0000 Median : 0.000 Median : 4.00
## Mean : 73.49 Mean : 0.4265 Mean : 2.031 Mean : 43.29
## 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
##
## COMP_G COMP_H COMP_I COMP_J
## Min. : 1.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 75.0 Median : 7.00 Median : 7.00 Median : 1.00
## Mean : 348.2 Mean : 41.02 Mean : 55.91 Mean : 24.76
## 3rd Qu.: 199.8 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00
## Max. :150633.0 Max. :19515.00 Max. :29290.00 Max. :38720.00
##
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 4.00
## Mean : 15.56 Mean : 15.15 Mean : 51.33 Mean : 83.76
## 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.00
##
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 2.00 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.00
## Median : 2.00 Median : 6.00 Median : 3.00 Median : 2.00
## Mean : 3.27 Mean : 30.98 Mean : 34.17 Mean : 12.19
## 3rd Qu.: 3.00 3rd Qu.: 17.00 3rd Qu.: 12.00 3rd Qu.: 6.00
## Max. :204.00 Max. :16030.00 Max. :22248.00 Max. :6687.00
##
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.64 Mean :0 Mean : 0.05031 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :4679
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 257.5 Mean : 3.383 Mean : 2.829 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :4679 NA's :2224 NA's :2224 NA's :2224
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.047e+07 1st Qu.: 602
## Median :2.00 Median :3.231e+07 Median :1.339e+08 Median : 1440
## Mean :1.58 Mean :9.180e+09 Mean :6.005e+09 Mean : 9866
## 3rd Qu.:2.00 3rd Qu.:1.148e+08 3rd Qu.:4.970e+08 3rd Qu.: 4089
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2224 NA's :2224 NA's :2224 NA's :8
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 1 : 125 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 NA's:5441 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median : 2.000
## Mean : 4881 Mean : 5.757 Mean : 4.277
## 3rd Qu.: 3298 3rd Qu.: 1.000 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :130.000
## NA's :8 NA's :8 NA's :5400
## WAL-MART POST_OFFICES LOG_GDP_CAPITA
## Min. : 1.000 Min. : 1.000 Min. : 8.068
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 9.112
## Median : 1.000 Median : 1.000 Median : 9.672
## Mean : 2.059 Mean : 2.081 Mean : 9.697
## 3rd Qu.: 1.750 3rd Qu.: 2.000 3rd Qu.:10.172
## Max. :26.000 Max. :225.000 Max. :12.659
## NA's :5464 NA's :117
Raw Variables that we will consider using:
IBGE_RES_POP
Active_pop
IBGE_CROP_PRODUCTION
IDHM_Longevidade
IDHM_Educacao
IDHM_Renda
PAY_TV
FIXED_PHONES
AREA
GVA_AGROPEC
GVA_INDUSTRY
GVA_SERVICES
TAXES
GDP
POP_GDP
GDP_CAPITA
MUN_EXPENDIT
COMP_TOT
Cars
Motorcycles
Pr_Assets
Pu_Assets
We choose IBGE_RES_POP since it is the population of Brazil, which includes residents and foreigners both.
IBGE_RES_POP has 8 NA values, which could be due to the particular city not having any population. We will remove these NA values as it does not have any impact on GDP_CAPITA. It is also hard to assign a value to these
We have identified that after removing these 8 NAs, the amount of NAs in other variables have decreased too. This could be due to cities having multiple NA values for different variables
brazil <- brazil%>%
filter(!is.na(`IBGE_RES_POP`))
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:5561 MG : 853 0:5534 Min. : 805
## Class :character SP : 645 1: 27 1st Qu.: 5231
## Mode :character RS : 496 Median : 10936
## BA : 416 Mean : 34296
## PR : 399 3rd Qu.: 23513
## SC : 292 Max. :11253503
## (Other):2460
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 805 Min. : 0.00 Min. : 239 Min. : 60
## 1st Qu.: 5223 1st Qu.: 0.00 1st Qu.: 1572 1st Qu.: 874
## Median : 10934 Median : 0.00 Median : 3178 Median : 1850
## Mean : 34218 Mean : 77.56 Mean : 10308 Mean : 8864
## 3rd Qu.: 23397 3rd Qu.: 10.00 3rd Qu.: 6727 3rd Qu.: 4628
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :2 NA's :2
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3.0 Min. : 174 Min. : 0.0 Min. : 5
## 1st Qu.: 486.8 1st Qu.: 2802 1st Qu.: 38.0 1st Qu.: 158
## Median : 931.0 Median : 6177 Median : 92.0 Median : 377
## Mean : 1462.6 Mean : 27612 Mean : 383.5 Mean : 1546
## 3rd Qu.: 1831.2 3rd Qu.: 15306 3rd Qu.: 232.0 3rd Qu.: 952
## Max. :33809.0 Max. :10463636 Max. :129464.0 Max. :514794
## NA's :73
## IBGE_5-9 IBGE_10-14 Active_pop IBGE_60+
## Min. : 7 Min. : 12 Min. : 94 Min. : 29
## 1st Qu.: 220 1st Qu.: 260 1st Qu.: 1735 1st Qu.: 341
## Median : 516 Median : 589 Median : 3842 Median : 723
## Mean : 2071 Mean : 2383 Mean : 18223 Mean : 3006
## 3rd Qu.: 1301 3rd Qu.: 1479 3rd Qu.: 9633 3rd Qu.: 1725
## Max. :684443 Max. :783702 Max. :7058221 Max. :1293012
##
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM Ranking 2010 IDHM
## Min. : 0 Min. : 0 Min. : 1 Min. :0.4180
## 1st Qu.: 911 1st Qu.: 2333 1st Qu.:1391 1st Qu.:0.5990
## Median : 3473 Median : 13845 Median :2782 Median :0.6650
## Mean : 14171 Mean : 57356 Mean :2782 Mean :0.6592
## 3rd Qu.: 11171 3rd Qu.: 55490 3rd Qu.:4173 3rd Qu.:0.7180
## Max. :1205669 Max. :3274885 Max. :5565 Max. :0.8620
##
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4000 Min. :0.6720 Min. :0.2070 Min. :-72.92
## 1st Qu.:0.5720 1st Qu.:0.7690 1st Qu.:0.4900 1st Qu.:-50.87
## Median :0.6540 Median :0.8080 Median :0.5600 Median :-46.52
## Mean :0.6429 Mean :0.8016 Mean :0.5591 Mean :-46.21
## 3rd Qu.:0.7070 3rd Qu.:0.8360 3rd Qu.:0.6310 3rd Qu.:-41.41
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. : 51.47
##
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1 Min. : 3
## 1st Qu.:-22.841 1st Qu.: 169.4 1st Qu.: 88 1st Qu.: 118
## Median :-18.097 Median : 406.5 Median : 247 Median : 328
## Mean :-16.450 Mean : 894.2 Mean : 3099 Mean : 6577
## 3rd Qu.: -8.490 3rd Qu.: 628.9 3rd Qu.: 816 3rd Qu.: 1151
## Max. : 4.585 Max. :874579.0 Max. :2047668 Max. :5543127
## NA's :1
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 59 A : 51
## 1st Qu.: 204.54 Vale Do Contestado : 45 B : 168
## Median : 415.86 Amazônia Atlântica : 40 C : 521
## Mean : 1515.03 Araguaia-Tocantins : 39 D :1889
## 3rd Qu.: 1025.73 Cariri : 37 E : 648
## Max. :159533.33 (Other) :3057 NA's:2284
## NA's :1 NA's :2284
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:5561 Min. : 0 Min. : 1
## 1st Qu.: 5450 Class :character 1st Qu.: 4193 1st Qu.: 1724
## Median : 11591 Mode :character Median : 20434 Median : 7432
## Mean : 37477 Mean : 47277 Mean : 176171
## 3rd Qu.: 25311 3rd Qu.: 51238 3rd Qu.: 41026
## Max. :12176866 Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 7 Min. : 17 Min. : -14159
## 1st Qu.: 10112 1st Qu.: 17252 1st Qu.: 42253 1st Qu.: 1303
## Median : 31216 Median : 35747 Median : 119481 Median : 5108
## Mean : 490191 Mean : 123904 Mean : 834110 Mean : 119046
## 3rd Qu.: 115644 3rd Qu.: 89363 3rd Qu.: 314190 3rd Qu.: 22251
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
##
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 15 Min. : 815 Min. : 3191 Length:5561
## 1st Qu.: 43645 1st Qu.: 5481 1st Qu.: 9062 Class :character
## Median : 125111 Median : 11584 Median : 15866 Mode :character
## Mean : 955869 Mean : 37043 Mean : 21125
## 3rd Qu.: 329780 3rd Qu.: 25114 3rd Qu.: 26157
## Max. :687035890 Max. :12038175 Max. :314638
##
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 6 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.573e+07 1st Qu.: 68 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.749e+07 Median : 163 Median : 2.00 Median : 0.000
## Mean :1.045e+08 Mean : 908 Mean : 18.28 Mean : 1.854
## 3rd Qu.:5.681e+07 3rd Qu.: 450 3rd Qu.: 8.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446 Max. :1948.00 Max. :274.000
## NA's :1489
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 3.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 11.00 Median : 0.0000 Median : 0.000 Median : 4.00
## Mean : 73.55 Mean : 0.4267 Mean : 2.032 Mean : 43.31
## 3rd Qu.: 39.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 15.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
##
## COMP_G COMP_H COMP_I COMP_J
## Min. : 1.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 32.0 1st Qu.: 1.00 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 75.0 Median : 7.00 Median : 7.00 Median : 1.00
## Mean : 348.5 Mean : 41.05 Mean : 55.96 Mean : 24.78
## 3rd Qu.: 200.0 3rd Qu.: 25.00 3rd Qu.: 24.00 3rd Qu.: 5.00
## Max. :150633.0 Max. :19515.00 Max. :29290.00 Max. :38720.00
##
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 1.00
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 4.00
## Mean : 15.58 Mean : 15.16 Mean : 51.37 Mean : 83.82
## 3rd Qu.: 2.00 3rd Qu.: 3.00 3rd Qu.: 13.00 3rd Qu.: 14.00
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.00
##
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 1.000 Min. : 0 Min. : 0.0 Min. : 0.0
## 1st Qu.: 2.000 1st Qu.: 2 1st Qu.: 1.0 1st Qu.: 0.0
## Median : 2.000 Median : 6 Median : 3.0 Median : 2.0
## Mean : 3.272 Mean : 31 Mean : 34.2 Mean : 12.2
## 3rd Qu.: 3.000 3rd Qu.: 17 3rd Qu.: 12.0 3rd Qu.: 6.0
## Max. :204.000 Max. :16030 Max. :22248.0 Max. :6687.0
##
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 5.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 12.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 51.68 Mean :0 Mean : 0.05035 Mean : 3.131
## 3rd Qu.: 31.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :123.00000 Max. :97.000
## NA's :4674
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 2.0 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 40.0 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 0.000
## Median : 82.0 Median : 1.000 Median : 2.00 Median : 1.000
## Mean : 257.5 Mean : 3.384 Mean : 2.83 Mean : 1.312
## 3rd Qu.: 200.0 3rd Qu.: 2.000 3rd Qu.: 2.00 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.00 Max. :83.000
## NA's :4674 NA's :2220 NA's :2220 NA's :2220
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.00 Min. :0.000e+00 Min. :0.000e+00 Min. : 2
## 1st Qu.:1.00 1st Qu.:0.000e+00 1st Qu.:4.048e+07 1st Qu.: 602
## Median :2.00 Median :3.234e+07 Median :1.339e+08 Median : 1440
## Mean :1.58 Mean :9.183e+09 Mean :6.007e+09 Mean : 9873
## 3rd Qu.:2.00 3rd Qu.:1.149e+08 3rd Qu.:4.976e+08 3rd Qu.: 4095
## Max. :8.00 Max. :1.947e+13 Max. :8.016e+12 Max. :5740995
## NA's :2220 NA's :2220 NA's :2220 NA's :8
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 4 Min. : 0.000 1 : 125 Min. : 1.000
## 1st Qu.: 591 1st Qu.: 0.000 NA's:5436 1st Qu.: 1.000
## Median : 1285 Median : 0.000 Median : 2.000
## Mean : 4885 Mean : 5.761 Mean : 4.277
## 3rd Qu.: 3299 3rd Qu.: 1.000 3rd Qu.: 3.000
## Max. :1134570 Max. :3236.000 Max. :130.000
## NA's :8 NA's :8 NA's :5395
## WAL-MART POST_OFFICES LOG_GDP_CAPITA
## Min. : 1.000 Min. : 1.000 Min. : 8.068
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 9.112
## Median : 1.000 Median : 1.000 Median : 9.672
## Mean : 2.059 Mean : 2.081 Mean : 9.697
## 3rd Qu.: 1.750 3rd Qu.: 2.000 3rd Qu.:10.172
## Max. :26.000 Max. :225.000 Max. :12.659
## NA's :5459 NA's :117
IBGE_CROP_PRODUCTION is chosen since it represents the earnings from the production of crops, which these earnings will contribute to the economy. It is also reported online that agriculture is one of the principal bases of Brazil’s economy.
##### IDHM_Longevidade IDHM is the Human Development Index, which is calculated from IDHM_Longevidade, IDHM_Educacao, and IDHM_Renda.
The Human Development Index might contribute to the economy as a higher HDI value suggests that life expectancy (IDHM_Longevidade) is longer, IDHM_Educacao (Education level) is higher, which will raise the nation’s Gross National Income. This could be due to more people being able to work and contribute to the economy.
We will extract these three variables individually and conduct separate analysis on each three.
There is 1 NA value in IDHM_Longevidade. As it is hard to assign a value to this NA, we will drop this NA value
brazil <- brazil%>%
filter(!is.na(IDHM_Longevidade))
PAY_TV and FIXED_PHONES are selected as these as luxury goods, and only a better off person can afford it. A municipality with higher number of PAY_TV and/or FIXED_PHONES may suggest a higher GDP, although it could be due to other factors. We will just include PAY_TV and FIXED_PHONES into our analysis first.
There are no NA values inside PAY_TV and FIXED_PHONES
A municipality with a bigger area might have more space for economic development, and could result in a change in GDP too.
There is 1 NA value for AREA, and this could be due to the city not being well developed enough for AREA to be calculated. However, it is hard to assign a value to this NA, therefore we will drop this NA value.
brazil <- brazil%>%
filter(!is.na(`AREA`))
A municipality with higher MUN_EXPENDIT probably suggests higher expenditure, which might be due to individuals being better off. Expenditure might affect GDP.
There are 1489 NA values in MUN_EXPENDIT, and we assume that these municipalities do not have any expenditures. We will drop the NA values as it is difficult to assign a value to it.
brazil <- brazil%>%
filter(!is.na(MUN_EXPENDIT))
Owning a vehicle such as cars or motorcycles is part of private consumption, and it might have an impact on GDP.
There are 8 NA values for Cars, and could be due to the cities not having any Cars, but we are unable to correctly assign a value to these cities, hence, we will drop the NA values
brazil <- brazil%>%
filter(!is.na(`Cars`))
There are 9 NA values for Motorcycles, and could be due to the cities not having any motorcycles, but we are unable to correctly assign a value to these cities, hence, we will drop the NA values
brazil <- brazil%>%
filter(!is.na(`Motorcycles`))
summary(brazil)
## CITY STATE CAPITAL IBGE_RES_POP
## Length:4066 MG : 615 0:4041 Min. : 815
## Class :character SP : 542 1: 25 1st Qu.: 5156
## Mode :character RS : 468 Median : 11031
## PR : 332 Mean : 37587
## BA : 250 3rd Qu.: 23922
## SC : 249 Max. :11253503
## (Other):1610
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN
## Min. : 815 Min. : 0.00 Min. : 290 Min. : 60
## 1st Qu.: 5150 1st Qu.: 0.00 1st Qu.: 1574 1st Qu.: 865
## Median : 11017 Median : 0.00 Median : 3241 Median : 1924
## Mean : 37491 Mean : 96.74 Mean : 11420 Mean : 10018
## 3rd Qu.: 23904 3rd Qu.: 11.00 3rd Qu.: 6948 3rd Qu.: 5050
## Max. :11133776 Max. :119727.00 Max. :3576148 Max. :3548433
## NA's :1 NA's :1
## IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1-4
## Min. : 3 Min. : 174 Min. : 0.0 Min. : 5.0
## 1st Qu.: 477 1st Qu.: 2722 1st Qu.: 37.0 1st Qu.: 150.2
## Median : 915 Median : 6308 Median : 92.0 Median : 375.0
## Mean : 1421 Mean : 30805 Mean : 420.8 Mean : 1691.2
## 3rd Qu.: 1748 3rd Qu.: 16381 3rd Qu.: 240.0 3rd Qu.: 979.0
## Max. :33809 Max. :10463636 Max. :129464.0 Max. :514794.0
## NA's :55
## IBGE_5-9 IBGE_10-14 Active_pop IBGE_60+
## Min. : 7.0 Min. : 12 Min. : 94 Min. : 36.0
## 1st Qu.: 212.0 1st Qu.: 250 1st Qu.: 1702 1st Qu.: 339.2
## Median : 514.5 Median : 592 Median : 3941 Median : 754.0
## Mean : 2262.5 Mean : 2613 Mean : 20404 Mean : 3413.6
## 3rd Qu.: 1334.8 3rd Qu.: 1536 3rd Qu.: 10388 3rd Qu.: 1864.2
## Max. :684443.0 Max. :783702 Max. :7058221 Max. :1293012.0
##
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM Ranking 2010 IDHM
## Min. : 0 Min. : 0 Min. : 1 Min. :0.4400
## 1st Qu.: 1037 1st Qu.: 2762 1st Qu.:1203 1st Qu.:0.6100
## Median : 4038 Median : 16585 Median :2472 Median :0.6800
## Mean : 15485 Mean : 61425 Mean :2576 Mean :0.6686
## 3rd Qu.: 12625 3rd Qu.: 60599 3rd Qu.:3904 3rd Qu.:0.7230
## Max. :1205669 Max. :3274885 Max. :5564 Max. :0.8620
##
## IDHM_Renda IDHM_Longevidade IDHM_Educacao LONG
## Min. :0.4170 Min. :0.6720 Min. :0.2660 Min. :-72.92
## 1st Qu.:0.5830 1st Qu.:0.7750 1st Qu.:0.5030 1st Qu.:-51.40
## Median :0.6680 Median :0.8130 Median :0.5730 Median :-47.40
## Mean :0.6533 Mean :0.8062 Mean :0.5704 Mean :-46.65
## 3rd Qu.:0.7150 3rd Qu.:0.8400 3rd Qu.:0.6390 3rd Qu.:-41.78
## Max. :0.8910 Max. :0.8940 Max. :0.8250 Max. : 51.47
##
## LAT ALT PAY_TV FIXED_PHONES
## Min. :-33.688 Min. : 0.0 Min. : 1.0 Min. : 4
## 1st Qu.:-23.640 1st Qu.: 195.3 1st Qu.: 90.0 1st Qu.: 137
## Median :-19.806 Median : 430.9 Median : 260.0 Median : 375
## Mean :-17.545 Mean : 1022.2 Mean : 3641.6 Mean : 7899
## 3rd Qu.: -9.681 3rd Qu.: 641.5 3rd Qu.: 880.5 3rd Qu.: 1430
## Max. : 3.350 Max. :874579.0 Max. :2047668.0 Max. :5543127
## NA's :1
## AREA REGIAO_TUR CATEGORIA_TUR
## Min. : 3.57 Corredores Das Águas: 49 A : 46
## 1st Qu.: 196.56 Vale Do Contestado : 37 B : 140
## Median : 401.38 Cariri : 30 C : 414
## Mean : 1320.71 Rota Do Yucumã : 30 D :1383
## 3rd Qu.: 943.82 Trilhas Do Rio Doce : 29 E : 485
## Max. :122461.09 (Other) :2293 NA's:1598
## NA's :1598
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## Min. : 786 Length:4066 Min. : 0 Min. : 1
## 1st Qu.: 5354 Class :character 1st Qu.: 4835 1st Qu.: 1880
## Median : 11661 Mode :character Median : 22184 Median : 8559
## Mean : 41050 Mean : 49549 Mean : 204934
## 3rd Qu.: 25854 3rd Qu.: 54289 3rd Qu.: 50500
## Max. :12176866 Max. :1402282 Max. :63306755
##
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES
## Min. : 2 Min. : 9 Min. : 17 Min. : -235
## 1st Qu.: 10570 1st Qu.: 17205 1st Qu.: 45494 1st Qu.: 1468
## Median : 34758 Median : 36062 Median : 126260 Median : 6116
## Mean : 581146 Mean : 135527 Mean : 952261 Mean : 141943
## 3rd Qu.: 133709 3rd Qu.: 92256 3rd Qu.: 342166 3rd Qu.: 26746
## Max. :464656988 Max. :41902893 Max. :569910503 Max. :117125387
##
## GDP POP_GDP GDP_CAPITA GVA_MAIN
## Min. : 18 Min. : 815 Min. : 4586 Length:4066
## 1st Qu.: 46564 1st Qu.: 5380 1st Qu.: 9706 Class :character
## Median : 133937 Median : 11636 Median : 17614 Mode :character
## Mean : 1107127 Mean : 40570 Mean : 22469
## 3rd Qu.: 364308 3rd Qu.: 25550 3rd Qu.: 28018
## Max. :687035890 Max. :12038175 Max. :314638
##
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B
## Min. :1.421e+06 Min. : 8.0 Min. : 0.00 Min. : 0.000
## 1st Qu.:1.574e+07 1st Qu.: 76.0 1st Qu.: 1.00 1st Qu.: 0.000
## Median :2.749e+07 Median : 184.0 Median : 3.00 Median : 0.000
## Mean :1.046e+08 Mean : 1063.5 Mean : 20.35 Mean : 1.933
## 3rd Qu.:5.680e+07 3rd Qu.: 519.8 3rd Qu.: 10.00 3rd Qu.: 2.000
## Max. :4.577e+10 Max. :530446.0 Max. :1948.00 Max. :274.000
##
## COMP_C COMP_D COMP_E COMP_F
## Min. : 0.00 Min. : 0.0000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 4.00 1st Qu.: 0.0000 1st Qu.: 0.000 1st Qu.: 1.00
## Median : 14.00 Median : 0.0000 Median : 0.000 Median : 5.00
## Mean : 85.95 Mean : 0.4921 Mean : 2.342 Mean : 51.08
## 3rd Qu.: 47.00 3rd Qu.: 0.0000 3rd Qu.: 1.000 3rd Qu.: 17.00
## Max. :31566.00 Max. :332.0000 Max. :657.000 Max. :25222.00
##
## COMP_G COMP_H COMP_I COMP_J
## Min. : 2 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 34 1st Qu.: 2.00 1st Qu.: 2.00 1st Qu.: 0.00
## Median : 81 Median : 9.00 Median : 8.00 Median : 1.00
## Mean : 400 Mean : 48.33 Mean : 65.35 Mean : 30.48
## 3rd Qu.: 231 3rd Qu.: 29.00 3rd Qu.: 28.00 3rd Qu.: 6.00
## Max. :150633 Max. :19515.00 Max. :29290.00 Max. :38720.00
##
## COMP_K COMP_L COMP_M COMP_N
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 1.00 1st Qu.: 2.0
## Median : 0.00 Median : 0.00 Median : 4.00 Median : 5.0
## Mean : 19.63 Mean : 18.61 Mean : 62.66 Mean : 102.8
## 3rd Qu.: 3.00 3rd Qu.: 4.00 3rd Qu.: 15.00 3rd Qu.: 17.0
## Max. :23738.00 Max. :14003.00 Max. :49181.00 Max. :76757.0
##
## COMP_O COMP_P COMP_Q COMP_R
## Min. : 1.000 Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 2.000 1st Qu.: 2.00 1st Qu.: 1.00 1st Qu.: 0.0
## Median : 2.000 Median : 6.00 Median : 4.00 Median : 2.0
## Mean : 3.425 Mean : 35.48 Mean : 40.57 Mean : 14.5
## 3rd Qu.: 4.000 3rd Qu.: 19.00 3rd Qu.: 14.00 3rd Qu.: 7.0
## Max. :153.000 Max. :16030.00 Max. :22248.00 Max. :6687.0
##
## COMP_S COMP_T COMP_U HOTELS
## Min. : 0.00 Min. :0 Min. : 0.00000 Min. : 1.000
## 1st Qu.: 6.00 1st Qu.:0 1st Qu.: 0.00000 1st Qu.: 1.000
## Median : 14.00 Median :0 Median : 0.00000 Median : 1.000
## Mean : 59.36 Mean :0 Mean : 0.03788 Mean : 3.381
## 3rd Qu.: 35.00 3rd Qu.:0 3rd Qu.: 0.00000 3rd Qu.: 3.000
## Max. :24832.00 Max. :0 Max. :64.00000 Max. :97.000
## NA's :3425
## BEDS Pr_Agencies Pu_Agencies Pr_Bank
## Min. : 5.0 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 42.0 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 90.0 Median : 1.000 Median : 2.000 Median : 1.000
## Mean : 290.8 Mean : 3.919 Mean : 3.069 Mean : 1.383
## 3rd Qu.: 239.0 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :13247.0 Max. :1693.000 Max. :626.000 Max. :83.000
## NA's :3425 NA's :1544 NA's :1544 NA's :1544
## Pu_Bank Pr_Assets Pu_Assets Cars
## Min. :0.000 Min. :0.000e+00 Min. :0.000e+00 Min. : 3
## 1st Qu.:1.000 1st Qu.:0.000e+00 1st Qu.:4.478e+07 1st Qu.: 723
## Median :2.000 Median :3.749e+07 Median :1.505e+08 Median : 1657
## Mean :1.615 Mean :1.198e+10 Mean :4.336e+09 Mean : 11497
## 3rd Qu.:2.000 3rd Qu.:1.296e+08 3rd Qu.:5.602e+08 3rd Qu.: 4727
## Max. :8.000 Max. :1.947e+13 Max. :2.893e+12 Max. :5740995
## NA's :1544 NA's :1544 NA's :1544
## Motorcycles Wheeled_tractor UBER MAC
## Min. : 33.0 Min. : 0.000 1 : 103 Min. : 1.000
## 1st Qu.: 607.2 1st Qu.: 0.000 NA's:3963 1st Qu.: 1.000
## Median : 1370.5 Median : 0.000 Median : 2.000
## Mean : 5401.2 Mean : 7.081 Mean : 4.597
## 3rd Qu.: 3638.0 3rd Qu.: 2.000 3rd Qu.: 4.000
## Max. :1134570.0 Max. :3236.000 Max. :130.000
## NA's :3927
## WAL-MART POST_OFFICES LOG_GDP_CAPITA
## Min. : 1.000 Min. : 1.000 Min. : 8.431
## 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 9.181
## Median : 1.000 Median : 1.000 Median : 9.776
## Mean : 2.151 Mean : 2.179 Mean : 9.764
## 3rd Qu.: 2.000 3rd Qu.: 2.000 3rd Qu.:10.241
## Max. :26.000 Max. :225.000 Max. :12.659
## NA's :3973 NA's :85
Pr_Assets and Pu_Assets are selected as assets are equivalent to money, which can count towards interests and investments, and investments is a factor that affects GDP.
There are 1544 NA values for Pr_Assets, and could be due to the cities not having any Private Assets. We will drop the NA values as it is difficult to assign a value to it.
brazil <- brazil%>%
filter(!is.na(Pr_Assets))
There are 1544 NA values for Pu_Assets, and could be due to the cities not having any Public Assets. We will drop the NA values as it is difficult to assign a value to it.
brazil <- brazil%>%
filter(!is.na(Pu_Assets))
After removing all the NA values, we can conduct a correlation matrix
We will select our variables first
brazil3 <- brazil%>%
dplyr::select("IBGE_RES_POP", "Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV", "FIXED_PHONES", "AREA", "GVA_AGROPEC", "GVA_INDUSTRY", "GVA_SERVICES", "TAXES", "GDP", "POP_GDP", "GDP_CAPITA", "MUN_EXPENDIT", "COMP_TOT", "Cars", "Motorcycles", "Pr_Assets", "Pu_Assets")
Before doing so, we will double check the NAs in the variables
summary(brazil3)
## IBGE_RES_POP Active_pop IBGE_CROP_PRODUCTION IDHM_Longevidade
## Min. : 1641 Min. : 358 Min. : 0 Min. :0.6770
## 1st Qu.: 10328 1st Qu.: 3826 1st Qu.: 6572 1st Qu.:0.7940
## Median : 18948 Median : 7970 Median : 32439 Median :0.8230
## Mean : 56654 Mean : 31660 Mean : 85956 Mean :0.8164
## 3rd Qu.: 37652 3rd Qu.: 17681 3rd Qu.: 88482 3rd Qu.:0.8450
## Max. :11253503 Max. :7058221 Max. :3274885 Max. :0.8940
## IDHM_Educacao IDHM_Renda PAY_TV FIXED_PHONES
## Min. :0.3150 Min. :0.4380 Min. : 7.0 Min. : 30
## 1st Qu.:0.5320 1st Qu.:0.6240 1st Qu.: 220.2 1st Qu.: 375
## Median :0.6020 Median :0.6930 Median : 583.5 Median : 902
## Mean :0.5932 Mean :0.6766 Mean : 5779.7 Mean : 12631
## 3rd Qu.:0.6590 3rd Qu.:0.7290 3rd Qu.: 1763.8 3rd Qu.: 3096
## Max. :0.8250 Max. :0.8910 Max. :2047668.0 Max. :5543127
## AREA GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## Min. : 3.61 Min. : 0 Min. : 2 Min. : 9
## 1st Qu.: 249.04 1st Qu.: 11106 1st Qu.: 5661 1st Qu.: 30409
## Median : 495.38 Median : 34661 Median : 23559 Median : 91102
## Mean : 1684.11 Mean : 67995 Mean : 324442 Mean : 925640
## 3rd Qu.: 1195.48 3rd Qu.: 80643 3rd Qu.: 139071 3rd Qu.: 266119
## Max. :122461.09 Max. :1402282 Max. :63306755 Max. :464656988
## TAXES GDP POP_GDP GDP_CAPITA
## Min. : -235 Min. : 34 Min. : 1573 Min. : 4849
## 1st Qu.: 4210 1st Qu.: 116175 1st Qu.: 10810 1st Qu.: 12565
## Median : 15862 Median : 259097 Median : 20184 Median : 21051
## Mean : 226337 Mean : 1738249 Mean : 61249 Mean : 25624
## 3rd Qu.: 61560 3rd Qu.: 709298 3rd Qu.: 40625 3rd Qu.: 31663
## Max. :117125387 Max. :687035890 Max. :12038175 Max. :314638
## MUN_EXPENDIT COMP_TOT Cars Motorcycles
## Min. :2.823e+06 Min. : 15.0 Min. : 7 Min. : 114
## 1st Qu.:2.593e+07 1st Qu.: 188.0 1st Qu.: 1671 1st Qu.: 1197
## Median :4.494e+07 Median : 377.0 Median : 3442 Median : 2650
## Mean :1.575e+08 Mean : 1664.0 Mean : 18065 Mean : 8205
## 3rd Qu.:8.971e+07 3rd Qu.: 926.8 3rd Qu.: 9085 3rd Qu.: 6498
## Max. :4.577e+10 Max. :530446.0 Max. :5740995 Max. :1134570
## Pr_Assets Pu_Assets
## Min. :0.000e+00 Min. :0.000e+00
## 1st Qu.:0.000e+00 1st Qu.:4.478e+07
## Median :3.749e+07 Median :1.505e+08
## Mean :1.198e+10 Mean :4.336e+09
## 3rd Qu.:1.296e+08 3rd Qu.:5.602e+08
## Max. :1.947e+13 Max. :2.893e+12
After confirming there are no NA values, we can now plot the correlation matrix
corrplot(cor(brazil3), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.4, method="number", type="upper")
As seen in the correlation plot, many variables are highly correlated. To prevent this, we will derive new variables.
In addition, since IBGE_RES_POP and Active_pop have a correlation value of 1, we conclude that these two variables are rather similar. IBGE_RES_POP and Active_pop are both population of the municipality, but we would choose Active_pop instead since they are the economically active group, which contributes most to the economy, and hence would perhaps have the greatest correlation to GDP_CAPITA. This could be due to the economically active group earning income, spending income, and paying taxes.
Since GDP_CAPITA is taken by GDP divided by population, population affects GDP. For consistency, we will divide several variables by population.
We will divide PAY_TV by population, to be able to calculate approximate number of pay_tv each individual has
brazil <- brazil %>%
mutate(PAY_TV_p = PAY_TV/POP_GDP)
We will divide FIXED_PHONES by population, to be able to calculate approximate number of fixed_phone each individual has
brazil <- brazil %>%
mutate(FIXED_PHONES_p = FIXED_PHONES/POP_GDP)
We will divide Cars by population, to be able to calculate approximate number of cars each individual has
brazil <- brazil %>%
mutate(Cars_p =Cars/POP_GDP)
We will divide Motorcycles by population, to be able to calculate approximate number of motorcycles each individual has
brazil <- brazil %>%
mutate(Motorcycles_p = Motorcycles/POP_GDP)
GVA tells us how much value is added or lost from a municipality, which can be used in the calculation of GDP. We divide GVA_AGROPEC by population, to calculate the approximate Gross Added Value of each individual
brazil <- brazil %>%
mutate(GVA_AGROPEC_p = GVA_AGROPEC/POP_GDP)
We divide GVA_INDUSTRY by population, to calculate the approximate Gross Added Value of each individual
brazil <- brazil %>%
mutate(GVA_INDUSTRY_p = GVA_INDUSTRY/POP_GDP)
We divide GVA_SERVICES by population, to calculate the approximate Gross Added Value of each individual
brazil <- brazil %>%
mutate(GVA_SERVICES_p = GVA_SERVICES/POP_GDP)
We divide MUN_EXPENDIT by population, to calculate the approximate expenses of each individual in each municipality
brazil <- brazil %>%
mutate(MUN_EXPENDIT_p = MUN_EXPENDIT/POP_GDP)
We will also derive new variables such as pop_density, as the higher the population density in a city, the higher the possibility of spending in that city, increasing the GDP_CAPITA. We will use population divided by area to calculate pop_density ##### pop_density
brazil <- brazil %>%
mutate(pop_density = POP_GDP/AREA)
We will also derive a tax to gdp ratio, which tells us how the government spend the tax money. It is calculated by taking taxes dividing by GDP. A higher ratio suggests higher ability for sustainable economic growth. ##### tax_to_gdp
brazil <- brazil %>%
mutate(tax_to_gdp = TAXES/GDP)
Select variables first
brazil4 <- brazil%>%
dplyr::select("Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV_p", "FIXED_PHONES_p", "GVA_AGROPEC_p", "GVA_INDUSTRY_p", "GVA_SERVICES_p", "tax_to_gdp", "MUN_EXPENDIT_p", "GDP_CAPITA", "COMP_TOT", "Cars_p", "Motorcycles_p", "Pr_Assets", "Pu_Assets", "pop_density")
We will, again, double confirm there are no NA values in our data set
summary(brazil4)
## Active_pop IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Educacao
## Min. : 358 Min. : 0 Min. :0.6770 Min. :0.3150
## 1st Qu.: 3826 1st Qu.: 6572 1st Qu.:0.7940 1st Qu.:0.5320
## Median : 7970 Median : 32439 Median :0.8230 Median :0.6020
## Mean : 31660 Mean : 85956 Mean :0.8164 Mean :0.5932
## 3rd Qu.: 17681 3rd Qu.: 88482 3rd Qu.:0.8450 3rd Qu.:0.6590
## Max. :7058221 Max. :3274885 Max. :0.8940 Max. :0.8250
## IDHM_Renda PAY_TV_p FIXED_PHONES_p GVA_AGROPEC_p
## Min. :0.4380 Min. :0.0006389 Min. :0.001669 Min. : 0.0000
## 1st Qu.:0.6240 1st Qu.:0.0168189 1st Qu.:0.028060 1st Qu.: 0.3333
## Median :0.6930 Median :0.0339047 Median :0.068502 Median : 1.5590
## Mean :0.6766 Mean :0.0470778 Mean :0.086474 Mean : 3.7226
## 3rd Qu.:0.7290 3rd Qu.:0.0614278 3rd Qu.:0.117982 3rd Qu.: 4.8311
## Max. :0.8910 Max. :0.4805407 Max. :1.037262 Max. :75.9531
## GVA_INDUSTRY_p GVA_SERVICES_p tax_to_gdp MUN_EXPENDIT_p
## Min. : 0.00017 Min. : 0.00124 Min. : -2.0920 Min. : 90.45
## 1st Qu.: 0.43654 1st Qu.: 2.49250 1st Qu.: 0.0398 1st Qu.: 1912.71
## Median : 1.54206 Median : 6.22514 Median : 0.0691 Median : 2353.07
## Mean : 4.46843 Mean : 7.78589 Mean : 10.1533 Mean : 2582.38
## 3rd Qu.: 4.67058 3rd Qu.: 10.88331 3rd Qu.: 0.1127 3rd Qu.: 2912.14
## Max. :183.79405 Max. :108.80127 Max. :324.6684 Max. :12680.22
## GDP_CAPITA COMP_TOT Cars_p Motorcycles_p
## Min. : 4849 Min. : 15.0 Min. :0.0003393 Min. :0.008056
## 1st Qu.: 12565 1st Qu.: 188.0 1st Qu.:0.1123884 1st Qu.:0.095797
## Median : 21051 Median : 377.0 Median :0.2648455 Median :0.133799
## Mean : 25624 Mean : 1664.0 Mean :0.2440236 Mean :0.146095
## 3rd Qu.: 31663 3rd Qu.: 926.8 3rd Qu.:0.3552831 3rd Qu.:0.183595
## Max. :314638 Max. :530446.0 Max. :0.6409822 Max. :0.534068
## Pr_Assets Pu_Assets pop_density
## Min. :0.000e+00 Min. :0.000e+00 Min. : 0.225
## 1st Qu.:0.000e+00 1st Qu.:4.478e+07 1st Qu.: 17.222
## Median :3.749e+07 Median :1.505e+08 Median : 33.820
## Mean :1.198e+10 Mean :4.336e+09 Mean : 179.058
## 3rd Qu.:1.296e+08 3rd Qu.:5.602e+08 3rd Qu.: 82.319
## Max. :1.947e+13 Max. :2.893e+12 Max. :13533.497
corrplot(cor(brazil4), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="number", type="upper")
corrplot(cor(brazil4), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="square", type="upper")
We chose to order the variables alphabetically for easy identifcation. From the scatterplot, we see that Active_pop is highly correlated to COMP_TOT (correlation value = 0.97). In view of this, we should only include either one in the subsequent model building. As a result, COMP_TOT is excluded in the subsequent model builing.
corrplot(cor(brazil4[,c(1:13, 15:19)]), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="number", type="upper")
corrplot(cor(brazil4[,c(1:13, 15:19)]), diag=FALSE, order="alphabet", tl.pos="td", tl.cex=0.5,number.cex=0.5, method="square", type="upper")
Having selecting our variables needed, we can now do the modelling.
We will be doing a linear modelling, with the GDP_CAPITA with all the independent variables that we have identified previously.
Before joining, we have identified that the names of variables in name_muni and CITY in mun and brazil respectively are not consistent. We will hence upper case all the letters for both variables.
mun$name_muni <- toupper(mun$name_muni)
brazil$CITY <- toupper(brazil$CITY)
brazil_mun <- right_join(mun, brazil, by=c("name_muni"="CITY", "abbrev_state"="STATE"))
## Warning: Column `abbrev_state`/`STATE` joining character vector and factor,
## coercing into character vector
Check for missing geometries
any(is.na(st_dimension(brazil_mun)))
## [1] FALSE
qtm(brazil_mun, "GDP_CAPITA", border=NULL, scale=0.6) + tm_legend(main.title="GDP_CAPITA in 2016", main.title.position="centre")
Select variables to use first
brazil6 <- brazil%>%
dplyr::select("Active_pop", "IBGE_CROP_PRODUCTION", "IDHM_Longevidade", "IDHM_Educacao", "IDHM_Renda", "PAY_TV_p", "FIXED_PHONES_p", "GVA_AGROPEC_p", "GVA_INDUSTRY_p", "GVA_SERVICES_p", "tax_to_gdp", "MUN_EXPENDIT_p", "GDP_CAPITA", "Cars_p", "Motorcycles_p", "Pr_Assets", "Pu_Assets", "pop_density")
Before we conduct the linear regression, we will plot histograms of each of the variables, to have an idea of the distribution of the variables.
Active_pop <- ggplot(data=brazil6, aes(x=Active_pop))+
geom_histogram(bins=25, color="black", fill="light blue")
IBGE_CROP_PRODUCTION <- ggplot(data=brazil6, aes(x=IBGE_CROP_PRODUCTION))+
geom_histogram(bins=25, color="black", fill="light blue")
IDHM_Longevidade <- ggplot(data=brazil6, aes(x=IDHM_Longevidade))+
geom_histogram(bins=25, color="black", fill="light blue")
IDHM_Educacao <- ggplot(data=brazil6, aes(x=IDHM_Educacao))+
geom_histogram(bins=25, color="black", fill="light blue")
IDHM_Renda <- ggplot(data=brazil6, aes(x=IDHM_Renda))+
geom_histogram(bins=25, color="black", fill="light blue")
PAY_TV_p <- ggplot(data=brazil6, aes(x=PAY_TV_p))+
geom_histogram(bins=25, color="black", fill="light blue")
FIXED_PHONES_p <- ggplot(data=brazil6, aes(x=FIXED_PHONES_p))+
geom_histogram(bins=25, color="black", fill="light blue")
GVA_AGROPEC_p <- ggplot(data=brazil6, aes(x=GVA_AGROPEC_p))+
geom_histogram(bins=25, color="black", fill="light blue")
GVA_INDUSTRY_p <- ggplot(data=brazil6, aes(x=GVA_INDUSTRY_p))+
geom_histogram(bins=25, color="black", fill="light blue")
GVA_SERVICES_p <- ggplot(data=brazil6, aes(x=GVA_SERVICES_p))+
geom_histogram(bins=25, color="black", fill="light blue")
tax_to_gdp <- ggplot(data=brazil6, aes(x=tax_to_gdp))+
geom_histogram(bins=25, color="black", fill="light blue")
MUN_EXPENDIT_p <- ggplot(data=brazil6, aes(x=MUN_EXPENDIT_p))+
geom_histogram(bins=25, color="black", fill="light blue")
GDP_CAPITA <- ggplot(data=brazil6, aes(x=GDP_CAPITA))+
geom_histogram(bins=25, color="black", fill="light blue")
Cars_p <- ggplot(data=brazil6, aes(x=Cars_p))+
geom_histogram(bins=25, color="black", fill="light blue")
Motorcycles_p <- ggplot(data=brazil6, aes(x=Motorcycles_p))+
geom_histogram(bins=25, color="black", fill="light blue")
Pr_Assets <- ggplot(data=brazil6, aes(x=Pr_Assets))+
geom_histogram(bins=25, color="black", fill="light blue")
Pu_Assets <- ggplot(data=brazil6, aes(x=Pu_Assets)) +
geom_histogram(bins=25, color="black", fill="light blue")
pop_density <- ggplot(data=brazil6, aes(x=pop_density))+
geom_histogram(bins=25, color="black", fill="light blue")
ggarrange(Active_pop, IBGE_CROP_PRODUCTION, IDHM_Longevidade, IDHM_Educacao, IDHM_Renda, PAY_TV_p, FIXED_PHONES_p, GVA_AGROPEC_p, GVA_INDUSTRY_p, GVA_SERVICES_p, tax_to_gdp, MUN_EXPENDIT_p, GDP_CAPITA, Cars_p, Motorcycles_p, Pr_Assets, Pu_Assets, pop_density, ncol=3, nrow=6)
We can tell from the data that most of them are skewed, either left or right. Some however, resemble a normal distribution. For eg, IDHM_Longevidade, IDHM_Educacao, IDHM_Renda, or Motorcycles_p (even though it is slightly right skewed)
brazil_lm <- lm(GDP_CAPITA~., data=brazil6)
summary(brazil_lm)
##
## Call:
## lm(formula = GDP_CAPITA ~ ., data = brazil6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36833 -3483 -1012 1588 223485
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.314e+04 5.651e+03 -2.326 0.02011 *
## Active_pop -1.338e-03 2.810e-03 -0.476 0.63404
## IBGE_CROP_PRODUCTION 6.195e-03 1.121e-03 5.524 3.65e-08 ***
## IDHM_Longevidade 1.839e+03 8.235e+03 0.223 0.82335
## IDHM_Educacao -2.345e+03 3.799e+03 -0.617 0.53706
## IDHM_Renda 1.903e+04 6.977e+03 2.728 0.00642 **
## PAY_TV_p -1.145e+04 5.345e+03 -2.142 0.03231 *
## FIXED_PHONES_p 1.249e+04 4.289e+03 2.912 0.00362 **
## GVA_AGROPEC_p 7.222e+02 3.940e+01 18.328 < 2e-16 ***
## GVA_INDUSTRY_p 1.091e+03 2.146e+01 50.864 < 2e-16 ***
## GVA_SERVICES_p 8.076e+02 2.991e+01 27.005 < 2e-16 ***
## tax_to_gdp 1.586e+01 5.950e+00 2.665 0.00774 **
## MUN_EXPENDIT_p 3.499e+00 2.213e-01 15.813 < 2e-16 ***
## Cars_p 4.868e+03 2.923e+03 1.665 0.09598 .
## Motorcycles_p 2.096e+03 2.758e+03 0.760 0.44735
## Pr_Assets 1.751e-09 8.438e-10 2.076 0.03804 *
## Pu_Assets -1.197e-08 7.664e-09 -1.562 0.11835
## pop_density 1.315e+00 3.109e-01 4.230 2.42e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9102 on 2504 degrees of freedom
## Multiple R-squared: 0.8263, Adjusted R-squared: 0.8251
## F-statistic: 700.6 on 17 and 2504 DF, p-value: < 2.2e-16
AIC(brazil_lm)
## [1] 53159.52
BIC(brazil_lm)
## [1] 53270.34
With reference to the report above, it is clear that not all the independent variables are statistically significant. We will revise the model by removing those variables which are not statistically significant (p > 0.05).
brazil_lm2 <- lm(GDP_CAPITA~Active_pop + IBGE_CROP_PRODUCTION + IDHM_Renda + Pr_Assets + Pu_Assets + FIXED_PHONES_p + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + pop_density, data=brazil6)
summary(brazil_lm2)
##
## Call:
## lm(formula = GDP_CAPITA ~ Active_pop + IBGE_CROP_PRODUCTION +
## IDHM_Renda + Pr_Assets + Pu_Assets + FIXED_PHONES_p + GVA_AGROPEC_p +
## GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp +
## pop_density, data = brazil6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37751 -3439 -1025 1539 223491
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.605e+04 2.363e+03 -6.794 1.36e-11 ***
## Active_pop -2.318e-03 2.765e-03 -0.838 0.40198
## IBGE_CROP_PRODUCTION 6.010e-03 1.104e-03 5.446 5.66e-08 ***
## IDHM_Renda 2.578e+04 3.909e+03 6.595 5.18e-11 ***
## Pr_Assets 1.978e-09 8.398e-10 2.355 0.01858 *
## Pu_Assets -1.093e-08 7.641e-09 -1.431 0.15258
## FIXED_PHONES_p 9.577e+03 3.901e+03 2.455 0.01415 *
## GVA_AGROPEC_p 7.374e+02 3.851e+01 19.150 < 2e-16 ***
## GVA_INDUSTRY_p 1.094e+03 2.131e+01 51.342 < 2e-16 ***
## GVA_SERVICES_p 8.016e+02 2.970e+01 26.993 < 2e-16 ***
## MUN_EXPENDIT_p 3.384e+00 2.148e-01 15.757 < 2e-16 ***
## tax_to_gdp 1.555e+01 5.941e+00 2.617 0.00893 **
## pop_density 1.211e+00 3.079e-01 3.932 8.67e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9109 on 2509 degrees of freedom
## Multiple R-squared: 0.8257, Adjusted R-squared: 0.8248
## F-statistic: 990.2 on 12 and 2509 DF, p-value: < 2.2e-16
ols_regress(brazil_lm2)
## Model Summary
## --------------------------------------------------------------------
## R 0.909 RMSE 9109.326
## R-Squared 0.826 Coef. Var 35.549
## Adj. R-Squared 0.825 MSE 82979821.453
## Pred R-Squared 0.819 MAE 4298.952
## --------------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## -----------------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## -----------------------------------------------------------------------------------
## Regression 986015686339.237 12 82167973861.603 990.216 0.0000
## Residual 208196372025.838 2509 82979821.453
## Total 1.194212e+12 2521
## -----------------------------------------------------------------------------------
##
## Parameter Estimates
## -------------------------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## -------------------------------------------------------------------------------------------------------------
## (Intercept) -16052.762 2362.934 -6.794 0.000 -20686.263 -11419.261
## Active_pop -0.002 0.003 -0.019 -0.838 0.402 -0.008 0.003
## IBGE_CROP_PRODUCTION 0.006 0.001 0.053 5.446 0.000 0.004 0.008
## IDHM_Renda 25776.575 3908.704 0.086 6.595 0.000 18111.958 33441.193
## Pr_Assets 0.000 0.000 0.038 2.355 0.019 0.000 0.000
## Pu_Assets 0.000 0.000 -0.034 -1.431 0.153 0.000 0.000
## FIXED_PHONES_p 9576.562 3900.617 0.035 2.455 0.014 1927.802 17225.321
## GVA_AGROPEC_p 737.390 38.506 0.201 19.150 0.000 661.883 812.896
## GVA_INDUSTRY_p 1094.057 21.309 0.508 51.342 0.000 1052.271 1135.842
## GVA_SERVICES_p 801.646 29.699 0.302 26.993 0.000 743.410 859.882
## MUN_EXPENDIT_p 3.384 0.215 0.167 15.757 0.000 2.963 3.806
## tax_to_gdp 15.546 5.941 0.022 2.617 0.009 3.896 27.195
## pop_density 1.211 0.308 0.041 3.932 0.000 0.607 1.814
## -------------------------------------------------------------------------------------------------------------
AIC(brazil_lm2)
## [1] 53158.51
BIC(brazil_lm2)
## [1] 53240.17
Adjusted R2 = 0.8248
AIC = 53158.51
BIC = 53240.17
In order to use this model for the rest of our analysis, we need to check the assumptions of multiple linear regression.
##### 6.2.1 Check for multicolinearity
ols_vif_tol(brazil_lm)
## Variables Tolerance VIF
## 1 Active_pop 0.1280218 7.811170
## 2 IBGE_CROP_PRODUCTION 0.7140220 1.400517
## 3 IDHM_Longevidade 0.2998088 3.335459
## 4 IDHM_Educacao 0.2956733 3.382112
## 5 IDHM_Renda 0.1267334 7.890583
## 6 PAY_TV_p 0.5532049 1.807648
## 7 FIXED_PHONES_p 0.2860655 3.495703
## 8 GVA_AGROPEC_p 0.5983846 1.671166
## 9 GVA_INDUSTRY_p 0.7002448 1.428072
## 10 GVA_SERVICES_p 0.5456202 1.832777
## 11 tax_to_gdp 0.9697863 1.031155
## 12 MUN_EXPENDIT_p 0.5852129 1.708780
## 13 Cars_p 0.2096788 4.769200
## 14 Motorcycles_p 0.8963326 1.115657
## 15 Pr_Assets 0.2709912 3.690157
## 16 Pu_Assets 0.1250262 7.998324
## 17 pop_density 0.6246143 1.600988
Since the VIF of the independent variables are less than 10, we can safely conclude that there are no sign of multicollinearity among the independent variables. This suggests that there is a low to moderate correlation between variables, but it is not a significant cause for concern.
Test the assumption that linearity and additivity of the relationship between dependent and independent variables
ols_plot_resid_fit(brazil_lm2)
Looking at the residuals vs fitted values plot, the red line is approximately at 0. There is no pattern in the residual plot, suggesting that we can assume a linear relationship between the predictors and outcome variables.
ols_plot_resid_hist(brazil_lm2)
The figure reveals that the residual of the multiple linear regression model resemble normal distribution
We will make another test to further verify out testing of assumption
fnorm <- fitdist(residuals(brazil_lm2), distr="norm")
summary(fnorm)
## Fitting of the distribution ' norm ' by maximum likelihood
## Parameters :
## estimate Std. Error
## mean 1.639439e-13 179.8293
## sd 9.085818e+03 128.1039
## Loglikelihood: -26565.26 AIC: 53134.51 BIC: 53146.18
## Correlation matrix:
## mean sd
## mean 1 0
## sd 0 1
plot(fnorm)
Q-Q plot and P-P plot shows no deviation from normal distribution. From the code chunk below, at the 95% significance level, KS statistic is smaller than the critical value (0.1968861 < 0.3205551). Do not reject null hypothesis. There is insufficient evidence to suspect that the residuals of the multiple linear regression model are not normally distributed. Hence, they are normally distributed.
par(mfrow=c(2,2))
plot(brazil_lm2)
gofstat(fnorm,discrete=FALSE)
## Goodness-of-fit statistics
## 1-mle-norm
## Kolmogorov-Smirnov statistic 0.1968861
## Cramer-von Mises statistic 39.4846927
## Anderson-Darling statistic Inf
##
## Goodness-of-fit criteria
## 1-mle-norm
## Akaike's Information Criterion 53134.51
## Bayesian Information Criterion 53146.18
KScritvalue <- 1.36/sqrt(length(brazil6))
KScritvalue
## [1] 0.3205551
Looking at the spread location plot, the variances of the residual points increases with the value of the fitted outcome variable, suggesting heteroscedasticity.
As some of our assumptions are not met, we run a 2nd specification: log GDP_CAPITA against all other Xs to correct extreme values, and eliminate heteroscedasticity..
brazil_lglm <- lm(log(GDP_CAPITA)~., data=brazil6)
summary(brazil_lglm)
##
## Call:
## lm(formula = log(GDP_CAPITA) ~ ., data = brazil6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.40751 -0.15373 -0.01677 0.13063 1.99913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.418e+00 1.628e-01 39.413 < 2e-16 ***
## Active_pop 1.626e-07 8.097e-08 2.008 0.04474 *
## IBGE_CROP_PRODUCTION 2.764e-07 3.232e-08 8.553 < 2e-16 ***
## IDHM_Longevidade 7.303e-01 2.373e-01 3.077 0.00211 **
## IDHM_Educacao 1.116e-01 1.095e-01 1.019 0.30822
## IDHM_Renda 3.231e+00 2.011e-01 16.068 < 2e-16 ***
## PAY_TV_p -4.996e-01 1.540e-01 -3.243 0.00120 **
## FIXED_PHONES_p 7.280e-02 1.236e-01 0.589 0.55592
## GVA_AGROPEC_p 2.209e-02 1.136e-03 19.451 < 2e-16 ***
## GVA_INDUSTRY_p 1.766e-02 6.184e-04 28.556 < 2e-16 ***
## GVA_SERVICES_p 1.453e-02 8.619e-04 16.858 < 2e-16 ***
## tax_to_gdp 8.545e-04 1.715e-04 4.983 6.67e-07 ***
## MUN_EXPENDIT_p 8.822e-05 6.377e-06 13.834 < 2e-16 ***
## Cars_p 5.273e-01 8.424e-02 6.259 4.53e-10 ***
## Motorcycles_p 5.670e-02 7.949e-02 0.713 0.47572
## Pr_Assets 2.495e-14 2.432e-14 1.026 0.30494
## Pu_Assets -6.188e-13 2.209e-13 -2.801 0.00513 **
## pop_density 1.161e-05 8.959e-06 1.296 0.19511
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2623 on 2504 degrees of freedom
## Multiple R-squared: 0.8402, Adjusted R-squared: 0.8391
## F-statistic: 774.6 on 17 and 2504 DF, p-value: < 2.2e-16
AIC(brazil_lglm)
## [1] 427.1781
BIC(brazil_lglm)
## [1] 538.0015
With reference to the report above, it is clear that not all the independent variables are statistically significant. We will revise the model by removing those variables which are not statistically significant.
##### 6.3.1 Selecting statistically significant variables
brazil_lglm2 <- lm(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil6)
summary(brazil_lglm2)
##
## Call:
## lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + IDHM_Longevidade +
## IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p +
## MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387, Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
ols_regress(brazil_lglm2)
## Model Summary
## -------------------------------------------------------------
## R 0.916 RMSE 0.263
## R-Squared 0.839 Coef. Var 2.651
## Adj. R-Squared 0.838 MSE 0.069
## Pred R-Squared 0.833 MAE 0.189
## -------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
##
## ANOVA
## ------------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## ------------------------------------------------------------------------
## Regression 904.463 9 100.496 1451.153 0.0000
## Residual 173.962 2512 0.069
## Total 1078.424 2521
## ------------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------------
## (Intercept) 6.462 0.158 40.848 0.000 6.152 6.773
## IBGE_CROP_PRODUCTION 0.000 0.000 0.081 8.616 0.000 0.000 0.000
## IDHM_Longevidade 0.663 0.236 0.041 2.813 0.005 0.201 1.126
## IDHM_Renda 3.362 0.174 0.375 19.370 0.000 3.022 3.703
## GVA_AGROPEC_p 0.022 0.001 0.202 20.869 0.000 0.020 0.024
## GVA_INDUSTRY_p 0.018 0.001 0.275 28.963 0.000 0.017 0.019
## GVA_SERVICES_p 0.015 0.001 0.183 17.938 0.000 0.013 0.016
## MUN_EXPENDIT_p 0.000 0.000 0.134 13.231 0.000 0.000 0.000
## tax_to_gdp 0.001 0.000 0.040 4.928 0.000 0.001 0.001
## Cars_p 0.521 0.080 0.108 6.492 0.000 0.364 0.678
## ----------------------------------------------------------------------------------------------
AIC(brazil_lglm2)
## [1] 435.3718
BIC(brazil_lglm2)
## [1] 499.5327
Adjusted R2=0.8381
AIC = 435.3718
BIC = 499.5327
ols_plot_resid_hist(brazil_lglm2)
fnorm <- fitdist(residuals(brazil_lglm2), distr="norm")
summary(fnorm)
## Fitting of the distribution ' norm ' by maximum likelihood
## Parameters :
## estimate Std. Error
## mean 6.268681e-18 0.005229764
## sd 2.626362e-01 0.003697760
## Loglikelihood: -206.6859 AIC: 417.3718 BIC: 429.0374
## Correlation matrix:
## mean sd
## mean 1 0
## sd 0 1
plot(fnorm)
par(mfrow=c(2,2))
plot(brazil_lglm2)
gofstat(fnorm,discrete=FALSE)
## Goodness-of-fit statistics
## 1-mle-norm
## Kolmogorov-Smirnov statistic 0.05891883
## Cramer-von Mises statistic 3.24343354
## Anderson-Darling statistic 21.14080686
##
## Goodness-of-fit criteria
## 1-mle-norm
## Akaike's Information Criterion 417.3718
## Bayesian Information Criterion 429.0374
KScritvalue <- 1.36/sqrt(length(brazil6))
KScritvalue
## [1] 0.3205551
The plots are now more favourable and there is a higher adjusted R2 and lower AIC and BIC values. The significant variables have changed slightly too.
summary(brazil_lglm2)
##
## Call:
## lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION + IDHM_Longevidade +
## IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p +
## MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387, Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
The following equation includes variables that are significant in the log multiple linear regression model.
###### GDP_CAPITA = 6.465 + 2.758e-07 IBGE_CROP_PRODUCTION + 0.6633 IDHM_Longevidade + 3.362 IDHM_Renda + 0.5208 Cars_p + 0.02223 GVA_AGROPEC_p + 0.01779 GVA_INDUSTRY_p + 0.01459 GVA_SERVICES + 8.189e-05 MUN_EXPENDIT_p + 8.434e-04 tax_to_gdp
We have to visualise the residuals of the multiple linear regression model that we have achieved above
In order to perform spatial autocorrelation test, We will need to convert brazil into a SpatialPointsDataFrame
brazil.point.sf <- st_as_sf(brazil, coords=c("LONG", "LAT"), crs=4674)%>%
st_transform(crs=4674)
brazil.polygon.sf <- right_join(mun, brazil, by=c("name_muni" = "CITY", "abbrev_state" = "STATE"))
## Warning: Column `abbrev_state`/`STATE` joining character vector and factor,
## coercing into character vector
brazil.polygon.sf <- st_as_sf(brazil.polygon.sf, crs=4674) %>%
st_transform(crs=4674)
Next, we will export the residual of the model and save it as a separate data frame
mlr.output <- as.data.frame(brazil_lglm2$residuals)
We will then join this newly created mlr.output data frame with brazil.point.sf object and brazil.polygon.sf object
brazil.point.res.sf <- cbind(brazil.point.sf,
brazil_lglm2$residuals) %>%
rename(`MLR_RES` = `brazil_lglm2.residuals`)
brazil.polygon.res.sf <- cbind(brazil.polygon.sf,
brazil_lglm2$residuals) %>%
rename(`MLR_RES` = `brazil_lglm2.residuals`)
We will load the spatial point and spatial polygon point respectively
brazil.point.res.sf
## Simple feature collection with 2522 features and 91 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -72.9165 ymin: -33.68757 xmax: -34.82395 ymax: 2.816682
## geographic CRS: SIRGAS 2000
## First 10 features:
## CITY STATE CAPITAL IBGE_RES_POP IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR
## 1 ABADIÂNIA GO 0 15757 15609 148
## 2 ABDON BATISTA SC 0 2653 2653 0
## 3 ABREU E LIMA PE 0 94429 94407 22
## 4 AÇAILÂNDIA MA 0 104047 104018 29
## 5 ACAJUTIBA BA 0 14653 14643 10
## 6 ACARÁ PA 0 53569 53516 53
## 7 ACARAÚ CE 0 57551 57542 9
## 8 ACEGUÁ RS 0 4394 4265 129
## 9 ACOPIARA CE 0 51160 51160 0
## 10 ACRELÂNDIA AC 0 12538 12535 3
## IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9
## 1 4655 3233 1422 10656 139 650 894
## 2 848 234 614 724 12 32 49
## 3 28182 25944 2238 81482 1050 4405 6255
## 4 27523 20612 6911 78081 1442 5896 7924
## 5 4116 3632 484 12727 216 849 1282
## 6 11833 3014 8819 12590 265 1082 1436
## 7 14680 7410 7270 24117 358 1615 2084
## 8 1398 314 1084 1059 17 46 76
## 9 15041 7885 7156 25159 365 1521 1929
## 10 3473 1679 1794 5902 118 508 671
## IBGE_10.14 Active_pop IBGE_60. IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION
## 1 1087 6896 990 10307 33085
## 2 63 479 89 5502 26195
## 3 7019 54749 8004 387 2595
## 4 8368 49197 5254 27137 89420
## 5 1404 7412 1564 4570 11442
## 6 1537 7281 989 41637 342851
## 7 2558 15123 2379 18505 38871
## 8 119 684 117 31149 119866
## 9 2422 15121 3801 17482 4244
## 10 710 3484 411 5807 31152
## IDHM.Ranking.2010 IDHM IDHM_Renda IDHM_Longevidade IDHM_Educacao ALT
## 1 2202 0.690 0.671 0.841 0.579 1017.55
## 2 2092 0.690 0.660 0.812 0.625 720.98
## 3 2477 0.679 0.625 0.791 0.632 27.06
## 4 2633 0.672 0.643 0.785 0.602 229.05
## 5 4613 0.580 0.560 0.723 0.487 183.93
## 6 5513 0.506 0.517 0.757 0.332 7.40
## 7 4125 0.601 0.554 0.758 0.517 17.29
## 8 2259 0.687 0.703 0.852 0.541 237.92
## 9 4279 0.595 0.563 0.724 0.517 312.96
## 10 4079 0.604 0.584 0.808 0.466 205.89
## PAY_TV FIXED_PHONES AREA REGIAO_TUR
## 1 227 720 1045.13 Região Turística Do Ouro E Cristais
## 2 109 260 237.16 Vale Do Contestado
## 3 1418 4661 126.19 Costa Náutica Coroa Do Avião
## 4 1225 2618 5806.44 <NA>
## 5 426 297 181.48 <NA>
## 6 964 181 4343.55 Araguaia-Tocantins
## 7 3032 479 845.47 Litoral Extremo Oeste
## 8 171 298 1551.34 Pampa Gaúcho
## 9 426 598 2265.35 <NA>
## 10 184 369 1807.95 <NA>
## CATEGORIA_TUR ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY
## 1 C 19614 Rural Adjacente 42.84 16728.30
## 2 D 2577 Rural Adjacente 24996.75 3578.87
## 3 D 99622 Urbano 7.80 384262.09
## 4 <NA> 111757 Urbano 159853.84 488799.22
## 5 <NA> 15129 Rural Adjacente 23176.61 7.02
## 6 D 55513 Rural Adjacente 441281.74 43934.34
## 7 C 62557 Rural Adjacente 74352.86 94123.87
## 8 D 4858 Urbano 130383.04 10632.62
## 9 <NA> 53931 Rural Adjacente 35057.66 17008.70
## 10 <NA> 15020 Rural Adjacente 91143.89 14.02
## GVA_SERVICES GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 1 138198.58 63396.20 261161.91 26822.58 287984.49 18427 15628.40
## 2 16011.10 17842.64 62429.36 2312.65 64742.01 2617 24739.02
## 3 526.04 336141.88 1254241.30 170264.52 1424505.83 98990 14390.40
## 4 779.84 364811.85 1793302.18 206244.13 1999546.31 110543 18088.40
## 5 33546.04 48142.87 111888.63 3269.82 115158.46 15764 7305.15
## 6 97080.41 188941.14 771237.62 17.18 788.42 54080 14578.70
## 7 187.46 188055.40 543990.18 36737.25 580727.43 61715 9409.83
## 8 66901.48 30193.24 238110.37 12221.86 250332.23 4731 52913.18
## 9 137661.95 155449.90 345.18 26.52 371701.24 53358 6966.18
## 10 32079.69 85106.34 222349.24 5261.66 227610.90 14120 16119.75
## GVA_MAIN
## 1 Demais serviços
## 2 Administração, defesa, educação e saúde públicas e seguridade social
## 3 Demais serviços
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Agricultura, inclusive apoio à agricultura e a pós colheita
## 7 Administração, defesa, educação e saúde públicas e seguridade social
## 8 Agricultura, inclusive apoio à agricultura e a pós colheita
## 9 Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 37513019 288 5 9 26 0 2 7 117
## 2 19506956 69 2 0 4 0 0 2 35
## 3 119645700 841 1 0 130 0 2 26 434
## 4 214456331 1334 47 1 113 0 5 75 657
## 5 27275310 96 2 0 4 0 0 1 57
## 6 106368816 162 8 3 6 0 0 4 99
## 7 101483437 638 41 1 38 8 0 6 363
## 8 22028721 168 8 0 8 0 1 2 86
## 9 85042995 365 2 0 26 0 0 6 255
## 10 22507579 107 2 0 15 0 0 0 56
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1 12 57 2 1 0 7 15 3 11 5 1
## 2 8 3 1 1 0 4 0 2 1 3 0
## 3 27 36 14 3 4 18 30 2 47 20 6
## 4 61 80 18 5 21 38 72 3 21 52 12
## 5 2 3 1 2 0 1 2 3 3 4 2
## 6 3 2 1 0 0 1 3 3 12 0 1
## 7 7 28 3 0 6 9 14 2 71 17 6
## 8 8 9 0 1 0 2 13 3 6 3 5
## 9 1 15 5 1 4 7 5 2 5 2 4
## 10 1 5 1 0 0 1 1 3 8 6 0
## COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1 8 0 0 1 34 1 1 1 1
## 2 3 0 0 NA NA 0 1 0 1
## 3 41 0 0 NA NA 2 3 2 3
## 4 53 0 0 2 56 2 3 2 3
## 5 9 0 0 NA NA 0 1 0 1
## 6 16 0 0 NA NA 1 1 1 1
## 7 18 0 0 NA NA 1 3 1 3
## 8 13 0 0 NA NA 0 1 0 1
## 9 25 0 0 1 22 1 3 1 3
## 10 8 0 0 1 27 0 1 0 1
## Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1 33724584 67091904 2838 1426 0 <NA> NA NA
## 2 0 42909056 976 345 2 <NA> NA NA
## 3 155632735 460626103 14579 10122 0 <NA> NA NA
## 4 125525251 1494221307 9935 24208 17 <NA> NA NA
## 5 0 50185684 834 1444 0 <NA> NA NA
## 6 22821995 37523391 652 3342 0 <NA> NA NA
## 7 57802114 529074069 3371 10448 0 <NA> NA NA
## 8 0 13450411 2046 591 5 <NA> NA NA
## 9 33077714 262320355 3158 11056 0 <NA> NA NA
## 10 0 86310524 1223 3343 0 <NA> NA NA
## POST_OFFICES LOG_GDP_CAPITA PAY_TV_p FIXED_PHONES_p Cars_p
## 1 3 9.656845 0.012318880 0.039073099 0.15401313
## 2 1 10.116137 0.041650745 0.099350401 0.37294612
## 3 1 9.574317 0.014324679 0.047085564 0.14727750
## 4 1 9.803026 0.011081661 0.023683092 0.08987453
## 5 1 8.896335 0.027023598 0.018840396 0.05290535
## 6 1 9.587317 0.017825444 0.003346893 0.01205621
## 7 1 9.149510 0.049129061 0.007761484 0.05462205
## 8 2 10.876408 0.036144578 0.062988797 0.43246671
## 9 10 8.848822 0.007983807 0.011207317 0.05918513
## 10 1 9.687801 0.013031161 0.026133144 0.08661473
## Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1 0.07738644 2.324849e-03 0.9078146199 7.499787269 2035.764
## 2 0.13183034 9.551681e+00 1.3675468093 6.118112342 7453.938
## 3 0.10225275 7.879584e-05 3.8818273563 0.005314072 1208.665
## 4 0.21899170 1.446078e+00 4.4218016518 0.007054630 1940.026
## 5 0.09160112 1.470224e+00 0.0004453184 2.128015732 1730.228
## 6 0.06179734 8.159795e+00 0.8123953402 1.795125925 1966.879
## 7 0.16929434 1.204778e+00 1.5251376489 0.003037511 1644.389
## 8 0.12492074 2.755930e+01 2.2474360600 14.141086451 4656.250
## 9 0.20720417 6.570272e-01 0.3187656959 2.579968327 1593.819
## 10 0.23675637 6.454950e+00 0.0009929178 2.271932720 1594.021
## pop_density tax_to_gdp MLR_RES geometry
## 1 17.631299 9.313897e-02 -0.001283972 POINT (-48.71881 -16.18267)
## 2 11.034744 3.572101e-02 -0.241843560 POINT (-51.02527 -27.60899)
## 3 784.452017 1.195253e-01 0.240172219 POINT (-34.89913 -7.904449)
## 4 19.037999 1.031455e-01 0.316636847 POINT (-47.50666 -4.951377)
## 5 86.863566 2.839409e-02 -0.164691960 POINT (-38.01829 -11.66261)
## 6 12.450645 2.179042e-02 0.400526860 POINT (-48.20046 -1.963437)
## 7 72.994902 6.326075e-02 0.093778846 POINT (-40.11824 -2.885311)
## 8 3.049622 4.882256e-02 -0.013513261 POINT (-54.16473 -31.86402)
## 9 23.553976 7.134762e-05 -0.207225765 POINT (-39.45571 -6.092762)
## 10 7.809950 2.311691e-02 0.364934669 POINT (-67.05232 -10.07379)
brazil.polygon.res.sf
## Simple feature collection with 2522 features and 95 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 3.605727
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni code_state abbrev_state CAPITAL IBGE_RES_POP
## 1 5200100 ABADIÂNIA 52 GO 0 15757
## 2 4200051 ABDON BATISTA 42 SC 0 2653
## 3 2600054 ABREU E LIMA 26 PE 0 94429
## 4 2100055 AÇAILÂNDIA 21 MA 0 104047
## 5 2900306 ACAJUTIBA 29 BA 0 14653
## 6 1500206 ACARÁ 15 PA 0 53569
## 7 2300200 ACARAÚ 23 CE 0 57551
## 8 4300034 ACEGUÁ 43 RS 0 4394
## 9 2300309 ACOPIARA 23 CE 0 51160
## 10 1200013 ACRELÂNDIA 12 AC 0 12538
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1 15609 148 4655 3233 1422
## 2 2653 0 848 234 614
## 3 94407 22 28182 25944 2238
## 4 104018 29 27523 20612 6911
## 5 14643 10 4116 3632 484
## 6 53516 53 11833 3014 8819
## 7 57542 9 14680 7410 7270
## 8 4265 129 1398 314 1084
## 9 51160 0 15041 7885 7156
## 10 12535 3 3473 1679 1794
## IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 Active_pop IBGE_60.
## 1 10656 139 650 894 1087 6896 990
## 2 724 12 32 49 63 479 89
## 3 81482 1050 4405 6255 7019 54749 8004
## 4 78081 1442 5896 7924 8368 49197 5254
## 5 12727 216 849 1282 1404 7412 1564
## 6 12590 265 1082 1436 1537 7281 989
## 7 24117 358 1615 2084 2558 15123 2379
## 8 1059 17 46 76 119 684 117
## 9 25159 365 1521 1929 2422 15121 3801
## 10 5902 118 508 671 710 3484 411
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM.Ranking.2010 IDHM IDHM_Renda
## 1 10307 33085 2202 0.690 0.671
## 2 5502 26195 2092 0.690 0.660
## 3 387 2595 2477 0.679 0.625
## 4 27137 89420 2633 0.672 0.643
## 5 4570 11442 4613 0.580 0.560
## 6 41637 342851 5513 0.506 0.517
## 7 18505 38871 4125 0.601 0.554
## 8 31149 119866 2259 0.687 0.703
## 9 17482 4244 4279 0.595 0.563
## 10 5807 31152 4079 0.604 0.584
## IDHM_Longevidade IDHM_Educacao LONG LAT ALT PAY_TV
## 1 0.841 0.579 -48.71881 -16.182672 1017.55 227
## 2 0.812 0.625 -51.02527 -27.608987 720.98 109
## 3 0.791 0.632 -34.89913 -7.904449 27.06 1418
## 4 0.785 0.602 -47.50666 -4.951377 229.05 1225
## 5 0.723 0.487 -38.01829 -11.662613 183.93 426
## 6 0.757 0.332 -48.20046 -1.963437 7.40 964
## 7 0.758 0.517 -40.11824 -2.885311 17.29 3032
## 8 0.852 0.541 -54.16473 -31.864015 237.92 171
## 9 0.724 0.517 -39.45571 -6.092762 312.96 426
## 10 0.808 0.466 -67.05232 -10.073794 205.89 184
## FIXED_PHONES AREA REGIAO_TUR CATEGORIA_TUR
## 1 720 1045.13 Região Turística Do Ouro E Cristais C
## 2 260 237.16 Vale Do Contestado D
## 3 4661 126.19 Costa Náutica Coroa Do Avião D
## 4 2618 5806.44 <NA> <NA>
## 5 297 181.48 <NA> <NA>
## 6 181 4343.55 Araguaia-Tocantins D
## 7 479 845.47 Litoral Extremo Oeste C
## 8 298 1551.34 Pampa Gaúcho D
## 9 598 2265.35 <NA> <NA>
## 10 369 1807.95 <NA> <NA>
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 1 19614 Rural Adjacente 42.84 16728.30 138198.58
## 2 2577 Rural Adjacente 24996.75 3578.87 16011.10
## 3 99622 Urbano 7.80 384262.09 526.04
## 4 111757 Urbano 159853.84 488799.22 779.84
## 5 15129 Rural Adjacente 23176.61 7.02 33546.04
## 6 55513 Rural Adjacente 441281.74 43934.34 97080.41
## 7 62557 Rural Adjacente 74352.86 94123.87 187.46
## 8 4858 Urbano 130383.04 10632.62 66901.48
## 9 53931 Rural Adjacente 35057.66 17008.70 137661.95
## 10 15020 Rural Adjacente 91143.89 14.02 32079.69
## GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 1 63396.20 261161.91 26822.58 287984.49 18427 15628.40
## 2 17842.64 62429.36 2312.65 64742.01 2617 24739.02
## 3 336141.88 1254241.30 170264.52 1424505.83 98990 14390.40
## 4 364811.85 1793302.18 206244.13 1999546.31 110543 18088.40
## 5 48142.87 111888.63 3269.82 115158.46 15764 7305.15
## 6 188941.14 771237.62 17.18 788.42 54080 14578.70
## 7 188055.40 543990.18 36737.25 580727.43 61715 9409.83
## 8 30193.24 238110.37 12221.86 250332.23 4731 52913.18
## 9 155449.90 345.18 26.52 371701.24 53358 6966.18
## 10 85106.34 222349.24 5261.66 227610.90 14120 16119.75
## GVA_MAIN
## 1 Demais serviços
## 2 Administração, defesa, educação e saúde públicas e seguridade social
## 3 Demais serviços
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Agricultura, inclusive apoio à agricultura e a pós colheita
## 7 Administração, defesa, educação e saúde públicas e seguridade social
## 8 Agricultura, inclusive apoio à agricultura e a pós colheita
## 9 Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 37513019 288 5 9 26 0 2 7 117
## 2 19506956 69 2 0 4 0 0 2 35
## 3 119645700 841 1 0 130 0 2 26 434
## 4 214456331 1334 47 1 113 0 5 75 657
## 5 27275310 96 2 0 4 0 0 1 57
## 6 106368816 162 8 3 6 0 0 4 99
## 7 101483437 638 41 1 38 8 0 6 363
## 8 22028721 168 8 0 8 0 1 2 86
## 9 85042995 365 2 0 26 0 0 6 255
## 10 22507579 107 2 0 15 0 0 0 56
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1 12 57 2 1 0 7 15 3 11 5 1
## 2 8 3 1 1 0 4 0 2 1 3 0
## 3 27 36 14 3 4 18 30 2 47 20 6
## 4 61 80 18 5 21 38 72 3 21 52 12
## 5 2 3 1 2 0 1 2 3 3 4 2
## 6 3 2 1 0 0 1 3 3 12 0 1
## 7 7 28 3 0 6 9 14 2 71 17 6
## 8 8 9 0 1 0 2 13 3 6 3 5
## 9 1 15 5 1 4 7 5 2 5 2 4
## 10 1 5 1 0 0 1 1 3 8 6 0
## COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1 8 0 0 1 34 1 1 1 1
## 2 3 0 0 NA NA 0 1 0 1
## 3 41 0 0 NA NA 2 3 2 3
## 4 53 0 0 2 56 2 3 2 3
## 5 9 0 0 NA NA 0 1 0 1
## 6 16 0 0 NA NA 1 1 1 1
## 7 18 0 0 NA NA 1 3 1 3
## 8 13 0 0 NA NA 0 1 0 1
## 9 25 0 0 1 22 1 3 1 3
## 10 8 0 0 1 27 0 1 0 1
## Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1 33724584 67091904 2838 1426 0 <NA> NA NA
## 2 0 42909056 976 345 2 <NA> NA NA
## 3 155632735 460626103 14579 10122 0 <NA> NA NA
## 4 125525251 1494221307 9935 24208 17 <NA> NA NA
## 5 0 50185684 834 1444 0 <NA> NA NA
## 6 22821995 37523391 652 3342 0 <NA> NA NA
## 7 57802114 529074069 3371 10448 0 <NA> NA NA
## 8 0 13450411 2046 591 5 <NA> NA NA
## 9 33077714 262320355 3158 11056 0 <NA> NA NA
## 10 0 86310524 1223 3343 0 <NA> NA NA
## POST_OFFICES LOG_GDP_CAPITA PAY_TV_p FIXED_PHONES_p Cars_p
## 1 3 9.656845 0.012318880 0.039073099 0.15401313
## 2 1 10.116137 0.041650745 0.099350401 0.37294612
## 3 1 9.574317 0.014324679 0.047085564 0.14727750
## 4 1 9.803026 0.011081661 0.023683092 0.08987453
## 5 1 8.896335 0.027023598 0.018840396 0.05290535
## 6 1 9.587317 0.017825444 0.003346893 0.01205621
## 7 1 9.149510 0.049129061 0.007761484 0.05462205
## 8 2 10.876408 0.036144578 0.062988797 0.43246671
## 9 10 8.848822 0.007983807 0.011207317 0.05918513
## 10 1 9.687801 0.013031161 0.026133144 0.08661473
## Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1 0.07738644 2.324849e-03 0.9078146199 7.499787269 2035.764
## 2 0.13183034 9.551681e+00 1.3675468093 6.118112342 7453.938
## 3 0.10225275 7.879584e-05 3.8818273563 0.005314072 1208.665
## 4 0.21899170 1.446078e+00 4.4218016518 0.007054630 1940.026
## 5 0.09160112 1.470224e+00 0.0004453184 2.128015732 1730.228
## 6 0.06179734 8.159795e+00 0.8123953402 1.795125925 1966.879
## 7 0.16929434 1.204778e+00 1.5251376489 0.003037511 1644.389
## 8 0.12492074 2.755930e+01 2.2474360600 14.141086451 4656.250
## 9 0.20720417 6.570272e-01 0.3187656959 2.579968327 1593.819
## 10 0.23675637 6.454950e+00 0.0009929178 2.271932720 1594.021
## pop_density tax_to_gdp MLR_RES geom
## 1 17.631299 9.313897e-02 -0.001283972 MULTIPOLYGON (((-48.84178 -...
## 2 11.034744 3.572101e-02 -0.241843560 MULTIPOLYGON (((-51.03724 -...
## 3 784.452017 1.195253e-01 0.240172219 POLYGON ((-35.10602 -7.8251...
## 4 19.037999 1.031455e-01 0.316636847 MULTIPOLYGON (((-47.00353 -...
## 5 86.863566 2.839409e-02 -0.164691960 MULTIPOLYGON (((-37.98092 -...
## 6 12.450645 2.179042e-02 0.400526860 MULTIPOLYGON (((-48.30974 -...
## 7 72.994902 6.326075e-02 0.093778846 MULTIPOLYGON (((-40.33112 -...
## 8 3.049622 4.882256e-02 -0.013513261 POLYGON ((-54.1094 -31.4331...
## 9 23.553976 7.134762e-05 -0.207225765 MULTIPOLYGON (((-39.15667 -...
## 10 7.809950 2.311691e-02 0.364934669 POLYGON ((-67.13424 -9.6762...
Next, we will convert brazil.point.res.sf simple feature object into a SpatialPointDataFrame by using as_Spatial()
brazil.point.sp <- as_Spatial(brazil.point.res.sf)
brazil.point.sp
## class : SpatialPointsDataFrame
## features : 2522
## extent : -72.9165, -34.82395, -33.68757, 2.816682 (xmin, xmax, ymin, ymax)
## Warning in proj4string(x): CRS object has comment, which is lost in output
## crs : +proj=longlat +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +no_defs
## variables : 91
## names : CITY, STATE, CAPITAL, IBGE_RES_POP, IBGE_RES_POP_BRAS, IBGE_RES_POP_ESTR, IBGE_DU, IBGE_DU_URBAN, IBGE_DU_RURAL, IBGE_POP, IBGE_1, IBGE_1.4, IBGE_5.9, IBGE_10.14, Active_pop, ...
## min values : ABADIÂNIA, AC, 0, 1641, 1641, 0, 595, 187, 3, 528, 2, 9, 30, 34, 358, ...
## max values : ZÉ DOCA, TO, 1, 11253503, 11133776, 119727, 3576148, 3548433, 33809, 10463636, 129464, 514794, 684443, 783702, 7058221, ...
brazil.polygon.res.sf <- st_make_valid(brazil.polygon.res.sf)
qtm(brazil.polygon.res.sf, "GDP_CAPITA", borders=NULL, scale=0.7)+
tm_legend(main.title="GDP_CAPITA",
main.title.position="centre")
brazil.polygon.res.sf <- st_make_valid(brazil.polygon.res.sf)
qtm(brazil.polygon.res.sf, "MLR_RES", borders=NULL,scale = 0.7) + tm_legend(
main.title = "Residuals",
main.title.position = "centre")
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
tm_shape(mun)+
tm_polygons()+
tm_shape(brazil.point.res.sf)+
tm_dots(col="MLR_RES", alpha=0.6, style="quantile")
## Variable(s) "MLR_RES" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
The figure above reveal that there is sign of spatial autocorrelation.
We will now perform Moran’s I test, to further confirm our observation
The hypothesis test is as follow:
H0: Residual for regression model is randomly distributed
H1: Residual for regression model is not randomly distributed
Confidence interval: 0.95
The code chunk below will tell us the upper limit for distance band
coords <- coordinates(brazil.point.sp)
k <- knn2nb(knearneigh(coords))
kdists <- unlist(nbdists(k, coords, longlat=FALSE))
summary(kdists)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.005484 0.105464 0.161013 0.220505 0.252019 3.093780
The output above tells us that the maximum distance is 3.093780. Using this as the upper threshold, this gives us certainty that all units will have at least one neighbour.
Compute the distance-based weight matrix
nb <- dnearneigh(coords, 0, 3.10, longlat=FALSE)
nb_lw <- nb2listw(nb, style='B')
summary(nb_lw)
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 2522
## Number of nonzero links: 640004
## Percentage nonzero weights: 10.06219
## Average number of links: 253.7684
## Link number distribution:
##
## 1 2 3 4 5 6 7 8 10 12 13 14 15 16 17 18 19 20 21 22
## 1 4 3 6 5 4 2 7 4 2 6 10 3 2 1 2 4 4 3 7
## 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
## 10 7 10 6 10 13 9 8 11 7 11 11 1 3 3 2 3 9 5 4
## 43 44 45 46 47 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
## 1 1 2 3 2 3 8 2 2 2 1 2 1 3 7 3 3 2 2 3
## 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
## 1 5 6 2 6 4 8 5 6 4 4 4 6 6 9 6 3 5 2 2
## 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## 6 6 5 4 3 8 4 5 6 3 13 3 10 6 7 4 2 9 6 9
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
## 6 8 4 10 8 5 6 4 8 6 8 5 4 3 10 6 5 4 3 5
## 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
## 4 3 1 5 4 7 4 1 8 5 8 5 7 5 4 1 4 7 10 3
## 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163
## 1 3 9 4 2 9 6 10 7 4 7 8 2 9 3 11 9 11 4 4
## 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183
## 7 7 3 5 7 7 1 3 9 7 4 1 4 1 4 6 3 8 7 2
## 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203
## 6 5 9 5 3 5 7 4 5 6 10 11 3 4 9 4 1 6 2 8
## 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223
## 5 5 2 5 5 2 4 5 7 9 3 7 2 5 7 2 8 6 10 5
## 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 241 242 243 244
## 3 8 3 2 3 6 6 6 4 7 4 5 2 4 6 4 4 6 1 5
## 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264
## 4 2 8 4 4 6 5 4 2 4 4 4 5 5 8 2 4 2 5 1
## 265 266 267 268 269 270 271 272 273 275 276 277 278 279 280 281 282 283 284 285
## 2 2 3 4 2 3 5 2 1 7 7 5 4 3 2 2 4 4 2 5
## 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305
## 6 4 4 5 6 6 7 4 8 3 7 4 6 8 6 2 9 3 6 6
## 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325
## 11 4 5 3 2 6 5 1 3 6 6 4 5 11 4 8 4 10 6 4
## 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345
## 4 7 6 3 4 4 6 3 6 4 5 4 4 2 8 6 7 6 5 7
## 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365
## 2 6 7 8 7 6 6 13 8 7 14 5 12 10 5 6 6 4 5 4
## 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385
## 5 4 3 7 6 4 7 4 10 3 5 9 2 9 13 7 5 3 5 2
## 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405
## 1 4 6 5 2 3 8 7 11 5 4 5 8 1 8 8 7 7 8 6
## 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
## 3 6 6 5 6 2 7 5 5 5 6 4 6 5 3 2 5 3 6 5
## 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445
## 1 3 7 6 4 6 4 5 5 6 2 3 4 4 6 3 6 6 3 5
## 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465
## 1 3 1 5 7 3 9 7 4 3 4 8 2 3 6 3 3 3 2 9
## 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485
## 8 4 7 2 3 4 3 2 4 1 4 6 4 4 2 4 4 7 6 3
## 486 487 488 489 490 491 492 493 494 496 497 499 500 501 502 503 504 505 506 507
## 2 4 2 3 5 3 2 2 2 3 5 2 3 2 2 1 3 2 4 1
## 508 509 511 512 513 514 515 516 517 518 519 520 521 522 524 525 526 527 528
## 3 2 2 4 2 2 4 2 5 5 3 4 2 3 3 1 1 1 1
## 1 least connected region:
## 2119 with 1 link
## 1 most connected region:
## 28 with 528 links
##
## Weights style: B
## Weights constants summary:
## n nn S0 S1 S2
## B 2522 6360484 640004 1280008 859921536
Computing Global Moran’s I test
lm.morantest(brazil_lglm2, nb_lw)
##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil6)
## weights: nb_lw
##
## Moran I statistic standard deviate = 14.216, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 2.007820e-02 -8.180248e-04 2.160505e-06
Since pvalue < 2.2e-16, which is less than alpha value (0.05), reject null hypothesis. There is sufficient evidence to conclude that the residuals are not randomly distributed. In fact, from analysis previously, we can now safely conclude that the residuals are normally distributed.
Since the observed Moran I = 0.2007820, which is greater than 0, we can infer that the residuals resemble cluster distribution.
There are 2 different approaches, and 2 different kernels to use. ### 7.1 CV approach and Gaussian Kernel There are two possible approaches that can be used, CV (cross-validation) and AIC corrected approach. There are two possible kernels that can be used, Gaussian and bi-square kernel We will be testing out all 4 different methods ##### 7.1.1 Computing Fixed Bandwidth GWR Model In the code chunk below, we use bw.gwr()of GWModel package to determine the optimal fixed bandwidth to use in the model
We will use gw.dist() to calculate the dMat value
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="gaussian", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 CV score: 178.8391
## Fixed bandwidth: 14.55489 CV score: 177.7677
## Fixed bandwidth: 8.998324 CV score: 176.9538
## Fixed bandwidth: 5.56418 CV score: 181.3198
## Fixed bandwidth: 11.12074 CV score: 177.1448
## Fixed bandwidth: 7.686598 CV score: 177.1898
## Fixed bandwidth: 9.809016 CV score: 176.979
## Fixed bandwidth: 8.497289 CV score: 176.9898
## Fixed bandwidth: 9.307981 CV score: 176.9533
## Fixed bandwidth: 9.499359 CV score: 176.9597
## Fixed bandwidth: 9.189703 CV score: 176.9518
## Fixed bandwidth: 9.116603 CV score: 176.9519
The result shows that the recommended bandwidth is 9.116603m
Constructing the fixed bandwidth gwr model
gwr.fixed <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.fixed
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:14:14
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.fixed, kernel = "gaussian", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 9.189703
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 5.2353e+00 6.3630e+00 6.5508e+00 6.8287e+00 7.3867
## IBGE_CROP_PRODUCTION 5.3055e-08 2.9313e-07 3.1528e-07 3.3611e-07 0.0000
## IDHM_Longevidade 3.0786e-01 5.9856e-01 6.8837e-01 7.7004e-01 1.5507
## IDHM_Renda 2.0687e+00 2.6508e+00 3.1007e+00 3.7208e+00 4.3203
## GVA_AGROPEC_p 2.0202e-02 2.1351e-02 2.2135e-02 2.3136e-02 0.0295
## GVA_INDUSTRY_p 1.6383e-02 1.7275e-02 1.7884e-02 1.8100e-02 0.0247
## GVA_SERVICES_p 9.6425e-03 1.4560e-02 1.4730e-02 1.5160e-02 0.0187
## MUN_EXPENDIT_p 7.2041e-05 7.4460e-05 7.7082e-05 9.0463e-05 0.0001
## tax_to_gdp 5.8077e-04 7.5617e-04 7.7445e-04 8.9061e-04 0.0012
## Cars_p -2.8773e-01 3.3830e-01 6.6848e-01 7.9079e-01 0.9821
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 26.5562
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2495.444
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 320.7882
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 298.2736
## Residual sum of squares: 164.8792
## R-square value: 0.847111
## Adjusted R-square value: 0.8454833
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:14:25
The adjusted R2 value is 0.8454833, and p-value < 2.22e-16
We will now set adaptive = TRUE since we are calculating the adaptive bandwidth
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.adaptive <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach = "CV", kernel="gaussian", adaptive = TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 1566 CV score: 177.7881
## Adaptive bandwidth: 976 CV score: 177.3087
## Adaptive bandwidth: 610 CV score: 179.5891
## Adaptive bandwidth: 1201 CV score: 177.1208
## Adaptive bandwidth: 1341 CV score: 177.4033
## Adaptive bandwidth: 1115 CV score: 177.1015
## Adaptive bandwidth: 1061 CV score: 177.1428
## Adaptive bandwidth: 1147 CV score: 177.1051
## Adaptive bandwidth: 1093 CV score: 177.0983
## Adaptive bandwidth: 1082 CV score: 177.1206
## Adaptive bandwidth: 1102 CV score: 177.0993
## Adaptive bandwidth: 1089 CV score: 177.099
## Adaptive bandwidth: 1097 CV score: 177.1006
## Adaptive bandwidth: 1092 CV score: 177.0962
## Adaptive bandwidth: 1090 CV score: 177.0996
## Adaptive bandwidth: 1092 CV score: 177.0962
The result shows that 1092 is the recommended data points to be used.
Constructing the adaptive bandwidth gwr model
gwr.adaptive <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.adaptive
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:16:11
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.adaptive, kernel = "gaussian", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 1092
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 6.4622e+00 6.4623e+00 6.4624e+00 6.4624e+00 6.4624
## IBGE_CROP_PRODUCTION 2.7576e-07 2.7579e-07 2.7579e-07 2.7579e-07 0.0000
## IDHM_Longevidade 6.6324e-01 6.6325e-01 6.6326e-01 6.6326e-01 0.6633
## IDHM_Renda 3.3624e+00 3.3624e+00 3.3624e+00 3.3625e+00 3.3626
## GVA_AGROPEC_p 2.2231e-02 2.2232e-02 2.2232e-02 2.2232e-02 0.0222
## GVA_INDUSTRY_p 1.7787e-02 1.7787e-02 1.7787e-02 1.7787e-02 0.0178
## GVA_SERVICES_p 1.4594e-02 1.4594e-02 1.4594e-02 1.4594e-02 0.0146
## MUN_EXPENDIT_p 8.1886e-05 8.1887e-05 8.1887e-05 8.1888e-05 0.0001
## tax_to_gdp 8.4337e-04 8.4338e-04 8.4339e-04 8.4340e-04 0.0008
## Cars_p 5.2077e-01 5.2082e-01 5.2085e-01 5.2086e-01 0.5209
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 10.00117
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.999
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 435.4654
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 423.3596
## Residual sum of squares: 173.9611
## R-square value: 0.8386896
## Adjusted R-square value: 0.8380471
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:16:22
The adjusted R2 is 0.8380471, and pvalue < 2.2e-16
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed2 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach = "AIC", kernel="gaussian", adaptive = FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 AICc value: 412.4282
## Fixed bandwidth: 14.55489 AICc value: 381.0282
## Fixed bandwidth: 8.998324 AICc value: 316.9127
## Fixed bandwidth: 5.56418 AICc value: 188.2232
## Fixed bandwidth: 3.441763 AICc value: 69.8468
## Fixed bandwidth: 2.130036 AICc value: -3.991837
## Fixed bandwidth: 1.319345 AICc value: -0.5756209
## Fixed bandwidth: 2.631071 AICc value: 23.68389
## Fixed bandwidth: 1.82038 AICc value: -16.35188
## Fixed bandwidth: 1.629001 AICc value: -18.72857
## Fixed bandwidth: 1.510723 AICc value: -16.39338
## Fixed bandwidth: 1.702101 AICc value: -18.5296
## Fixed bandwidth: 1.583823 AICc value: -18.28341
## Fixed bandwidth: 1.656923 AICc value: -18.77527
## Fixed bandwidth: 1.67418 AICc value: -18.72591
## Fixed bandwidth: 1.646258 AICc value: -18.77651
## Fixed bandwidth: 1.639667 AICc value: -18.76563
## Fixed bandwidth: 1.650332 AICc value: -18.77875
## Fixed bandwidth: 1.652849 AICc value: -18.77845
## Fixed bandwidth: 1.648776 AICc value: -18.77829
## Fixed bandwidth: 1.651293 AICc value: -18.77878
The results shows that the recommended bandwidth is 1.651293m
Constructing the fixed bandwidth gwr model
gwr.fixed2 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed2, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.fixed2
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:18:32
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.fixed2, kernel = "gaussian", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 1.651293
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 3.5215e+00 6.3136e+00 6.8398e+00 7.7959e+00 11.3203
## IBGE_CROP_PRODUCTION -7.5668e-07 2.8135e-07 3.8394e-07 5.6728e-07 0.0000
## IDHM_Longevidade -3.9407e+00 2.5051e-01 7.1351e-01 1.4550e+00 6.9458
## IDHM_Renda -1.7380e+00 1.3014e+00 2.1133e+00 3.0724e+00 5.3516
## GVA_AGROPEC_p -3.2826e-02 1.7117e-02 2.2537e-02 4.0983e-02 0.1108
## GVA_INDUSTRY_p -3.2572e-01 1.5245e-02 1.9853e-02 2.6880e-02 0.1789
## GVA_SERVICES_p -6.9099e-02 1.3607e-02 1.5936e-02 1.9998e-02 0.3289
## MUN_EXPENDIT_p -2.3390e-04 5.9268e-05 7.3281e-05 9.6047e-05 0.0004
## tax_to_gdp -1.4518e-02 2.9002e-04 7.5854e-04 1.0303e-03 0.0040
## Cars_p -1.4322e+00 5.3910e-01 9.4238e-01 1.5196e+00 8.6382
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 350.0251
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2171.975
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): -18.77878
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -349.4203
## Residual sum of squares: 115.7155
## R-square value: 0.8926995
## Adjusted R-square value: 0.8753995
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:18:42
The adjusted r2 is 0.8753995, and pvalue < 2.2e-16
dmat <- gw.dist(dp.locat = coordinates(brazil.point.sp))
bw.adaptive2 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="gaussian", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 1566 AICc value: 375.3037
## Adaptive bandwidth (number of nearest neighbours): 976 AICc value: 334.4534
## Adaptive bandwidth (number of nearest neighbours): 610 AICc value: 296.5408
## Adaptive bandwidth (number of nearest neighbours): 385 AICc value: 234.011
## Adaptive bandwidth (number of nearest neighbours): 244 AICc value: 146.8711
## Adaptive bandwidth (number of nearest neighbours): 159 AICc value: 52.35883
## Adaptive bandwidth (number of nearest neighbours): 104 AICc value: -58.53556
## Adaptive bandwidth (number of nearest neighbours): 72 AICc value: -131.4551
## Adaptive bandwidth (number of nearest neighbours): 50 AICc value: -200.397
## Adaptive bandwidth (number of nearest neighbours): 39 AICc value: -237.422
## Adaptive bandwidth (number of nearest neighbours): 29 AICc value: -265.716
## Adaptive bandwidth (number of nearest neighbours): 26 AICc value: -266.6002
## Adaptive bandwidth (number of nearest neighbours): 21 AICc value: -275.1402
## Adaptive bandwidth (number of nearest neighbours): 21 AICc value: -275.1402
The result shows that 21 is the recommended data points to be used
Constructing the adaptive bandwidth gwr model
gwr.adaptive2 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive2, kernel="gaussian", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.adaptive2
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:20:50
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.adaptive2, kernel = "gaussian", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: gaussian
## Fixed bandwidth: 21
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 6.1794e+00 6.4128e+00 6.4903e+00 6.5236e+00 6.6368
## IBGE_CROP_PRODUCTION 2.2399e-07 2.7965e-07 2.8483e-07 2.8985e-07 0.0000
## IDHM_Longevidade 6.0373e-01 6.5560e-01 6.8403e-01 6.9853e-01 0.8447
## IDHM_Renda 3.0376e+00 3.2130e+00 3.2873e+00 3.4315e+00 3.6493
## GVA_AGROPEC_p 2.1810e-02 2.2031e-02 2.2160e-02 2.2375e-02 0.0228
## GVA_INDUSTRY_p 1.7480e-02 1.7705e-02 1.7791e-02 1.7958e-02 0.0183
## GVA_SERVICES_p 1.3994e-02 1.4559e-02 1.4602e-02 1.4631e-02 0.0147
## MUN_EXPENDIT_p 7.7340e-05 7.9584e-05 8.0709e-05 8.3413e-05 0.0001
## tax_to_gdp 7.8632e-04 8.0696e-04 8.2160e-04 8.5048e-04 0.0009
## Cars_p 3.1531e-01 4.8545e-01 5.6320e-01 5.9127e-01 0.6732
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 13.07303
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2508.927
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 407.0152
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 393.227
## Residual sum of squares: 171.7826
## R-square value: 0.8407097
## Adjusted R-square value: 0.8398794
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:21:01
The adjusted r2 is 0.8398794, pvalue < 2.2e-16
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed3 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="bisquare", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 CV score: 177.2407
## Fixed bandwidth: 14.55489 CV score: 182.7619
## Fixed bandwidth: 29.10215 CV score: 177.4802
## Fixed bandwidth: 20.11145 CV score: 176.3287
## Fixed bandwidth: 17.98903 CV score: 176.0173
## Fixed bandwidth: 16.6773 CV score: 176.8538
## Fixed bandwidth: 18.79972 CV score: 175.9784
## Fixed bandwidth: 19.30076 CV score: 176.0674
## Fixed bandwidth: 18.49006 CV score: 175.9627
## Fixed bandwidth: 18.29869 CV score: 175.9709
## Fixed bandwidth: 18.60834 CV score: 175.9648
## Fixed bandwidth: 18.41696 CV score: 175.9641
## Fixed bandwidth: 18.53524 CV score: 175.9629
## Fixed bandwidth: 18.46214 CV score: 175.963
## Fixed bandwidth: 18.50732 CV score: 175.9627
The result shows that the recommended bandwidth is 18.50732m
Constructing the fixed bandwidth gwr model
gwr.fixed3 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed3, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.fixed3
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:21:48
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.fixed3, kernel = "bisquare", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Fixed bandwidth: 18.50732
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 4.8252e+00 6.3883e+00 6.5991e+00 7.0581e+00 7.7033
## IBGE_CROP_PRODUCTION -3.3349e-07 2.8214e-07 3.2965e-07 3.6047e-07 0.0000
## IDHM_Longevidade 5.5489e-02 5.3515e-01 6.4530e-01 7.8001e-01 2.2054
## IDHM_Renda 1.6200e+00 2.4049e+00 2.9986e+00 3.5334e+00 4.3333
## GVA_AGROPEC_p 1.8017e-02 2.0991e-02 2.2101e-02 2.4050e-02 0.0524
## GVA_INDUSTRY_p 1.0660e-02 1.7002e-02 1.7903e-02 1.8188e-02 0.0524
## GVA_SERVICES_p -4.8527e-04 1.4595e-02 1.4910e-02 1.5538e-02 0.0316
## MUN_EXPENDIT_p 4.1144e-05 7.3532e-05 7.5580e-05 9.5306e-05 0.0002
## tax_to_gdp 2.1798e-04 7.5500e-04 7.7579e-04 9.2807e-04 0.0016
## Cars_p -5.7055e-01 4.8067e-01 7.4510e-01 8.9274e-01 2.6361
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 34.21613
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2487.784
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 262.811
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 233.6465
## Residual sum of squares: 160.3009
## R-square value: 0.8513564
## Adjusted R-square value: 0.8493112
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:21:57
The adjusted R2 = 0.8493112, pvalue= 2.2e-16
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.adaptive3 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="CV", kernel="bisquare", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth: 1566 CV score: 183.7409
## Adaptive bandwidth: 976 CV score: 204.2585
## Adaptive bandwidth: 1932 CV score: 176.7985
## Adaptive bandwidth: 2157 CV score: 176.5077
## Adaptive bandwidth: 2297 CV score: 176.7875
## Adaptive bandwidth: 2071 CV score: 176.5366
## Adaptive bandwidth: 2210 CV score: 176.601
## Adaptive bandwidth: 2123 CV score: 176.4937
## Adaptive bandwidth: 2103 CV score: 176.5012
## Adaptive bandwidth: 2136 CV score: 176.4752
## Adaptive bandwidth: 2143 CV score: 176.4837
## Adaptive bandwidth: 2130 CV score: 176.4768
## Adaptive bandwidth: 2138 CV score: 176.4798
## Adaptive bandwidth: 2133 CV score: 176.4711
## Adaptive bandwidth: 2133 CV score: 176.4711
The result shows that 2133 is the recommended data points to be used
Constructing the adaptive bandwidth gwr model
gwr.adaptive3 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.adaptive3, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.adaptive3
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:23:22
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.adaptive3, kernel = "bisquare", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Fixed bandwidth: 2133
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 6.4622e+00 6.4623e+00 6.4624e+00 6.4624e+00 6.4624
## IBGE_CROP_PRODUCTION 2.7576e-07 2.7579e-07 2.7579e-07 2.7579e-07 0.0000
## IDHM_Longevidade 6.6324e-01 6.6325e-01 6.6326e-01 6.6327e-01 0.6633
## IDHM_Renda 3.3623e+00 3.3624e+00 3.3624e+00 3.3625e+00 3.3626
## GVA_AGROPEC_p 2.2231e-02 2.2232e-02 2.2232e-02 2.2232e-02 0.0222
## GVA_INDUSTRY_p 1.7787e-02 1.7787e-02 1.7787e-02 1.7787e-02 0.0178
## GVA_SERVICES_p 1.4594e-02 1.4594e-02 1.4594e-02 1.4594e-02 0.0146
## MUN_EXPENDIT_p 8.1885e-05 8.1887e-05 8.1887e-05 8.1888e-05 0.0001
## tax_to_gdp 8.4337e-04 8.4338e-04 8.4339e-04 8.4340e-04 0.0008
## Cars_p 5.2076e-01 5.2082e-01 5.2086e-01 5.2087e-01 0.5209
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 10.00123
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.999
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 435.4649
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 423.359
## Residual sum of squares: 173.9611
## R-square value: 0.8386896
## Adjusted R-square value: 0.8380471
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:23:31
Adjusted R2 = 0.8380471, value < 2.2e-16
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
bw.fixed4 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="bisquare", adaptive=FALSE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Fixed bandwidth: 23.54559 AICc value: 337.9902
## Fixed bandwidth: 14.55489 AICc value: 194.3918
## Fixed bandwidth: 8.998324 AICc value: 96.12548
## Fixed bandwidth: 5.56418 AICc value: 11773.8
## Fixed bandwidth: 11.12074 AICc value: 126.2947
## Fixed bandwidth: 7.686598 AICc value: 79.04865
## Fixed bandwidth: 6.875907 AICc value: 62.14869
## Fixed bandwidth: 6.374872 AICc value: 2233.271
## Fixed bandwidth: 7.185563 AICc value: 69.59574
## Fixed bandwidth: 6.684528 AICc value: 25167.85
## Fixed bandwidth: 6.994185 AICc value: 65.04558
## Fixed bandwidth: 6.802807 AICc value: 4120.464
## Fixed bandwidth: 6.921085 AICc value: 63.21221
## Fixed bandwidth: 6.847985 AICc value: 61.4571
## Fixed bandwidth: 6.830728 AICc value: 130.1504
## Fixed bandwidth: 6.85865 AICc value: 61.72205
## Fixed bandwidth: 6.841393 AICc value: 61.29374
## Fixed bandwidth: 6.83732 AICc value: 2014.162
## Fixed bandwidth: 6.843911 AICc value: 61.3561
## Fixed bandwidth: 6.839837 AICc value: 61.25522
## Fixed bandwidth: 6.838876 AICc value: 61.23142
## Fixed bandwidth: 6.838281 AICc value: 61.21671
## Fixed bandwidth: 6.837914 AICc value: 61.20763
## Fixed bandwidth: 6.837687 AICc value: 61.20209
## Fixed bandwidth: 6.837547 AICc value: 61.19847
## Fixed bandwidth: 6.83746 AICc value: 63.5259
## Fixed bandwidth: 6.8376 AICc value: 61.19983
## Fixed bandwidth: 6.837514 AICc value: 61.20102
The result shows that the recommended bandwidth is 6.837514m
Constructing the fixed bandwidth gwr model
gwr.fixed4 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=bw.fixed4, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.fixed4
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:26:03
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = bw.fixed4, kernel = "bisquare", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Fixed bandwidth: 6.837547
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 3.7228e+00 6.3174e+00 6.7982e+00 7.6754e+00 10.5289
## IBGE_CROP_PRODUCTION -3.0273e-07 2.9595e-07 3.5234e-07 4.6128e-07 0.0000
## IDHM_Longevidade -3.4846e+00 2.7581e-01 7.3632e-01 1.2301e+00 4.7401
## IDHM_Renda -9.3779e-01 1.5373e+00 2.2102e+00 3.0612e+00 5.0615
## GVA_AGROPEC_p -7.6501e-02 1.8035e-02 2.3185e-02 3.4198e-02 0.0932
## GVA_INDUSTRY_p -6.2565e-01 1.5045e-02 1.9287e-02 2.0152e-02 0.3404
## GVA_SERVICES_p -6.4762e-02 1.4122e-02 1.6024e-02 1.8955e-02 1.0405
## MUN_EXPENDIT_p -1.5178e-04 6.3667e-05 7.6561e-05 9.2970e-05 0.0004
## tax_to_gdp -7.6089e-03 5.5813e-04 7.9482e-04 1.0216e-03 0.2455
## Cars_p -1.1388e+01 5.4224e-01 1.0497e+00 1.3031e+00 6.2200
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 163.8131
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2358.187
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 61.19893
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): -84.40912
## Residual sum of squares: 135.6657
## R-square value: 0.8742001
## Adjusted R-square value: 0.8654576
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:26:13
The adjusted r2 is 0.8654576, pvalue < 1.2e-16
dmat <- gw.dist(dp.locat=coordinates(brazil.point.sp))
gw.adaptive4 <- bw.gwr(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, approach="AIC", kernel="bisquare", adaptive=TRUE, longlat=FALSE, dMat=dmat)
## Take a cup of tea and have a break, it will take a few minutes.
## -----A kind suggestion from GWmodel development group
## Adaptive bandwidth (number of nearest neighbours): 1566 AICc value: 212.9455
## Adaptive bandwidth (number of nearest neighbours): 976 AICc value: 126.6319
## Adaptive bandwidth (number of nearest neighbours): 610 AICc value: 25.90335
## Adaptive bandwidth (number of nearest neighbours): 385 AICc value: -68.74051
## Adaptive bandwidth (number of nearest neighbours): 244 AICc value: -169.9868
## Adaptive bandwidth (number of nearest neighbours): 159 AICc value: -198.9041
## Adaptive bandwidth (number of nearest neighbours): 104 AICc value: -154.3044
## Adaptive bandwidth (number of nearest neighbours): 190 AICc value: -191.1021
## Adaptive bandwidth (number of nearest neighbours): 136 AICc value: -198.2876
## Adaptive bandwidth (number of nearest neighbours): 169 AICc value: -198.4536
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602
## Adaptive bandwidth (number of nearest neighbours): 154 AICc value: -199.307
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162
## Adaptive bandwidth (number of nearest neighbours): 152 AICc value: -199.4733
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602
## Adaptive bandwidth (number of nearest neighbours): 146 AICc value: -199.3602
## Adaptive bandwidth (number of nearest neighbours): 145 AICc value: -199.6369
## Adaptive bandwidth (number of nearest neighbours): 151 AICc value: -200.2273
## Adaptive bandwidth (number of nearest neighbours): 144 AICc value: -199.8091
## Adaptive bandwidth (number of nearest neighbours): 144 AICc value: -199.8091
## Adaptive bandwidth (number of nearest neighbours): 143 AICc value: -200.5664
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836
## Adaptive bandwidth (number of nearest neighbours): 150 AICc value: -199.3836
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162
## Adaptive bandwidth (number of nearest neighbours): 149 AICc value: -198.6162
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 148 AICc value: -199.379
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643
## Adaptive bandwidth (number of nearest neighbours): 147 AICc value: -199.2643
The result shows that 147 is the new recommended data points to be used
Constructing the adaptive bandwidth gwr model
gwr.adaptive4 <- gwr.basic(log(GDP_CAPITA)~ IBGE_CROP_PRODUCTION + IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p + GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data=brazil.point.sp, bw=gw.adaptive4, kernel="bisquare", longlat=FALSE)
## Warning in proj4string(data): CRS object has comment, which is lost in output
Display the model output
gwr.adaptive4
## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2020-05-31 17:31:03
## Call:
## gwr.basic(formula = log(GDP_CAPITA) ~ IBGE_CROP_PRODUCTION +
## IDHM_Longevidade + IDHM_Renda + GVA_AGROPEC_p + GVA_INDUSTRY_p +
## GVA_SERVICES_p + MUN_EXPENDIT_p + tax_to_gdp + Cars_p, data = brazil.point.sp,
## bw = gw.adaptive4, kernel = "bisquare", longlat = FALSE)
##
## Dependent (y) variable: GDP_CAPITA
## Independent variables: IBGE_CROP_PRODUCTION IDHM_Longevidade IDHM_Renda GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p tax_to_gdp Cars_p
## Number of data points: 2522
## ***********************************************************************
## * Results of Global Regression *
## ***********************************************************************
##
## Call:
## lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.41555 -0.15453 -0.02065 0.13162 2.01229
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.462e+00 1.582e-01 40.848 < 2e-16 ***
## IBGE_CROP_PRODUCTION 2.758e-07 3.201e-08 8.616 < 2e-16 ***
## IDHM_Longevidade 6.633e-01 2.357e-01 2.813 0.00494 **
## IDHM_Renda 3.362e+00 1.736e-01 19.370 < 2e-16 ***
## GVA_AGROPEC_p 2.223e-02 1.065e-03 20.869 < 2e-16 ***
## GVA_INDUSTRY_p 1.779e-02 6.141e-04 28.963 < 2e-16 ***
## GVA_SERVICES_p 1.459e-02 8.136e-04 17.938 < 2e-16 ***
## MUN_EXPENDIT_p 8.189e-05 6.189e-06 13.231 < 2e-16 ***
## tax_to_gdp 8.434e-04 1.711e-04 4.928 8.84e-07 ***
## Cars_p 5.208e-01 8.022e-02 6.492 1.01e-10 ***
##
## ---Significance stars
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 0.2632 on 2512 degrees of freedom
## Multiple R-squared: 0.8387
## Adjusted R-squared: 0.8381
## F-statistic: 1451 on 9 and 2512 DF, p-value: < 2.2e-16
## ***Extra Diagnostic information
## Residual sum of squares: 173.962
## Sigma(hat): 0.2627404
## AIC: 435.3718
## AICc: 435.477
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: bisquare
## Fixed bandwidth: 143
## Regression points: the same locations as observations are used.
## Distance metric: Euclidean distance metric is used.
##
## ****************Summary of GWR coefficient estimates:******************
## Min. 1st Qu. Median 3rd Qu. Max.
## Intercept 6.4390e+00 6.4571e+00 6.4649e+00 6.4677e+00 6.4766
## IBGE_CROP_PRODUCTION 2.7127e-07 2.7610e-07 2.7660e-07 2.7706e-07 0.0000
## IDHM_Longevidade 6.5988e-01 6.6303e-01 6.6512e-01 6.6655e-01 0.6805
## IDHM_Renda 3.3347e+00 3.3497e+00 3.3556e+00 3.3682e+00 3.3874
## GVA_AGROPEC_p 2.2190e-02 2.2211e-02 2.2223e-02 2.2243e-02 0.0223
## GVA_INDUSTRY_p 1.7755e-02 1.7777e-02 1.7785e-02 1.7804e-02 0.0178
## GVA_SERVICES_p 1.4546e-02 1.4588e-02 1.4592e-02 1.4595e-02 0.0146
## MUN_EXPENDIT_p 8.1389e-05 8.1666e-05 8.1783e-05 8.2020e-05 0.0001
## tax_to_gdp 8.3676e-04 8.3967e-04 8.4126e-04 8.4386e-04 0.0008
## Cars_p 5.0383e-01 5.1791e-01 5.2486e-01 5.2732e-01 0.5350
## ************************Diagnostic information*************************
## Number of data points: 2522
## Effective number of parameters (2trace(S) - trace(S'S)): 10.27537
## Effective degrees of freedom (n-2trace(S) + trace(S'S)): 2511.725
## AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 432.771
## AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 420.5244
## Residual sum of squares: 173.7561
## R-square value: 0.8388797
## Adjusted R-square value: 0.8382203
##
## ***********************************************************************
## Program stops at: 2020-05-31 17:31:12
The adjusted R2 value is 0.8382203, pvalue < 2.2e-16
We will now compare the r2 values of the 4 different approaches (8 in total since 2 per each (fixed, and adaptive)), before deciding which method is the most suitable.
We will compare based on adjusted R2 value. The higher the Adjusted R2 value, the better the method.
This is because the adjusted R2 value calculates the correlation of the variables. In other words, it takes into account all the variables in the model.
A higher adjusted R2 would mean that the higher the percentage of variation in GDP_CAPITA can be explained in the regression model.
CV approach and Gaussian kernel:
Fixed: 0.8454853
Adaptive: 0.8380471
AIC approach and Gaussian kernel:
Fixed: 0.8753995
Adaptive: 0.8398794
CV approach and bi-square kernel:
Fixed: 0.8493112
Adaptive: 0.8380471
AIC approach and Gaussian kernel:
Fixed: 0.8654576
Adaptive: 0.8382203
From the adjusted r2 values above, we can conclude that AIC approach and Gaussian kernel, fixed method has the highest adjusted R2 value of 0.875995. Hence, this method should be used.
In addition to regression residuals, the output feature class table includes fields for observed and predicted y values, condition number, local R2, residuals, and explanatory variable coefficients and standard errors.
We would now attempt to visualise the GWR Output using the AIC approach and Guassian kernel, fixed method, as identified above.
To visualise the SDF, we need to first convert it into sf data.frame
brazil.sf.fixed2 <- st_as_sf(gwr.fixed2$SDF) %>%
st_transform(crs=4674)
gwr.fixed2.output <- as.data.frame(gwr.fixed2$SDF)
brazil.sf.fixed2 <- cbind(brazil.polygon.res.sf, as.matrix(gwr.fixed2.output))
brazil.sf.fixed2
## Simple feature collection with 2522 features and 133 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: -73.99045 ymin: -33.75118 xmax: -28.83594 ymax: 3.605727
## geographic CRS: SIRGAS 2000
## First 10 features:
## code_muni name_muni code_state abbrev_state CAPITAL IBGE_RES_POP
## 1 5200100 ABADIÂNIA 52 GO 0 15757
## 2 4200051 ABDON BATISTA 42 SC 0 2653
## 3 2600054 ABREU E LIMA 26 PE 0 94429
## 4 2100055 AÇAILÂNDIA 21 MA 0 104047
## 5 2900306 ACAJUTIBA 29 BA 0 14653
## 6 1500206 ACARÁ 15 PA 0 53569
## 7 2300200 ACARAÚ 23 CE 0 57551
## 8 4300034 ACEGUÁ 43 RS 0 4394
## 9 2300309 ACOPIARA 23 CE 0 51160
## 10 1200013 ACRELÂNDIA 12 AC 0 12538
## IBGE_RES_POP_BRAS IBGE_RES_POP_ESTR IBGE_DU IBGE_DU_URBAN IBGE_DU_RURAL
## 1 15609 148 4655 3233 1422
## 2 2653 0 848 234 614
## 3 94407 22 28182 25944 2238
## 4 104018 29 27523 20612 6911
## 5 14643 10 4116 3632 484
## 6 53516 53 11833 3014 8819
## 7 57542 9 14680 7410 7270
## 8 4265 129 1398 314 1084
## 9 51160 0 15041 7885 7156
## 10 12535 3 3473 1679 1794
## IBGE_POP IBGE_1 IBGE_1.4 IBGE_5.9 IBGE_10.14 Active_pop IBGE_60.
## 1 10656 139 650 894 1087 6896 990
## 2 724 12 32 49 63 479 89
## 3 81482 1050 4405 6255 7019 54749 8004
## 4 78081 1442 5896 7924 8368 49197 5254
## 5 12727 216 849 1282 1404 7412 1564
## 6 12590 265 1082 1436 1537 7281 989
## 7 24117 358 1615 2084 2558 15123 2379
## 8 1059 17 46 76 119 684 117
## 9 25159 365 1521 1929 2422 15121 3801
## 10 5902 118 508 671 710 3484 411
## IBGE_PLANTED_AREA IBGE_CROP_PRODUCTION IDHM.Ranking.2010 IDHM IDHM_Renda
## 1 10307 33085 2202 0.690 0.671
## 2 5502 26195 2092 0.690 0.660
## 3 387 2595 2477 0.679 0.625
## 4 27137 89420 2633 0.672 0.643
## 5 4570 11442 4613 0.580 0.560
## 6 41637 342851 5513 0.506 0.517
## 7 18505 38871 4125 0.601 0.554
## 8 31149 119866 2259 0.687 0.703
## 9 17482 4244 4279 0.595 0.563
## 10 5807 31152 4079 0.604 0.584
## IDHM_Longevidade IDHM_Educacao LONG LAT ALT PAY_TV
## 1 0.841 0.579 -48.71881 -16.182672 1017.55 227
## 2 0.812 0.625 -51.02527 -27.608987 720.98 109
## 3 0.791 0.632 -34.89913 -7.904449 27.06 1418
## 4 0.785 0.602 -47.50666 -4.951377 229.05 1225
## 5 0.723 0.487 -38.01829 -11.662613 183.93 426
## 6 0.757 0.332 -48.20046 -1.963437 7.40 964
## 7 0.758 0.517 -40.11824 -2.885311 17.29 3032
## 8 0.852 0.541 -54.16473 -31.864015 237.92 171
## 9 0.724 0.517 -39.45571 -6.092762 312.96 426
## 10 0.808 0.466 -67.05232 -10.073794 205.89 184
## FIXED_PHONES AREA REGIAO_TUR CATEGORIA_TUR
## 1 720 1045.13 Região Turística Do Ouro E Cristais C
## 2 260 237.16 Vale Do Contestado D
## 3 4661 126.19 Costa Náutica Coroa Do Avião D
## 4 2618 5806.44 <NA> <NA>
## 5 297 181.48 <NA> <NA>
## 6 181 4343.55 Araguaia-Tocantins D
## 7 479 845.47 Litoral Extremo Oeste C
## 8 298 1551.34 Pampa Gaúcho D
## 9 598 2265.35 <NA> <NA>
## 10 369 1807.95 <NA> <NA>
## ESTIMATED_POP RURAL_URBAN GVA_AGROPEC GVA_INDUSTRY GVA_SERVICES
## 1 19614 Rural Adjacente 42.84 16728.30 138198.58
## 2 2577 Rural Adjacente 24996.75 3578.87 16011.10
## 3 99622 Urbano 7.80 384262.09 526.04
## 4 111757 Urbano 159853.84 488799.22 779.84
## 5 15129 Rural Adjacente 23176.61 7.02 33546.04
## 6 55513 Rural Adjacente 441281.74 43934.34 97080.41
## 7 62557 Rural Adjacente 74352.86 94123.87 187.46
## 8 4858 Urbano 130383.04 10632.62 66901.48
## 9 53931 Rural Adjacente 35057.66 17008.70 137661.95
## 10 15020 Rural Adjacente 91143.89 14.02 32079.69
## GVA_PUBLIC GVA_TOTAL TAXES GDP POP_GDP GDP_CAPITA
## 1 63396.20 261161.91 26822.58 287984.49 18427 15628.40
## 2 17842.64 62429.36 2312.65 64742.01 2617 24739.02
## 3 336141.88 1254241.30 170264.52 1424505.83 98990 14390.40
## 4 364811.85 1793302.18 206244.13 1999546.31 110543 18088.40
## 5 48142.87 111888.63 3269.82 115158.46 15764 7305.15
## 6 188941.14 771237.62 17.18 788.42 54080 14578.70
## 7 188055.40 543990.18 36737.25 580727.43 61715 9409.83
## 8 30193.24 238110.37 12221.86 250332.23 4731 52913.18
## 9 155449.90 345.18 26.52 371701.24 53358 6966.18
## 10 85106.34 222349.24 5261.66 227610.90 14120 16119.75
## GVA_MAIN
## 1 Demais serviços
## 2 Administração, defesa, educação e saúde públicas e seguridade social
## 3 Demais serviços
## 4 Demais serviços
## 5 Administração, defesa, educação e saúde públicas e seguridade social
## 6 Agricultura, inclusive apoio à agricultura e a pós colheita
## 7 Administração, defesa, educação e saúde públicas e seguridade social
## 8 Agricultura, inclusive apoio à agricultura e a pós colheita
## 9 Administração, defesa, educação e saúde públicas e seguridade social
## 10 Administração, defesa, educação e saúde públicas e seguridade social
## MUN_EXPENDIT COMP_TOT COMP_A COMP_B COMP_C COMP_D COMP_E COMP_F COMP_G
## 1 37513019 288 5 9 26 0 2 7 117
## 2 19506956 69 2 0 4 0 0 2 35
## 3 119645700 841 1 0 130 0 2 26 434
## 4 214456331 1334 47 1 113 0 5 75 657
## 5 27275310 96 2 0 4 0 0 1 57
## 6 106368816 162 8 3 6 0 0 4 99
## 7 101483437 638 41 1 38 8 0 6 363
## 8 22028721 168 8 0 8 0 1 2 86
## 9 85042995 365 2 0 26 0 0 6 255
## 10 22507579 107 2 0 15 0 0 0 56
## COMP_H COMP_I COMP_J COMP_K COMP_L COMP_M COMP_N COMP_O COMP_P COMP_Q COMP_R
## 1 12 57 2 1 0 7 15 3 11 5 1
## 2 8 3 1 1 0 4 0 2 1 3 0
## 3 27 36 14 3 4 18 30 2 47 20 6
## 4 61 80 18 5 21 38 72 3 21 52 12
## 5 2 3 1 2 0 1 2 3 3 4 2
## 6 3 2 1 0 0 1 3 3 12 0 1
## 7 7 28 3 0 6 9 14 2 71 17 6
## 8 8 9 0 1 0 2 13 3 6 3 5
## 9 1 15 5 1 4 7 5 2 5 2 4
## 10 1 5 1 0 0 1 1 3 8 6 0
## COMP_S COMP_T COMP_U HOTELS BEDS Pr_Agencies Pu_Agencies Pr_Bank Pu_Bank
## 1 8 0 0 1 34 1 1 1 1
## 2 3 0 0 NA NA 0 1 0 1
## 3 41 0 0 NA NA 2 3 2 3
## 4 53 0 0 2 56 2 3 2 3
## 5 9 0 0 NA NA 0 1 0 1
## 6 16 0 0 NA NA 1 1 1 1
## 7 18 0 0 NA NA 1 3 1 3
## 8 13 0 0 NA NA 0 1 0 1
## 9 25 0 0 1 22 1 3 1 3
## 10 8 0 0 1 27 0 1 0 1
## Pr_Assets Pu_Assets Cars Motorcycles Wheeled_tractor UBER MAC WAL.MART
## 1 33724584 67091904 2838 1426 0 <NA> NA NA
## 2 0 42909056 976 345 2 <NA> NA NA
## 3 155632735 460626103 14579 10122 0 <NA> NA NA
## 4 125525251 1494221307 9935 24208 17 <NA> NA NA
## 5 0 50185684 834 1444 0 <NA> NA NA
## 6 22821995 37523391 652 3342 0 <NA> NA NA
## 7 57802114 529074069 3371 10448 0 <NA> NA NA
## 8 0 13450411 2046 591 5 <NA> NA NA
## 9 33077714 262320355 3158 11056 0 <NA> NA NA
## 10 0 86310524 1223 3343 0 <NA> NA NA
## POST_OFFICES LOG_GDP_CAPITA PAY_TV_p FIXED_PHONES_p Cars_p
## 1 3 9.656845 0.012318880 0.039073099 0.15401313
## 2 1 10.116137 0.041650745 0.099350401 0.37294612
## 3 1 9.574317 0.014324679 0.047085564 0.14727750
## 4 1 9.803026 0.011081661 0.023683092 0.08987453
## 5 1 8.896335 0.027023598 0.018840396 0.05290535
## 6 1 9.587317 0.017825444 0.003346893 0.01205621
## 7 1 9.149510 0.049129061 0.007761484 0.05462205
## 8 2 10.876408 0.036144578 0.062988797 0.43246671
## 9 10 8.848822 0.007983807 0.011207317 0.05918513
## 10 1 9.687801 0.013031161 0.026133144 0.08661473
## Motorcycles_p GVA_AGROPEC_p GVA_INDUSTRY_p GVA_SERVICES_p MUN_EXPENDIT_p
## 1 0.07738644 2.324849e-03 0.9078146199 7.499787269 2035.764
## 2 0.13183034 9.551681e+00 1.3675468093 6.118112342 7453.938
## 3 0.10225275 7.879584e-05 3.8818273563 0.005314072 1208.665
## 4 0.21899170 1.446078e+00 4.4218016518 0.007054630 1940.026
## 5 0.09160112 1.470224e+00 0.0004453184 2.128015732 1730.228
## 6 0.06179734 8.159795e+00 0.8123953402 1.795125925 1966.879
## 7 0.16929434 1.204778e+00 1.5251376489 0.003037511 1644.389
## 8 0.12492074 2.755930e+01 2.2474360600 14.141086451 4656.250
## 9 0.20720417 6.570272e-01 0.3187656959 2.579968327 1593.819
## 10 0.23675637 6.454950e+00 0.0009929178 2.271932720 1594.021
## pop_density tax_to_gdp MLR_RES Intercept IBGE_CROP_PRODUCTION.1
## 1 17.631299 9.313897e-02 -0.001283972 6.597287 2.777367e-07
## 2 11.034744 3.572101e-02 -0.241843560 8.189936 5.672827e-07
## 3 784.452017 1.195253e-01 0.240172219 7.690896 1.498522e-06
## 4 19.037999 1.031455e-01 0.316636847 5.621880 3.475785e-07
## 5 86.863566 2.839409e-02 -0.164691960 6.731945 1.842714e-07
## 6 12.450645 2.179042e-02 0.400526860 6.920214 1.615765e-07
## 7 72.994902 6.326075e-02 0.093778846 7.936980 4.870924e-07
## 8 3.049622 4.882256e-02 -0.013513261 7.357133 2.706744e-07
## 9 23.553976 7.134762e-05 -0.207225765 7.075304 7.461556e-08
## 10 7.809950 2.311691e-02 0.364934669 7.660411 -2.618270e-08
## IDHM_Longevidade.1 IDHM_Renda.1 GVA_AGROPEC_p.1 GVA_INDUSTRY_p.1
## 1 1.4813248 1.8415302 0.02156504 0.015148346
## 2 0.0847707 1.3069467 0.01358269 0.014479816
## 3 0.4576879 1.3033866 0.07639569 0.028447246
## 4 1.5009399 3.3379677 0.04059035 0.027162696
## 5 1.4044373 1.1242071 0.07743475 0.005213864
## 6 1.5240248 0.8009765 0.07633172 0.049969254
## 7 -0.7909009 2.4032606 0.05842963 0.055760725
## 8 0.5034629 2.1229538 0.02265421 0.014055985
## 9 -0.1253044 2.9617273 0.08345629 0.032905428
## 10 0.0753320 2.5994594 0.03769048 -0.035619436
## GVA_SERVICES_p.1 MUN_EXPENDIT_p.1 tax_to_gdp.1 Cars_p.1 y
## 1 0.026897387 1.327321e-04 9.011492e-05 0.6328407 9.656845
## 2 0.015768268 7.854375e-05 3.379579e-04 1.3767096 10.116137
## 3 0.048717274 -1.457816e-05 1.200115e-03 1.6374547 9.574317
## 4 -0.004672260 1.894414e-04 6.255445e-04 1.4015512 9.803026
## 5 0.028399896 1.337906e-04 1.000011e-04 4.5140828 8.896335
## 6 0.030518674 9.615383e-05 3.918865e-04 4.9286169 9.587317
## 7 0.006838951 3.386892e-05 -1.358367e-03 3.5756980 9.149510
## 8 0.014682190 7.910365e-05 1.008933e-03 0.9234775 10.876408
## 9 0.023164711 3.474733e-05 -8.985955e-06 2.7743166 8.848822
## 10 -0.010880621 -9.682922e-05 -1.104611e-03 4.5244044 9.687801
## yhat residual CV_Score Stud_residual Intercept_SE
## 1 9.671149 -0.014303751 0 -0.06485682 1.0504153
## 2 10.481137 -0.364999808 0 -1.73304301 0.4033128
## 3 9.205809 0.368507968 0 1.76656698 0.6033006
## 4 9.649833 0.153192724 0 0.93719792 0.9127804
## 5 9.023612 -0.127276741 0 -0.56613874 0.5272937
## 6 9.510185 0.077131549 0 0.60697619 1.3335524
## 7 9.094195 0.055314680 0 0.25579239 1.1426845
## 8 10.942259 -0.065851575 0 -0.35832177 0.7946915
## 9 8.997019 -0.148196388 0 -0.67582022 0.6705061
## 10 9.694589 -0.006788715 0 -0.17136088 2.9725849
## IBGE_CROP_PRODUCTION_SE IDHM_Longevidade_SE IDHM_Renda_SE GVA_AGROPEC_p_SE
## 1 8.484024e-08 1.3177188 0.8873868 0.003504811
## 2 1.290444e-07 0.5151458 0.4163829 0.002020650
## 3 7.413602e-07 0.7879453 0.8727743 0.014356430
## 4 4.614686e-07 1.2285379 0.8864023 0.012913531
## 5 1.946111e-07 0.6984230 0.7654763 0.012859999
## 6 4.642615e-07 1.6630944 1.2057719 0.019326965
## 7 1.100752e-06 1.4288003 1.2359445 0.031319249
## 8 1.675201e-07 1.0194275 0.6575403 0.002914116
## 9 1.870839e-07 0.8438609 0.8137637 0.017063834
## 10 1.072791e-06 3.7547695 3.5833143 0.041169811
## GVA_INDUSTRY_p_SE GVA_SERVICES_p_SE MUN_EXPENDIT_p_SE tax_to_gdp_SE
## 1 0.0022550210 0.004879588 2.750713e-05 0.0008638681
## 2 0.0009756658 0.001779603 1.323539e-05 0.0002918066
## 3 0.0049360525 0.008842004 3.746259e-05 0.0006001835
## 4 0.0071949655 0.018413722 3.756291e-05 0.0011783000
## 5 0.0027032225 0.007730917 2.916379e-05 0.0007335302
## 6 0.0138239499 0.021059938 7.987995e-05 0.0014168033
## 7 0.0100904776 0.012943162 7.238931e-05 0.0012485628
## 8 0.0015549421 0.003430252 2.436199e-05 0.0004721408
## 9 0.0055450193 0.008943831 4.619489e-05 0.0009341655
## 10 0.1126677386 0.039796305 3.679609e-04 0.0031127056
## Cars_p_SE Intercept_TV IBGE_CROP_PRODUCTION_TV IDHM_Longevidade_TV
## 1 0.4335130 6.280646 3.27364338 1.12415854
## 2 0.2218850 20.306662 4.39602790 0.16455671
## 3 0.9374201 12.748034 2.02131471 0.58086250
## 4 1.4909104 6.159072 0.75320076 1.22172860
## 5 0.9069956 12.766973 0.94686971 2.01086928
## 6 2.2354432 5.189308 0.34802910 0.91637903
## 7 1.4761085 6.945907 0.44250867 -0.55354192
## 8 0.3458147 9.257847 1.61577302 0.49386826
## 9 0.8596388 10.552184 0.39883464 -0.14848937
## 10 3.8876976 2.577020 -0.02440614 0.02006302
## IDHM_Renda_TV GVA_AGROPEC_p_TV GVA_INDUSTRY_p_TV GVA_SERVICES_p_TV
## 1 2.0752283 6.1529837 6.7176073 5.5122252
## 2 3.1388098 6.7219413 14.8409592 8.8605529
## 3 1.4933832 5.3213568 5.7631572 5.5097551
## 4 3.7657480 3.1432423 3.7752365 -0.2537379
## 5 1.4686374 6.0213648 1.9287586 3.6735482
## 6 0.6642852 3.9494932 3.6146872 1.4491341
## 7 1.9444729 1.8656141 5.5260739 0.5283833
## 8 3.2286294 7.7739552 9.0395554 4.2802074
## 9 3.6395420 4.8908286 5.9342314 2.5900211
## 10 0.7254344 0.9154882 -0.3161458 -0.2734078
## MUN_EXPENDIT_p_TV tax_to_gdp_TV Cars_p_TV Local_R2 coords.x1 coords.x2
## 1 4.8253691 0.104315591 1.459796 0.8409292 -48.71881 -16.182672
## 2 5.9343752 1.158157225 6.204609 0.8586014 -51.02527 -27.608987
## 3 -0.3891390 1.999580526 1.746767 0.9438067 -34.89913 -7.904449
## 4 5.0433106 0.530887256 0.940064 0.9505465 -47.50666 -4.951377
## 5 4.5875581 0.136328509 4.976962 0.8851918 -38.01829 -11.662613
## 6 1.2037292 0.276599065 2.204761 0.9745541 -48.20046 -1.963437
## 7 0.4678718 -1.087944044 2.422382 0.9492447 -40.11824 -2.885311
## 8 3.2470118 2.136933421 2.670440 0.8619977 -54.16473 -31.864015
## 9 0.7521900 -0.009619232 3.227305 0.9705163 -39.45571 -6.092762
## 10 -0.2631508 -0.354871591 1.163775 0.9980819 -67.05232 -10.073794
## geom
## 1 MULTIPOLYGON (((-48.84178 -...
## 2 MULTIPOLYGON (((-51.03724 -...
## 3 POLYGON ((-35.10602 -7.8251...
## 4 MULTIPOLYGON (((-47.00353 -...
## 5 MULTIPOLYGON (((-37.98092 -...
## 6 MULTIPOLYGON (((-48.30974 -...
## 7 MULTIPOLYGON (((-40.33112 -...
## 8 POLYGON ((-54.1094 -31.4331...
## 9 MULTIPOLYGON (((-39.15667 -...
## 10 POLYGON ((-67.13424 -9.6762...
summary(gwr.fixed2$SDF$yhat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.302 9.481 9.989 9.923 10.309 14.606
The maximum value is 14.606
Remove code_muni from brazil.sf.fixed2 data frame
brazil.sf.fixed2 <- subset(brazil.sf.fixed2, select= -code_muni)
The code chunks below are used to plot a choropleth to visualise local R2
qtm(brazil.sf.fixed2, "Local_R2", border=NULL)
The range of local R2 values are from 0.75 < Local_R2 < 1.00, which is relatively high. In fact, the choropleth shows a large area in Brazil with darker shades, indicating higher Local R2 value. This suggest that our model is predicting well.
It can be seen that the upper parts of brazil are darker shaded, which suggests that the relationship between GDP_CAPITA and the independent variables are stronger, since the variables are more correlated.
The code chunks below are used to plot an choropleth map to visualise intercept of the regression model
qtm(brazil.sf.fixed2, "Intercept", border=NULL)
We can see that there is a varying range of intercept value from 2 to 12, and since intercept value > 0, we can conclude that the slope is positive.
Residual is the difference between the observed GDP_CAPITA and the predicted GDP_CAPITA. Residual will be 0 if there is no difference bewteen observed and predicted values of GDP_CAPITA.
qtm(brazil.sf.fixed2,"residual", border=NULL)
## Variable(s) "residual" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
The code chunk below allow us to visualise the y hat of the regression model.
Y-hat represents the predicted equation for a line of best fit in a linear regression model. The value of y hat helps us to differentiate between the predicted data and the observed data of GDP_CAPITA.
qtm(brazil.sf.fixed2, "yhat", border=NULL)
Overall, our regression model is able to perform well in explaining the factors that are affecting GDP_CAPITA in Brazil.